* Problem with function select on kernel 2.6.29.6-rt23
@ 2009-09-10 10:47 Jürgen Mell
2009-09-10 11:33 ` Sujit K M
2009-09-18 19:40 ` Jürgen Mell
0 siblings, 2 replies; 8+ messages in thread
From: Jürgen Mell @ 2009-09-10 10:47 UTC (permalink / raw)
To: linux-rt-users
Hi,
I have an application which connects via a network socket to a server
running on the same machine (IP 127.0.0.1) This application uses the
function 'select' to wait for new data from the server or until a two
seconds timeout. This works well until there is network traffic on the
external network interfaces (eth* or WLAN). When there is network
traffic on the external interfaces, the select function does not wait
anymore but it returns with a return code of zero, indicating not data
available on the socket. This happens nearly immediately (after 8 to 9
microseconds) and not after the specified two seconds interval. The
timeout parameter of select is updated accordingly (it shows eg. 1 s
999991 us).
Up to now I could not test this with another kernel but I will try to do
it this afternoon. Are there any known problems with select? Is there
any way to circumvent this?
Any help would be greatly appreciated!
Jürgen
--
Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com
Tel.: +49-511-762-18226 http://www.hedrich-winding.com
FAX : +49-511-762-18225
Mobil: +49-160-7428156
----------------------------------------------------------------------------
HEDRICH winding systems GmbH
An der Universität 2 (im PZH)
D-30823 Garbsen (GERMANY)
----------------------------------------------------------------------------
Geschäftsführer: Karsten Adam
Handelsregister: Wetzlar, HRB 4768
Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279
----------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Problem with function select on kernel 2.6.29.6-rt23 2009-09-10 10:47 Problem with function select on kernel 2.6.29.6-rt23 Jürgen Mell @ 2009-09-10 11:33 ` Sujit K M 2009-09-10 11:50 ` Jürgen Mell 2009-09-18 19:40 ` Jürgen Mell 1 sibling, 1 reply; 8+ messages in thread From: Sujit K M @ 2009-09-10 11:33 UTC (permalink / raw) To: mell, linux-rt-users Could you Check the sort of load that your server machine is able to take. The site you would be looking at is http://www.petefreitag.com/item/689.cfm. I think if the network is rejecting the select call, As the socket is not getting created. Other wise it would wait to the required 2 Second limit set by you. Also are you sure of the bind part of the socket creation. Making it threaded is an option. Thanks, Sujit On Thu, Sep 10, 2009 at 4:17 PM, Jürgen Mell <mell@hedrich-winders.com> wrote: > Hi, > > I have an application which connects via a network socket to a server > running on the same machine (IP 127.0.0.1) This application uses the > function 'select' to wait for new data from the server or until a two > seconds timeout. This works well until there is network traffic on the > external network interfaces (eth* or WLAN). When there is network traffic on > the external interfaces, the select function does not wait anymore but it > returns with a return code of zero, indicating not data available on the > socket. This happens nearly immediately (after 8 to 9 microseconds) and not > after the specified two seconds interval. The timeout parameter of select is > updated accordingly (it shows eg. 1 s 999991 us). > Up to now I could not test this with another kernel but I will try to do it > this afternoon. Are there any known problems with select? Is there any way > to circumvent this? > > Any help would be greatly appreciated! > > Jürgen > > -- > Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com > Tel.: +49-511-762-18226 http://www.hedrich-winding.com > FAX : +49-511-762-18225 > Mobil: +49-160-7428156 > ---------------------------------------------------------------------------- > HEDRICH winding systems GmbH > An der Universität 2 (im PZH) > D-30823 Garbsen (GERMANY) > ---------------------------------------------------------------------------- > Geschäftsführer: Karsten Adam > Handelsregister: Wetzlar, HRB 4768 > Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279 > ---------------------------------------------------------------------------- > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- -- Sujit K M blog(http://kmsujit.blogspot.com/) -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem with function select on kernel 2.6.29.6-rt23 2009-09-10 11:33 ` Sujit K M @ 2009-09-10 11:50 ` Jürgen Mell 0 siblings, 0 replies; 8+ messages in thread From: Jürgen Mell @ 2009-09-10 11:50 UTC (permalink / raw) To: Sujit K M, linux-rt-users Thanks for your reply! The machine is not loaded very much (CPU load below 30%) . ANY network traffic, which is not even directed to the server process on the machine will cause the problem, eg. just a copy operation which copies the Linux kernel source from a SMB server to the local disk. The application program has established the socket connection. 'bind' has completed without problems. It is then running in a loop where it calls 'select' to wait for new data, reads the data if any is available, processes it and then repeats these steps. Bye, Jürgen Sujit K M wrote: > Could you Check the sort of load that your server machine is able to > take. The site you would be looking at is > http://www.petefreitag.com/item/689.cfm. > I think if the network is rejecting the select call, As the socket is > not getting created. Other wise it would wait to the required 2 Second > limit set by you. > Also are you sure of the bind part of the socket creation. Making it > threaded is an option. > > Thanks, > Sujit > > On Thu, Sep 10, 2009 at 4:17 PM, Jürgen Mell <mell@hedrich-winders.com> wrote: > >> Hi, >> >> I have an application which connects via a network socket to a server >> running on the same machine (IP 127.0.0.1) This application uses the >> function 'select' to wait for new data from the server or until a two >> seconds timeout. This works well until there is network traffic on the >> external network interfaces (eth* or WLAN). When there is network traffic on >> the external interfaces, the select function does not wait anymore but it >> returns with a return code of zero, indicating not data available on the >> socket. This happens nearly immediately (after 8 to 9 microseconds) and not >> after the specified two seconds interval. The timeout parameter of select is >> updated accordingly (it shows eg. 1 s 999991 us). >> Up to now I could not test this with another kernel but I will try to do it >> this afternoon. Are there any known problems with select? Is there any way >> to circumvent this? >> >> Any help would be greatly appreciated! >> >> Jürgen >> >> -- >> Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com >> Tel.: +49-511-762-18226 http://www.hedrich-winding.com >> FAX : +49-511-762-18225 >> Mobil: +49-160-7428156 >> ---------------------------------------------------------------------------- >> HEDRICH winding systems GmbH >> An der Universität 2 (im PZH) >> D-30823 Garbsen (GERMANY) >> ---------------------------------------------------------------------------- >> Geschäftsführer: Karsten Adam >> Handelsregister: Wetzlar, HRB 4768 >> Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279 >> ---------------------------------------------------------------------------- >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > > > -- Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com Tel.: +49-511-762-18226 http://www.hedrich-winding.com FAX : +49-511-762-18225 Mobil: +49-160-7428156 ---------------------------------------------------------------------------- HEDRICH winding systems GmbH An der Universität 2 (im PZH) D-30823 Garbsen (GERMANY) ---------------------------------------------------------------------------- Geschäftsführer: Karsten Adam Handelsregister: Wetzlar, HRB 4768 Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279 ---------------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem with function select on kernel 2.6.29.6-rt23 2009-09-10 10:47 Problem with function select on kernel 2.6.29.6-rt23 Jürgen Mell 2009-09-10 11:33 ` Sujit K M @ 2009-09-18 19:40 ` Jürgen Mell 2009-09-20 10:20 ` Sujit K M 1 sibling, 1 reply; 8+ messages in thread From: Jürgen Mell @ 2009-09-18 19:40 UTC (permalink / raw) To: linux-rt-users Meanwhile I have dug a little deeper into this problem. The problem occurs under the following conditions: - the BIOS clock must be slow - the NTP daemon is used to adjust the system time The problem can be reproduced on real hardware as well as on a virtual machine running under VMware. Set the BIOS clock back about ten minutes against the 'real' time. Then start the NTP daemon and then run the little test program: #include <stdio.h> #include <stdlib.h> #include <time.h> #include <errno.h> #include <sys/select.h> int main(int argc, char *argv[]) { time_t t; struct timeval timeout; int i; int ret; t = time (NULL); printf ("Current time before = %s", ctime (&t)); for (i = 0; i < 20; i++) { timeout.tv_sec = 1; timeout.tv_usec = 0; if ((ret = select (FD_SETSIZE, NULL, NULL, NULL, &timeout)) < 0) { printf ("select returned %d, errno = %d\n", ret, errno); return EXIT_FAILURE; } } t = time (NULL); printf ("Current time after = %s", ctime (&t)); return EXIT_SUCCESS; } On a virtual machine under VMware I got the following result after some minutes of system run time: hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test Current time before = Fri Sep 18 20:05:51 2009 Current time after = Fri Sep 18 20:06:11 2009 hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test Current time before = Fri Sep 18 20:14:29 2009 Current time after = Fri Sep 18 20:14:33 2009 hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test Current time before = Fri Sep 18 20:14:57 2009 Current time after = Fri Sep 18 20:14:57 2009 hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test Current time before = Fri Sep 18 20:15:20 2009 Current time after = Fri Sep 18 20:15:40 2009 hws@cwc-vmware:/home/hws > Normally, the time distance between 'before' and 'after' should be 20 seconds as in the first and last run of the program. For the second run the time difference is only 4 seconds and for the third run it is even zero. On the real hardware I have also some other time-related issues when the problem occurs. Keyboard input will often 'bounce' - key presses are detected two or more times and some delay times are prolonged (!). I could not yet reproduce this in the virtual machine. The problem will not always occur immediately after the system is started but it may take several minutes until the first effects occur. I have not tested this issue with other kernels yet but I will do so during the weekend. Are there any ideas what to do about this (beside buying a better BIOS clock)? I would really like to have the NTP daemon running to keep the system time accurate, but somehow it seems to effect wait queues in the kernel pretty badly. Bye, Jürgen Jürgen Mell schrieb: > Hi, > > I have an application which connects via a network socket to a server > running on the same machine (IP 127.0.0.1) This application uses the > function 'select' to wait for new data from the server or until a two > seconds timeout. This works well until there is network traffic on the > external network interfaces (eth* or WLAN). When there is network > traffic on the external interfaces, the select function does not wait > anymore but it returns with a return code of zero, indicating not data > available on the socket. This happens nearly immediately (after 8 to 9 > microseconds) and not after the specified two seconds interval. The > timeout parameter of select is updated accordingly (it shows eg. 1 s > 999991 us). > Up to now I could not test this with another kernel but I will try to > do it this afternoon. Are there any known problems with select? Is > there any way to circumvent this? > > Any help would be greatly appreciated! > > Jürgen > -- Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com Tel.: +49-511-762-18226 http://www.hedrich-winding.com FAX : +49-511-762-18225 Mobil: +49-160-7428156 ---------------------------------------------------------------------------- HEDRICH winding systems GmbH An der Universität 2 (im PZH) D-30823 Garbsen (GERMANY) ---------------------------------------------------------------------------- Geschäftsführer: Karsten Adam Handelsregister: Wetzlar, HRB 4768 Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279 ---------------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem with function select on kernel 2.6.29.6-rt23 2009-09-18 19:40 ` Jürgen Mell @ 2009-09-20 10:20 ` Sujit K M 2009-09-21 9:23 ` Sujit K M 0 siblings, 1 reply; 8+ messages in thread From: Sujit K M @ 2009-09-20 10:20 UTC (permalink / raw) To: mell; +Cc: linux-rt-users Hi, One thing at the onset I would like you to check is that what happens to the program when the loop count is made more like 1000/10,000/100000 - 1 Million/10 Million. Does the Time Graph Increase. Try Plotting the Difference with actual time start. Try Making Use of Some scripting language like TCL/TK. There is some info regarding the select system call. I think it is pertaining to this problem. http://linux.die.net/man/2/syscalls. Basically It is an Optimization that the Current Kernels Look Into. Thanks, Sujit On Sat, Sep 19, 2009 at 1:10 AM, Jürgen Mell <mell@hedrich-winders.com> wrote: > Meanwhile I have dug a little deeper into this problem. The problem > occurs under the following conditions: > - the BIOS clock must be slow > - the NTP daemon is used to adjust the system time > The problem can be reproduced on real hardware as well as on a virtual > machine running under VMware. Set the BIOS clock back about ten minutes > against the 'real' time. Then start the NTP daemon and then run the > little test program: > > #include <stdio.h> > #include <stdlib.h> > #include <time.h> > #include <errno.h> > #include <sys/select.h> > > int main(int argc, char *argv[]) > { > time_t t; > struct timeval timeout; > int i; > int ret; > > t = time (NULL); > printf ("Current time before = %s", ctime (&t)); > > for (i = 0; i < 20; i++) > { > timeout.tv_sec = 1; > timeout.tv_usec = 0; > > if ((ret = select (FD_SETSIZE, NULL, NULL, NULL, &timeout)) < 0) > { > printf ("select returned %d, errno = %d\n", ret, errno); > return EXIT_FAILURE; > } > } > t = time (NULL); > printf ("Current time after = %s", ctime (&t)); > return EXIT_SUCCESS; > } > > On a virtual machine under VMware I got the following result after some > minutes of system run time: > > hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test > Current time before = Fri Sep 18 20:05:51 2009 > Current time after = Fri Sep 18 20:06:11 2009 > hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test > Current time before = Fri Sep 18 20:14:29 2009 > Current time after = Fri Sep 18 20:14:33 2009 > hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test > Current time before = Fri Sep 18 20:14:57 2009 > Current time after = Fri Sep 18 20:14:57 2009 > hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test > Current time before = Fri Sep 18 20:15:20 2009 > Current time after = Fri Sep 18 20:15:40 2009 > hws@cwc-vmware:/home/hws > > > Normally, the time distance between 'before' and 'after' should be 20 > seconds as in the first and last run of the program. For the second run > the time difference is only 4 seconds and for the third run it is even zero. > > On the real hardware I have also some other time-related issues when the > problem occurs. Keyboard input will often 'bounce' - key presses are > detected two or more times and some delay times are prolonged (!). I > could not yet reproduce this in the virtual machine. > > The problem will not always occur immediately after the system is > started but it may take several minutes until the first effects occur. I > have not tested this issue with other kernels yet but I will do so > during the weekend. > > Are there any ideas what to do about this (beside buying a better BIOS > clock)? I would really like to have the NTP daemon running to keep the > system time accurate, but somehow it seems to effect wait queues in the > kernel pretty badly. > > Bye, > Jürgen > > Jürgen Mell schrieb: >> Hi, >> >> I have an application which connects via a network socket to a server >> running on the same machine (IP 127.0.0.1) This application uses the >> function 'select' to wait for new data from the server or until a two >> seconds timeout. This works well until there is network traffic on the >> external network interfaces (eth* or WLAN). When there is network >> traffic on the external interfaces, the select function does not wait >> anymore but it returns with a return code of zero, indicating not data >> available on the socket. This happens nearly immediately (after 8 to 9 >> microseconds) and not after the specified two seconds interval. The >> timeout parameter of select is updated accordingly (it shows eg. 1 s >> 999991 us). >> Up to now I could not test this with another kernel but I will try to >> do it this afternoon. Are there any known problems with select? Is >> there any way to circumvent this? >> >> Any help would be greatly appreciated! >> >> Jürgen >> > > -- > Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com > Tel.: +49-511-762-18226 http://www.hedrich-winding.com > FAX : +49-511-762-18225 > Mobil: +49-160-7428156 > ---------------------------------------------------------------------------- > HEDRICH winding systems GmbH > An der Universität 2 (im PZH) > D-30823 Garbsen (GERMANY) > ---------------------------------------------------------------------------- > Geschäftsführer: Karsten Adam > Handelsregister: Wetzlar, HRB 4768 > Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279 > ---------------------------------------------------------------------------- > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- -- Sujit K M blog(http://kmsujit.blogspot.com/) -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem with function select on kernel 2.6.29.6-rt23 2009-09-20 10:20 ` Sujit K M @ 2009-09-21 9:23 ` Sujit K M 2009-09-21 9:58 ` Jürgen Mell 0 siblings, 1 reply; 8+ messages in thread From: Sujit K M @ 2009-09-21 9:23 UTC (permalink / raw) To: mell; +Cc: linux-rt-users this seems to be normal functionality. As quoted from http://linux.die.net/man/2/select (ii) select() may update the timeout argument to indicate how much time was left. pselect() does not change this argument. On Sun, Sep 20, 2009 at 3:50 PM, Sujit K M <sjt.kar@gmail.com> wrote: > Hi, > > One thing at the onset I would like you to check is that what happens > to the program when the loop > count is made more like 1000/10,000/100000 - 1 Million/10 Million. > Does the Time Graph Increase. > Try Plotting the Difference with actual time start. Try Making Use of > Some scripting language like TCL/TK. > > There is some info regarding the select system call. I think it is > pertaining to this problem. > http://linux.die.net/man/2/syscalls. Basically It is an Optimization > that the Current Kernels Look Into. > > Thanks, > Sujit > > On Sat, Sep 19, 2009 at 1:10 AM, Jürgen Mell <mell@hedrich-winders.com> wrote: >> Meanwhile I have dug a little deeper into this problem. The problem >> occurs under the following conditions: >> - the BIOS clock must be slow >> - the NTP daemon is used to adjust the system time >> The problem can be reproduced on real hardware as well as on a virtual >> machine running under VMware. Set the BIOS clock back about ten minutes >> against the 'real' time. Then start the NTP daemon and then run the >> little test program: >> >> #include <stdio.h> >> #include <stdlib.h> >> #include <time.h> >> #include <errno.h> >> #include <sys/select.h> >> >> int main(int argc, char *argv[]) >> { >> time_t t; >> struct timeval timeout; >> int i; >> int ret; >> >> t = time (NULL); >> printf ("Current time before = %s", ctime (&t)); >> >> for (i = 0; i < 20; i++) >> { >> timeout.tv_sec = 1; >> timeout.tv_usec = 0; >> >> if ((ret = select (FD_SETSIZE, NULL, NULL, NULL, &timeout)) < 0) >> { >> printf ("select returned %d, errno = %d\n", ret, errno); >> return EXIT_FAILURE; >> } >> } >> t = time (NULL); >> printf ("Current time after = %s", ctime (&t)); >> return EXIT_SUCCESS; >> } >> >> On a virtual machine under VMware I got the following result after some >> minutes of system run time: >> >> hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test >> Current time before = Fri Sep 18 20:05:51 2009 >> Current time after = Fri Sep 18 20:06:11 2009 >> hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test >> Current time before = Fri Sep 18 20:14:29 2009 >> Current time after = Fri Sep 18 20:14:33 2009 >> hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test >> Current time before = Fri Sep 18 20:14:57 2009 >> Current time after = Fri Sep 18 20:14:57 2009 >> hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test >> Current time before = Fri Sep 18 20:15:20 2009 >> Current time after = Fri Sep 18 20:15:40 2009 >> hws@cwc-vmware:/home/hws > >> >> Normally, the time distance between 'before' and 'after' should be 20 >> seconds as in the first and last run of the program. For the second run >> the time difference is only 4 seconds and for the third run it is even zero. >> >> On the real hardware I have also some other time-related issues when the >> problem occurs. Keyboard input will often 'bounce' - key presses are >> detected two or more times and some delay times are prolonged (!). I >> could not yet reproduce this in the virtual machine. >> >> The problem will not always occur immediately after the system is >> started but it may take several minutes until the first effects occur. I >> have not tested this issue with other kernels yet but I will do so >> during the weekend. >> >> Are there any ideas what to do about this (beside buying a better BIOS >> clock)? I would really like to have the NTP daemon running to keep the >> system time accurate, but somehow it seems to effect wait queues in the >> kernel pretty badly. >> >> Bye, >> Jürgen >> >> Jürgen Mell schrieb: >>> Hi, >>> >>> I have an application which connects via a network socket to a server >>> running on the same machine (IP 127.0.0.1) This application uses the >>> function 'select' to wait for new data from the server or until a two >>> seconds timeout. This works well until there is network traffic on the >>> external network interfaces (eth* or WLAN). When there is network >>> traffic on the external interfaces, the select function does not wait >>> anymore but it returns with a return code of zero, indicating not data >>> available on the socket. This happens nearly immediately (after 8 to 9 >>> microseconds) and not after the specified two seconds interval. The >>> timeout parameter of select is updated accordingly (it shows eg. 1 s >>> 999991 us). >>> Up to now I could not test this with another kernel but I will try to >>> do it this afternoon. Are there any known problems with select? Is >>> there any way to circumvent this? >>> >>> Any help would be greatly appreciated! >>> >>> Jürgen >>> >> >> -- >> Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com >> Tel.: +49-511-762-18226 http://www.hedrich-winding.com >> FAX : +49-511-762-18225 >> Mobil: +49-160-7428156 >> ---------------------------------------------------------------------------- >> HEDRICH winding systems GmbH >> An der Universität 2 (im PZH) >> D-30823 Garbsen (GERMANY) >> ---------------------------------------------------------------------------- >> Geschäftsführer: Karsten Adam >> Handelsregister: Wetzlar, HRB 4768 >> Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279 >> ---------------------------------------------------------------------------- >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > -- Sujit K M > > blog(http://kmsujit.blogspot.com/) > -- -- Sujit K M blog(http://kmsujit.blogspot.com/) -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem with function select on kernel 2.6.29.6-rt23 2009-09-21 9:23 ` Sujit K M @ 2009-09-21 9:58 ` Jürgen Mell 2009-09-21 10:34 ` Jürgen Mell 0 siblings, 1 reply; 8+ messages in thread From: Jürgen Mell @ 2009-09-21 9:58 UTC (permalink / raw) To: Sujit K M; +Cc: linux-rt-users No, I do not think that this is intentional. Some lines later, you will find "Some code calls *select*() with all three sets empty, /n/ zero, and a non-NULL /timeout/ as a fairly portable way to sleep with subsecond precision." This cannot make any sense, if I have to call select several times to get the full delay period. The overhead for calling the function several times is significant. I have modified the test program according to your proposal to run the loop 2000 times with 10000 us delay and get - depending on the speed of the computer - times between 22 and 24 seconds total. I understand that the timeout argument of select is updated when select returns after one of the monitored file descriptors is ready for the selected operation. I have tested this issue now with the kernel 2.6.31-rt11 and got a new problem: this time select does not abort prematurely any more but now each second of computer time is about three seconds in reality (the computer clock is extremely slow). NTP is running. Somehow fiddling with NTP causes very strange side effects... Bye, Jürgen Sujit K M schrieb: > this seems to be normal functionality. > > As quoted from > > http://linux.die.net/man/2/select > > (ii) > select() may update the timeout argument to indicate how much time was > left. pselect() does not change this argument. > > > > On Sun, Sep 20, 2009 at 3:50 PM, Sujit K M <sjt.kar@gmail.com> wrote: > >> Hi, >> >> One thing at the onset I would like you to check is that what happens >> to the program when the loop >> count is made more like 1000/10,000/100000 - 1 Million/10 Million. >> Does the Time Graph Increase. >> Try Plotting the Difference with actual time start. Try Making Use of >> Some scripting language like TCL/TK. >> >> There is some info regarding the select system call. I think it is >> pertaining to this problem. >> http://linux.die.net/man/2/syscalls. Basically It is an Optimization >> that the Current Kernels Look Into. >> >> Thanks, >> Sujit >> >> On Sat, Sep 19, 2009 at 1:10 AM, Jürgen Mell <mell@hedrich-winders.com> wrote: >> >>> Meanwhile I have dug a little deeper into this problem. The problem >>> occurs under the following conditions: >>> - the BIOS clock must be slow >>> - the NTP daemon is used to adjust the system time >>> The problem can be reproduced on real hardware as well as on a virtual >>> machine running under VMware. Set the BIOS clock back about ten minutes >>> against the 'real' time. Then start the NTP daemon and then run the >>> little test program: >>> >>> #include <stdio.h> >>> #include <stdlib.h> >>> #include <time.h> >>> #include <errno.h> >>> #include <sys/select.h> >>> >>> int main(int argc, char *argv[]) >>> { >>> time_t t; >>> struct timeval timeout; >>> int i; >>> int ret; >>> >>> t = time (NULL); >>> printf ("Current time before = %s", ctime (&t)); >>> >>> for (i = 0; i < 20; i++) >>> { >>> timeout.tv_sec = 1; >>> timeout.tv_usec = 0; >>> >>> if ((ret = select (FD_SETSIZE, NULL, NULL, NULL, &timeout)) < 0) >>> { >>> printf ("select returned %d, errno = %d\n", ret, errno); >>> return EXIT_FAILURE; >>> } >>> } >>> t = time (NULL); >>> printf ("Current time after = %s", ctime (&t)); >>> return EXIT_SUCCESS; >>> } >>> >>> On a virtual machine under VMware I got the following result after some >>> minutes of system run time: >>> >>> hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test >>> Current time before = Fri Sep 18 20:05:51 2009 >>> Current time after = Fri Sep 18 20:06:11 2009 >>> hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test >>> Current time before = Fri Sep 18 20:14:29 2009 >>> Current time after = Fri Sep 18 20:14:33 2009 >>> hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test >>> Current time before = Fri Sep 18 20:14:57 2009 >>> Current time after = Fri Sep 18 20:14:57 2009 >>> hws@cwc-vmware:/home/hws > /space/software/select_test/debug/src/select_test >>> Current time before = Fri Sep 18 20:15:20 2009 >>> Current time after = Fri Sep 18 20:15:40 2009 >>> hws@cwc-vmware:/home/hws > >>> >>> Normally, the time distance between 'before' and 'after' should be 20 >>> seconds as in the first and last run of the program. For the second run >>> the time difference is only 4 seconds and for the third run it is even zero. >>> >>> On the real hardware I have also some other time-related issues when the >>> problem occurs. Keyboard input will often 'bounce' - key presses are >>> detected two or more times and some delay times are prolonged (!). I >>> could not yet reproduce this in the virtual machine. >>> >>> The problem will not always occur immediately after the system is >>> started but it may take several minutes until the first effects occur. I >>> have not tested this issue with other kernels yet but I will do so >>> during the weekend. >>> >>> Are there any ideas what to do about this (beside buying a better BIOS >>> clock)? I would really like to have the NTP daemon running to keep the >>> system time accurate, but somehow it seems to effect wait queues in the >>> kernel pretty badly. >>> >>> Bye, >>> Jürgen >>> >>> Jürgen Mell schrieb: >>> >>>> Hi, >>>> >>>> I have an application which connects via a network socket to a server >>>> running on the same machine (IP 127.0.0.1) This application uses the >>>> function 'select' to wait for new data from the server or until a two >>>> seconds timeout. This works well until there is network traffic on the >>>> external network interfaces (eth* or WLAN). When there is network >>>> traffic on the external interfaces, the select function does not wait >>>> anymore but it returns with a return code of zero, indicating not data >>>> available on the socket. This happens nearly immediately (after 8 to 9 >>>> microseconds) and not after the specified two seconds interval. The >>>> timeout parameter of select is updated accordingly (it shows eg. 1 s >>>> 999991 us). >>>> Up to now I could not test this with another kernel but I will try to >>>> do it this afternoon. Are there any known problems with select? Is >>>> there any way to circumvent this? >>>> >>>> Any help would be greatly appreciated! >>>> >>>> Jürgen >>>> >>>> >>> -- >>> Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com >>> Tel.: +49-511-762-18226 http://www.hedrich-winding.com >>> FAX : +49-511-762-18225 >>> Mobil: +49-160-7428156 >>> ---------------------------------------------------------------------------- >>> HEDRICH winding systems GmbH >>> An der Universität 2 (im PZH) >>> D-30823 Garbsen (GERMANY) >>> ---------------------------------------------------------------------------- >>> Geschäftsführer: Karsten Adam >>> Handelsregister: Wetzlar, HRB 4768 >>> Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279 >>> ---------------------------------------------------------------------------- >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> >> -- >> -- Sujit K M >> >> blog(http://kmsujit.blogspot.com/) >> >> > > > > -- Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com Tel.: +49-511-762-18226 http://www.hedrich-winding.com FAX : +49-511-762-18225 Mobil: +49-160-7428156 ---------------------------------------------------------------------------- HEDRICH winding systems GmbH An der Universität 2 (im PZH) D-30823 Garbsen (GERMANY) ---------------------------------------------------------------------------- Geschäftsführer: Karsten Adam Handelsregister: Wetzlar, HRB 4768 Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279 ---------------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem with function select on kernel 2.6.29.6-rt23 2009-09-21 9:58 ` Jürgen Mell @ 2009-09-21 10:34 ` Jürgen Mell 0 siblings, 0 replies; 8+ messages in thread From: Jürgen Mell @ 2009-09-21 10:34 UTC (permalink / raw) To: Sujit K M; +Cc: linux-rt-users The slow clock was caused because the kernel suspected a defective ACPI PM timer. After fixing that, 2.6.31-rt11 runs up to now without problems. Jürgen Jürgen Mell schrieb: > No, I do not think that this is intentional. Some lines later, you > will find > > "Some code calls *select*() with all three sets empty, /n/ zero, and a > non-NULL /timeout/ as a fairly portable way to sleep with subsecond > precision." > > This cannot make any sense, if I have to call select several times to > get the full delay period. The overhead for calling the function > several times is significant. I have modified the test program > according to your proposal to run the loop 2000 times with 10000 us > delay and get - depending on the speed of the computer - times between > 22 and 24 seconds total. > > I understand that the timeout argument of select is updated when > select returns after one of the monitored file descriptors is ready > for the selected operation. > > I have tested this issue now with the kernel 2.6.31-rt11 and got a new > problem: this time select does not abort prematurely any more but now > each second of computer time is about three seconds in reality (the > computer clock is extremely slow). NTP is running. > > Somehow fiddling with NTP causes very strange side effects... > > Bye, > Jürgen > > Sujit K M schrieb: >> this seems to be normal functionality. >> >> As quoted from >> >> http://linux.die.net/man/2/select >> >> (ii) >> select() may update the timeout argument to indicate how much time was >> left. pselect() does not change this argument. >> >> >> >> On Sun, Sep 20, 2009 at 3:50 PM, Sujit K M <sjt.kar@gmail.com> wrote: >> >>> Hi, >>> >>> One thing at the onset I would like you to check is that what happens >>> to the program when the loop >>> count is made more like 1000/10,000/100000 - 1 Million/10 Million. >>> Does the Time Graph Increase. >>> Try Plotting the Difference with actual time start. Try Making Use of >>> Some scripting language like TCL/TK. >>> >>> There is some info regarding the select system call. I think it is >>> pertaining to this problem. >>> http://linux.die.net/man/2/syscalls. Basically It is an Optimization >>> that the Current Kernels Look Into. >>> >>> Thanks, >>> Sujit >>> >>> On Sat, Sep 19, 2009 at 1:10 AM, Jürgen Mell >>> <mell@hedrich-winders.com> wrote: >>> >>>> Meanwhile I have dug a little deeper into this problem. The problem >>>> occurs under the following conditions: >>>> - the BIOS clock must be slow >>>> - the NTP daemon is used to adjust the system time >>>> The problem can be reproduced on real hardware as well as on a virtual >>>> machine running under VMware. Set the BIOS clock back about ten >>>> minutes >>>> against the 'real' time. Then start the NTP daemon and then run the >>>> little test program: >>>> >>>> #include <stdio.h> >>>> #include <stdlib.h> >>>> #include <time.h> >>>> #include <errno.h> >>>> #include <sys/select.h> >>>> >>>> int main(int argc, char *argv[]) >>>> { >>>> time_t t; >>>> struct timeval timeout; >>>> int i; >>>> int ret; >>>> >>>> t = time (NULL); >>>> printf ("Current time before = %s", ctime (&t)); >>>> >>>> for (i = 0; i < 20; i++) >>>> { >>>> timeout.tv_sec = 1; >>>> timeout.tv_usec = 0; >>>> >>>> if ((ret = select (FD_SETSIZE, NULL, NULL, NULL, &timeout)) < 0) >>>> { >>>> printf ("select returned %d, errno = %d\n", ret, errno); >>>> return EXIT_FAILURE; >>>> } >>>> } >>>> t = time (NULL); >>>> printf ("Current time after = %s", ctime (&t)); >>>> return EXIT_SUCCESS; >>>> } >>>> >>>> On a virtual machine under VMware I got the following result after >>>> some >>>> minutes of system run time: >>>> >>>> hws@cwc-vmware:/home/hws > >>>> /space/software/select_test/debug/src/select_test >>>> Current time before = Fri Sep 18 20:05:51 2009 >>>> Current time after = Fri Sep 18 20:06:11 2009 >>>> hws@cwc-vmware:/home/hws > >>>> /space/software/select_test/debug/src/select_test >>>> Current time before = Fri Sep 18 20:14:29 2009 >>>> Current time after = Fri Sep 18 20:14:33 2009 >>>> hws@cwc-vmware:/home/hws > >>>> /space/software/select_test/debug/src/select_test >>>> Current time before = Fri Sep 18 20:14:57 2009 >>>> Current time after = Fri Sep 18 20:14:57 2009 >>>> hws@cwc-vmware:/home/hws > >>>> /space/software/select_test/debug/src/select_test >>>> Current time before = Fri Sep 18 20:15:20 2009 >>>> Current time after = Fri Sep 18 20:15:40 2009 >>>> hws@cwc-vmware:/home/hws > >>>> >>>> Normally, the time distance between 'before' and 'after' should be 20 >>>> seconds as in the first and last run of the program. For the second >>>> run >>>> the time difference is only 4 seconds and for the third run it is >>>> even zero. >>>> >>>> On the real hardware I have also some other time-related issues >>>> when the >>>> problem occurs. Keyboard input will often 'bounce' - key presses are >>>> detected two or more times and some delay times are prolonged (!). I >>>> could not yet reproduce this in the virtual machine. >>>> >>>> The problem will not always occur immediately after the system is >>>> started but it may take several minutes until the first effects >>>> occur. I >>>> have not tested this issue with other kernels yet but I will do so >>>> during the weekend. >>>> >>>> Are there any ideas what to do about this (beside buying a better BIOS >>>> clock)? I would really like to have the NTP daemon running to keep the >>>> system time accurate, but somehow it seems to effect wait queues in >>>> the >>>> kernel pretty badly. >>>> >>>> Bye, >>>> Jürgen >>>> >>>> Jürgen Mell schrieb: >>>> >>>>> Hi, >>>>> >>>>> I have an application which connects via a network socket to a server >>>>> running on the same machine (IP 127.0.0.1) This application uses the >>>>> function 'select' to wait for new data from the server or until a two >>>>> seconds timeout. This works well until there is network traffic on >>>>> the >>>>> external network interfaces (eth* or WLAN). When there is network >>>>> traffic on the external interfaces, the select function does not wait >>>>> anymore but it returns with a return code of zero, indicating not >>>>> data >>>>> available on the socket. This happens nearly immediately (after 8 >>>>> to 9 >>>>> microseconds) and not after the specified two seconds interval. The >>>>> timeout parameter of select is updated accordingly (it shows eg. 1 s >>>>> 999991 us). >>>>> Up to now I could not test this with another kernel but I will try to >>>>> do it this afternoon. Are there any known problems with select? Is >>>>> there any way to circumvent this? >>>>> >>>>> Any help would be greatly appreciated! >>>>> >>>>> Jürgen >>>>> >>>>> >>>> -- >>>> Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com >>>> Tel.: +49-511-762-18226 >>>> http://www.hedrich-winding.com >>>> FAX : +49-511-762-18225 >>>> Mobil: +49-160-7428156 >>>> ---------------------------------------------------------------------------- >>>> >>>> HEDRICH winding systems GmbH >>>> An der Universität 2 (im PZH) >>>> D-30823 Garbsen (GERMANY) >>>> ---------------------------------------------------------------------------- >>>> >>>> Geschäftsführer: Karsten Adam >>>> Handelsregister: Wetzlar, HRB 4768 >>>> Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279 >>>> ---------------------------------------------------------------------------- >>>> >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe >>>> linux-rt-users" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>> >>> -- >>> -- Sujit K M >>> >>> blog(http://kmsujit.blogspot.com/) >>> >>> >> >> >> >> > > -- Jürgen Mell (Software-Entwicklung) mell@hedrich-winders.com Tel.: +49-511-762-18226 http://www.hedrich-winding.com FAX : +49-511-762-18225 Mobil: +49-160-7428156 ---------------------------------------------------------------------------- HEDRICH winding systems GmbH An der Universität 2 (im PZH) D-30823 Garbsen (GERMANY) ---------------------------------------------------------------------------- Geschäftsführer: Karsten Adam Handelsregister: Wetzlar, HRB 4768 Steuernr.: 020/235/20110 USt-IdNr.: DE 258258279 ---------------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-09-21 10:34 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-09-10 10:47 Problem with function select on kernel 2.6.29.6-rt23 Jürgen Mell 2009-09-10 11:33 ` Sujit K M 2009-09-10 11:50 ` Jürgen Mell 2009-09-18 19:40 ` Jürgen Mell 2009-09-20 10:20 ` Sujit K M 2009-09-21 9:23 ` Sujit K M 2009-09-21 9:58 ` Jürgen Mell 2009-09-21 10:34 ` Jürgen Mell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox