* [B.A.T.M.A.N.] times wrapper
@ 2009-04-30 15:43 Nathan Wharton
2009-04-30 16:27 ` L. Aaron Kaplan
2009-05-01 7:55 ` Marek Lindner
0 siblings, 2 replies; 3+ messages in thread
From: Nathan Wharton @ 2009-04-30 15:43 UTC (permalink / raw)
To: The list for a Better Approach To Mobile Ad-hoc Networking
[-- Attachment #1: Type: text/plain, Size: 904 bytes --]
When I first started meshing, I used olsrd. There was a problem on
our platform that caused olsrd packets to stop going out after the
board had been up about 5 minutes. The outage lasted about 40
seconds, then came back. It turns out that there is a problem with
the times() call that will return -1 for the 4096 jiffies before wrap
around. This caused no apparent change in time during that duration,
so no packets were sent. Also, for some reason on our platform,
times() starts out about 5 minutes before wrap around.
I found a work around elsewhere (and I leave credit in the patch) for
some other software that was failing every 400 something days. I
applied this to olsrd, and the problem went away.
When I switched to batman, I saw that times() is used in it as well,
so I made the patch attached. So, if anyone sees packets fail to come
from batman for 40 seconds, they might need this.
[-- Attachment #2: 001-times.patch --]
[-- Type: application/octet-stream, Size: 2359 bytes --]
diff -ruN batmand-r1178.orig/posix/posix.c batmand-r1178/posix/posix.c
--- batmand-r1178.orig/posix/posix.c 2008-12-29 09:35:38.000000000 -0600
+++ batmand-r1178/posix/posix.c 2009-01-02 13:31:27.000000000 -0600
@@ -49,11 +49,49 @@
static pthread_mutex_t batman_clock_mutex = PTHREAD_MUTEX_INITIALIZER;
+static struct tms dummy_tms_struct;
+static clock_t
+times_wrapper(void) /* Make times(2) behave rationally on Linux */
+{
+ int save_errno = errno;
+ clock_t ret;
+
+ /*
+ * times(2) really returns an unsigned value ...
+ *
+ * We don't check to see if we got back the error value (-1), because
+ * the only possibility for an error would be if the address of
+ * dummy_tms_struct was invalid. Since it's a
+ * compiler-generated address, we assume that errors are impossible.
+ * And, unfortunately, it is quite possible for the correct return
+ * from times(2) to be exactly (clock_t)-1. Sigh...
+ *
+ */
+ errno = 0;
+ ret = times(&dummy_tms_struct);
+
+/*
+ * This is to work around a bug in the system call interface
+ * for times(2) found in glibc on Linux (and maybe elsewhere)
+ * It changes the return values from -1 to -4096 all into
+ * -1 and then dumps the -(return value) into errno.
+ *
+ * This totally bizarre behavior seems to be widespread in
+ * versions of Linux and glibc.
+ *
+ * Many thanks to Wolfgang Dumhs <wolfgang.dumhs (at) gmx.at>
+ * for finding and documenting this bizarre behavior.
+ */
+ if (errno != 0) {
+ ret = (clock_t) (-errno);
+ }
+ errno = save_errno;
+ return ret;
+}
static void update_internal_clock(void)
{
- struct tms tp;
- clock_t current_clock_tick = times(&tp);
+ clock_t current_clock_tick = times_wrapper();
batman_clock_ticks += (current_clock_tick - last_clock_tick);
last_clock_tick = current_clock_tick;
@@ -553,7 +591,6 @@
int main( int argc, char *argv[] ) {
int8_t res;
- struct tms tp;
/* check if user is root */
if ( ( getuid() ) || ( getgid() ) ) {
@@ -575,7 +612,7 @@
/* save start value */
system_tick = (float)sysconf(_SC_CLK_TCK);
- last_clock_tick = times(&tp);
+ last_clock_tick = times_wrapper();
update_internal_clock();
apply_init_args( argc, argv );
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [B.A.T.M.A.N.] times wrapper
2009-04-30 15:43 [B.A.T.M.A.N.] times wrapper Nathan Wharton
@ 2009-04-30 16:27 ` L. Aaron Kaplan
2009-05-01 7:55 ` Marek Lindner
1 sibling, 0 replies; 3+ messages in thread
From: L. Aaron Kaplan @ 2009-04-30 16:27 UTC (permalink / raw)
To: The list for a Better Approach To Mobile Ad-hoc Networking; +Cc: Henning Rogge
On Apr 30, 2009, at 5:43 PM, Nathan Wharton wrote:
> When I first started meshing, I used olsrd. There was a problem on
> our platform that caused olsrd packets to stop going out after the
> board had been up about 5 minutes. The outage lasted about 40
> seconds, then came back. It turns out that there is a problem with
> the times() call that will return -1 for the 4096 jiffies before wrap
FYI / maybe it helps BATMAN:
that has been fixed in olsrd in a more portable way in the meantime.
> around. This caused no apparent change in time during that duration,
> so no packets were sent. Also, for some reason on our platform,
> times() starts out about 5 minutes before wrap around.
>
> I found a work around elsewhere (and I leave credit in the patch) for
> some other software that was failing every 400 something days. I
> applied this to olsrd, and the problem went away.
>
yes, we had some looong discussions about this on the lists.
It is really not easy to use times() in a OS/HW independant way.
One way is to check if there is an overrun in times().
That is how me and Sven-Ola fixed it.
Henning provided a more general/cleaner solution which can even
survive if the clock goes backwards (sometimes happens on Xen
machines! you wont believe it) or jumps badly. This has been tested
with turning the time of the clock backwards etc.
Please also feel free to compare against the olsrd code in case that
helps / in case you are interested.
the man page for times() says that it is deprecated.
One more thing for Nathan:
your comment in the patch
(+ /*
+ * times(2) really returns an unsigned value ...
+ *
+ * We don't check to see if we got back the error value (-1),
because
+ * the only possibility for an error would be if the address of
+ * dummy_tms_struct was invalid. Since it's a
+ * compiler-generated address, we assume that errors are
impossible.
+ * And, unfortunately, it is quite possible for the correct
return
+ * from times(2) to be exactly (clock_t)-1. Sigh...
+ *
+ */
)
is only partly correct.
It really depends on the OS that you compile on.
Some OSes actually treat the return value as signed, some as unsigned.
You can compare the different manpages for the different OSes.
But -1 is the problem after all, agreed.
*Sigh* when does Marek learn how to code proper C ;-))
^^^ --- just teasing, don't take it seriously.
> When I switched to batman, I saw that times() is used in it as well,
> so I made the patch attached. So, if anyone sees packets fail to come
> from batman for 40 seconds, they might need this.
> <001-times.patch>_______________________________________________
> B.A.T.M.A.N mailing list
> B.A.T.M.A.N@open-mesh.net
> https://lists.open-mesh.net/mm/listinfo/b.a.t.m.a.n
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [B.A.T.M.A.N.] times wrapper
2009-04-30 15:43 [B.A.T.M.A.N.] times wrapper Nathan Wharton
2009-04-30 16:27 ` L. Aaron Kaplan
@ 2009-05-01 7:55 ` Marek Lindner
1 sibling, 0 replies; 3+ messages in thread
From: Marek Lindner @ 2009-05-01 7:55 UTC (permalink / raw)
To: The list for a Better Approach To Mobile Ad-hoc Networking
Hi,
> When I switched to batman, I saw that times() is used in it as well,
> so I made the patch attached. So, if anyone sees packets fail to come
> from batman for 40 seconds, they might need this.
excellent finding! I immediately applied your patch.
For the curious reader:
When you develop a daemon that should do certain things in intervals (e.g.
sending packets, removing dead nodes from the routing table, etc) you will
need a reference clock to know when this interval is over. Using the system
time as reference is the obvious solution which theoretically offers what you
need.
As often, reality is much harder: time synchronisation approaches (e.g. NTP),
manually changed system time, winter / summer time switches, timezones, etc
might encourage the daemon to improvised reactions (e.g. sending the messages
of the last 3 years because ntp just finished the time sync).
A more stable reference clock can be found by using the times() system call.
It returns "the number of clock ticks that have elapsed since an arbitrary
point in the past" (often the uptime). The compelling advantage: it always
increments although it overflows from time to time.
Batman uses the times() function which has a bug that this patch deals with. I
had a short look in the olsr code (0.5.6-r4) to see how this bug is being
handled there. main.c line 896 reveals a function called olsr_times()
--- SNIP -->
clock_t
olsr_times(void)
{
struct tms tms_buf;
return times(&tms_buf);
}
<-- SNAP ---
which is called from vairous places that don't deal with this bug:
scheduler.c line 117
--- SNIP -->
void
olsr_scheduler(void)
{
[..]
/* Main scheduler loop */
for (;;) {
/*
* Update the global timestamp. We are using a non-wallclock timer here
* to avoid any undesired side effects if the system clock changes.
*/
now_times = olsr_times();
[..]
/* We are done, sleep until the next scheduling interval. */
olsr_scheduler_sleep(olsr_times() - now_times);
<-- SNAP ---
Regards,
Marek
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-05-01 7:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-30 15:43 [B.A.T.M.A.N.] times wrapper Nathan Wharton
2009-04-30 16:27 ` L. Aaron Kaplan
2009-05-01 7:55 ` Marek Lindner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox