From: Fabio Olive Leite <fleite@redhat.com>
To: linux-nfs <nfs@lists.sourceforge.net>
Subject: time_after cannot be used alone by NFS code in 32bit architectures
Date: Thu, 26 Apr 2007 01:33:44 -0300 [thread overview]
Message-ID: <20070426043344.GO10449@sleipnir.redhat.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 4093 bytes --]
Hi nfs@,
This is going to be tough, so bear with me for a while. :)
While I was trying to understand the effects of wrapping jiffies in
the various timestamp comparisons the NFS code does among its data
structures, I was struck by cases where the code flow just did not
make sense (checking with nfs_debug set to 32767 and additional
systemtap scripts to extract debug info), considering the jiffy
timestamps involved. This is the investigation that ensued.
time_after and friends provide a very clever way to compare two jiffy
counts when those two counts are close together, and indeed does
provide sign flip resistance, when the two values are close together.
So it is ideal for short-lived timers and timeouts, since one would be
usually comparing values a few thousands or even millions apart.
But unfortunately for the NFS code, time_after and friends alone are
not well suited if we're just comparing jiffy counts taken at random
times in the past instead of at moments close together. Exactly the
protection against sign flips backfires when comparing timestamps over
2 billion counts apart.
The NFS data structures can easily be left idle for over 2 billion
jiffies, which is about 25 days with an HZ of 1000. So consider a
file's nfs_fattr data has a timestamp of 0x3fffffff. About 25 days
later this file is accessed again, and the new attributes from the
server get tagged with a timestamp of 0xc0000000. Those new attributes
are clearly newer than what we had before, yet
time_after(0xc0000000,0x3fffffff) returns 0, and the older attributes
remain in use. Yes, those values are corner cases, but they are
actually just the limits of a range of values with similar behavior.
The ascii chart below shows that the sign flip protection of
time_after extends to a lot more than what we'd like for the NFS
timestamp comparisons.
t b 0 0 0 3 4 7 7 7 8 8 8 b c f f f
i 0 0 0 f 0 f f f 0 0 0 f 0 f f f
m 0 0 0 f 0 f f f 0 0 0 f 0 f f f
e 0 0 0 f 0 f f f 0 0 0 f 0 f f f
_ 0 0 0 f 0 f f f 0 0 0 f 0 f f f
a 0 0 0 f 0 f f f 0 0 0 f 0 f f f
f 0 0 0 f 0 f f f 0 0 0 f 0 f f f
a t 0 1 2 f 0 d e f 0 1 2 f 0 d e f
00000000 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
00000001 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
00000002 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1
3fffffff 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1
40000000 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1
7ffffffd 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1
7ffffffe 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1
7fffffff 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1
80000000 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
80000001 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
80000002 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0
bfffffff 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0
c0000000 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0
fffffffd 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0
fffffffe 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0
ffffffff 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0
This leads to several problematic situations in long-lived NFS mounts.
We can see old attributes not being updated, negative dentries that do
not revalidate even after touching their parent dir, etc.
So something more robust is needed when comparing two timestamps in
NFS data structures, or we should convert the code to use 64bit
jiffies everywhere, even on 32bit architectures. This probably affects
many other things in the rest of the kernel, but I don't think many
other data structures are as long-lived as the NFS ones.
Attached are two small programs to help understand the issue, one that
generates the matrix above, the other being a simple time_after
calculator for checking specific values. Compile and run on 32bit
boxes, of course.
Although this sounds like the perfect push for using autofs, I'm sure
long-lived (over 100 days, allowing for two rounds of u32 jiffies and
timestamps all over the place) NFS mounts on 32bit architectures will
still be very common for a long time. :)
Comments? I'd love to be proved wrong, as right now I'm very scared.
Regards,
Fábio
--
ex sed lex awk yacc, e pluribus unix, amem
[-- Attachment #2: time_after_matrix.c --]
[-- Type: text/plain, Size: 1521 bytes --]
#include <stdio.h>
#define typecheck(type,x) \
({ type __dummy; \
typeof(x) __dummy2; \
(void)(&__dummy == &__dummy2); \
1; \
})
#define time_after(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)(b) - (long)(a) < 0))
#define time_before(a,b) time_after(b,a)
#define time_after_eq(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)(a) - (long)(b) >= 0))
#define time_before_eq(a,b) time_after_eq(b,a)
int main() {
unsigned long i, j;
unsigned long values[] = {
0x00000000,
0x00000001,
0x00000002,
0x3fffffff,
0x40000000,
0x7ffffffd,
0x7ffffffe,
0x7fffffff,
0x80000000,
0x80000001,
0x80000002,
0xbfffffff,
0xc0000000,
0xfffffffd,
0xfffffffe,
0xffffffff
};
/* Lame! =) */
printf("t b 0 0 0 3 4 7 7 7 8 8 8 b c f f f\n");
printf(" i 0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
printf(" m 0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
printf(" e 0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
printf(" _ 0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
printf(" a 0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
printf(" f 0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
printf("a t 0 1 2 f 0 d e f 0 1 2 f 0 d e f\n");
printf("\n");
for (i = 0; i < 16; i++) {
printf("%08lx ", values[i]);
for (j = 0; j < 16; j++) {
printf("%d ", time_after(values[i],values[j]));
}
printf("\n");
}
return 0;
}
[-- Attachment #3: time_after_calc.c --]
[-- Type: text/plain, Size: 802 bytes --]
#include <stdio.h>
#define typecheck(type,x) \
({ type __dummy; \
typeof(x) __dummy2; \
(void)(&__dummy == &__dummy2); \
1; \
})
#define time_after(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)(b) - (long)(a) < 0))
#define time_before(a,b) time_after(b,a)
#define time_after_eq(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)(a) - (long)(b) >= 0))
#define time_before_eq(a,b) time_after_eq(b,a)
int main() {
unsigned long a, b;
printf("Type in series of two space-separated hex values, ^D to end.\n");
while (scanf("%lx %lx", &a, &b) == 2)
printf("time_after(%lx,%lx) == %d\n", a, b, time_after(a,b));
return 0;
}
[-- Attachment #4: Type: text/plain, Size: 286 bytes --]
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
[-- Attachment #5: Type: text/plain, Size: 140 bytes --]
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next reply other threads:[~2007-04-26 4:33 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-26 4:33 Fabio Olive Leite [this message]
2007-04-26 6:13 ` time_after cannot be used alone by NFS code in 32bit architectures Neil Brown
2007-05-01 4:04 ` Ian Kent
2007-05-02 11:52 ` Ian Kent
2007-05-08 2:37 ` Ian Kent
2007-05-08 2:43 ` Ian Kent
2007-05-08 6:19 ` Neil Brown
2007-05-08 8:50 ` Ian Kent
2007-05-08 9:43 ` Ian Kent
2007-05-08 12:34 ` Trond Myklebust
2007-05-08 13:07 ` Ian Kent
2007-05-08 13:18 ` Neil Brown
2007-05-08 17:09 ` Fabio Olive Leite
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070426043344.GO10449@sleipnir.redhat.com \
--to=fleite@redhat.com \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.