* [RFC 00/32] making inode time stamps y2038 ready
@ 2014-05-30 20:01 Arnd Bergmann
2014-05-30 20:01 ` [RFC 11/32] xfs: convert to struct inode_time Arnd Bergmann
` (3 more replies)
0 siblings, 4 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-05-30 20:01 UTC (permalink / raw)
To: linux-kernel
Cc: hch, linux-mtd, hpa, logfs, linux-afs, joseph, linux-arch,
linux-cifs, linux-scsi, ceph-devel, codalist, cluster-devel, coda,
geert, linux-ext4, Arnd Bergmann, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
Based on the recent discussion about 64-bit time_t for new
architectures, and for solving the year 2038 problem in general,
I decided to try out what it would take to solve part of the
kernel side of things.
This is a proof-of-concept work to get us to the point where
two system calls (utimes and stat) provide a working interface
to user space to pass 64-bit inode time stamps in and out of
the kernel all the way to the file systems.
I picked this because it is a fairly isolated problem, as the
inode time stamps are rarely assigned to any other time values.
As a byproduct of this work, I documented for each of the file
systems we support how long the on-disk format can work[1].
Obviously we also need to convert all the other syscalls and
have a proper libc implementation using those for this to
be really useful, but it's a start and it can be tested
independently (I didn't so far, want to wait for initial
feedback).
All the interesting stuff is in the first five patches here,
the rest is the straightforward conversion of all file systems
that use 'timespec' values internally.
There are of course a number of open questions:
a) is this the right approach in general? The previous discussion
pointed this way, but there may be other opinions.
b) what type should we use internally to represent inode time
stamps? The code contains three different versions that would
all work, we just have to pick a good tradeoff between
efficiency and the range of times we want to cover.
c) Should we continue this way for all 32-bit platforms for
consistency, including future ones, or should we go to
different 64-bit types right away? My feeling is that the
second approach would complicate this work.
Arnd
[1] http://kernelnewbies.org/y2038
Arnd Bergmann (32):
fs: introduce new 'struct inode_time'
uapi: add struct __kernel_timespec{32,64}
fs: introduce sys_utimens64at
fs: introduce sys_newfstat64/sys_newfstatat64
arch: hook up new stat and utimes syscalls
isofs: fix timestamps beyond 2027
fs/nfs: convert to struct inode_time
fs/ceph: convert to 'struct inode_time'
fs/pstore: convert to struct inode_time
fs/coda: convert to struct inode_time
xfs: convert to struct inode_time
btrfs: convert to struct inode_time
ext3: convert to struct inode_time
ext4: convert to struct inode_time
cifs: convert to struct inode_time
ntfs: convert to struct inode_time
ubifs: convert to struct inode_time
ocfs2: convert to struct inode_time
fs/fat: convert to struct inode_time
afs: convert to struct inode_time
udf: convert to struct inode_time
fs: convert simple fs to inode_time
logfs: convert to struct inode_time
hfs, hfsplus: convert to struct inode_time
gfs2: convert to struct inode_time
reiserfs: convert to struct inode_time
jffs2: convert to struct inode_time
adfs: convert to struct inode_time
f2fs: convert to struct inode_time
fuse: convert to struct inode_time
scsi: fnic: use current_kernel_time() for timestamp
fs: use new inode_time definition unconditionally
arch/alpha/kernel/osf_sys.c | 2 +-
arch/arm/include/asm/unistd.h | 2 +-
arch/arm/include/uapi/asm/stat.h | 25 +++++++++++++++++
arch/arm/include/uapi/asm/unistd.h | 3 +++
arch/arm/kernel/calls.S | 3 +++
arch/arm64/include/asm/unistd32.h | 5 +++-
arch/x86/include/uapi/asm/stat.h | 28 +++++++++++++++++++
arch/x86/syscalls/syscall_32.tbl | 3 +++
drivers/block/rbd.c | 2 +-
drivers/firmware/efi/efi-pstore.c | 28 +++++++++----------
drivers/scsi/fnic/fnic_trace.c | 2 +-
drivers/tty/tty_io.c | 2 +-
drivers/usb/gadget/f_fs.c | 2 +-
fs/adfs/inode.c | 4 +--
fs/afs/afs.h | 6 ++---
fs/afs/fsclient.c | 2 +-
fs/attr.c | 8 +++---
fs/btrfs/file.c | 6 ++---
fs/btrfs/inode.c | 4 +--
fs/btrfs/ioctl.c | 4 +--
fs/btrfs/root-tree.c | 2 +-
fs/btrfs/transaction.c | 2 +-
fs/ceph/cache.c | 2 +-
fs/ceph/caps.c | 6 ++---
fs/ceph/file.c | 4 +--
fs/ceph/inode.c | 20 +++++++-------
fs/ceph/super.h | 8 +++---
fs/cifs/cache.c | 6 ++---
fs/cifs/cifsglob.h | 6 ++---
fs/cifs/cifsproto.h | 6 ++---
fs/cifs/cifssmb.c | 5 ++--
fs/cifs/inode.c | 2 +-
fs/cifs/netmisc.c | 15 ++++++-----
fs/coda/coda_linux.c | 18 ++++++++-----
fs/compat.c | 19 ++-----------
fs/configfs/inode.c | 6 ++---
fs/cramfs/inode.c | 2 +-
fs/ext3/inode.c | 4 +--
fs/ext4/ext4.h | 10 +++----
fs/ext4/extents.c | 2 +-
fs/f2fs/file.c | 6 ++---
fs/fat/dir.c | 2 +-
fs/fat/fat.h | 6 ++---
fs/fat/misc.c | 4 +--
fs/fat/namei_msdos.c | 8 +++---
fs/fat/namei_vfat.c | 10 +++----
fs/fuse/inode.c | 6 ++---
fs/gfs2/dir.c | 6 ++---
fs/gfs2/glops.c | 4 +--
fs/hfs/hfs_fs.h | 2 +-
fs/hfsplus/hfsplus_fs.h | 2 +-
fs/inode.c | 18 ++++++-------
fs/isofs/util.c | 2 +-
fs/jffs2/os-linux.h | 2 +-
fs/locks.c | 4 +--
fs/logfs/readwrite.c | 18 ++++++-------
fs/nfs/callback.h | 4 +--
fs/nfs/callback_xdr.c | 6 ++---
fs/nfs/file.c | 2 +-
fs/nfs/fscache-index.c | 8 +++---
fs/nfs/inode.c | 10 +++----
fs/nfs/internal.h | 4 +--
fs/nfs/netns.h | 2 +-
fs/nfs/nfs2xdr.c | 8 +++---
fs/nfs/nfs3xdr.c | 10 +++----
fs/nfs/nfs4xdr.c | 20 +++++++-------
fs/nfsd/nfs3xdr.c | 6 ++---
fs/nfsd/nfsfh.h | 4 +--
fs/nfsd/nfsxdr.c | 2 +-
fs/ntfs/inode.c | 12 ++++-----
fs/ntfs/time.h | 8 +++---
fs/ocfs2/dlmglue.c | 16 +++++------
fs/ocfs2/file.c | 6 ++---
fs/ocfs2/ocfs2.h | 2 +-
fs/pstore/inode.c | 2 +-
fs/pstore/internal.h | 2 +-
fs/pstore/platform.c | 2 +-
fs/pstore/ram.c | 18 +++++++------
fs/reiserfs/namei.c | 2 +-
fs/reiserfs/xattr.c | 4 +--
fs/stat.c | 55 ++++++++++++++++++++++++++++++++++++++
fs/ubifs/dir.c | 2 +-
fs/ubifs/file.c | 16 +++++------
fs/ubifs/misc.h | 2 +-
fs/udf/udf_i.h | 2 +-
fs/udf/udf_sb.h | 2 +-
fs/udf/udfdecl.h | 7 ++---
fs/udf/udftime.c | 7 ++---
fs/utimes.c | 47 +++++++++++++++++++++++++++-----
fs/xfs/time.h | 4 +--
fs/xfs/xfs_inode.c | 2 +-
fs/xfs/xfs_iops.c | 2 +-
fs/xfs/xfs_trans_inode.c | 6 ++---
include/linux/ceph/decode.h | 8 +++---
include/linux/ceph/osd_client.h | 4 +--
include/linux/compat.h | 2 +-
include/linux/fs.h | 32 +++++++++++-----------
include/linux/nfs_fs_sb.h | 2 +-
include/linux/nfs_xdr.h | 14 +++++-----
include/linux/pstore.h | 4 +--
include/linux/stat.h | 6 ++---
include/linux/syscalls.h | 9 ++++++-
include/linux/time.h | 44 +++++++++++++++++++++++++++---
include/uapi/asm-generic/stat.h | 29 ++++++++++++++++++--
include/uapi/asm-generic/unistd.h | 8 +++++-
include/uapi/linux/coda.h | 1 +
include/uapi/linux/time.h | 40 ++++++++++++++++++++++++++-
init/initramfs.c | 2 +-
kernel/audit.c | 2 +-
kernel/auditsc.c | 2 +-
kernel/time.c | 44 +++++++++++++++++++++++++-----
kernel/time/timekeeping.c | 16 +++++++++++
net/ceph/auth_x.c | 2 +-
net/ceph/osd_client.c | 4 +--
114 files changed, 642 insertions(+), 333 deletions(-)
--
1.8.3.2
Bcc: "J. Bruce Fields" <bfields@fieldses.org>
Bcc: "Theodore Ts'o" <tytso@mit.edu>
Bcc: Adrian Hunter <adrian.hunter@intel.com>
Bcc: Andreas Dilger <adilger.kernel@dilger.ca>
Bcc: Andrew Morton <akpm@linux-foundation.org>
Bcc: Anton Altaparmakov <anton@tuxera.com>
Bcc: Anton Vorontsov <anton@enomsg.org>
Bcc: Artem Bityutskiy <dedekind1@gmail.com>
Bcc: Brian Uchino <buchino@cisco.com>
Bcc: Chris Mason <clm@fb.com>
Bcc: Colin Cross <ccross@android.com>
Bcc: Dave Chinner <david@fromorbit.com>
Bcc: David Howells <dhowells@redhat.com>
Bcc: David Woodhouse <dwmw2@infradead.org>
Bcc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bcc: Hiral Patel <hiralpat@cisco.com>
Bcc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Bcc: Jan Harkes <jaharkes@cs.cmu.edu>
Bcc: Jan Kara <jack@suse.cz>
Bcc: Joel Becker <jlbec@evilplan.org>
Bcc: Joern Engel <joern@logfs.org>
Bcc: Josef Bacik <jbacik@fb.com>
Bcc: Kees Cook <keescook@chromium.org>
Bcc: Mark Fasheh <mfasheh@suse.com>
Bcc: Miklos Szeredi <miklos@szeredi.hu>
Bcc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Bcc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Bcc: Sage Weil <sage@inktank.com>
Bcc: Steve French <sfrench@samba.org>
Bcc: Steven Whitehouse <swhiteho@redhat.com>
Bcc: Suma Ramars <sramars@cisco.com>
Bcc: Tony Luck <tony.luck@intel.com>
Cc: ceph-devel@vger.kernel.org
Cc: cluster-devel@redhat.com
Cc: coda@cs.cmu.edu
Cc: codalist@coda.cs.cmu.edu
Cc: fuse-devel@lists.sourceforge.net
Cc: linux-afs@lists.infradead.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: linux-ext4@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mtd@lists.infradead.org
Cc: linux-nfs@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: linux-scsi@vger.kernel.org
Cc: logfs@logfs.org
Cc: ocfs2-devel@oss.oracle.com
Cc: reiserfs-devel@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: xfs@oss.sgi.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* [RFC 11/32] xfs: convert to struct inode_time
2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann
@ 2014-05-30 20:01 ` Arnd Bergmann
2014-05-31 0:37 ` Dave Chinner
2014-05-31 14:30 ` [RFC 00/32] making inode time stamps y2038 ready Vyacheslav Dubeyko
` (2 subsequent siblings)
3 siblings, 1 reply; 71+ messages in thread
From: Arnd Bergmann @ 2014-05-30 20:01 UTC (permalink / raw)
To: linux-kernel
Cc: linux-arch, Arnd Bergmann, hpa, xfs, hch, john.stultz, lftan,
linux-fsdevel, geert, tglx, joseph
xfs uses unsigned 32-bit seconds for inode timestamps, which will work
for the next 92 years, but the VFS uses struct timespec for timestamps,
which is only good until 2038 on 32-bit CPUs.
This gets us one small step closer to lifting the VFS limit by using
struct inode_time in XFS.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
---
fs/xfs/time.h | 4 ++--
fs/xfs/xfs_inode.c | 2 +-
fs/xfs/xfs_iops.c | 2 +-
fs/xfs/xfs_trans_inode.c | 6 +++---
4 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/time.h b/fs/xfs/time.h
index 387e695..a490f1b 100644
--- a/fs/xfs/time.h
+++ b/fs/xfs/time.h
@@ -21,14 +21,14 @@
#include <linux/sched.h>
#include <linux/time.h>
-typedef struct timespec timespec_t;
+typedef struct inode_time timespec_t;
static inline void delay(long ticks)
{
schedule_timeout_uninterruptible(ticks);
}
-static inline void nanotime(struct timespec *tvp)
+static inline void nanotime(struct inode_time *tvp)
{
*tvp = CURRENT_TIME;
}
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index a6115fe..16d5392 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -654,7 +654,7 @@ xfs_ialloc(
xfs_inode_t *ip;
uint flags;
int error;
- timespec_t tv;
+ struct inode_time tv;
/*
* Call the space management code to pick
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 205613a..092ee7c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -956,7 +956,7 @@ xfs_vn_setattr(
STATIC int
xfs_vn_update_time(
struct inode *inode,
- struct timespec *now,
+ struct inode_time *now,
int flags)
{
struct xfs_inode *ip = XFS_I(inode);
diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index 50c3f56..bae2520 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -70,7 +70,7 @@ xfs_trans_ichgtime(
int flags)
{
struct inode *inode = VFS_I(ip);
- timespec_t tv;
+ struct inode_time tv;
ASSERT(tp);
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
@@ -78,13 +78,13 @@ xfs_trans_ichgtime(
tv = current_fs_time(inode->i_sb);
if ((flags & XFS_ICHGTIME_MOD) &&
- !timespec_equal(&inode->i_mtime, &tv)) {
+ !inode_time_equal(&inode->i_mtime, &tv)) {
inode->i_mtime = tv;
ip->i_d.di_mtime.t_sec = tv.tv_sec;
ip->i_d.di_mtime.t_nsec = tv.tv_nsec;
}
if ((flags & XFS_ICHGTIME_CHG) &&
- !timespec_equal(&inode->i_ctime, &tv)) {
+ !inode_time_equal(&inode->i_ctime, &tv)) {
inode->i_ctime = tv;
ip->i_d.di_ctime.t_sec = tv.tv_sec;
ip->i_d.di_ctime.t_nsec = tv.tv_nsec;
--
1.8.3.2
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-30 20:01 ` [RFC 11/32] xfs: convert to struct inode_time Arnd Bergmann
@ 2014-05-31 0:37 ` Dave Chinner
2014-05-31 0:41 ` H. Peter Anvin
0 siblings, 1 reply; 71+ messages in thread
From: Dave Chinner @ 2014-05-31 0:37 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arch, hpa, linux-kernel, xfs, hch, john.stultz, lftan,
linux-fsdevel, geert, tglx, joseph
On Fri, May 30, 2014 at 10:01:35PM +0200, Arnd Bergmann wrote:
> xfs uses unsigned 32-bit seconds for inode timestamps, which will work
> for the next 92 years, but the VFS uses struct timespec for timestamps,
> which is only good until 2038 on 32-bit CPUs.
>
> This gets us one small step closer to lifting the VFS limit by using
> struct inode_time in XFS.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: xfs@oss.sgi.com
> ---
> fs/xfs/time.h | 4 ++--
> fs/xfs/xfs_inode.c | 2 +-
> fs/xfs/xfs_iops.c | 2 +-
> fs/xfs/xfs_trans_inode.c | 6 +++---
> 4 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/fs/xfs/time.h b/fs/xfs/time.h
> index 387e695..a490f1b 100644
> --- a/fs/xfs/time.h
> +++ b/fs/xfs/time.h
> @@ -21,14 +21,14 @@
> #include <linux/sched.h>
> #include <linux/time.h>
>
> -typedef struct timespec timespec_t;
> +typedef struct inode_time timespec_t;
>
> static inline void delay(long ticks)
> {
> schedule_timeout_uninterruptible(ticks);
> }
>
> -static inline void nanotime(struct timespec *tvp)
> +static inline void nanotime(struct inode_time *tvp)
> {
> *tvp = CURRENT_TIME;
> }
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index a6115fe..16d5392 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -654,7 +654,7 @@ xfs_ialloc(
> xfs_inode_t *ip;
> uint flags;
> int error;
> - timespec_t tv;
> + struct inode_time tv;
>
> /*
> * Call the space management code to pick
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 205613a..092ee7c 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -956,7 +956,7 @@ xfs_vn_setattr(
> STATIC int
> xfs_vn_update_time(
> struct inode *inode,
> - struct timespec *now,
> + struct inode_time *now,
> int flags)
> {
> struct xfs_inode *ip = XFS_I(inode);
> diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
> index 50c3f56..bae2520 100644
> --- a/fs/xfs/xfs_trans_inode.c
> +++ b/fs/xfs/xfs_trans_inode.c
> @@ -70,7 +70,7 @@ xfs_trans_ichgtime(
> int flags)
> {
> struct inode *inode = VFS_I(ip);
> - timespec_t tv;
> + struct inode_time tv;
>
> ASSERT(tp);
> ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> @@ -78,13 +78,13 @@ xfs_trans_ichgtime(
> tv = current_fs_time(inode->i_sb);
>
> if ((flags & XFS_ICHGTIME_MOD) &&
> - !timespec_equal(&inode->i_mtime, &tv)) {
> + !inode_time_equal(&inode->i_mtime, &tv)) {
> inode->i_mtime = tv;
> ip->i_d.di_mtime.t_sec = tv.tv_sec;
> ip->i_d.di_mtime.t_nsec = tv.tv_nsec;
> }
The problem I see here is that the code is now potentially stuffing
a variable that is larger than 32 bits into on on-disk structure
that is only 32 bits in size. You can't just change the in-memory
representation of inode timestamps and expect the problem to be
fixed - this just pushes the problem down a layer without any
intrastructure allowing filesystems to handle storage of the new
timestamp format sanely.
IOWs, the filesystem has to be able to reject any attempt to set a
timestamp that is can't represent on disk otherwise Bad Stuff will
happen, and filesystems have to be able to specify in their on
disk format what timestamp encoding is being used. The solution will
be different for every filesystem that needs to support time beyond
2038.
Hence I think you are going to need superblock flags and/or
variables to indicate the epoch range the fielsystem can support.
Then the fileystems need conversion functions from whatever the
internal VFS timestamp representation is to whatever their on-disk
format is, and only then can we switch the VFS to using a new
timestamp format.
At that point, filesystem developers can make the changes they need
to the on-disk format to support timestamps beyond 2038, and all
they need to do at the VFS layer is set the "supported range" fields
appropriately in the VFS superblock...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 0:37 ` Dave Chinner
@ 2014-05-31 0:41 ` H. Peter Anvin
2014-05-31 1:14 ` Dave Chinner
0 siblings, 1 reply; 71+ messages in thread
From: H. Peter Anvin @ 2014-05-31 0:41 UTC (permalink / raw)
To: Dave Chinner, Arnd Bergmann
Cc: linux-arch, linux-kernel, xfs, hch, john.stultz, lftan,
linux-fsdevel, geert, tglx, joseph
On 05/30/2014 05:37 PM, Dave Chinner wrote:
>
> IOWs, the filesystem has to be able to reject any attempt to set a
> timestamp that is can't represent on disk otherwise Bad Stuff will
> happen,
Actually it is questionable if it is worse to reject a timestamp or just
let it wrap. Rejecting a valid timestamp is a bit like "You don't
exist, go away."
> and filesystems have to be able to specify in their on
> disk format what timestamp encoding is being used. The solution will
> be different for every filesystem that needs to support time beyond
> 2038.
Actually the cutoff can be really different for each filesystem, not
necessarily 2038. However, I maintain the above still holds.
Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What
would you have expected such a filesystem to do on Jan 1, 2000?
-hpa
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 0:41 ` H. Peter Anvin
@ 2014-05-31 1:14 ` Dave Chinner
2014-05-31 1:22 ` H. Peter Anvin
2014-05-31 15:37 ` Arnd Bergmann
0 siblings, 2 replies; 71+ messages in thread
From: Dave Chinner @ 2014-05-31 1:14 UTC (permalink / raw)
To: H. Peter Anvin
Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz,
lftan, linux-fsdevel, geert, tglx, joseph
On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> On 05/30/2014 05:37 PM, Dave Chinner wrote:
> >
> > IOWs, the filesystem has to be able to reject any attempt to set a
> > timestamp that is can't represent on disk otherwise Bad Stuff will
> > happen,
>
> Actually it is questionable if it is worse to reject a timestamp or just
> let it wrap. Rejecting a valid timestamp is a bit like "You don't
> exist, go away."
I think having the new systems calls being able to
return EINVAL if the value cannot be stored permanently on disk
correctly is the right thing to do. Having it silently mangled
by the filesystem and returning "everything is just fine, trust me"
is close to the worst solution I can think of. That's exactly what
leads to overflow bugs occurring....
> > and filesystems have to be able to specify in their on
> > disk format what timestamp encoding is being used. The solution will
> > be different for every filesystem that needs to support time beyond
> > 2038.
>
> Actually the cutoff can be really different for each filesystem, not
> necessarily 2038. However, I maintain the above still holds.
Sure, but all filesystems are supposed to handle at least the
current unix epoch.
> Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What
> would you have expected such a filesystem to do on Jan 1, 2000?
Strawman.
We don't need to cater for fundamentally broken designs that can't
even handle the current unix epoch correctly. If such filesystems
exist, then they can simple say "original unix epoch support only"
and do whatever crap they are doing right now.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 1:14 ` Dave Chinner
@ 2014-05-31 1:22 ` H. Peter Anvin
2014-05-31 5:54 ` Dave Chinner
2014-05-31 15:37 ` Arnd Bergmann
1 sibling, 1 reply; 71+ messages in thread
From: H. Peter Anvin @ 2014-05-31 1:22 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz,
lftan, linux-fsdevel, geert, tglx, joseph
No, not a strawman. Replace with Jan 26, 2038 and you have the same situation.
On May 30, 2014 6:14:50 PM PDT, Dave Chinner <david@fromorbit.com> wrote:
>On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
>> On 05/30/2014 05:37 PM, Dave Chinner wrote:
>> >
>> > IOWs, the filesystem has to be able to reject any attempt to set a
>> > timestamp that is can't represent on disk otherwise Bad Stuff will
>> > happen,
>>
>> Actually it is questionable if it is worse to reject a timestamp or
>just
>> let it wrap. Rejecting a valid timestamp is a bit like "You don't
>> exist, go away."
>
>I think having the new systems calls being able to
>return EINVAL if the value cannot be stored permanently on disk
>correctly is the right thing to do. Having it silently mangled
>by the filesystem and returning "everything is just fine, trust me"
>is close to the worst solution I can think of. That's exactly what
>leads to overflow bugs occurring....
>
>> > and filesystems have to be able to specify in their on
>> > disk format what timestamp encoding is being used. The solution
>will
>> > be different for every filesystem that needs to support time beyond
>> > 2038.
>>
>> Actually the cutoff can be really different for each filesystem, not
>> necessarily 2038. However, I maintain the above still holds.
>
>Sure, but all filesystems are supposed to handle at least the
>current unix epoch.
>
>> Consider a filesystem that kept timestamps in YYMMDDHHMMSS format.
>What
>> would you have expected such a filesystem to do on Jan 1, 2000?
>
>Strawman.
>
>We don't need to cater for fundamentally broken designs that can't
>even handle the current unix epoch correctly. If such filesystems
>exist, then they can simple say "original unix epoch support only"
>and do whatever crap they are doing right now.
>
>Cheers,
>
>Dave.
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 1:22 ` H. Peter Anvin
@ 2014-05-31 5:54 ` Dave Chinner
2014-05-31 8:41 ` H. Peter Anvin
2014-06-02 14:00 ` Joseph S. Myers
0 siblings, 2 replies; 71+ messages in thread
From: Dave Chinner @ 2014-05-31 5:54 UTC (permalink / raw)
To: H. Peter Anvin
Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz,
lftan, linux-fsdevel, geert, tglx, joseph
[ Please don't top post. ]
On Fri, May 30, 2014 at 06:22:55PM -0700, H. Peter Anvin wrote:
> On May 30, 2014 6:14:50 PM PDT, Dave Chinner <david@fromorbit.com> wrote:
> >On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> >> On 05/30/2014 05:37 PM, Dave Chinner wrote:
> >> >
> >> > IOWs, the filesystem has to be able to reject any attempt to
> >> > set a timestamp that is can't represent on disk otherwise Bad
> >> > Stuff will happen,
> >>
> >> Actually it is questionable if it is worse to reject a
> >> timestamp or
> >just
> >> let it wrap. Rejecting a valid timestamp is a bit like "You
> >> don't exist, go away."
> >
> >I think having the new systems calls being able to return EINVAL
> >if the value cannot be stored permanently on disk correctly is
> >the right thing to do. Having it silently mangled by the
> >filesystem and returning "everything is just fine, trust me" is
> >close to the worst solution I can think of. That's exactly what
> >leads to overflow bugs occurring....
> >
> >> > and filesystems have to be able to specify in their on disk
> >> > format what timestamp encoding is being used. The solution
> >will
> >> > be different for every filesystem that needs to support time
> >> > beyond 2038.
> >>
> >> Actually the cutoff can be really different for each
> >> filesystem, not necessarily 2038. However, I maintain the
> >> above still holds.
> >
> >Sure, but all filesystems are supposed to handle at least the
> >current unix epoch.
> >
> >> Consider a filesystem that kept timestamps in YYMMDDHHMMSS
> >> format.
> >What
> >> would you have expected such a filesystem to do on Jan 1, 2000?
> >
> >Strawman.
> >
> >We don't need to cater for fundamentally broken designs that
> >can't even handle the current unix epoch correctly. If such
> >filesystems exist, then they can simple say "original unix epoch
> >support only" and do whatever crap they are doing right now.
>
> No, not a strawman. Replace with Jan 26, 2038 and you have the
> same situation.
But that's not the problem I'm talking about. The problem isn't the
roll-over date of the epoch - the problem is that we're changing the
in-memory meaning of time without changing what the filesystems
store on disk or how they translate them.
To use your example, what I'm actually talking about is the kernel
switching to CCYYMMDDHHMMSS while the filesystem has YYMMDDHHMMSS on
disk. The filesystem doesn't know the timestamp is now a different
format, so it could mangle it writing it to disk, or it could mangle
existing timestamps in the YY.. format reading them from disk and
putting them into CC.. format structures. IOWs, it will
incorrectly translate YY format dates to CC format, or translate
something in the CC format as though it was in YY format. And it
wouldn't even know what was the correct format because there's
nothing telling it on disk whether the date is in CC or YY format.
Either way, you get mangled timestamps, the filesystem doesn't know
about it because it's just storing what the kernel gives it, the
kernel thinks they are fine because they are just opaque when read
back, but the user says "what the fuck did a reboot do to all these
timestamps?".
Hence your example of roll-over dates is a strawman - you've
constructed a problem that is irrelevant to the issue being pointed
out.
FWIW, we already have code in the superblock and VFS to avoid such
problems on filesystems with limited timestamp resolution (i.e
s_time_gran and current_fs_time()) so that what the VFS hands the
filesystem is exactly what the VFS expects to get back from disk
when comparing timestamps.
If we are changing the in-kernel timestamp to have a greater dynamic
range that anything we current support on disk, then we need support
for all filesystems for similar translation and constraint. The
filesystems need to be able to tell the kernel what they timestamp
range they support, and then the kernel needs to follow those
guidelines. And if the filesystem is mounted on a kernel that
doesn't support the current filesystem's timestamp format, then at
minimum that filesystem cannot do anything that writes a
timestamp....
Put simply: the filesystem defines the timestamp range that can be
used safely, not the userspace API. If the filesystem can't support
the date it is handed then that is an out-of-range error. Since
when have we accepted that it's OK to handle out-of-range data with
silent overflows or corruption of the data that we are attempting to
store? We're defining a new API to support a wider date range -
there is nothing that prevents us from saying ERANGE can be returned
to a timestamp that the file cannot store correctly....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 5:54 ` Dave Chinner
@ 2014-05-31 8:41 ` H. Peter Anvin
2014-05-31 15:46 ` Nicolas Pitre
2014-06-01 0:39 ` Dave Chinner
2014-06-02 14:00 ` Joseph S. Myers
1 sibling, 2 replies; 71+ messages in thread
From: H. Peter Anvin @ 2014-05-31 8:41 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz,
lftan, linux-fsdevel, geert, tglx, joseph
On 05/30/2014 10:54 PM, Dave Chinner wrote:
>
> If we are changing the in-kernel timestamp to have a greater dynamic
> range that anything we current support on disk, then we need support
> for all filesystems for similar translation and constraint. The
> filesystems need to be able to tell the kernel what they timestamp
> range they support, and then the kernel needs to follow those
> guidelines. And if the filesystem is mounted on a kernel that
> doesn't support the current filesystem's timestamp format, then at
> minimum that filesystem cannot do anything that writes a
> timestamp....
>
> Put simply: the filesystem defines the timestamp range that can be
> used safely, not the userspace API. If the filesystem can't support
> the date it is handed then that is an out-of-range error. Since
> when have we accepted that it's OK to handle out-of-range data with
> silent overflows or corruption of the data that we are attempting to
> store? We're defining a new API to support a wider date range -
> there is nothing that prevents us from saying ERANGE can be returned
> to a timestamp that the file cannot store correctly....
>
I'm still puzzled.
Are you saying that you want a program that does:
/* Deliberately simplified */
gettimeofdayns(&now ...);
utimensat(... now);
... to suddenly start failing on Jan 19, 2038 (for a filesystem with
32-bit timestamps), or would you propose some ways for the filesystems
in question to extend the range of the timestamps?
What you seem to propose also seems to imply that on Jan 19, 2038
anything that writes a timestamp with the current date (which logically
ends up being almost every write operation) would be dead and frozen on
such a filesystem -- pretty much meaning the filesystem would become
readonly if not in reality than in practice.
I strongly suspect that that would be a more catastrophic failure than
incorrect timestamps, as you suddenly have all kinds of machines
embedded in $DEITY knows what places just stop and refuse to run.
If that is not what you mean I genuinely like to understand the
situation better.
-hpa
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann
2014-05-30 20:01 ` [RFC 11/32] xfs: convert to struct inode_time Arnd Bergmann
@ 2014-05-31 14:30 ` Vyacheslav Dubeyko
2014-06-03 12:21 ` Arnd Bergmann
2014-05-31 14:51 ` Richard Cochran
2014-06-02 13:52 ` Joseph S. Myers
3 siblings, 1 reply; 71+ messages in thread
From: Vyacheslav Dubeyko @ 2014-05-31 14:30 UTC (permalink / raw)
To: Arnd Bergmann
Cc: hch, linux-mtd, hpa, logfs, linux-afs, joseph, linux-arch,
linux-cifs, linux-scsi, ceph-devel, codalist, cluster-devel, coda,
geert, linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz,
tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel,
linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
Hi Arnd,
On Fri, 2014-05-30 at 22:01 +0200, Arnd Bergmann wrote:
[snip]
>
> Arnd Bergmann (32):
> fs: introduce new 'struct inode_time'
> uapi: add struct __kernel_timespec{32,64}
> fs: introduce sys_utimens64at
> fs: introduce sys_newfstat64/sys_newfstatat64
> arch: hook up new stat and utimes syscalls
> isofs: fix timestamps beyond 2027
> fs/nfs: convert to struct inode_time
> fs/ceph: convert to 'struct inode_time'
> fs/pstore: convert to struct inode_time
> fs/coda: convert to struct inode_time
> xfs: convert to struct inode_time
> btrfs: convert to struct inode_time
> ext3: convert to struct inode_time
> ext4: convert to struct inode_time
> cifs: convert to struct inode_time
> ntfs: convert to struct inode_time
> ubifs: convert to struct inode_time
> ocfs2: convert to struct inode_time
> fs/fat: convert to struct inode_time
> afs: convert to struct inode_time
> udf: convert to struct inode_time
> fs: convert simple fs to inode_time
> logfs: convert to struct inode_time
> hfs, hfsplus: convert to struct inode_time
> gfs2: convert to struct inode_time
> reiserfs: convert to struct inode_time
> jffs2: convert to struct inode_time
> adfs: convert to struct inode_time
> f2fs: convert to struct inode_time
> fuse: convert to struct inode_time
> scsi: fnic: use current_kernel_time() for timestamp
> fs: use new inode_time definition unconditionally
>
By the way, what about NILFS2? Is NILFS2 ready for suggested approach
without any changes?
Thanks,
Vyacheslav Dubeyko.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann
2014-05-30 20:01 ` [RFC 11/32] xfs: convert to struct inode_time Arnd Bergmann
2014-05-31 14:30 ` [RFC 00/32] making inode time stamps y2038 ready Vyacheslav Dubeyko
@ 2014-05-31 14:51 ` Richard Cochran
[not found] ` <6347520.8jMPlVsFjM@wuerfel>
2014-06-02 13:52 ` Joseph S. Myers
3 siblings, 1 reply; 71+ messages in thread
From: Richard Cochran @ 2014-05-31 14:51 UTC (permalink / raw)
To: Arnd Bergmann
Cc: hch, linux-mtd, hpa, linux-f2fs-devel, ceph-devel, joseph,
linux-arch, linux-cifs, linux-scsi, codalist, cluster-devel, coda,
geert, linux-ext4, linux-afs, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan,
ocfs2-devel
On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote:
>
> I picked this because it is a fairly isolated problem, as the
> inode time stamps are rarely assigned to any other time values.
> As a byproduct of this work, I documented for each of the file
> systems we support how long the on-disk format can work[1].
Why are some of the time stamp expiration dates marked as "never"?
Thanks,
Richard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 1:14 ` Dave Chinner
2014-05-31 1:22 ` H. Peter Anvin
@ 2014-05-31 15:37 ` Arnd Bergmann
2014-06-01 0:24 ` Dave Chinner
1 sibling, 1 reply; 71+ messages in thread
From: Arnd Bergmann @ 2014-05-31 15:37 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Saturday 31 May 2014 11:14:50 Dave Chinner wrote:
> On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> > On 05/30/2014 05:37 PM, Dave Chinner wrote:
> > >
> > > IOWs, the filesystem has to be able to reject any attempt to set a
> > > timestamp that is can't represent on disk otherwise Bad Stuff will
> > > happen,
> >
> > Actually it is questionable if it is worse to reject a timestamp or just
> > let it wrap. Rejecting a valid timestamp is a bit like "You don't
> > exist, go away."
>
> I think having the new systems calls being able to
> return EINVAL if the value cannot be stored permanently on disk
> correctly is the right thing to do. Having it silently mangled
> by the filesystem and returning "everything is just fine, trust me"
> is close to the worst solution I can think of. That's exactly what
> leads to overflow bugs occurring....
While going through the file systems, I was wondering whether
we should have the times stop at the end of each file systems
epoch rather than wrap around.
> > > and filesystems have to be able to specify in their on
> > > disk format what timestamp encoding is being used. The solution will
> > > be different for every filesystem that needs to support time beyond
> > > 2038.
> >
> > Actually the cutoff can be really different for each filesystem, not
> > necessarily 2038. However, I maintain the above still holds.
>
> Sure, but all filesystems are supposed to handle at least the
> current unix epoch.
In my list at http://kernelnewbies.org/y2038, I found that almost
all file systems at least times until 2106, because they treat
the on-disk value as unsigned on 64-bit systems, or they use
a completely different representation. My guess is that somebody
earlier spent a lot of work on making that happen.
The exceptions are:
* exofs uses signed values, which can probably be changed to be
consistent with the others.
* isofs has a bug that limits it until 2027 on architectures with
a signed 'char' type (otherwise it's 2155).
* udf can represent times for many thousands of years through a
16-bit year representation, but the code to convert to epoch
uses a const array that ends at 2038.
* afs uses signed seconds and can probably be fixed
* coda relies on user space time representation getting passed
through an ioctl.
* I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
where they really use signed.
I was confused about XFS since I didn't noticed that there are
separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
XFS to also use the 1970-2106 time range on 64-bit systems today.
If we are using the variant of my patch that extends
indode_time->tv_sec to s64, nothing should change for XFS
at all, the main difference is that we if it gets extended
to wider on-disk timestamps, they will work the same way on
32-bit and 64-bit kernels.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 8:41 ` H. Peter Anvin
@ 2014-05-31 15:46 ` Nicolas Pitre
2014-06-01 19:56 ` Arnd Bergmann
2014-06-01 0:39 ` Dave Chinner
1 sibling, 1 reply; 71+ messages in thread
From: Nicolas Pitre @ 2014-05-31 15:46 UTC (permalink / raw)
To: H. Peter Anvin
Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz,
lftan, linux-fsdevel, geert, tglx, joseph
On Sat, 31 May 2014, H. Peter Anvin wrote:
> On 05/30/2014 10:54 PM, Dave Chinner wrote:
> >
> > If we are changing the in-kernel timestamp to have a greater dynamic
> > range that anything we current support on disk, then we need support
> > for all filesystems for similar translation and constraint. The
> > filesystems need to be able to tell the kernel what they timestamp
> > range they support, and then the kernel needs to follow those
> > guidelines. And if the filesystem is mounted on a kernel that
> > doesn't support the current filesystem's timestamp format, then at
> > minimum that filesystem cannot do anything that writes a
> > timestamp....
> >
> > Put simply: the filesystem defines the timestamp range that can be
> > used safely, not the userspace API. If the filesystem can't support
> > the date it is handed then that is an out-of-range error. Since
> > when have we accepted that it's OK to handle out-of-range data with
> > silent overflows or corruption of the data that we are attempting to
> > store? We're defining a new API to support a wider date range -
> > there is nothing that prevents us from saying ERANGE can be returned
> > to a timestamp that the file cannot store correctly....
> >
>
> I'm still puzzled.
>
> Are you saying that you want a program that does:
>
> /* Deliberately simplified */
> gettimeofdayns(&now ...);
> utimensat(... now);
>
> ... to suddenly start failing on Jan 19, 2038 (for a filesystem with
> 32-bit timestamps), or would you propose some ways for the filesystems
> in question to extend the range of the timestamps?
>
> What you seem to propose also seems to imply that on Jan 19, 2038
> anything that writes a timestamp with the current date (which logically
> ends up being almost every write operation) would be dead and frozen on
> such a filesystem -- pretty much meaning the filesystem would become
> readonly if not in reality than in practice.
For those (legacy) filesystems with a signed 32-bit timestamps, any
attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
(silently) clamped to 0x7fffffff and that value (the last representable
time) used as an overflow indicator. The filesystem driver should
convert that value into a corresponding overflow value for whatever
kernel internal time representation being used when read back, and this
should be propagated up to user space. It should not be a hard error
otherwise, as you rightfully stated, everything non read-only would come
to a halt on that day.
Inside the kernel, the overflow indicator could be as simple as
dedicating one of the top bit in a 64-bit time_t value in order to still
transmit the overflow limit. For example, in the above case, we could
use 0x40000000-7fffffff to indicate the actual time is unavailable due
to the filesystem's time representation being overflowed from
0x7fffffff.
If for example a filesystem cannot represent timestamps from Jan 1
00:00:00 2100 UTC then the overflow representation for this particular
filesystem would be 0x40000000-f48656ff.
Those syscalls with a 32-bit time_t would be returned 0x7fffffff
whenever there is an overflow being signaled. Whether 64-bit
overflow-marked time_t values, when passed to user space, should clear
the overflow bit, or use a unique time_t overflow value, could be
decided and even changed later after discussion with glibc people for
example.
Hard errors should be signaled to user space, and the actual operation
aborted, only with the presence of a new flag passed to the kernel.
However, by default, things should "just work" albeit with the "wrong"
i.e clamped time being saved on disk as much as possible otherwise.
Nicolas
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
[not found] ` <6347520.8jMPlVsFjM@wuerfel>
@ 2014-05-31 16:20 ` Geert Uytterhoeven
2014-05-31 18:22 ` Richard Cochran
2014-06-01 4:44 ` Richard Cochran
2 siblings, 0 replies; 71+ messages in thread
From: Geert Uytterhoeven @ 2014-05-31 16:20 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Christoph Hellwig, MTD Maling List, H. Peter Anvin,
linux-f2fs-devel, ceph-devel, Joseph S. Myers, Linux-Arch,
linux-cifs, scsi, codalist, cluster-devel, coda,
linux-ext4@vger.kernel.org, linux-afs, fuse-devel,
Richard Cochran, reiserfs-devel, xfs, John Stultz,
Thomas Gleixner, open list:NFS, SUNRPC, AND..., linux-ntfs-dev,
samba-technical, linux-kernel@vger.kernel.org, logfs, linux-btrfs,
Linux FS Devel, Ley Foon Tan, ocfs2-devel
On Sat, May 31, 2014 at 5:23 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Saturday 31 May 2014 16:51:15 Richard Cochran wrote:
>> On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote:
>> > I picked this because it is a fairly isolated problem, as the
>> > inode time stamps are rarely assigned to any other time values.
>> > As a byproduct of this work, I documented for each of the file
>> > systems we support how long the on-disk format can work[1].
>>
>> Why are some of the time stamp expiration dates marked as "never"?
>
> It's an approximation:
> with 64-bit timestamps, you can represent close to 300 billion
> years, which is way past the time that our planet can sustain
> life of any form[1].
FWIW, the 48-bit second limit of befs marked never happens sooner
than the 32-bit day limit of affs marked as Y11760870.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
[not found] ` <6347520.8jMPlVsFjM@wuerfel>
2014-05-31 16:20 ` Geert Uytterhoeven
@ 2014-05-31 18:22 ` Richard Cochran
2014-05-31 19:34 ` H. Peter Anvin
2014-06-01 4:44 ` Richard Cochran
2 siblings, 1 reply; 71+ messages in thread
From: Richard Cochran @ 2014-05-31 18:22 UTC (permalink / raw)
To: Arnd Bergmann
Cc: hch, linux-mtd, hpa, linux-f2fs-devel, ceph-devel, joseph,
linux-arch, linux-cifs, linux-scsi, codalist, cluster-devel, coda,
geert, linux-ext4, linux-afs, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan,
ocfs2-devel
On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote:
>
> It's an approximation:
(Approximately never ;)
> with 64-bit timestamps, you can represent close to 300 billion
> years, which is way past the time that our planet can sustain
> life of any form[1].
Did you mean mean 64 bits worth of seconds?
2^64 / (3600*24*365) = 584,942,417,355
That is more than 300 billion years, and still, it is not quite the
same as "never".
In any case, that term is not too helpful in the comparison table,
IMHO. One could think that some sort of clever running count relative
to the last mount time was implied.
Thanks,
Richard
[1] You are forgetting the immortal robotic overlords.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-05-31 18:22 ` Richard Cochran
@ 2014-05-31 19:34 ` H. Peter Anvin
2014-06-01 4:46 ` Richard Cochran
0 siblings, 1 reply; 71+ messages in thread
From: H. Peter Anvin @ 2014-05-31 19:34 UTC (permalink / raw)
To: Richard Cochran, Arnd Bergmann
Cc: hch, linux-mtd, linux-f2fs-devel, ceph-devel, joseph, linux-arch,
linux-cifs, linux-scsi, linux-afs, cluster-devel, coda, geert,
linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan,
ocfs2-devel
Typically they are using 64-bit signed seconds.
On May 31, 2014 11:22:37 AM PDT, Richard Cochran <richardcochran@gmail.com> wrote:
>On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote:
>>
>> It's an approximation:
>
>(Approximately never ;)
>
>> with 64-bit timestamps, you can represent close to 300 billion
>> years, which is way past the time that our planet can sustain
>> life of any form[1].
>
>Did you mean mean 64 bits worth of seconds?
>
> 2^64 / (3600*24*365) = 584,942,417,355
>
>That is more than 300 billion years, and still, it is not quite the
>same as "never".
>
>In any case, that term is not too helpful in the comparison table,
>IMHO. One could think that some sort of clever running count relative
>to the last mount time was implied.
>
>Thanks,
>Richard
>
>[1] You are forgetting the immortal robotic overlords.
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 15:37 ` Arnd Bergmann
@ 2014-06-01 0:24 ` Dave Chinner
2014-06-02 0:28 ` Dave Chinner
0 siblings, 1 reply; 71+ messages in thread
From: Dave Chinner @ 2014-06-01 0:24 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> On Saturday 31 May 2014 11:14:50 Dave Chinner wrote:
> > On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> > > On 05/30/2014 05:37 PM, Dave Chinner wrote:
> > > >
> > > > IOWs, the filesystem has to be able to reject any attempt to set a
> > > > timestamp that is can't represent on disk otherwise Bad Stuff will
> > > > happen,
> > >
> > > Actually it is questionable if it is worse to reject a timestamp or just
> > > let it wrap. Rejecting a valid timestamp is a bit like "You don't
> > > exist, go away."
> >
> > I think having the new systems calls being able to
> > return EINVAL if the value cannot be stored permanently on disk
> > correctly is the right thing to do. Having it silently mangled
> > by the filesystem and returning "everything is just fine, trust me"
> > is close to the worst solution I can think of. That's exactly what
> > leads to overflow bugs occurring....
>
> While going through the file systems, I was wondering whether
> we should have the times stop at the end of each file systems
> epoch rather than wrap around.
>
> > > > and filesystems have to be able to specify in their on
> > > > disk format what timestamp encoding is being used. The solution will
> > > > be different for every filesystem that needs to support time beyond
> > > > 2038.
> > >
> > > Actually the cutoff can be really different for each filesystem, not
> > > necessarily 2038. However, I maintain the above still holds.
> >
> > Sure, but all filesystems are supposed to handle at least the
> > current unix epoch.
>
> In my list at http://kernelnewbies.org/y2038, I found that almost
> all file systems at least times until 2106, because they treat
> the on-disk value as unsigned on 64-bit systems, or they use
> a completely different representation. My guess is that somebody
> earlier spent a lot of work on making that happen.
>
> The exceptions are:
>
> * exofs uses signed values, which can probably be changed to be
> consistent with the others.
> * isofs has a bug that limits it until 2027 on architectures with
> a signed 'char' type (otherwise it's 2155).
> * udf can represent times for many thousands of years through a
> 16-bit year representation, but the code to convert to epoch
> uses a const array that ends at 2038.
> * afs uses signed seconds and can probably be fixed
> * coda relies on user space time representation getting passed
> through an ioctl.
> * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> where they really use signed.
>
> I was confused about XFS since I didn't noticed that there are
> separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> XFS to also use the 1970-2106 time range on 64-bit systems today.
You've missed an awful lot more than just the implications for the
core kernel code.
There's a good chance such changes propagate to APIs elsewhere in
the filesystems, because something you haven't realised is that XFS
effectively exposes the on-disk timestamp format directly to
userspace via the bulkstat interface (see struct xfs_bstat). It also
affects the XFS open-by-handle ioctl and the swap extent ioctl used
by the online defragmenter.
IOWs, if we are changing the on-disk timestamp format then this
affects several ioctl()s and hence quite a few of the XFS userspace
utilities. The hardest to fix will be xfsdump which would need a new
dump format to store the extended timestamp ranges, and then
xfs_restore will need to be able to handle restoring such timestamps
on filesystems that don't have extended timestamp support...
Put simply, changing the structure of system time isn't as straight
forward as changing the kernel structures. System time gets stored
permanently, and that has a cascade effect through the kernel all
to all of the filesystem utilities that know about that permanent
storage in some way....
So yes, you can change the kernel definition, but until the
permanent storage of system time can be extended to support the same
range as the kernel the *system* will still have nasty, silent epoch
overflow, truncation or corruption issues.
> If we are using the variant of my patch that extends
> indode_time->tv_sec to s64, nothing should change for XFS
> at all, the main difference is that we if it gets extended
> to wider on-disk timestamps, they will work the same way on
> 32-bit and 64-bit kernels.
"nothing should change" except for the fact that a 64 bit timestamp
gets silently truncated to 32 bits and the timestamp is not what the
user expects it to be. The user does not find out until the inode
passes out of cache and is re-read from disk, and then it's wrong.
To put it politely: that is broken, obnoxious behaviour and we don't
design new interfaces with such ugly warts anymore. Define an
EOVERFLOW, EINVAL or ERANGE error in the new syscalls to handle this
case and *hard fail* if the storage cannot support the extended
timestamp being passed in. There is no excuse for silently mangling
out-of-range data, especially as we have plenty of time to add
support to the filesystems so that such errors don't occur. It might
take us a year to implement, but it will be done long before the
epoch overflows.
And, FWIW, this patchset needs a set of regression tests that ensure
timestamps beyond 2038 and 2106 don't change across unmount/mount.
Written for xfstests, preferably, so that it's run as part of every
filesystem developer's daily workflow. This is the only way we are
going to ensure that the filesystem and VFS code works correctly and
continues to work correctly up to the end of the current epoch....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 8:41 ` H. Peter Anvin
2014-05-31 15:46 ` Nicolas Pitre
@ 2014-06-01 0:39 ` Dave Chinner
1 sibling, 0 replies; 71+ messages in thread
From: Dave Chinner @ 2014-06-01 0:39 UTC (permalink / raw)
To: H. Peter Anvin
Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz,
lftan, linux-fsdevel, geert, tglx, joseph
On Sat, May 31, 2014 at 01:41:56AM -0700, H. Peter Anvin wrote:
> On 05/30/2014 10:54 PM, Dave Chinner wrote:
> >
> > If we are changing the in-kernel timestamp to have a greater dynamic
> > range that anything we current support on disk, then we need support
> > for all filesystems for similar translation and constraint. The
> > filesystems need to be able to tell the kernel what they timestamp
> > range they support, and then the kernel needs to follow those
> > guidelines. And if the filesystem is mounted on a kernel that
> > doesn't support the current filesystem's timestamp format, then at
> > minimum that filesystem cannot do anything that writes a
> > timestamp....
> >
> > Put simply: the filesystem defines the timestamp range that can be
> > used safely, not the userspace API. If the filesystem can't support
> > the date it is handed then that is an out-of-range error. Since
> > when have we accepted that it's OK to handle out-of-range data with
> > silent overflows or corruption of the data that we are attempting to
> > store? We're defining a new API to support a wider date range -
> > there is nothing that prevents us from saying ERANGE can be returned
> > to a timestamp that the file cannot store correctly....
> >
>
> I'm still puzzled.
>
> Are you saying that you want a program that does:
>
> /* Deliberately simplified */
> gettimeofdayns(&now ...);
> utimensat(... now);
>
> ... to suddenly start failing on Jan 19, 2038 (for a filesystem with
> 32-bit timestamps),
Yes. Hard fail so overflows are in your face and we know exactly
what is going to cause silent timestamp screwups when the epoch
> or would you propose some ways for the filesystems
> in question to extend the range of the timestamps?
Filesystems are going to have to change their on-disk formats, so
we'd do that just like we do every other on-disk format change. With
feature bits and translation layers, new ioctl structures, etc.
Depending on the amount of work necessary, some filesystems could do
this in 3.16, others it might be 3.20 before everything is sorted
out across the kernel and userspace code...
Either way, the hard fail problem goes away as each filesystem is
converted. Further, if we have regression tests then new filesystems
are guaranteed to be designed to handle 2038 epoch rollover, and so
in a year of two this "hard fail" is effectively a non-problem. If
someone breaks something in future, then we'll know about it pretty
quickly.
> What you seem to propose also seems to imply that on Jan 19, 2038
> anything that writes a timestamp with the current date (which logically
> ends up being almost every write operation) would be dead and frozen on
> such a filesystem -- pretty much meaning the filesystem would become
> readonly if not in reality than in practice.
Yup. If we can't do what the user wants without the user thinking
corruption has occurred, then the only thing we are left with is
"shut down the filesystem" error handling. Kind of like using BUG()
rather than returning an error. That's why we need to be able to
hard fail and return an error.
However, we've got 20+ years to fix our current filesystems and all
their support code to ensure this doesn't happen. In the mean time,
having stuff hard fail is a great way to ensure that filesystems get
fixed sooner rather than later...
> I strongly suspect that that would be a more catastrophic failure than
> incorrect timestamps, as you suddenly have all kinds of machines
> embedded in $DEITY knows what places just stop and refuse to run.
Yup, that's a great way of flushing out problems 20 years before
they really matter.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
[not found] ` <6347520.8jMPlVsFjM@wuerfel>
2014-05-31 16:20 ` Geert Uytterhoeven
2014-05-31 18:22 ` Richard Cochran
@ 2014-06-01 4:44 ` Richard Cochran
2 siblings, 0 replies; 71+ messages in thread
From: Richard Cochran @ 2014-06-01 4:44 UTC (permalink / raw)
To: Arnd Bergmann
Cc: hch, linux-mtd, hpa, linux-f2fs-devel, ceph-devel, joseph,
linux-arch, linux-cifs, linux-scsi, codalist, cluster-devel, coda,
geert, linux-ext4, linux-afs, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan,
ocfs2-devel
On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote:
> On Saturday 31 May 2014 16:51:15 Richard Cochran wrote:
> >
> > Why are some of the time stamp expiration dates marked as "never"?
>
> It's an approximation:
Also, the term "never" might mean using arbitrarily long integers
as in ASN.1.
Thanks,
Richard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-05-31 19:34 ` H. Peter Anvin
@ 2014-06-01 4:46 ` Richard Cochran
0 siblings, 0 replies; 71+ messages in thread
From: Richard Cochran @ 2014-06-01 4:46 UTC (permalink / raw)
To: H. Peter Anvin
Cc: hch, linux-mtd, linux-f2fs-devel, ceph-devel, joseph, linux-arch,
linux-cifs, linux-scsi, linux-afs, cluster-devel, coda, geert,
linux-ext4, codalist, Arnd Bergmann, fuse-devel, reiserfs-devel,
xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev,
samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel,
lftan, ocfs2-devel
On Sat, May 31, 2014 at 12:34:12PM -0700, H. Peter Anvin wrote:
> Typically they are using 64-bit signed seconds.
Okay, that is what I wanted to know.
Thanks,
Richard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 15:46 ` Nicolas Pitre
@ 2014-06-01 19:56 ` Arnd Bergmann
2014-06-01 20:26 ` H. Peter Anvin
2014-06-02 1:36 ` Nicolas Pitre
0 siblings, 2 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-01 19:56 UTC (permalink / raw)
To: Nicolas Pitre
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > readonly if not in reality than in practice.
>
> For those (legacy) filesystems with a signed 32-bit timestamps, any
> attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
> (silently) clamped to 0x7fffffff and that value (the last representable
> time) used as an overflow indicator. The filesystem driver should
> convert that value into a corresponding overflow value for whatever
> kernel internal time representation being used when read back, and this
> should be propagated up to user space. It should not be a hard error
> otherwise, as you rightfully stated, everything non read-only would come
> to a halt on that day.
I don't think there is much of a difference between not being able to
write at all and all newly written files having the same timestamp,
causing random things to break differently.
The clamp to the maximum supported time stamp sounds like a reasonable
choice for 'utimens' and related syscalls for the case of someone
setting an arbitrary future date beyond what the file system can
represent. Then again, I don't see a reason why that shouldn't just
cause an error to be returned.
For actually running kernels beyond 2038, the best idea I've seen so
far is to disallow all broken code at compile time. I don't see
a choice but to audit the entire kernel for invalid uses on both
32 and 64 bit in the next few years. A lot of code will get changed
in the process so we can actually keep running 32-bit kernels and
file systems, but other code will likely go away:
* any system calls that pass a time_t, timeval or timespec on
32-bit systems return -ENOSYS, to ensure all user land uses
the replacements we will put into place
* The definition of 'time_t', 'timval' and 'timespec' can be hidden
from the kernel, and all code using it left out.
* ext2 and ext3 file system code will have to be disabled, but that's
file since ext4 can mount old file systems.
* until xfs gets extended, we can also disiable it at build time.
For most users, we probably want to leave all that enabled by
default until we get much closer to 2038, but a compile time
option should allow us to test what works or doesn't, and it
can be set by embedded developers that want to ensure their
code keeps running for the next few decades.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-01 19:56 ` Arnd Bergmann
@ 2014-06-01 20:26 ` H. Peter Anvin
2014-06-02 11:02 ` Arnd Bergmann
2014-06-02 1:36 ` Nicolas Pitre
1 sibling, 1 reply; 71+ messages in thread
From: H. Peter Anvin @ 2014-06-01 20:26 UTC (permalink / raw)
To: Arnd Bergmann, Nicolas Pitre
Cc: linux-arch, linux-kernel, xfs, hch, john.stultz, lftan,
linux-fsdevel, geert, tglx, joseph
Perhaps we should make this a kernel command line option instead, with the settings: error out on outside the standard window, or a date indicating the earliest date that should be recognized and do windowing (0 for no windowing, 1970 for retconning the Unix epoch as unsigned...)
But again, the kernel is probably the least problem here...
On June 1, 2014 12:56:52 PM PDT, Arnd Bergmann <arnd@arndb.de> wrote:
>On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
>> > readonly if not in reality than in practice.
>>
>> For those (legacy) filesystems with a signed 32-bit timestamps, any
>> attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
>
>> (silently) clamped to 0x7fffffff and that value (the last
>representable
>> time) used as an overflow indicator. The filesystem driver should
>> convert that value into a corresponding overflow value for whatever
>> kernel internal time representation being used when read back, and
>this
>> should be propagated up to user space. It should not be a hard error
>
>> otherwise, as you rightfully stated, everything non read-only would
>come
>> to a halt on that day.
>
>I don't think there is much of a difference between not being able to
>write at all and all newly written files having the same timestamp,
>causing random things to break differently.
>
>The clamp to the maximum supported time stamp sounds like a reasonable
>choice for 'utimens' and related syscalls for the case of someone
>setting an arbitrary future date beyond what the file system can
>represent. Then again, I don't see a reason why that shouldn't just
>cause an error to be returned.
>
>For actually running kernels beyond 2038, the best idea I've seen so
>far is to disallow all broken code at compile time. I don't see
>a choice but to audit the entire kernel for invalid uses on both
>32 and 64 bit in the next few years. A lot of code will get changed
>in the process so we can actually keep running 32-bit kernels and
>file systems, but other code will likely go away:
>
>* any system calls that pass a time_t, timeval or timespec on
> 32-bit systems return -ENOSYS, to ensure all user land uses
> the replacements we will put into place
>* The definition of 'time_t', 'timval' and 'timespec' can be hidden
> from the kernel, and all code using it left out.
>* ext2 and ext3 file system code will have to be disabled, but that's
> file since ext4 can mount old file systems.
>* until xfs gets extended, we can also disiable it at build time.
>
>For most users, we probably want to leave all that enabled by
>default until we get much closer to 2038, but a compile time
>option should allow us to test what works or doesn't, and it
>can be set by embedded developers that want to ensure their
>code keeps running for the next few decades.
>
> Arnd
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-01 0:24 ` Dave Chinner
@ 2014-06-02 0:28 ` Dave Chinner
2014-06-02 11:35 ` Roger Willcocks
2014-06-02 11:43 ` Arnd Bergmann
0 siblings, 2 replies; 71+ messages in thread
From: Dave Chinner @ 2014-06-02 0:28 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > In my list at http://kernelnewbies.org/y2038, I found that almost
> > all file systems at least times until 2106, because they treat
> > the on-disk value as unsigned on 64-bit systems, or they use
> > a completely different representation. My guess is that somebody
> > earlier spent a lot of work on making that happen.
> >
> > The exceptions are:
> >
> > * exofs uses signed values, which can probably be changed to be
> > consistent with the others.
> > * isofs has a bug that limits it until 2027 on architectures with
> > a signed 'char' type (otherwise it's 2155).
> > * udf can represent times for many thousands of years through a
> > 16-bit year representation, but the code to convert to epoch
> > uses a const array that ends at 2038.
> > * afs uses signed seconds and can probably be fixed
> > * coda relies on user space time representation getting passed
> > through an ioctl.
> > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > where they really use signed.
> >
> > I was confused about XFS since I didn't noticed that there are
> > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > XFS to also use the 1970-2106 time range on 64-bit systems today.
>
> You've missed an awful lot more than just the implications for the
> core kernel code.
>
> There's a good chance such changes propagate to APIs elsewhere in
> the filesystems, because something you haven't realised is that XFS
> effectively exposes the on-disk timestamp format directly to
> userspace via the bulkstat interface (see struct xfs_bstat). It also
> affects the XFS open-by-handle ioctl and the swap extent ioctl used
> by the online defragmenter.
>
> IOWs, if we are changing the on-disk timestamp format then this
> affects several ioctl()s and hence quite a few of the XFS userspace
> utilities. The hardest to fix will be xfsdump which would need a new
> dump format to store the extended timestamp ranges, and then
> xfs_restore will need to be able to handle restoring such timestamps
> on filesystems that don't have extended timestamp support...
>
> Put simply, changing the structure of system time isn't as straight
> forward as changing the kernel structures. System time gets stored
> permanently, and that has a cascade effect through the kernel all
> to all of the filesystem utilities that know about that permanent
> storage in some way....
>
> So yes, you can change the kernel definition, but until the
> permanent storage of system time can be extended to support the same
> range as the kernel the *system* will still have nasty, silent epoch
> overflow, truncation or corruption issues.
Just to put that in context, here's the kernel patch to add extended
epoch support to XFS. It's completely untested as I haven't done any
userspace code changes to enable the feature. However, it should
give you an indication of how far the simple act of changing the
kernel time representation spread through the filesystem. This does
not include any of the VFS infrastructure to specifying the range of
supported timestamps. It survives some smoke testing, but dies when
the online defragmenter starts using the bulkstat and swap extent
ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
probably don't have that all sorted correctly yet...
To test extended epoch support, however, I need to some fstests that
define and validate the behaviour of the new syscalls - until we get
those we can't validate that the filesystem follows the spec
properly. I also suspect we are going to need an interface to query
the supported range of timestamps from a filesystem so that we can
test boundary conditions in an automated fashion....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
xfs: support timestamps beyond Unix epochs
From: Dave Chinner <dchinner@redhat.com>
The 32 bit second counters in timestamps are too small to represent
time beyond the unix epoch (jan 2038) correctly. Extend the on-disk
format for a timestamp to include an 8-bit epoch counter so that we
can extend time for up to 255 Unix epochs. This should be good for
representing timestamps from 1970 to somewhere around 19,000 A.D....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/time.h | 7 ------
fs/xfs/xfs_bmap_util.c | 35 +++++++++++++++++-----------
fs/xfs/xfs_dinode.h | 48 ++++++++++++++++++++++++++++++++++++++-
fs/xfs/xfs_fs.h | 9 +++++++-
fs/xfs/xfs_fsops.c | 5 +++-
fs/xfs/xfs_inode.c | 16 ++++++++++---
fs/xfs/xfs_inode_buf.c | 8 +++++++
fs/xfs/xfs_ioctl32.c | 3 +++
fs/xfs/xfs_ioctl32.h | 5 +++-
fs/xfs/xfs_iops.c | 59 +++++++++++++++++++++++++++++++-----------------
fs/xfs/xfs_itable.c | 12 ++++++++++
fs/xfs/xfs_log_format.h | 4 ++++
fs/xfs/xfs_sb.h | 12 +++++++++-
fs/xfs/xfs_trans_inode.c | 2 +-
14 files changed, 175 insertions(+), 50 deletions(-)
diff --git a/fs/xfs/time.h b/fs/xfs/time.h
index 387e695..9f38d60 100644
--- a/fs/xfs/time.h
+++ b/fs/xfs/time.h
@@ -21,16 +21,9 @@
#include <linux/sched.h>
#include <linux/time.h>
-typedef struct timespec timespec_t;
-
static inline void delay(long ticks)
{
schedule_timeout_uninterruptible(ticks);
}
-static inline void nanotime(struct timespec *tvp)
-{
- *tvp = CURRENT_TIME;
-}
-
#endif /* __XFS_SUPPORT_TIME_H__ */
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 703b3ec..dbc9a74 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1686,6 +1686,7 @@ xfs_swap_extents(
int aforkblks = 0;
int taforkblks = 0;
__uint64_t tmp;
+ struct timespec tv;
tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
if (!tempifp) {
@@ -1746,25 +1747,33 @@ xfs_swap_extents(
}
/*
- * Compare the current change & modify times with that
- * passed in. If they differ, we abort this swap.
- * This is the mechanism used to ensure the calling
- * process that the file was not changed out from
+ * Compare the current change & modify times with that passed in. If
+ * they differ, we abort this swap. This is the mechanism used to
+ * ensure the calling process that the file was not changed out from
* under it.
*/
- if ((sbp->bs_ctime.tv_sec != VFS_I(ip)->i_ctime.tv_sec) ||
- (sbp->bs_ctime.tv_nsec != VFS_I(ip)->i_ctime.tv_nsec) ||
- (sbp->bs_mtime.tv_sec != VFS_I(ip)->i_mtime.tv_sec) ||
- (sbp->bs_mtime.tv_nsec != VFS_I(ip)->i_mtime.tv_nsec)) {
+ tv.tv_sec = xfs_inode_time_from_epoch(sbp->bs_ctime.tv_sec,
+ sbp->bs_ctime_epoch);
+ tv.tv_nsec = sbp->bs_ctime.tv_nsec;
+ if (timespec_compare(&tv, &VFS_I(ip)->i_ctime)) {
error = XFS_ERROR(EBUSY);
goto out_unlock;
}
- /* We need to fail if the file is memory mapped. Once we have tossed
- * all existing pages, the page fault will have no option
- * but to go to the filesystem for pages. By making the page fault call
- * vop_read (or write in the case of autogrow) they block on the iolock
- * until we have switched the extents.
+ tv.tv_sec = xfs_inode_time_from_epoch(sbp->bs_mtime.tv_sec,
+ sbp->bs_mtime_epoch);
+ tv.tv_nsec = sbp->bs_mtime.tv_nsec;
+ if (timespec_compare(&tv, &VFS_I(ip)->i_mtime)) {
+ error = XFS_ERROR(EBUSY);
+ goto out_unlock;
+ }
+
+ /*
+ * We need to fail if the file is memory mapped. Once we have tossed
+ * all existing pages, the page fault will have no option but to go to
+ * the filesystem for pages. By making the page fault call vop_read (or
+ * write in the case of autogrow) they block on the iolock until we have
+ * switched the extents.
*/
if (VN_MAPPED(VFS_I(ip))) {
error = XFS_ERROR(EBUSY);
diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h
index 623bbe8..79f94722 100644
--- a/fs/xfs/xfs_dinode.h
+++ b/fs/xfs/xfs_dinode.h
@@ -21,11 +21,53 @@
#define XFS_DINODE_MAGIC 0x494e /* 'IN' */
#define XFS_DINODE_GOOD_VERSION(v) ((v) >= 1 && (v) <= 3)
+/*
+ * Inode timestamps get more complex when we consider supporting times beyond
+ * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot support
+ * more than a single extension by playing sign games, and that is still not
+ * reliable. We also can't extend the timestamp structure because there is no
+ * free space around them in the on-disk inode.
+ *
+ * Hence the simplest thing to do is to add an epoch counter for each timestamp
+ * in the inode. This can be a single byte for each timestamp and make use of
+ * a hole we currently pad. This gives us another 255 epochs range for the
+ * timestamps, but requires a superblock feature bit to indicate that these
+ * fields have meaning and can be non-zero.
+ *
+ * Provide wrapper functions for converting the kernel inode time format to
+ * the on-disk fields. The nanosecond counter is unlikely to change in future,
+ * so it's mostly just for the second+epoch counter conversion.
+ */
typedef struct xfs_timestamp {
__be32 t_sec; /* timestamp seconds */
__be32 t_nsec; /* timestamp nanoseconds */
} xfs_timestamp_t;
+static inline __uint8_t
+xfs_timestamp_epoch(
+ struct timespec *time)
+{
+ /* will be zero until the extended struct inode_time is introduced */
+ return 0;
+}
+
+static inline __int32_t
+xfs_timestamp_sec(
+ struct timespec *time)
+{
+ return time->tv_sec;
+}
+
+static inline __kernel_time_t
+xfs_inode_time_from_epoch(
+ __uint8_t epoch,
+ __int32_t seconds)
+{
+ /* need to handle non-zero epoch when struct inode_time is introduced */
+ ASSERT(epoch == 0);
+ return seconds;
+}
+
/*
* On-disk inode structure.
*
@@ -54,7 +96,11 @@ typedef struct xfs_dinode {
__be32 di_nlink; /* number of links to file */
__be16 di_projid_lo; /* lower part of owner's project id */
__be16 di_projid_hi; /* higher part owner's project id */
- __u8 di_pad[6]; /* unused, zeroed space */
+ __u8 di_atime_epoch; /* access time epoch */
+ __u8 di_mtime_epoch; /* modify time epoch */
+ __u8 di_ctime_epoch; /* change time epoch */
+ __u8 di_crtime_epoch;/* create time epoch */
+ __u8 di_pad[2]; /* unused, zeroed space */
__be16 di_flushiter; /* incremented on flush */
xfs_timestamp_t di_atime; /* time last accessed */
xfs_timestamp_t di_mtime; /* time last modified */
diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
index d34703d..fb0a0ea 100644
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -239,6 +239,7 @@ typedef struct xfs_fsop_resblks {
#define XFS_FSOP_GEOM_FLAGS_V5SB 0x8000 /* version 5 superblock */
#define XFS_FSOP_GEOM_FLAGS_FTYPE 0x10000 /* inode directory types */
#define XFS_FSOP_GEOM_FLAGS_FINOBT 0x20000 /* free inode btree */
+#define XFS_FSOP_GEOM_FLAGS_EPOCH 0x40000 /* timestamp epochs */
/*
* Minimum and maximum sizes need for growth checks.
@@ -280,6 +281,9 @@ typedef struct xfs_growfs_rt {
/*
* Structures returned from ioctl XFS_IOC_FSBULKSTAT & XFS_IOC_FSBULKSTAT_SINGLE
+ *
+ * Time epoch structures are only used if the XFS_FSOP_GEOM_FLAGS_EPOCH flag is
+ * asserted in the geometry output.
*/
typedef struct xfs_bstime {
time_t tv_sec; /* seconds */
@@ -307,7 +311,10 @@ typedef struct xfs_bstat {
#define bs_projid bs_projid_lo /* (previously just bs_projid) */
__u16 bs_forkoff; /* inode fork offset in bytes */
__u16 bs_projid_hi; /* higher part of project id */
- unsigned char bs_pad[10]; /* pad space, unused */
+ __u8 bs_atime_epoch; /* access time epoch */
+ __u8 bs_mtime_epoch; /* modify time epoch */
+ __u8 bs_ctime_epoch; /* change time epoch */
+ unsigned char bs_pad[7]; /* pad space, unused */
__u32 bs_dmevmask; /* DMIG event mask */
__u16 bs_dmstate; /* DMIG state info */
__u16 bs_aextents; /* attribute number of extents */
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index d229556..7b8db57 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -103,7 +103,10 @@ xfs_fs_geometry(
(xfs_sb_version_hasftype(&mp->m_sb) ?
XFS_FSOP_GEOM_FLAGS_FTYPE : 0) |
(xfs_sb_version_hasfinobt(&mp->m_sb) ?
- XFS_FSOP_GEOM_FLAGS_FINOBT : 0);
+ XFS_FSOP_GEOM_FLAGS_FINOBT : 0) |
+ (xfs_sb_version_hasepoch(&mp->m_sb) ?
+ XFS_FSOP_GEOM_FLAGS_EPOCH : 0);
+
geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
mp->m_sb.sb_logsectsize : BBSIZE;
geo->rtsectsize = mp->m_sb.sb_blocksize;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index a6115fe..eecae93 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -654,7 +654,8 @@ xfs_ialloc(
xfs_inode_t *ip;
uint flags;
int error;
- timespec_t tv;
+ struct timespec tv;
+ bool has_epoch;
/*
* Call the space management code to pick
@@ -720,12 +721,19 @@ xfs_ialloc(
ip->i_d.di_nextents = 0;
ASSERT(ip->i_d.di_nblocks == 0);
- nanotime(&tv);
- ip->i_d.di_mtime.t_sec = (__int32_t)tv.tv_sec;
+ has_epoch = xfs_sb_version_hasepoch(&mp->m_sb);
+ tv = current_fs_time(mp->m_super);
+ ip->i_d.di_mtime.t_sec = xfs_timestamp_sec(&tv);
ip->i_d.di_mtime.t_nsec = (__int32_t)tv.tv_nsec;
ip->i_d.di_atime = ip->i_d.di_mtime;
ip->i_d.di_ctime = ip->i_d.di_mtime;
+ if (has_epoch) {
+ ip->i_d.di_mtime_epoch = xfs_timestamp_epoch(&tv);
+ ip->i_d.di_atime_epoch = ip->i_d.di_mtime_epoch;
+ ip->i_d.di_ctime_epoch = ip->i_d.di_mtime_epoch;
+ }
+
/*
* di_gen will have been taken care of in xfs_iread.
*/
@@ -743,6 +751,8 @@ xfs_ialloc(
ip->i_d.di_flags2 = 0;
memset(&(ip->i_d.di_pad2[0]), 0, sizeof(ip->i_d.di_pad2));
ip->i_d.di_crtime = ip->i_d.di_mtime;
+ if (has_epoch)
+ ip->i_d.di_crtime_epoch = ip->i_d.di_mtime_epoch;
}
diff --git a/fs/xfs/xfs_inode_buf.c b/fs/xfs/xfs_inode_buf.c
index cb35ae4..0459e3d 100644
--- a/fs/xfs/xfs_inode_buf.c
+++ b/fs/xfs/xfs_inode_buf.c
@@ -208,6 +208,10 @@ xfs_dinode_from_disk(
to->di_nlink = be32_to_cpu(from->di_nlink);
to->di_projid_lo = be16_to_cpu(from->di_projid_lo);
to->di_projid_hi = be16_to_cpu(from->di_projid_hi);
+ to->di_atime_epoch = from->di_atime_epoch;
+ to->di_mtime_epoch = from->di_mtime_epoch;
+ to->di_ctime_epoch = from->di_ctime_epoch;
+ to->di_crtime_epoch = from->di_crtime_epoch;
memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
to->di_flushiter = be16_to_cpu(from->di_flushiter);
to->di_atime.t_sec = be32_to_cpu(from->di_atime.t_sec);
@@ -255,6 +259,10 @@ xfs_dinode_to_disk(
to->di_nlink = cpu_to_be32(from->di_nlink);
to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
+ to->di_atime_epoch = from->di_atime_epoch;
+ to->di_mtime_epoch = from->di_mtime_epoch;
+ to->di_ctime_epoch = from->di_ctime_epoch;
+ to->di_crtime_epoch = from->di_crtime_epoch;
memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
to->di_atime.t_sec = cpu_to_be32(from->di_atime.t_sec);
to->di_atime.t_nsec = cpu_to_be32(from->di_atime.t_nsec);
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 944d5ba..215324f 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -161,6 +161,9 @@ xfs_ioctl32_bstat_copyin(
get_user(bstat->bs_gen, &bstat32->bs_gen) ||
get_user(bstat->bs_projid_lo, &bstat32->bs_projid_lo) ||
get_user(bstat->bs_projid_hi, &bstat32->bs_projid_hi) ||
+ get_user(bstat->bs_atime_epoch, &bstat32->bs_atime_epoch) ||
+ get_user(bstat->bs_mtime_epoch, &bstat32->bs_mtime_epoch) ||
+ get_user(bstat->bs_ctime_epoch, &bstat32->bs_ctime_epoch) ||
get_user(bstat->bs_dmevmask, &bstat32->bs_dmevmask) ||
get_user(bstat->bs_dmstate, &bstat32->bs_dmstate) ||
get_user(bstat->bs_aextents, &bstat32->bs_aextents))
diff --git a/fs/xfs/xfs_ioctl32.h b/fs/xfs/xfs_ioctl32.h
index 80f4060..2a35c62 100644
--- a/fs/xfs/xfs_ioctl32.h
+++ b/fs/xfs/xfs_ioctl32.h
@@ -68,7 +68,10 @@ typedef struct compat_xfs_bstat {
__u16 bs_projid_lo; /* lower part of project id */
#define bs_projid bs_projid_lo /* (previously just bs_projid) */
__u16 bs_projid_hi; /* high part of project id */
- unsigned char bs_pad[12]; /* pad space, unused */
+ __u8 bs_atime_epoch; /* access time epoch */
+ __u8 bs_mtime_epoch; /* modify time epoch */
+ __u8 bs_ctime_epoch; /* change time epoch */
+ unsigned char bs_pad[9]; /* pad space, unused */
__u32 bs_dmevmask; /* DMIG event mask */
__u16 bs_dmstate; /* DMIG state info */
__u16 bs_aextents; /* attribute number of extents */
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 205613a..0588381 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -505,23 +505,34 @@ xfs_setattr_time(
struct iattr *iattr)
{
struct inode *inode = VFS_I(ip);
+ bool has_epoch;
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+ has_epoch = xfs_sb_version_hasepoch(&ip->i_mount->m_sb);
if (iattr->ia_valid & ATTR_ATIME) {
inode->i_atime = iattr->ia_atime;
- ip->i_d.di_atime.t_sec = iattr->ia_atime.tv_sec;
- ip->i_d.di_atime.t_nsec = iattr->ia_atime.tv_nsec;
+ ip->i_d.di_atime.t_sec = xfs_timestamp_sec(&iattr->ia_atime);
+ ip->i_d.di_atime.t_nsec = (__int32_t)iattr->ia_atime.tv_nsec;
+ if (has_epoch)
+ ip->i_d.di_atime_epoch =
+ xfs_timestamp_epoch(&iattr->ia_atime);
}
if (iattr->ia_valid & ATTR_CTIME) {
inode->i_ctime = iattr->ia_ctime;
- ip->i_d.di_ctime.t_sec = iattr->ia_ctime.tv_sec;
- ip->i_d.di_ctime.t_nsec = iattr->ia_ctime.tv_nsec;
+ ip->i_d.di_ctime.t_sec = xfs_timestamp_sec(&iattr->ia_ctime);
+ ip->i_d.di_ctime.t_nsec = (__int32_t)iattr->ia_ctime.tv_nsec;
+ if (has_epoch)
+ ip->i_d.di_ctime_epoch =
+ xfs_timestamp_epoch(&iattr->ia_ctime);
}
if (iattr->ia_valid & ATTR_MTIME) {
inode->i_mtime = iattr->ia_mtime;
- ip->i_d.di_mtime.t_sec = iattr->ia_mtime.tv_sec;
- ip->i_d.di_mtime.t_nsec = iattr->ia_mtime.tv_nsec;
+ ip->i_d.di_mtime.t_sec = xfs_timestamp_sec(&iattr->ia_mtime);
+ ip->i_d.di_mtime.t_nsec = (__int32_t)iattr->ia_mtime.tv_nsec;
+ if (has_epoch)
+ ip->i_d.di_mtime_epoch =
+ xfs_timestamp_epoch(&iattr->ia_mtime);
}
}
@@ -963,6 +974,7 @@ xfs_vn_update_time(
struct xfs_mount *mp = ip->i_mount;
struct xfs_trans *tp;
int error;
+ struct iattr iattr = {0};
trace_xfs_update_time(ip);
@@ -975,20 +987,19 @@ xfs_vn_update_time(
xfs_ilock(ip, XFS_ILOCK_EXCL);
if (flags & S_CTIME) {
- inode->i_ctime = *now;
- ip->i_d.di_ctime.t_sec = (__int32_t)now->tv_sec;
- ip->i_d.di_ctime.t_nsec = (__int32_t)now->tv_nsec;
+ iattr.ia_valid |= ATTR_CTIME;
+ iattr.ia_ctime = *now;
}
if (flags & S_MTIME) {
- inode->i_mtime = *now;
- ip->i_d.di_mtime.t_sec = (__int32_t)now->tv_sec;
- ip->i_d.di_mtime.t_nsec = (__int32_t)now->tv_nsec;
+ iattr.ia_valid |= ATTR_MTIME;
+ iattr.ia_mtime = *now;
}
if (flags & S_ATIME) {
- inode->i_atime = *now;
- ip->i_d.di_atime.t_sec = (__int32_t)now->tv_sec;
- ip->i_d.di_atime.t_nsec = (__int32_t)now->tv_nsec;
+ iattr.ia_valid |= ATTR_ATIME;
+ iattr.ia_atime = *now;
}
+ xfs_setattr_time(ip, &iattr);
+
xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP);
return -xfs_trans_commit(tp, 0);
@@ -1239,12 +1250,18 @@ xfs_setup_inode(
inode->i_generation = ip->i_d.di_gen;
i_size_write(inode, ip->i_d.di_size);
- inode->i_atime.tv_sec = ip->i_d.di_atime.t_sec;
- inode->i_atime.tv_nsec = ip->i_d.di_atime.t_nsec;
- inode->i_mtime.tv_sec = ip->i_d.di_mtime.t_sec;
- inode->i_mtime.tv_nsec = ip->i_d.di_mtime.t_nsec;
- inode->i_ctime.tv_sec = ip->i_d.di_ctime.t_sec;
- inode->i_ctime.tv_nsec = ip->i_d.di_ctime.t_nsec;
+ inode->i_atime.tv_sec = xfs_inode_time_from_epoch(
+ ip->i_d.di_atime_epoch,
+ ip->i_d.di_atime.t_sec);
+ inode->i_atime.tv_nsec = ip->i_d.di_atime.t_nsec;
+ inode->i_mtime.tv_sec = xfs_inode_time_from_epoch(
+ ip->i_d.di_mtime_epoch,
+ ip->i_d.di_mtime.t_sec);
+ inode->i_mtime.tv_nsec = ip->i_d.di_mtime.t_nsec;
+ inode->i_ctime.tv_sec = xfs_inode_time_from_epoch(
+ ip->i_d.di_ctime_epoch,
+ ip->i_d.di_ctime.t_sec);
+ inode->i_ctime.tv_nsec = ip->i_d.di_ctime.t_nsec;
xfs_diflags_to_iflags(inode, ip);
ip->d_ops = ip->i_mount->m_nondir_inode_ops;
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index cb64f22..e902418 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -97,12 +97,24 @@ xfs_bulkstat_one_int(
buf->bs_uid = dic->di_uid;
buf->bs_gid = dic->di_gid;
buf->bs_size = dic->di_size;
+
+ /* timestamp epochs are emitted only when configured */
buf->bs_atime.tv_sec = dic->di_atime.t_sec;
buf->bs_atime.tv_nsec = dic->di_atime.t_nsec;
buf->bs_mtime.tv_sec = dic->di_mtime.t_sec;
buf->bs_mtime.tv_nsec = dic->di_mtime.t_nsec;
buf->bs_ctime.tv_sec = dic->di_ctime.t_sec;
buf->bs_ctime.tv_nsec = dic->di_ctime.t_nsec;
+ if (xfs_sb_version_hasepoch(&mp->m_sb)) {
+ buf->bs_atime_epoch = dic->di_atime_epoch;
+ buf->bs_mtime_epoch = dic->di_mtime_epoch;
+ buf->bs_ctime_epoch = dic->di_ctime_epoch;
+ } else {
+ buf->bs_atime_epoch = 0;
+ buf->bs_mtime_epoch = 0;
+ buf->bs_ctime_epoch = 0;
+ }
+
buf->bs_xflags = xfs_ip2xflags(ip);
buf->bs_extsize = dic->di_extsize << mp->m_sb.sb_blocklog;
buf->bs_extents = dic->di_nextents;
diff --git a/fs/xfs/xfs_log_format.h b/fs/xfs/xfs_log_format.h
index f0969c7..abac6ad 100644
--- a/fs/xfs/xfs_log_format.h
+++ b/fs/xfs/xfs_log_format.h
@@ -374,6 +374,10 @@ typedef struct xfs_icdinode {
__uint32_t di_nlink; /* number of links to file */
__uint16_t di_projid_lo; /* lower part of owner's project id */
__uint16_t di_projid_hi; /* higher part of owner's project id */
+ __uint8_t di_atime_epoch; /* access time epoch */
+ __uint8_t di_mtime_epoch; /* modify time epoch */
+ __uint8_t di_ctime_epoch; /* change time epoch */
+ __uint8_t di_crtime_epoch;/* create time epoch */
__uint8_t di_pad[6]; /* unused, zeroed space */
__uint16_t di_flushiter; /* incremented on flush */
xfs_ictimestamp_t di_atime; /* time last accessed */
diff --git a/fs/xfs/xfs_sb.h b/fs/xfs/xfs_sb.h
index c43c2d6..1b3ccd8 100644
--- a/fs/xfs/xfs_sb.h
+++ b/fs/xfs/xfs_sb.h
@@ -509,8 +509,11 @@ xfs_sb_has_ro_compat_feature(
}
#define XFS_SB_FEAT_INCOMPAT_FTYPE (1 << 0) /* filetype in dirent */
+#define XFS_SB_FEAT_INCOMPAT_EPOCH (1 << 1) /* Time beyond 2038 */
#define XFS_SB_FEAT_INCOMPAT_ALL \
- (XFS_SB_FEAT_INCOMPAT_FTYPE)
+ (XFS_SB_FEAT_INCOMPAT_FTYPE | \
+ XFS_SB_FEAT_INCOMPAT_EPOCH | \
+ 0)
#define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
static inline bool
@@ -558,6 +561,13 @@ static inline int xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FINOBT);
}
+static inline int xfs_sb_version_hasepoch(xfs_sb_t *sbp)
+{
+ return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
+ (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_EPOCH);
+}
+
+
/*
* end of superblock version macros
*/
diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index 50c3f56..cdb4d86 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -70,7 +70,7 @@ xfs_trans_ichgtime(
int flags)
{
struct inode *inode = VFS_I(ip);
- timespec_t tv;
+ struct timespec tv;
ASSERT(tp);
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-01 19:56 ` Arnd Bergmann
2014-06-01 20:26 ` H. Peter Anvin
@ 2014-06-02 1:36 ` Nicolas Pitre
2014-06-02 2:22 ` Dave Chinner
2014-06-02 10:56 ` Arnd Bergmann
1 sibling, 2 replies; 71+ messages in thread
From: Nicolas Pitre @ 2014-06-02 1:36 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Sun, 1 Jun 2014, Arnd Bergmann wrote:
> On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > > readonly if not in reality than in practice.
> >
> > For those (legacy) filesystems with a signed 32-bit timestamps, any
> > attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
> > (silently) clamped to 0x7fffffff and that value (the last representable
> > time) used as an overflow indicator. The filesystem driver should
> > convert that value into a corresponding overflow value for whatever
> > kernel internal time representation being used when read back, and this
> > should be propagated up to user space. It should not be a hard error
> > otherwise, as you rightfully stated, everything non read-only would come
> > to a halt on that day.
>
> I don't think there is much of a difference between not being able to
> write at all and all newly written files having the same timestamp,
> causing random things to break differently.
Well, in one case you have a crash certitude. In the other case you have
some probability that your system might still be usable.
> The clamp to the maximum supported time stamp sounds like a reasonable
> choice for 'utimens' and related syscalls for the case of someone
> setting an arbitrary future date beyond what the file system can
> represent. Then again, I don't see a reason why that shouldn't just
> cause an error to be returned.
Resiliance is better than outright failure.
> For actually running kernels beyond 2038, the best idea I've seen so
> far is to disallow all broken code at compile time. I don't see
> a choice but to audit the entire kernel for invalid uses on both
> 32 and 64 bit in the next few years. A lot of code will get changed
> in the process so we can actually keep running 32-bit kernels and
> file systems, but other code will likely go away:
>
> * any system calls that pass a time_t, timeval or timespec on
> 32-bit systems return -ENOSYS, to ensure all user land uses
> the replacements we will put into place
> * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> from the kernel, and all code using it left out.
> * ext2 and ext3 file system code will have to be disabled, but that's
> file since ext4 can mount old file systems.
Syscalls and libs can be "fixed". Existing filesystem content might
not. So if you need to mount some old media in read-write mode after
2038 and that happens to content an ext2 or similarly limited filesystem
then it'd better just "work". Having the kernel refuse to modify the
filesystem would be unacceptable.
Nicolas
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 1:36 ` Nicolas Pitre
@ 2014-06-02 2:22 ` Dave Chinner
2014-06-02 7:09 ` Geert Uytterhoeven
2014-06-02 10:56 ` Arnd Bergmann
1 sibling, 1 reply; 71+ messages in thread
From: Dave Chinner @ 2014-06-02 2:22 UTC (permalink / raw)
To: Nicolas Pitre
Cc: linux-arch, Arnd Bergmann, linux-kernel, lftan, hch, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
On Sun, Jun 01, 2014 at 09:36:26PM -0400, Nicolas Pitre wrote:
> On Sun, 1 Jun 2014, Arnd Bergmann wrote:
> > On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > For actually running kernels beyond 2038, the best idea I've seen so
> > far is to disallow all broken code at compile time. I don't see
> > a choice but to audit the entire kernel for invalid uses on both
> > 32 and 64 bit in the next few years. A lot of code will get changed
> > in the process so we can actually keep running 32-bit kernels and
> > file systems, but other code will likely go away:
> >
> > * any system calls that pass a time_t, timeval or timespec on
> > 32-bit systems return -ENOSYS, to ensure all user land uses
> > the replacements we will put into place
> > * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> > from the kernel, and all code using it left out.
> > * ext2 and ext3 file system code will have to be disabled, but that's
> > file since ext4 can mount old file systems.
>
> Syscalls and libs can be "fixed". Existing filesystem content might
> not. So if you need to mount some old media in read-write mode after
> 2038 and that happens to content an ext2 or similarly limited filesystem
> then it'd better just "work". Having the kernel refuse to modify the
> filesystem would be unacceptable.
We can already tell the VFS/filesystems not to update timestamps:
inode->i_flags |= S_NOATIME | S_NOCMTIME;
Just enforce that everywhere (i.e. notify_change()) rather than just
on the IO path and the "legacy filesystem timestamp" problem is
"solved".
New interfaces need to return errors when an out-of-range parameter
is set. And right now, >epoch dates are out of range for most
filesystems, and so we need to handle that condition appropriately.
Silent date overflow == filesystem corruption, and as such I'm going
to error out such conditions in the filesystem regardless of what
the userspace API says.
Filesystems place all sorts of userspace visible limits on storage -
ever tried to create a file >16TB on ext4? The on-disk format
doesn't support it, so it returns an out of range error (E2BIG, I
think) if you try. XFS, OTOH, handles this just fine and so it
continues to work. It's exactly the same with timestamps - there's a
physical limit to what can sanely be stored in any given filesystem
and it's an *error condition* to go beyond that limit....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 2:22 ` Dave Chinner
@ 2014-06-02 7:09 ` Geert Uytterhoeven
0 siblings, 0 replies; 71+ messages in thread
From: Geert Uytterhoeven @ 2014-06-02 7:09 UTC (permalink / raw)
To: Dave Chinner
Cc: Nicolas Pitre, Linux-Arch, Arnd Bergmann,
linux-kernel@vger.kernel.org, xfs, Christoph Hellwig, John Stultz,
H. Peter Anvin, Linux FS Devel, Ley Foon Tan, Thomas Gleixner,
Joseph S. Myers
On Mon, Jun 2, 2014 at 4:22 AM, Dave Chinner <david@fromorbit.com> wrote:
> Filesystems place all sorts of userspace visible limits on storage -
> ever tried to create a file >16TB on ext4? The on-disk format
> doesn't support it, so it returns an out of range error (E2BIG, I
> think) if you try. XFS, OTOH, handles this just fine and so it
> continues to work. It's exactly the same with timestamps - there's a
> physical limit to what can sanely be stored in any given filesystem
> and it's an *error condition* to go beyond that limit....
This comparison doesn't fly.
File sizes do not depend on the current time (except for the increase of
megapixels in your new camera ;-).
Writing a 15 GiB file to ext4 is not something that magically stops working
tomorrow.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 1:36 ` Nicolas Pitre
2014-06-02 2:22 ` Dave Chinner
@ 2014-06-02 10:56 ` Arnd Bergmann
2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 15:04 ` Chuck Lever
1 sibling, 2 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 10:56 UTC (permalink / raw)
To: Nicolas Pitre
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
>
> > For actually running kernels beyond 2038, the best idea I've seen so
> > far is to disallow all broken code at compile time. I don't see
> > a choice but to audit the entire kernel for invalid uses on both
> > 32 and 64 bit in the next few years. A lot of code will get changed
> > in the process so we can actually keep running 32-bit kernels and
> > file systems, but other code will likely go away:
> >
> > * any system calls that pass a time_t, timeval or timespec on
> > 32-bit systems return -ENOSYS, to ensure all user land uses
> > the replacements we will put into place
> > * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> > from the kernel, and all code using it left out.
> > * ext2 and ext3 file system code will have to be disabled, but that's
> > file since ext4 can mount old file systems.
>
> Syscalls and libs can be "fixed". Existing filesystem content might
> not. So if you need to mount some old media in read-write mode after
> 2038 and that happens to content an ext2 or similarly limited filesystem
> then it'd better just "work". Having the kernel refuse to modify the
> filesystem would be unacceptable.
I think you misunderstood what I suggested: the intent is to avoid
seeing things break in 2038 by making them break much earlier. We have
a solution for ext2 file systems, it's called ext4, and we just need
to ensure that everybody knows they have to migrate eventually.
At some point before the mid 2030ies, you should no longer be able to
build a kernel that has support for ext2 or any other module that will
run into bugs later. Until then (rather sooner than later), I'd like
to get to the point where you can choose whether to include those
modules at build time or not, and then get everybody to turn off that
option and fix the bugs they run into. You wouldn't need that for a
2014-generation long-term support disto (rhel 7, sles 12, debian 7,
ubuntu 14.04, ...), but perhaps for the next generation, or the
one after that.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-01 20:26 ` H. Peter Anvin
@ 2014-06-02 11:02 ` Arnd Bergmann
0 siblings, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 11:02 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Nicolas Pitre, linux-arch, linux-kernel, xfs, hch, john.stultz,
lftan, linux-fsdevel, geert, tglx, joseph
On Sunday 01 June 2014 13:26:03 H. Peter Anvin wrote:
> Perhaps we should make this a kernel command line option instead, with the
> settings: error out on outside the standard window, or a date indicating the
> earliest date that should be recognized and do windowing (0 for no windowing,
> 1970 for retconning the Unix epoch as unsigned...)
What's wrong with compile-time errors? We have a pretty good understanding
of how time values are passed in the kernel, and we know they will all break
in 2038 for 32-bit kernels unless we do something about it.
> But again, the kernel is probably the least problem here...
I agree the glibc side is harder than this, but we have to get the kernel
into shape first (at the minimum we have to do the APIs), and there is enough
work to do here.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 0:28 ` Dave Chinner
@ 2014-06-02 11:35 ` Roger Willcocks
2014-06-02 11:43 ` Arnd Bergmann
1 sibling, 0 replies; 71+ messages in thread
From: Roger Willcocks @ 2014-06-02 11:35 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-arch, Arnd Bergmann, linux-kernel, geert, hch, john.stultz,
H. Peter Anvin, linux-fsdevel, lftan, tglx, xfs, joseph
On Mon, 2014-06-02 at 10:28 +1000, Dave Chinner wrote:
>
> The 32 bit second counters in timestamps are too small to represent
> time beyond the unix epoch (jan 2038) correctly. Extend the on-disk
> format for a timestamp to include an 8-bit epoch counter so that we
> can extend time for up to 255 Unix epochs. This should be good for
> representing timestamps from 1970 to somewhere around 19,000 A.D....
>
I assume you're using an 'epoch' variable and not simply using the
padding byte as an eight-bit prefix to the existing 32-bit counter
because the existing counter is signed ?
For long term sanity it might make more sense for the eight-bit value to
be a simple (sign-extended) prefix from 1970.
So if the feature bit is set it's a 40-bit signed time, which is good
for 1970 +/- 17400 years or so.
--
Roger
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 0:28 ` Dave Chinner
2014-06-02 11:35 ` Roger Willcocks
@ 2014-06-02 11:43 ` Arnd Bergmann
2014-06-03 0:32 ` Dave Chinner
1 sibling, 1 reply; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 11:43 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > all file systems at least times until 2106, because they treat
> > > the on-disk value as unsigned on 64-bit systems, or they use
> > > a completely different representation. My guess is that somebody
> > > earlier spent a lot of work on making that happen.
> > >
> > > The exceptions are:
> > >
> > > * exofs uses signed values, which can probably be changed to be
> > > consistent with the others.
> > > * isofs has a bug that limits it until 2027 on architectures with
> > > a signed 'char' type (otherwise it's 2155).
> > > * udf can represent times for many thousands of years through a
> > > 16-bit year representation, but the code to convert to epoch
> > > uses a const array that ends at 2038.
> > > * afs uses signed seconds and can probably be fixed
> > > * coda relies on user space time representation getting passed
> > > through an ioctl.
> > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > > where they really use signed.
> > >
> > > I was confused about XFS since I didn't noticed that there are
> > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> >
> > You've missed an awful lot more than just the implications for the
> > core kernel code.
> >
> > There's a good chance such changes propagate to APIs elsewhere in
> > the filesystems, because something you haven't realised is that XFS
> > effectively exposes the on-disk timestamp format directly to
> > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > by the online defragmenter.
I really didn't look at them at all, as ioctl is very late on my
mental list of things to change. I do realize that a lot of drivers
and file systems do have ioctls that pass time values and we need to
address them one by one.
I just looked at the ioctls you mentioned but don't see how open-by-handle
is affected by this. Can you point me to what you mean?
> Just to put that in context, here's the kernel patch to add extended
> epoch support to XFS. It's completely untested as I haven't done any
> userspace code changes to enable the feature. However, it should
> give you an indication of how far the simple act of changing the
> kernel time representation spread through the filesystem. This does
> not include any of the VFS infrastructure to specifying the range of
> supported timestamps. It survives some smoke testing, but dies when
> the online defragmenter starts using the bulkstat and swap extent
> ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
> probably don't have that all sorted correctly yet...
>
> To test extended epoch support, however, I need to some fstests that
> define and validate the behaviour of the new syscalls - until we get
> those we can't validate that the filesystem follows the spec
> properly. I also suspect we are going to need an interface to query
> the supported range of timestamps from a filesystem so that we can
> test boundary conditions in an automated fashion....
Thanks a lot for having an initial look at this yourself!
I'd still consider the two problems largely orthogonal. My patch set
(at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
more like 64-bit kernels regarding inode time stamps, which does
impact all the file systems that the a 64-bit time or the NFS
unsigned epoch (1970-2106), while your patch extends the file
system internal epoch (1901-2038 for XFS) so it can be used by
anything that knows how to handle larger than 32-bit second values
(either 64-bit kernel or 32-bit with inode_time patch).
> diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h
> index 623bbe8..79f94722 100644
> --- a/fs/xfs/xfs_dinode.h
> +++ b/fs/xfs/xfs_dinode.h
> @@ -21,11 +21,53 @@
> #define XFS_DINODE_MAGIC 0x494e /* 'IN' */
> #define XFS_DINODE_GOOD_VERSION(v) ((v) >= 1 && (v) <= 3)
>
> +/*
> + * Inode timestamps get more complex when we consider supporting times beyond
> + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot support
> + * more than a single extension by playing sign games, and that is still not
> + * reliable. We also can't extend the timestamp structure because there is no
> + * free space around them in the on-disk inode.
> + *
> + * Hence the simplest thing to do is to add an epoch counter for each timestamp
> + * in the inode. This can be a single byte for each timestamp and make use of
> + * a hole we currently pad. This gives us another 255 epochs range for the
> + * timestamps, but requires a superblock feature bit to indicate that these
> + * fields have meaning and can be non-zero.
Nice trick!
> +static inline __uint8_t
> +xfs_timestamp_epoch(
> + struct timespec *time)
> +{
> + /* will be zero until the extended struct inode_time is introduced */
> + return 0;
> +}
> +
> +static inline __int32_t
> +xfs_timestamp_sec(
> + struct timespec *time)
> +{
> + return time->tv_sec;
> +}
> +
> +static inline __kernel_time_t
> +xfs_inode_time_from_epoch(
> + __uint8_t epoch,
> + __int32_t seconds)
> +{
> + /* need to handle non-zero epoch when struct inode_time is introduced */
> + ASSERT(epoch == 0);
> + return seconds;
> +}
Why don't you already implement epoch conversion for 64-bit kernels that
are able to represent the time today? This is how ext4 does it (I mean
the sizeof() trick, not the bit stuffing they do):
static inline __le32 ext4_encode_extra_time(struct inode_time *time)
{
return cpu_to_le32((sizeof(time->tv_sec) > 4 ?
(time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) |
((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK));
}
static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra)
{
if (sizeof(time->tv_sec) > 4)
time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
<< 32;
time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
}
I guess if there is general agreement on introducing 'struct inode_time',
we can skip that intermediate step.
> @@ -509,8 +509,11 @@ xfs_sb_has_ro_compat_feature(
> }
>
> #define XFS_SB_FEAT_INCOMPAT_FTYPE (1 << 0) /* filetype in dirent */
> +#define XFS_SB_FEAT_INCOMPAT_EPOCH (1 << 1) /* Time beyond 2038 */
> #define XFS_SB_FEAT_INCOMPAT_ALL \
> - (XFS_SB_FEAT_INCOMPAT_FTYPE)
> + (XFS_SB_FEAT_INCOMPAT_FTYPE | \
> + XFS_SB_FEAT_INCOMPAT_EPOCH | \
> + 0)
>
> #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
How does this flag get set? Do you have to manually change it in the
superblock? Since most of the time I'd suspect you wouldn't actually
use it for the foreseeable future, would it make sense to have a mount
option that allows it to be set, but doesn't actually change the
superblock until the first inode gets written with a nonzero epoch?
That way, you'd still be able to mount it with an older kernel but
also be forward compatible with time moving on.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 10:56 ` Arnd Bergmann
@ 2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 12:38 ` Arnd Bergmann
` (2 more replies)
2014-06-02 15:04 ` Chuck Lever
1 sibling, 3 replies; 71+ messages in thread
From: Theodore Ts'o @ 2014-06-02 11:57 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
>
> I think you misunderstood what I suggested: the intent is to avoid
> seeing things break in 2038 by making them break much earlier. We have
> a solution for ext2 file systems, it's called ext4, and we just need
> to ensure that everybody knows they have to migrate eventually.
>
> At some point before the mid 2030ies, you should no longer be able to
> build a kernel that has support for ext2 or any other module that will
> run into bugs later....
Even for ext4, it's not quite so simple as that. You only have
support for times post 2038 if you are using an inode size > 128
bytes. There are a very, very large number of machines which even
today, are using 128 byte inodes with ext4 for performance reasons.
The vast majority of those machines which I know of can probably move
to 256 byte inodes relatively easily, since hard drive replacement
cycles are order 5-6 years tops, so I'm not that concerned, but it
just goes to show this is a very complicated problem.
And even if we're talking about flash and embedded devices, the good
news is if you assume that 10 years is enough time for people to
update their embedded OS builds, and that the vast majority of
deployed devices will probably only be in service for 10-15 years, we
do have enough time to make file system format changes, although
admittedly we can't afford to dilly-dally.
Regards,
- Ted
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 11:57 ` Theodore Ts'o
@ 2014-06-02 12:38 ` Arnd Bergmann
2014-06-02 13:15 ` Theodore Ts'o
2014-06-02 12:52 ` Arnd Bergmann
2014-06-02 14:52 ` H. Peter Anvin
2 siblings, 1 reply; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 12:38 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
> >
> > I think you misunderstood what I suggested: the intent is to avoid
> > seeing things break in 2038 by making them break much earlier. We have
> > a solution for ext2 file systems, it's called ext4, and we just need
> > to ensure that everybody knows they have to migrate eventually.
> >
> > At some point before the mid 2030ies, you should no longer be able to
> > build a kernel that has support for ext2 or any other module that will
> > run into bugs later....
>
> Even for ext4, it's not quite so simple as that. You only have
> support for times post 2038 if you are using an inode size > 128
> bytes. There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
>
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.
Ok, I see.
I also now noticed this comment above EXT4_FITS_IN_INODE():
"For new inodes we always reserve enough space for the kernel's known
extended fields, but for inodes created with an old kernel this might
not have been the case. None of the extended inode fields is critical
for correct filesystem operation."
Do we have to worry about this for inodes that contain extended
attributes and that get updated after 2038?
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 12:38 ` Arnd Bergmann
@ 2014-06-02 12:52 ` Arnd Bergmann
2014-06-02 13:07 ` Theodore Ts'o
2014-06-02 14:52 ` H. Peter Anvin
2 siblings, 1 reply; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 12:52 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
> >
> > I think you misunderstood what I suggested: the intent is to avoid
> > seeing things break in 2038 by making them break much earlier. We have
> > a solution for ext2 file systems, it's called ext4, and we just need
> > to ensure that everybody knows they have to migrate eventually.
> >
> > At some point before the mid 2030ies, you should no longer be able to
> > build a kernel that has support for ext2 or any other module that will
> > run into bugs later....
>
> Even for ext4, it's not quite so simple as that. You only have
> support for times post 2038 if you are using an inode size > 128
> bytes. There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
>
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.
One stupid question about the current code:
static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra)
{
if (sizeof(time->tv_sec) > 4)
time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
<< 32;
time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
}
#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode) \
do { \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \
(einode)->xtime.tv_sec = \
(signed)le32_to_cpu((raw_inode)->xtime); \
else \
(einode)->xtime.tv_sec = 0; \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra)) \
ext4_decode_extra_time(&(einode)->xtime, \
raw_inode->xtime ## _extra); \
else \
(einode)->xtime.tv_nsec = 0; \
} while (0)
For a time between 2038 and 2106, this looks like xtime.tv_sec is
negative when ext4_decode_extra_time gets called, so the '|=' operator
doesn't actually do anything. Shouldn't that be '+='?
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 12:52 ` Arnd Bergmann
@ 2014-06-02 13:07 ` Theodore Ts'o
2014-06-02 15:01 ` Arnd Bergmann
0 siblings, 1 reply; 71+ messages in thread
From: Theodore Ts'o @ 2014-06-02 13:07 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
Yes, there are some ongoing dicussions about changing the post-2038
encoding of the timestamp in ext4, which is why this hasn't been fixed
yet. The main thing that's been missing is time for me to review the
patches, and a good way of writing regression tests that will work (or
at least not fail) on build environments with a 32-bit time_t and
32-bit-only capable versions of functions such as gmtime(3).
And given current discussions, I may want to think about some kind of
superblock flag to allow the use of a 32-bit unsigned encoding for
file systems using a 128-byte inode, with a way of setting that flag
after scanning the file system to make sure there are no times that
are previous to January 1, 1970. (Or more generally, allow any epoch
to be defined using a 64-bit time_t offset stored in the superblock...)
Cheers,
- Ted
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 12:38 ` Arnd Bergmann
@ 2014-06-02 13:15 ` Theodore Ts'o
0 siblings, 0 replies; 71+ messages in thread
From: Theodore Ts'o @ 2014-06-02 13:15 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
On Mon, Jun 02, 2014 at 02:38:09PM +0200, Arnd Bergmann wrote:
>
> "For new inodes we always reserve enough space for the kernel's known
> extended fields, but for inodes created with an old kernel this might
> not have been the case. None of the extended inode fields is critical
> for correct filesystem operation."
>
> Do we have to worry about this for inodes that contain extended
> attributes and that get updated after 2038?
In practice, the extended timestamps was one of the first things added
to ext4, so the vast majority of ext4 file systems with inode sizes >
128 bytes will have room for the extended timestamps. There are some
legacy ext3 file systems with 256-byte inodes (enabled for fast
sotrage of SELinux xattrs) that in theory, could have been converted
to ext4 and had enough xattrs so that the extended timestamps couldn't
be added. That would be a vanishingly small use case, and in
practice, it's not likely to be the case for the embedded market.
I could imagine someone worrying about file systems originally
formatted using RHEL 4 post-2038 (perhaps running in a VM), but I
don't work for IBM any more, and hopefully even IBM would just tell
such customers that they need to suck it up, and do a
backup/reformat/restore pass.
Cheers,
- Ted
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann
` (2 preceding siblings ...)
2014-05-31 14:51 ` Richard Cochran
@ 2014-06-02 13:52 ` Joseph S. Myers
2014-06-02 19:19 ` Arnd Bergmann
3 siblings, 1 reply; 71+ messages in thread
From: Joseph S. Myers @ 2014-06-02 13:52 UTC (permalink / raw)
To: Arnd Bergmann
Cc: hch, linux-mtd, hpa, logfs, linux-afs, linux-arch, linux-cifs,
linux-scsi, ceph-devel, codalist, cluster-devel, coda, geert,
linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx,
linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel,
linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
On Fri, 30 May 2014, Arnd Bergmann wrote:
> a) is this the right approach in general? The previous discussion
> pointed this way, but there may be other opinions.
The syscall changes seem like the sort of thing I'd expect, although
patches adding new syscalls or otherwise affecting the kernel/userspace
interface (as opposed to those relating to an individual filesystem)
should go to linux-api as well as other relevant lists.
--
Joseph S. Myers
joseph@codesourcery.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 5:54 ` Dave Chinner
2014-05-31 8:41 ` H. Peter Anvin
@ 2014-06-02 14:00 ` Joseph S. Myers
1 sibling, 0 replies; 71+ messages in thread
From: Joseph S. Myers @ 2014-06-02 14:00 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-arch, Arnd Bergmann, linux-kernel, lftan, hch, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs
On Sat, 31 May 2014, Dave Chinner wrote:
> If we are changing the in-kernel timestamp to have a greater dynamic
> range that anything we current support on disk, then we need support
> for all filesystems for similar translation and constraint. The
> filesystems need to be able to tell the kernel what they timestamp
> range they support, and then the kernel needs to follow those
> guidelines. And if the filesystem is mounted on a kernel that
> doesn't support the current filesystem's timestamp format, then at
> minimum that filesystem cannot do anything that writes a
> timestamp....
>
> Put simply: the filesystem defines the timestamp range that can be
> used safely, not the userspace API. If the filesystem can't support
> the date it is handed then that is an out-of-range error. Since
> when have we accepted that it's OK to handle out-of-range data with
> silent overflows or corruption of the data that we are attempting to
> store? We're defining a new API to support a wider date range -
> there is nothing that prevents us from saying ERANGE can be returned
> to a timestamp that the file cannot store correctly....
I don't see anything new about this issue. All problems that could arise
from the kernel being able to represent a timestamp some filesystems can't
are problems that already apply with 64-bit kernels using 64-bit time_t
internally. So while as part of Y2038-preparedness we do need a clear
understanding of which filesystems have what timestamp limits and what
happens with timestamps beyond those limits, I think this is a separate
strand of the problem - one that applies to both 32-bit and 64-bit systems
- from the more general issue for 32-bit systems.
--
Joseph S. Myers
joseph@codesourcery.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 12:38 ` Arnd Bergmann
2014-06-02 12:52 ` Arnd Bergmann
@ 2014-06-02 14:52 ` H. Peter Anvin
2 siblings, 0 replies; 71+ messages in thread
From: H. Peter Anvin @ 2014-06-02 14:52 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nicolas Pitre, linux-arch@vger.kernel.org, Arnd Bergmann,
linux-kernel@vger.kernel.org, xfs@oss.sgi.com, hch@infradead.org,
john.stultz@linaro.org, lftan@altera.com,
linux-fsdevel@vger.kernel.org, geert@linux-m68k.org,
tglx@linutronix.de, joseph@codesourcery.com
> On Jun 2, 2014, at 4:57, "Theodore Ts'o" <tytso@mit.edu> wrote:
>
>> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
>>
>> I think you misunderstood what I suggested: the intent is to avoid
>> seeing things break in 2038 by making them break much earlier. We have
>> a solution for ext2 file systems, it's called ext4, and we just need
>> to ensure that everybody knows they have to migrate eventually.
>>
>> At some point before the mid 2030ies, you should no longer be able to
>> build a kernel that has support for ext2 or any other module that will
>> run into bugs later....
>
> Even for ext4, it's not quite so simple as that. You only have
> support for times post 2038 if you are using an inode size > 128
> bytes. There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
>
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.
>
> And even if we're talking about flash and embedded devices, the good
> news is if you assume that 10 years is enough time for people to
> update their embedded OS builds, and that the vast majority of
> deployed devices will probably only be in service for 10-15 years, we
> do have enough time to make file system format changes, although
> admittedly we can't afford to dilly-dally.
I have a number of file systems older than any device they are sitting on. RAID allows individual disks to be swapped out, and when all disks have been swapped out, extend the file system online. The system doesn't even have to be taken offline in the process if it is possible to physically get to the drives with the system powered (e.g. hot plug bays), which is really damned nice.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 13:07 ` Theodore Ts'o
@ 2014-06-02 15:01 ` Arnd Bergmann
0 siblings, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 15:01 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
On Monday 02 June 2014 09:07:00 Theodore Ts'o wrote:
> Yes, there are some ongoing dicussions about changing the post-2038
> encoding of the timestamp in ext4, which is why this hasn't been fixed
> yet. The main thing that's been missing is time for me to review the
> patches, and a good way of writing regression tests that will work (or
> at least not fail) on build environments with a 32-bit time_t and
> 32-bit-only capable versions of functions such as gmtime(3).
>
> And given current discussions, I may want to think about some kind of
> superblock flag to allow the use of a 32-bit unsigned encoding for
> file systems using a 128-byte inode, with a way of setting that flag
> after scanning the file system to make sure there are no times that
> are previous to January 1, 1970. (Or more generally, allow any epoch
> to be defined using a 64-bit time_t offset stored in the superblock...)
FWIW, I've gone through the other file system implementations once
more. The most common pattern I've encountered is to have a read_inode
function with
inode->i_mtime = le32_to_cpu(raw_inode->mtime);
which results in interpreting the time as 'signed' on 32-bit
kernels, but as 'unsigned' on 64-bit kernels. This could have been
done intentionally to extend the valid time range to 2106 on 64-bit
kernels, but it seems more likely that the code was written with
no thought given to 64-bit time_t at all. I see this pattern on
p9fs (old protocol only), afs, bfs, ceph, efs, freevxfs, hpfs, jffs2,
jfs, minix, nfsv2/v3 (this was clearly intentional and is
spelled out in the RFC), qnx4, qnx6, reiserfs, squashfs, sysv,
and ufs (protocol version 1 only).
The other behavior I see is to treat the on-disk 32-bit value
as signed on both 32-bit and 64-bit kernels:
inode->i_mtime = (signed)le32_to_cpu(raw_inode->mtime);
this seems to be done intentionally in all cases, to maintain
compatibility between 32-bit and 64-bit kernels, but it's
relatively rare: exofs, ext2/3/4 (good old inodes) and xfs
are the only ones doing this.
In case of ext2/3/4, the sign handlign was introduced here:
http://www.spinics.net/lists/linux-ext4/msg01758.html
exofs and xfs seem to have done it like this for all of git
history.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 10:56 ` Arnd Bergmann
2014-06-02 11:57 ` Theodore Ts'o
@ 2014-06-02 15:04 ` Chuck Lever
2014-06-02 15:31 ` Theodore Ts'o
` (2 more replies)
1 sibling, 3 replies; 71+ messages in thread
From: Chuck Lever @ 2014-06-02 15:04 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, LKML Kernel,
lftan, Christoph Hellwig, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Jun 2, 2014, at 6:56 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
>>
>>> For actually running kernels beyond 2038, the best idea I've seen so
>>> far is to disallow all broken code at compile time. I don't see
>>> a choice but to audit the entire kernel for invalid uses on both
>>> 32 and 64 bit in the next few years. A lot of code will get changed
>>> in the process so we can actually keep running 32-bit kernels and
>>> file systems, but other code will likely go away:
>>>
>>> * any system calls that pass a time_t, timeval or timespec on
>>> 32-bit systems return -ENOSYS, to ensure all user land uses
>>> the replacements we will put into place
>>> * The definition of 'time_t', 'timval' and 'timespec' can be hidden
>>> from the kernel, and all code using it left out.
>>> * ext2 and ext3 file system code will have to be disabled, but that's
>>> file since ext4 can mount old file systems.
>>
>> Syscalls and libs can be "fixed". Existing filesystem content might
>> not. So if you need to mount some old media in read-write mode after
>> 2038 and that happens to content an ext2 or similarly limited filesystem
>> then it'd better just "work". Having the kernel refuse to modify the
>> filesystem would be unacceptable.
>
> I think you misunderstood what I suggested: the intent is to avoid
> seeing things break in 2038 by making them break much earlier. We have
> a solution for ext2 file systems, it's called ext4, and we just need
> to ensure that everybody knows they have to migrate eventually.
>
> At some point before the mid 2030ies, you should no longer be able to
> build a kernel that has support for ext2 or any other module that will
> run into bugs later. Until then (rather sooner than later), I'd like
> to get to the point where you can choose whether to include those
> modules at build time or not, and then get everybody to turn off that
> option and fix the bugs they run into. You wouldn't need that for a
> 2014-generation long-term support disto (rhel 7, sles 12, debian 7,
> ubuntu 14.04, ...), but perhaps for the next generation, or the
> one after that.
I’m wondering what should be done about NFS. A solution for NFS should
match any scheme that is considered for local file systems, IMO.
NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
(See the definition of nfstime3 in RFC 1813).
NFSv4 uses a signed 64-bit value where zero represents midnight UTC
on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
the definition of nfstime4 in RFC 5661).
The NFSv4 protocol is probably not problematic, and NFSv3 should be out
of the picture by 2038. But if changes are planned for dealing _now_
with timestamp issues, compatibility with NFSv3 is a consideration.
It is already the case that, via NFSv3, the Linux NFS client transmits
timestamps earlier than 1970 as large positive numbers. Try this with
xfstests generic/258.
Maybe nfs3_proc_setattr() should recognize pre-epoch timestamps and
timestamps larger than can be represented in an unsigned 32-bit field
and return an immediate error to the requesting application (like EINVAL).
If the Linux NFS server encounters a local file with a timestamp that
cannot be represented via a u32, should it also return NFS3ERR_INVAL?
RFC 1813 does not provide guidance on the behavior nor does it suggest
a particular error status code. The Solaris 11 server appears to return
NFS3ERR_INVAL in this case.
An alternative would be to “cap” the timestamps transmitted via NFSv3 by
Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
timestamp is transmitted as UINT_MAX.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 15:04 ` Chuck Lever
@ 2014-06-02 15:31 ` Theodore Ts'o
2014-06-02 17:12 ` H. Peter Anvin
2014-06-02 18:52 ` Arnd Bergmann
2014-06-02 18:58 ` Roger Willcocks
2 siblings, 1 reply; 71+ messages in thread
From: Theodore Ts'o @ 2014-06-02 15:31 UTC (permalink / raw)
To: Chuck Lever
Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann,
LKML Kernel, lftan, Christoph Hellwig, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
On Mon, Jun 02, 2014 at 11:04:23AM -0400, Chuck Lever wrote:
> I’m wondering what should be done about NFS. A solution for NFS should
> match any scheme that is considered for local file systems, IMO.
>
> An alternative would be to “cap” the timestamps transmitted via NFSv3 by
> Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
> timestamp is transmitted as UINT_MAX.
I wonder if it would make sense to try to promulgate via the Austin
group, and possibly the C standards committee the concept of a bit
pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
unknown", or "time indefinite" or "we couldn't encode the time".
We would then teach gmtime(3) and asctime(3) to print some appropriate
message, and we could teach programs like find (with the -mtime)
option, make, tmpwatch, et. al., that they can't make any presumption
about the comparibility of any timestamp which has a value of
TIME_UNDEFINIED.
It would be problematic for time(2) or gettimeofday(2) to return
TIME_UNDEFINED, since there are programs that care about time ticking
forward, but I could imagine a new interface which would be permitted
to return a flag indicating that we don't know the current time
(because the CMOS battery had run down, etc.) so instead we're going
to be counting the number of seconds since the system was booted.
- Ted
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 15:31 ` Theodore Ts'o
@ 2014-06-02 17:12 ` H. Peter Anvin
2014-06-02 18:50 ` Arnd Bergmann
2014-06-02 22:29 ` Theodore Ts'o
0 siblings, 2 replies; 71+ messages in thread
From: H. Peter Anvin @ 2014-06-02 17:12 UTC (permalink / raw)
To: Theodore Ts'o, Chuck Lever, Arnd Bergmann, Nicolas Pitre,
Dave Chinner, LKML Kernel, linux-arch, joseph, john.stultz,
Christoph Hellwig, tglx, geert, lftan, linux-fsdevel, xfs,
Linux NFS Mailing List
On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
>
> I wonder if it would make sense to try to promulgate via the Austin
> group, and possibly the C standards committee the concept of a bit
> pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
> unknown", or "time indefinite" or "we couldn't encode the time".
>
(time_t)-1 already has this meaning for some calls (e.g. time(2)).
However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
something similar applies to all possible bit patterns, certainly within
the range of an int.
> We would then teach gmtime(3) and asctime(3) to print some appropriate
> message, and we could teach programs like find (with the -mtime)
> option, make, tmpwatch, et. al., that they can't make any presumption
> about the comparibility of any timestamp which has a value of
> TIME_UNDEFINIED.
>
> It would be problematic for time(2) or gettimeofday(2) to return
> TIME_UNDEFINED, since there are programs that care about time ticking
> forward, but I could imagine a new interface which would be permitted
> to return a flag indicating that we don't know the current time
> (because the CMOS battery had run down, etc.) so instead we're going
> to be counting the number of seconds since the system was booted.
This assumes that we actually know that that is the case, which may be
an aggressive assumption.
-hpa
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 17:12 ` H. Peter Anvin
@ 2014-06-02 18:50 ` Arnd Bergmann
2014-06-02 22:29 ` Theodore Ts'o
1 sibling, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 18:50 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List,
Theodore Ts'o, LKML Kernel, xfs, Christoph Hellwig,
Chuck Lever, john.stultz, lftan, linux-fsdevel, geert, tglx,
joseph
On Monday 02 June 2014 10:12:37 H. Peter Anvin wrote:
> On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
> >
> > I wonder if it would make sense to try to promulgate via the Austin
> > group, and possibly the C standards committee the concept of a bit
> > pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
> > unknown", or "time indefinite" or "we couldn't encode the time".
> >
>
> (time_t)-1 already has this meaning for some calls (e.g. time(2)).
> However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
> something similar applies to all possible bit patterns, certainly within
> the range of an int.
Worse than Wed Dec 31 23:59:59 UTC 1969, on NFSv3 it also means
"Sun Feb 7 07:28:15 CET 2106", and that is much harder to distinguish
from a real future date.
If we had the choice, I'd go for something like 1, i.e.
"Thu Jan 1 01:00:01 CET 1970".
> > We would then teach gmtime(3) and asctime(3) to print some appropriate
> > message, and we could teach programs like find (with the -mtime)
> > option, make, tmpwatch, et. al., that they can't make any presumption
> > about the comparibility of any timestamp which has a value of
> > TIME_UNDEFINIED.
> >
> > It would be problematic for time(2) or gettimeofday(2) to return
> > TIME_UNDEFINED, since there are programs that care about time ticking
> > forward, but I could imagine a new interface which would be permitted
> > to return a flag indicating that we don't know the current time
> > (because the CMOS battery had run down, etc.) so instead we're going
> > to be counting the number of seconds since the system was booted.
>
> This assumes that we actually know that that is the case, which may be
> an aggressive assumption.
It's harder for time(2), but for the inode case, we can definitely
detect when the file system specific representation overflows
or underflows, which may be be at a number of very different points
of time.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 15:04 ` Chuck Lever
2014-06-02 15:31 ` Theodore Ts'o
@ 2014-06-02 18:52 ` Arnd Bergmann
2014-06-02 18:58 ` Roger Willcocks
2 siblings, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 18:52 UTC (permalink / raw)
To: Chuck Lever
Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, LKML Kernel,
lftan, Christoph Hellwig, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Monday 02 June 2014 11:04:23 Chuck Lever wrote:
> I’m wondering what should be done about NFS. A solution for NFS should
> match any scheme that is considered for local file systems, IMO.
>
> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> (See the definition of nfstime3 in RFC 1813).
>
> NFSv4 uses a signed 64-bit value where zero represents midnight UTC
> on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
> the definition of nfstime4 in RFC 5661).
>
> The NFSv4 protocol is probably not problematic, and NFSv3 should be out
> of the picture by 2038. But if changes are planned for dealing _now_
> with timestamp issues, compatibility with NFSv3 is a consideration.
>
> It is already the case that, via NFSv3, the Linux NFS client transmits
> timestamps earlier than 1970 as large positive numbers. Try this with
> xfstests generic/258.
If I read the code correctly, a pre-1970 timestamp will be sent as
a large unsigned integer, but received as a post-2038 timestamp on
64-bit kernels, both in the nfs client and server code.
This behavior is clearly wrong, but it's the same bug that we have
in lots of other file systems, and it makes sense to have the
same fix everywhere, at lease the cases where we know what interpretation
we actually want. NFS has the luxury of having an actual specification
saying that the value is unsigned. For most of the legacy file systems,
we can only make a guess at how other OSs would interpret the same
numbers.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 15:04 ` Chuck Lever
2014-06-02 15:31 ` Theodore Ts'o
2014-06-02 18:52 ` Arnd Bergmann
@ 2014-06-02 18:58 ` Roger Willcocks
2014-06-02 19:04 ` Chuck Lever
2 siblings, 1 reply; 71+ messages in thread
From: Roger Willcocks @ 2014-06-02 18:58 UTC (permalink / raw)
To: Chuck Lever
Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann,
LKML Kernel, geert, Christoph Hellwig, john.stultz,
H. Peter Anvin, linux-fsdevel, lftan, tglx, xfs, joseph
On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> (See the definition of nfstime3 in RFC 1813).
>
nfstime3 could be extended by redefining the otherwise unused
nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
seconds field and an unsigned 30-bit nanoseconds field.
This could represent 1970 +/- 272 years.
Servers could indicate they can understand the extended time format by
adding a new FSINFO capability - FSF3_CANSETTIME_EX.
Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
timestamps so old servers would be protected from new clients.
Old clients don't need to be protected from new servers because the
on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
so they're no worse off than they were before.
Arguably the new server ought to clamp out-of-range timestamps before
sending them to old clients but that would need per-client state (and
nfs3 is stateless.)
--
Roger
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 18:58 ` Roger Willcocks
@ 2014-06-02 19:04 ` Chuck Lever
2014-06-02 19:10 ` Arnd Bergmann
0 siblings, 1 reply; 71+ messages in thread
From: Chuck Lever @ 2014-06-02 19:04 UTC (permalink / raw)
To: Roger Willcocks
Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann,
LKML Kernel, geert, Christoph Hellwig, john.stultz,
H. Peter Anvin, linux-fsdevel, lftan, tglx, xfs, joseph
On Jun 2, 2014, at 2:58 PM, Roger Willcocks <roger@filmlight.ltd.uk> wrote:
>
> On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
>
>> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
>> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
>> (See the definition of nfstime3 in RFC 1813).
>>
>
> nfstime3 could be extended by redefining the otherwise unused
> nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
> seconds field and an unsigned 30-bit nanoseconds field.
>
> This could represent 1970 +/- 272 years.
>
> Servers could indicate they can understand the extended time format by
> adding a new FSINFO capability - FSF3_CANSETTIME_EX.
>
> Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
> timestamps so old servers would be protected from new clients.
You would have to get the IETF’s NFSv4 working group to sign off on
this change. Otherwise, Linux would be the only NFSv3 implementation
that supports the extension.
But I suspect the answer you’d get is “Use NFSv4.”
> Old clients don't need to be protected from new servers because the
> on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
> so they're no worse off than they were before.
>
> Arguably the new server ought to clamp out-of-range timestamps before
> sending them to old clients but that would need per-client state (and
> nfs3 is stateless.)
There’s no reliable way in NFSv3 for clients and servers to identify
the software running on the peer.
Practically speaking, you should assume that the NFSv3 protocol is never
going to change.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 19:04 ` Chuck Lever
@ 2014-06-02 19:10 ` Arnd Bergmann
0 siblings, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 19:10 UTC (permalink / raw)
To: Chuck Lever
Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, xfs,
LKML Kernel, geert, Christoph Hellwig, john.stultz,
H. Peter Anvin, linux-fsdevel, lftan, tglx, Roger Willcocks,
joseph
On Monday 02 June 2014 15:04:27 Chuck Lever wrote:
> On Jun 2, 2014, at 2:58 PM, Roger Willcocks <roger@filmlight.ltd.uk> wrote:
>
> >
> > On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
> >
> >> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> >> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> >> (See the definition of nfstime3 in RFC 1813).
> >>
> >
> > nfstime3 could be extended by redefining the otherwise unused
> > nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
> > seconds field and an unsigned 30-bit nanoseconds field.
> >
> > This could represent 1970 +/- 272 years.
> >
> > Servers could indicate they can understand the extended time format by
> > adding a new FSINFO capability - FSF3_CANSETTIME_EX.
> >
> > Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
> > timestamps so old servers would be protected from new clients.
>
> You would have to get the IETF’s NFSv4 working group to sign off on
> this change. Otherwise, Linux would be the only NFSv3 implementation
> that supports the extension.
>
> But I suspect the answer you’d get is “Use NFSv4.”
While I've never dealt with an NFS standardization, I'd assume this is
a workable answer. The NFSv2 and NFSv3 definition clearly defines a valid
range of times until 2106 using unsigned seconds, and that should really
give enough time to migrate to something better (not necessarily NFSv4).
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 13:52 ` Joseph S. Myers
@ 2014-06-02 19:19 ` Arnd Bergmann
2014-06-02 19:26 ` H. Peter Anvin
2014-06-02 21:02 ` Joseph S. Myers
0 siblings, 2 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 19:19 UTC (permalink / raw)
To: Joseph S. Myers
Cc: hch, linux-mtd, hpa, logfs, linux-afs, linux-arch, linux-cifs,
linux-scsi, ceph-devel, codalist, cluster-devel, coda, geert,
linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx,
linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel,
linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
> On Fri, 30 May 2014, Arnd Bergmann wrote:
>
> > a) is this the right approach in general? The previous discussion
> > pointed this way, but there may be other opinions.
>
> The syscall changes seem like the sort of thing I'd expect, although
> patches adding new syscalls or otherwise affecting the kernel/userspace
> interface (as opposed to those relating to an individual filesystem)
> should go to linux-api as well as other relevant lists.
Ok. Sorry about missing linux-api, I confused it with linux-arch, which
may not be as relevant here, except for the one question whether we
actually want to have the new ABI on all 32-bit architectures or only
as an opt-in for those that expect to stay around for another 24 years.
Two more questions for you:
- are you (and others) happy with adding this type of stat syscall
(fstatat64/fstat64) as opposed to the more generic xstat that has
been discussed in the past and that never made it through the bike-
shedding discussion?
- once we have enough buy-in from reviewers to merge this initial
series, should we proceed to define rest of the syscall ABI
(minus driver ioctls) so glibc and kernel can do the conversion
on top of that, or should we better try to do things one syscall
family at a time and actually get the kernel to handle them
correctly internally?
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 19:19 ` Arnd Bergmann
@ 2014-06-02 19:26 ` H. Peter Anvin
2014-06-02 19:55 ` Arnd Bergmann
2014-06-02 21:02 ` Joseph S. Myers
1 sibling, 1 reply; 71+ messages in thread
From: H. Peter Anvin @ 2014-06-02 19:26 UTC (permalink / raw)
To: Arnd Bergmann, Joseph S. Myers
Cc: hch, linux-mtd, logfs, linux-afs, linux-arch, linux-cifs,
linux-scsi, ceph-devel, cluster-devel, coda, geert, linux-ext4,
codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx,
linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel,
linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
On 06/02/2014 12:19 PM, Arnd Bergmann wrote:
> On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
>> On Fri, 30 May 2014, Arnd Bergmann wrote:
>>
>>> a) is this the right approach in general? The previous discussion
>>> pointed this way, but there may be other opinions.
>>
>> The syscall changes seem like the sort of thing I'd expect, although
>> patches adding new syscalls or otherwise affecting the kernel/userspace
>> interface (as opposed to those relating to an individual filesystem)
>> should go to linux-api as well as other relevant lists.
>
> Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> may not be as relevant here, except for the one question whether we
> actually want to have the new ABI on all 32-bit architectures or only
> as an opt-in for those that expect to stay around for another 24 years.
>
> Two more questions for you:
>
> - are you (and others) happy with adding this type of stat syscall
> (fstatat64/fstat64) as opposed to the more generic xstat that has
> been discussed in the past and that never made it through the bike-
> shedding discussion?
>
> - once we have enough buy-in from reviewers to merge this initial
> series, should we proceed to define rest of the syscall ABI
> (minus driver ioctls) so glibc and kernel can do the conversion
> on top of that, or should we better try to do things one syscall
> family at a time and actually get the kernel to handle them
> correctly internally?
>
The bit that is really going to hurt is every single ioctl that uses a
timespec.
Honestly, though, I really don't understand the point with "struct
inode_time". It seems like the zeroeth-order thing is to change the
kernel internal version of struct timespec to have a 64-bit time... it
isn't just about inodes. We then should be explicit about the external
uses of time, and use accessors.
-hpa
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 19:26 ` H. Peter Anvin
@ 2014-06-02 19:55 ` Arnd Bergmann
2014-06-02 21:57 ` H. Peter Anvin
0 siblings, 1 reply; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-02 19:55 UTC (permalink / raw)
To: H. Peter Anvin
Cc: hch, linux-mtd, logfs, linux-afs, Joseph S. Myers, linux-arch,
linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert,
linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan,
linux-btrfs
On Monday 02 June 2014 12:26:22 H. Peter Anvin wrote:
> On 06/02/2014 12:19 PM, Arnd Bergmann wrote:
> > On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
> >> On Fri, 30 May 2014, Arnd Bergmann wrote:
> >>
> >>> a) is this the right approach in general? The previous discussion
> >>> pointed this way, but there may be other opinions.
> >>
> >> The syscall changes seem like the sort of thing I'd expect, although
> >> patches adding new syscalls or otherwise affecting the kernel/userspace
> >> interface (as opposed to those relating to an individual filesystem)
> >> should go to linux-api as well as other relevant lists.
> >
> > Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> > may not be as relevant here, except for the one question whether we
> > actually want to have the new ABI on all 32-bit architectures or only
> > as an opt-in for those that expect to stay around for another 24 years.
> >
> > Two more questions for you:
> >
> > - are you (and others) happy with adding this type of stat syscall
> > (fstatat64/fstat64) as opposed to the more generic xstat that has
> > been discussed in the past and that never made it through the bike-
> > shedding discussion?
> >
> > - once we have enough buy-in from reviewers to merge this initial
> > series, should we proceed to define rest of the syscall ABI
> > (minus driver ioctls) so glibc and kernel can do the conversion
> > on top of that, or should we better try to do things one syscall
> > family at a time and actually get the kernel to handle them
> > correctly internally?
> >
>
> The bit that is really going to hurt is every single ioctl that uses a
> timespec.
>
> Honestly, though, I really don't understand the point with "struct
> inode_time". It seems like the zeroeth-order thing is to change the
> kernel internal version of struct timespec to have a 64-bit time... it
> isn't just about inodes. We then should be explicit about the external
> uses of time, and use accessors.
I picked these because they are fairly isolated from all other uses,
in particular since inode times are the only things where we really
care about times in the distant past or future (decades away as opposed
to things that happened between boot and shutdown).
For other kernel-internal uses, we may be better off migrating to
a completely different representation, such as nanoseconds since
boot or the architecture specific ktime_t, but this is really something
to decide for each subsystem.
I just tried building an arm32 kernel with a s64 time_t, and that
failed horribly, I get linker errors for missing 64-bit divides
and lots of warnings for code that expects time_t pointers to
functions taking a 'long' or vice versa. I also think the only
way to maintain ABI compatibility is to separate the internal uses
from the interface, which means auditing all code in the end.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 19:19 ` Arnd Bergmann
2014-06-02 19:26 ` H. Peter Anvin
@ 2014-06-02 21:02 ` Joseph S. Myers
2014-06-04 15:05 ` Arnd Bergmann
1 sibling, 1 reply; 71+ messages in thread
From: Joseph S. Myers @ 2014-06-02 21:02 UTC (permalink / raw)
To: Arnd Bergmann
Cc: hch, linux-mtd, hpa, logfs, linux-afs, linux-arch, linux-cifs,
linux-scsi, ceph-devel, codalist, cluster-devel, coda, geert,
linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx,
linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel,
linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
On Mon, 2 Jun 2014, Arnd Bergmann wrote:
> Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> may not be as relevant here, except for the one question whether we
> actually want to have the new ABI on all 32-bit architectures or only
> as an opt-in for those that expect to stay around for another 24 years.
For glibc I think it will make the most sense to add the support for
64-bit time_t across all architectures that currently have 32-bit time_t
(with the new interfaces having fallback support to implementation in
terms of the 32-bit kernel interfaces, if the 64-bit syscalls are
unavailable either at runtime or in the kernel headers against which glibc
is compiled - this fallback code will of course need to check for overflow
when passing a time value to the kernel, hopefully with error handling
consistent with whatever the kernel ends up doing when a filesystem can't
support a timestamp). If some architectures don't provide the new
interfaces in the kernel then that will mean the fallback code in glibc
can't be removed until glibc support for those architectures is removed
(as opposed to removing it when glibc no longer supports kernels predating
the kernel support).
> Two more questions for you:
>
> - are you (and others) happy with adding this type of stat syscall
> (fstatat64/fstat64) as opposed to the more generic xstat that has
> been discussed in the past and that never made it through the bike-
> shedding discussion?
I am.
> - once we have enough buy-in from reviewers to merge this initial
> series, should we proceed to define rest of the syscall ABI
> (minus driver ioctls) so glibc and kernel can do the conversion
> on top of that, or should we better try to do things one syscall
> family at a time and actually get the kernel to handle them
> correctly internally?
I don't have any comments on that ordering question.
--
Joseph S. Myers
joseph@codesourcery.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 19:55 ` Arnd Bergmann
@ 2014-06-02 21:57 ` H. Peter Anvin
2014-06-03 14:22 ` Arnd Bergmann
0 siblings, 1 reply; 71+ messages in thread
From: H. Peter Anvin @ 2014-06-02 21:57 UTC (permalink / raw)
To: Arnd Bergmann
Cc: hch, linux-mtd, logfs, linux-afs, Joseph S. Myers, linux-arch,
linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert,
linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan,
linux-btrfs
On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
>>
>> The bit that is really going to hurt is every single ioctl that uses a
>> timespec.
>>
>> Honestly, though, I really don't understand the point with "struct
>> inode_time". It seems like the zeroeth-order thing is to change the
>> kernel internal version of struct timespec to have a 64-bit time... it
>> isn't just about inodes. We then should be explicit about the external
>> uses of time, and use accessors.
>
> I picked these because they are fairly isolated from all other uses,
> in particular since inode times are the only things where we really
> care about times in the distant past or future (decades away as opposed
> to things that happened between boot and shutdown).
>
If nothing else, I would expect to be able to set the system time to
weird values for testing. So I'm not so sure I agree with that...
> For other kernel-internal uses, we may be better off migrating to
> a completely different representation, such as nanoseconds since
> boot or the architecture specific ktime_t, but this is really something
> to decide for each subsystem.
Having a bunch of different time representations in the kernel seems
like a real headache...
-hpa
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 17:12 ` H. Peter Anvin
2014-06-02 18:50 ` Arnd Bergmann
@ 2014-06-02 22:29 ` Theodore Ts'o
2014-06-02 22:32 ` H. Peter Anvin
1 sibling, 1 reply; 71+ messages in thread
From: Theodore Ts'o @ 2014-06-02 22:29 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann,
LKML Kernel, xfs, Christoph Hellwig, Chuck Lever, john.stultz,
lftan, linux-fsdevel, geert, tglx, joseph
On Mon, Jun 02, 2014 at 10:12:37AM -0700, H. Peter Anvin wrote:
> > It would be problematic for time(2) or gettimeofday(2) to return
> > TIME_UNDEFINED, since there are programs that care about time ticking
> > forward, but I could imagine a new interface which would be permitted
> > to return a flag indicating that we don't know the current time
> > (because the CMOS battery had run down, etc.) so instead we're going
> > to be counting the number of seconds since the system was booted.
>
> This assumes that we actually know that that is the case, which may be
> an aggressive assumption.
We won't know if the RTC clock is wrong, true --- but the kernel will
know if (a) the hardware doesn't have RTC clock at all, or if (b) the
RTC clock is ticking some time that can't be encoded using the current
time_t type. So in that case, the fallback would be to be for the
kernel to tick starting with time_t == 0 when the system is initially
booted, and the "time indefinite flag" would be set.
Now assume that we have a new system call, gettimestampofday(2), which
returns a new timestamp structure which has a 64-bit ts_sec field, the
ts_nsec field (ala struct timespec), and a ts_flags field, where the
kernel could signal things like "time invalid", or "time can't be
encoded in the legacy time_t type", or "I'm not sure if the time is
correct" --- i.e., because the RTC battery isn't working.
Not all hardware might be able to support the last, of course, but if
the battery is low, or the system has been exposed to very low
temperatures (or large amounts of cosmic radiation, etc.) the RTC
time may just be plain wrong. No system is going to be perfect, but
it should be possible to make htings better, at for certain classes of
hardware.
And since we are already returning (time_t) -1 in some cases, we might
as well try to make things a bit more formal.
- Ted
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 22:29 ` Theodore Ts'o
@ 2014-06-02 22:32 ` H. Peter Anvin
2014-06-02 23:32 ` Theodore Ts'o
0 siblings, 1 reply; 71+ messages in thread
From: H. Peter Anvin @ 2014-06-02 22:32 UTC (permalink / raw)
To: Theodore Ts'o, Chuck Lever, Arnd Bergmann, Nicolas Pitre,
Dave Chinner, LKML Kernel, linux-arch, joseph, john.stultz,
Christoph Hellwig, tglx, geert, lftan, linux-fsdevel, xfs,
Linux NFS Mailing List
On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
>
> And since we are already returning (time_t) -1 in some cases, we might
> as well try to make things a bit more formal.
>
Are we? I am not aware of *Linux* actually using that.
-hpa
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 22:32 ` H. Peter Anvin
@ 2014-06-02 23:32 ` Theodore Ts'o
2014-06-02 23:33 ` H. Peter Anvin
2014-06-03 13:09 ` Roger Willcocks
0 siblings, 2 replies; 71+ messages in thread
From: Theodore Ts'o @ 2014-06-02 23:32 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann,
LKML Kernel, xfs, Christoph Hellwig, Chuck Lever, john.stultz,
lftan, linux-fsdevel, geert, tglx, joseph
On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote:
> On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
> >
> > And since we are already returning (time_t) -1 in some cases, we might
> > as well try to make things a bit more formal.
> >
>
> Are we? I am not aware of *Linux* actually using that.
Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
the Posix specification:
SYSCALL_DEFINE1(time, time_t __user *, tloc)
{
time_t i = get_seconds();
if (tloc) {
if (put_user(i,tloc))
return -EFAULT;
}
force_successful_syscall_return();
return i;
}
Cheers,
- Ted
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 23:32 ` Theodore Ts'o
@ 2014-06-02 23:33 ` H. Peter Anvin
2014-06-03 13:09 ` Roger Willcocks
1 sibling, 0 replies; 71+ messages in thread
From: H. Peter Anvin @ 2014-06-02 23:33 UTC (permalink / raw)
To: Theodore Ts'o, Chuck Lever, Arnd Bergmann, Nicolas Pitre,
Dave Chinner, LKML Kernel, linux-arch, joseph, john.stultz,
Christoph Hellwig, tglx, geert, lftan, linux-fsdevel, xfs,
Linux NFS Mailing List
On 06/02/2014 04:32 PM, Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote:
>> On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
>>>
>>> And since we are already returning (time_t) -1 in some cases, we might
>>> as well try to make things a bit more formal.
>>>
>>
>> Are we? I am not aware of *Linux* actually using that.
>
> Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
> the Posix specification:
>
> SYSCALL_DEFINE1(time, time_t __user *, tloc)
> {
> time_t i = get_seconds();
>
> if (tloc) {
> if (put_user(i,tloc))
> return -EFAULT;
> }
> force_successful_syscall_return();
> return i;
> }
>
OK, I guess I should have said... other than for -EFAULT.
I just don't know of anyone using time(2) with an argument other than NULL.
-hpa
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 11:43 ` Arnd Bergmann
@ 2014-06-03 0:32 ` Dave Chinner
2014-06-03 7:33 ` Arnd Bergmann
0 siblings, 1 reply; 71+ messages in thread
From: Dave Chinner @ 2014-06-03 0:32 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > > all file systems at least times until 2106, because they treat
> > > > the on-disk value as unsigned on 64-bit systems, or they use
> > > > a completely different representation. My guess is that somebody
> > > > earlier spent a lot of work on making that happen.
> > > >
> > > > The exceptions are:
> > > >
> > > > * exofs uses signed values, which can probably be changed to be
> > > > consistent with the others.
> > > > * isofs has a bug that limits it until 2027 on architectures with
> > > > a signed 'char' type (otherwise it's 2155).
> > > > * udf can represent times for many thousands of years through a
> > > > 16-bit year representation, but the code to convert to epoch
> > > > uses a const array that ends at 2038.
> > > > * afs uses signed seconds and can probably be fixed
> > > > * coda relies on user space time representation getting passed
> > > > through an ioctl.
> > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > > > where they really use signed.
> > > >
> > > > I was confused about XFS since I didn't noticed that there are
> > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> > >
> > > You've missed an awful lot more than just the implications for the
> > > core kernel code.
> > >
> > > There's a good chance such changes propagate to APIs elsewhere in
> > > the filesystems, because something you haven't realised is that XFS
> > > effectively exposes the on-disk timestamp format directly to
> > > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > > by the online defragmenter.
>
> I really didn't look at them at all, as ioctl is very late on my
> mental list of things to change. I do realize that a lot of drivers
> and file systems do have ioctls that pass time values and we need to
> address them one by one.
>
> I just looked at the ioctls you mentioned but don't see how open-by-handle
> is affected by this. Can you point me to what you mean?
Sorry, I misremembered how some of the XFS open-by-handle code works
in userspace (XFS has a pretty rich open-by-handle ioctl() interface
that predates the kernel syscalls by at least 10 years). Basically
there is code in userspace that uses the information returned from
bulkstat to construct file handles to pass to the open-by-handle
ioctls. xfs_fsr then uses the combination of open-by-handle from the
bulkstat output and the bulkstat output to feed into the swap extent
ioctls....
i.e. the filesystem's idea of what time is is passed to userspace as
an opaque cookie in this case, but it is not used directly by the
open-by-handle interfaces like I implied it was.
> > Just to put that in context, here's the kernel patch to add extended
> > epoch support to XFS. It's completely untested as I haven't done any
> > userspace code changes to enable the feature. However, it should
> > give you an indication of how far the simple act of changing the
> > kernel time representation spread through the filesystem. This does
> > not include any of the VFS infrastructure to specifying the range of
> > supported timestamps. It survives some smoke testing, but dies when
> > the online defragmenter starts using the bulkstat and swap extent
> > ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
> > probably don't have that all sorted correctly yet...
> >
> > To test extended epoch support, however, I need to some fstests that
> > define and validate the behaviour of the new syscalls - until we get
> > those we can't validate that the filesystem follows the spec
> > properly. I also suspect we are going to need an interface to query
> > the supported range of timestamps from a filesystem so that we can
> > test boundary conditions in an automated fashion....
>
> Thanks a lot for having an initial look at this yourself!
>
> I'd still consider the two problems largely orthogonal.
Depends how you look at it. You can't extend the kernel's idea of
time without permanent storage being able to specify the supported
bounds - that's a non-negotiable aspect of introducing extended
epoch timestamp support.
The actual addition of extended timestamp support to each individual
filesystem is orthoganol to the introduction of the struct
inode_time, but doing this addition properly is dependent on the VFS
infrastructure being there in the first place.
> My patch set
> (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> more like 64-bit kernels regarding inode time stamps, which does
> impact all the file systems that the a 64-bit time or the NFS
> unsigned epoch (1970-2106), while your patch extends the file
> system internal epoch (1901-2038 for XFS) so it can be used by
> anything that knows how to handle larger than 32-bit second values
> (either 64-bit kernel or 32-bit with inode_time patch).
Right, but the issue is that 64 bit second counters are broken right
now because most filesystems can't support more than 32 bit values.
So it doesn't matter whether it's 32 bit or 64 bit machines, just
adding explicit support for >32 bit second counters without doing
anything else just extends that brokenness into the indefinite
future.
If we don't fix it now (i.e in the new user API and supporting
infrastructure), then we'll *never be able to fix it* and we'll be
stuck with timestamps that do really weird things when you pass
arbitrary future dates to the kernel.
> > diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h
> > index 623bbe8..79f94722 100644
> > --- a/fs/xfs/xfs_dinode.h
> > +++ b/fs/xfs/xfs_dinode.h
> > @@ -21,11 +21,53 @@
> > #define XFS_DINODE_MAGIC 0x494e /* 'IN' */
> > #define XFS_DINODE_GOOD_VERSION(v) ((v) >= 1 && (v) <= 3)
> >
> > +/*
> > + * Inode timestamps get more complex when we consider supporting times beyond
> > + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot support
> > + * more than a single extension by playing sign games, and that is still not
> > + * reliable. We also can't extend the timestamp structure because there is no
> > + * free space around them in the on-disk inode.
> > + *
> > + * Hence the simplest thing to do is to add an epoch counter for each timestamp
> > + * in the inode. This can be a single byte for each timestamp and make use of
> > + * a hole we currently pad. This gives us another 255 epochs range for the
> > + * timestamps, but requires a superblock feature bit to indicate that these
> > + * fields have meaning and can be non-zero.
>
> Nice trick!
It's a pretty common way of extending the range of a variable for
on-disk formats. The on-disk format is completely disconnected from
the in-memory representation, so it's "easy" to play games like this
within the on-disk format.
If you look closely at ext4, you'll see all the lo/hi variables
where extension of 16->32 bits or 32->48 bits has occurred from
the ext2/3 variable formats... ;)
>
> > +static inline __uint8_t
> > +xfs_timestamp_epoch(
> > + struct timespec *time)
> > +{
> > + /* will be zero until the extended struct inode_time is introduced */
> > + return 0;
> > +}
> > +
> > +static inline __int32_t
> > +xfs_timestamp_sec(
> > + struct timespec *time)
> > +{
> > + return time->tv_sec;
> > +}
> > +
> > +static inline __kernel_time_t
> > +xfs_inode_time_from_epoch(
> > + __uint8_t epoch,
> > + __int32_t seconds)
> > +{
> > + /* need to handle non-zero epoch when struct inode_time is introduced */
> > + ASSERT(epoch == 0);
> > + return seconds;
> > +}
>
> Why don't you already implement epoch conversion for 64-bit kernels that
> are able to represent the time today?
Because I wasn't trying to solve the entire problem, just
demonstrate the infrastructure needed to support extended
timestamps.....
> This is how ext4 does it (I mean
> the sizeof() trick, not the bit stuffing they do):
....
> I guess if there is general agreement on introducing 'struct inode_time',
> we can skip that intermediate step.
Also, I don't like the concept of having filesystems that will work
on 64 bit but not 32 bit machines. Over the past 10 years, we've
managed to remove most of those differences from the VFS and XFS,
so adding new distinctions between 32/64 bit machines is not the
direction I want to head in.
As it is, I'm expecting to do this only after the struct inode_time
and the superblock "time range" infrastructure have been added to
the kernel and VFS. If that change is not made, then we've still
only got 32 bit time....
> > @@ -509,8 +509,11 @@ xfs_sb_has_ro_compat_feature(
> > }
> >
> > #define XFS_SB_FEAT_INCOMPAT_FTYPE (1 << 0) /* filetype in dirent */
> > +#define XFS_SB_FEAT_INCOMPAT_EPOCH (1 << 1) /* Time beyond 2038 */
> > #define XFS_SB_FEAT_INCOMPAT_ALL \
> > - (XFS_SB_FEAT_INCOMPAT_FTYPE)
> > + (XFS_SB_FEAT_INCOMPAT_FTYPE | \
> > + XFS_SB_FEAT_INCOMPAT_EPOCH | \
> > + 0)
> >
> > #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
>
> How does this flag get set?
mkfs.xfs
> Do you have to manually change it in the
> superblock? Since most of the time I'd suspect you wouldn't actually
> use it for the foreseeable future, would it make sense to have a mount
> option that allows it to be set, but doesn't actually change the
> superblock until the first inode gets written with a nonzero epoch?
Yes, we could set the flag on the first timestamp that goes beyond
the current epoch, but that has two problems:
1. filesystem silently becomes incompatible with older
kernels so failed upgrade rollbacks become problematic; and
2. It adds unecessary complexity, as this will end up being
the default behaviour for all new filesystems within a year.
Then we end up with a mount option and conversion functions
that never get used but we have to support for years....
> That way, you'd still be able to mount it with an older kernel but
> also be forward compatible with time moving on.
We've got plenty of time to roll this out so I don't see any need
for putting in place temporary support mechanisms that unnecessarily
complicate the code.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-03 0:32 ` Dave Chinner
@ 2014-06-03 7:33 ` Arnd Bergmann
2014-06-03 8:41 ` Dave Chinner
0 siblings, 1 reply; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-03 7:33 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
> On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> > On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > > > all file systems at least times until 2106, because they treat
> > > > > the on-disk value as unsigned on 64-bit systems, or they use
> > > > > a completely different representation. My guess is that somebody
> > > > > earlier spent a lot of work on making that happen.
> > > > >
> > > > > The exceptions are:
> > > > >
> > > > > * exofs uses signed values, which can probably be changed to be
> > > > > consistent with the others.
> > > > > * isofs has a bug that limits it until 2027 on architectures with
> > > > > a signed 'char' type (otherwise it's 2155).
> > > > > * udf can represent times for many thousands of years through a
> > > > > 16-bit year representation, but the code to convert to epoch
> > > > > uses a const array that ends at 2038.
> > > > > * afs uses signed seconds and can probably be fixed
> > > > > * coda relies on user space time representation getting passed
> > > > > through an ioctl.
> > > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > > > > where they really use signed.
> > > > >
> > > > > I was confused about XFS since I didn't noticed that there are
> > > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> > > >
> > > > You've missed an awful lot more than just the implications for the
> > > > core kernel code.
> > > >
> > > > There's a good chance such changes propagate to APIs elsewhere in
> > > > the filesystems, because something you haven't realised is that XFS
> > > > effectively exposes the on-disk timestamp format directly to
> > > > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > > > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > > > by the online defragmenter.
> >
> > I really didn't look at them at all, as ioctl is very late on my
> > mental list of things to change. I do realize that a lot of drivers
> > and file systems do have ioctls that pass time values and we need to
> > address them one by one.
> >
> > I just looked at the ioctls you mentioned but don't see how open-by-handle
> > is affected by this. Can you point me to what you mean?
>
> Sorry, I misremembered how some of the XFS open-by-handle code works
> in userspace (XFS has a pretty rich open-by-handle ioctl() interface
> that predates the kernel syscalls by at least 10 years). Basically
> there is code in userspace that uses the information returned from
> bulkstat to construct file handles to pass to the open-by-handle
> ioctls. xfs_fsr then uses the combination of open-by-handle from the
> bulkstat output and the bulkstat output to feed into the swap extent
> ioctls....
>
> i.e. the filesystem's idea of what time is is passed to userspace as
> an opaque cookie in this case, but it is not used directly by the
> open-by-handle interfaces like I implied it was.
Ok, I see.
> > My patch set
> > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> > more like 64-bit kernels regarding inode time stamps, which does
> > impact all the file systems that the a 64-bit time or the NFS
> > unsigned epoch (1970-2106), while your patch extends the file
> > system internal epoch (1901-2038 for XFS) so it can be used by
> > anything that knows how to handle larger than 32-bit second values
> > (either 64-bit kernel or 32-bit with inode_time patch).
>
> Right, but the issue is that 64 bit second counters are broken right
> now because most filesystems can't support more than 32 bit values.
> So it doesn't matter whether it's 32 bit or 64 bit machines, just
> adding explicit support for >32 bit second counters without doing
> anything else just extends that brokenness into the indefinite
> future.
Of course, "most filesystems" are obsolete, and most of the modern
file systems already support >32 bit timestamps: ext4, btrfs, cifs,
f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
64-bit systems, which interprets time stamps with the high bit
set as years 2038-2106 rather than 1903-1969.
> If we don't fix it now (i.e in the new user API and supporting
> infrastructure), then we'll *never be able to fix it* and we'll be
> stuck with timestamps that do really weird things when you pass
> arbitrary future dates to the kernel.
We already have that. I agree it's fixable and we should fix it,
but I don't see how this is different from what we had 20 years
ago when Linux on Alpha first introduced a 64-bit time_t. It's
been this way on every 64-bit Linux system since.
> > This is how ext4 does it (I mean
> > the sizeof() trick, not the bit stuffing they do):
> ....
> > I guess if there is general agreement on introducing 'struct inode_time',
> > we can skip that intermediate step.
>
> Also, I don't like the concept of having filesystems that will work
> on 64 bit but not 32 bit machines. Over the past 10 years, we've
> managed to remove most of those differences from the VFS and XFS,
> so adding new distinctions between 32/64 bit machines is not the
> direction I want to head in.
>
> As it is, I'm expecting to do this only after the struct inode_time
> and the superblock "time range" infrastructure have been added to
> the kernel and VFS. If that change is not made, then we've still
> only got 32 bit time....
Ok.
> > Do you have to manually change it in the
> > superblock? Since most of the time I'd suspect you wouldn't actually
> > use it for the foreseeable future, would it make sense to have a mount
> > option that allows it to be set, but doesn't actually change the
> > superblock until the first inode gets written with a nonzero epoch?
>
> Yes, we could set the flag on the first timestamp that goes beyond
> the current epoch, but that has two problems:
>
> 1. filesystem silently becomes incompatible with older
> kernels so failed upgrade rollbacks become problematic; and
>
> 2. It adds unecessary complexity, as this will end up being
> the default behaviour for all new filesystems within a year.
> Then we end up with a mount option and conversion functions
> that never get used but we have to support for years....
>
> > That way, you'd still be able to mount it with an older kernel but
> > also be forward compatible with time moving on.
>
> We've got plenty of time to roll this out so I don't see any need
> for putting in place temporary support mechanisms that unnecessarily
> complicate the code.
Ok, fair enough.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-03 7:33 ` Arnd Bergmann
@ 2014-06-03 8:41 ` Dave Chinner
2014-06-03 9:16 ` Arnd Bergmann
0 siblings, 1 reply; 71+ messages in thread
From: Dave Chinner @ 2014-06-03 8:41 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote:
> On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
> > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > My patch set
> > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> > > more like 64-bit kernels regarding inode time stamps, which does
> > > impact all the file systems that the a 64-bit time or the NFS
> > > unsigned epoch (1970-2106), while your patch extends the file
> > > system internal epoch (1901-2038 for XFS) so it can be used by
> > > anything that knows how to handle larger than 32-bit second values
> > > (either 64-bit kernel or 32-bit with inode_time patch).
> >
> > Right, but the issue is that 64 bit second counters are broken right
> > now because most filesystems can't support more than 32 bit values.
> > So it doesn't matter whether it's 32 bit or 64 bit machines, just
> > adding explicit support for >32 bit second counters without doing
> > anything else just extends that brokenness into the indefinite
> > future.
>
> Of course, "most filesystems" are obsolete, and most of the modern
> file systems already support >32 bit timestamps: ext4, btrfs, cifs,
> f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
> except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
> 64-bit systems, which interprets time stamps with the high bit
> set as years 2038-2106 rather than 1903-1969.
I'm not sure that's an entirely correct representation - the
remainder of the 32 bit-only timestamp filesystems don't actively
interpret the time stamp at all - it's just an opaque 32 bit value.
hence the interpretation of the value is dependent on whether the
kernel treats it as signed or unsigned....
> > infrastructure), then we'll *never be able to fix it* and we'll be
> > stuck with timestamps that do really weird things when you pass
> > arbitrary future dates to the kernel.
>
> We already have that. I agree it's fixable and we should fix it,
> but I don't see how this is different from what we had 20 years
> ago when Linux on Alpha first introduced a 64-bit time_t. It's
> been this way on every 64-bit Linux system since.
I see it differently: we've got 20 years more experience than when
the 64 bit time_t was introduced. That experience tells us that best
practices for API design are to range check every input to prevent
unintended side effects from occurring due to out-of-range data....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-03 8:41 ` Dave Chinner
@ 2014-06-03 9:16 ` Arnd Bergmann
0 siblings, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-03 9:16 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin,
linux-fsdevel, geert, tglx, xfs, joseph
On Tuesday 03 June 2014 18:41:30 Dave Chinner wrote:
> On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote:
> > On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
> > > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> > > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > > My patch set
> > > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> > > > more like 64-bit kernels regarding inode time stamps, which does
> > > > impact all the file systems that the a 64-bit time or the NFS
> > > > unsigned epoch (1970-2106), while your patch extends the file
> > > > system internal epoch (1901-2038 for XFS) so it can be used by
> > > > anything that knows how to handle larger than 32-bit second values
> > > > (either 64-bit kernel or 32-bit with inode_time patch).
> > >
> > > Right, but the issue is that 64 bit second counters are broken right
> > > now because most filesystems can't support more than 32 bit values.
> > > So it doesn't matter whether it's 32 bit or 64 bit machines, just
> > > adding explicit support for >32 bit second counters without doing
> > > anything else just extends that brokenness into the indefinite
> > > future.
> >
> > Of course, "most filesystems" are obsolete, and most of the modern
> > file systems already support >32 bit timestamps: ext4, btrfs, cifs,
> > f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
> > except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
> > 64-bit systems, which interprets time stamps with the high bit
> > set as years 2038-2106 rather than 1903-1969.
>
> I'm not sure that's an entirely correct representation - the
> remainder of the 32 bit-only timestamp filesystems don't actively
> interpret the time stamp at all - it's just an opaque 32 bit value.
> hence the interpretation of the value is dependent on whether the
> kernel treats it as signed or unsigned....
As I mentioned elsewhere in the thread, I don't the way it's handled
is intentional, but it's definitely the file system code that does
the assignment to the timeval and decides on the interpretation, doing
either
inode->i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode.mtime);
or
inode->i_mtime.tv_sec = le32_to_cpu(raw_inode.mtime);
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-05-31 14:30 ` [RFC 00/32] making inode time stamps y2038 ready Vyacheslav Dubeyko
@ 2014-06-03 12:21 ` Arnd Bergmann
0 siblings, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-03 12:21 UTC (permalink / raw)
To: Vyacheslav Dubeyko
Cc: hch, linux-mtd, hpa, logfs, linux-afs, joseph, linux-arch,
linux-cifs, linux-scsi, ceph-devel, codalist, cluster-devel, coda,
geert, linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz,
tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel,
linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
On Saturday 31 May 2014 18:30:49 Vyacheslav Dubeyko wrote:
> By the way, what about NILFS2? Is NILFS2 ready for suggested approach
> without any changes?
nilfs2 and a lot of other file systems don't need any changes for
this, because they don't assign the inode time stamp fields to
a 'struct timespec'.
FWIW, nilfs2 uses a 64-bit seconds value, which is always safe and
can represent the full range of user space timespec on all machines.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 23:32 ` Theodore Ts'o
2014-06-02 23:33 ` H. Peter Anvin
@ 2014-06-03 13:09 ` Roger Willcocks
1 sibling, 0 replies; 71+ messages in thread
From: Roger Willcocks @ 2014-06-03 13:09 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann,
LKML Kernel, geert, xfs, Christoph Hellwig, Chuck Lever,
john.stultz, H. Peter Anvin, linux-fsdevel, lftan, tglx, joseph
On Mon, 2014-06-02 at 19:32 -0400, Theodore Ts'o wrote:
> Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
> the Posix specification:
>
> SYSCALL_DEFINE1(time, time_t __user *, tloc)
> {
> time_t i = get_seconds();
>
> if (tloc) {
> if (put_user(i,tloc))
> return -EFAULT;
> }
> force_successful_syscall_return();
> return i;
> }
get_seconds() returns an unsigned long so there's potential for overflow
here.
--
Roger
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 21:57 ` H. Peter Anvin
@ 2014-06-03 14:22 ` Arnd Bergmann
2014-06-03 14:33 ` Joseph S. Myers
2014-06-03 21:38 ` Dave Chinner
0 siblings, 2 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-03 14:22 UTC (permalink / raw)
To: H. Peter Anvin
Cc: hch, linux-mtd, logfs, linux-afs, Joseph S. Myers, linux-arch,
linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert,
linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan,
linux-btrfs
On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> >>
> >> The bit that is really going to hurt is every single ioctl that uses a
> >> timespec.
> >>
> >> Honestly, though, I really don't understand the point with "struct
> >> inode_time". It seems like the zeroeth-order thing is to change the
> >> kernel internal version of struct timespec to have a 64-bit time... it
> >> isn't just about inodes. We then should be explicit about the external
> >> uses of time, and use accessors.
> >
> > I picked these because they are fairly isolated from all other uses,
> > in particular since inode times are the only things where we really
> > care about times in the distant past or future (decades away as opposed
> > to things that happened between boot and shutdown).
> >
>
> If nothing else, I would expect to be able to set the system time to
> weird values for testing. So I'm not so sure I agree with that...
I think John Stultz and Thomas Gleixner have already started looking
at how the timekeeping code can be updated. Once that is done, we should
be able to add a functional 64-bit gettimeofday/settimeofday syscall
pair. While I definitely agree this is one of the most basic things to
have, it's also not an area of the kernel that is easy to change.
> > For other kernel-internal uses, we may be better off migrating to
> > a completely different representation, such as nanoseconds since
> > boot or the architecture specific ktime_t, but this is really something
> > to decide for each subsystem.
>
> Having a bunch of different time representations in the kernel seems
> like a real headache...
We already have time_t, ktime_t, timeval, timespec, compat_timespec,
clock_t, cputime_t, cputime64_t, tm, nanoseconds, jiffies, jiffies64,
and lots of driver or file system specific representations. I'm all for
removing a bunch of these from the kernel, but my feeling is that this is
one of the cases where we first have to add new ones in order to remove
those that are already there.
To complicate things further, we also have various times bases
(realtime/utc, realtime/tai, monotonic, monotonic_raw, boottime, ...),
and at least for the timespec values we pass around, it's not always
obvious which one is used, of if that's the right one.
We probably don't want to add a lot of new representations, and it's
possible that we can change most of the internal code we have to
ktime_t and then convert that to whatever user space wants at the
interfaces.
The possible uses I can see for non-ktime_t types in the kernel are:
* inodes need 96 bit timestamps to represent the full range of values
that can be stored in a file system, you made a convincing argument
for that. Almost everything else can fit into 64 bit on a 32-bit
kernel, in theory also on a 64-bit kernel if we want that.
* A number of interfaces pass relative timespecs: nanosleep(), poll(),
select(), sigtimedwait(), alarm(), futex() and probably more. There is
nothing wrong with the use of timespec here, and it may be good to
annotate that by using a new type (e.g. struct timeout) that is defined
as compatible with the current timespec.
* For new user interfaces, we need a new type such as the
__kernel_timespec64 I introduced, so it doesn't clash with the normal
user timespec that may be smaller, depending on the libc.
* A lot of drivers will need new ioctl commands, and for drivers that
just need time stamps (audio, v4l, sockets, ...) it may be more
efficient and more correct to use a new timestamp_t (e.g. boot time
64-bit nanoseconds) than __kernel_timespec64, which is not normally
monotonic and requires a normalization step. If we end up introducing
such a type in the user interface, we can also start using it in the
kernel.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-03 14:22 ` Arnd Bergmann
@ 2014-06-03 14:33 ` Joseph S. Myers
2014-06-03 14:37 ` Arnd Bergmann
2014-06-03 21:38 ` Dave Chinner
1 sibling, 1 reply; 71+ messages in thread
From: Joseph S. Myers @ 2014-06-03 14:33 UTC (permalink / raw)
To: Arnd Bergmann
Cc: hch, linux-mtd, H. Peter Anvin, logfs, linux-afs, linux-arch,
linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert,
linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan,
linux-btrfs
On Tue, 3 Jun 2014, Arnd Bergmann wrote:
> I think John Stultz and Thomas Gleixner have already started looking
> at how the timekeeping code can be updated. Once that is done, we should
> be able to add a functional 64-bit gettimeofday/settimeofday syscall
> pair. While I definitely agree this is one of the most basic things to
> have, it's also not an area of the kernel that is easy to change.
64-bit clock_gettime / clock_settime instead of gettimeofday /
settimeofday should avoid the need for the kernel to have a 64-bit version
of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would
need to use a combination of the syscalls if the tz pointer is non-NULL.)
--
Joseph S. Myers
joseph@codesourcery.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-03 14:33 ` Joseph S. Myers
@ 2014-06-03 14:37 ` Arnd Bergmann
0 siblings, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-03 14:37 UTC (permalink / raw)
To: Joseph S. Myers
Cc: hch, linux-mtd, H. Peter Anvin, logfs, linux-afs, linux-arch,
linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert,
linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan,
linux-btrfs
On Tuesday 03 June 2014 14:33:10 Joseph S. Myers wrote:
> On Tue, 3 Jun 2014, Arnd Bergmann wrote:
>
> > I think John Stultz and Thomas Gleixner have already started looking
> > at how the timekeeping code can be updated. Once that is done, we should
> > be able to add a functional 64-bit gettimeofday/settimeofday syscall
> > pair. While I definitely agree this is one of the most basic things to
> > have, it's also not an area of the kernel that is easy to change.
>
> 64-bit clock_gettime / clock_settime instead of gettimeofday /
> settimeofday should avoid the need for the kernel to have a 64-bit version
> of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would
> need to use a combination of the syscalls if the tz pointer is non-NULL.)
Yes, that's what I meant.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-03 14:22 ` Arnd Bergmann
2014-06-03 14:33 ` Joseph S. Myers
@ 2014-06-03 21:38 ` Dave Chinner
2014-06-04 15:03 ` Arnd Bergmann
1 sibling, 1 reply; 71+ messages in thread
From: Dave Chinner @ 2014-06-03 21:38 UTC (permalink / raw)
To: Arnd Bergmann
Cc: hch, linux-mtd, H. Peter Anvin, logfs, linux-afs, Joseph S. Myers,
linux-arch, linux-cifs, linux-scsi, ceph-devel, cluster-devel,
coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel,
xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev,
samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel,
linux-fsdevel, lftan, linux-btrfs
On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote:
> On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> > On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> The possible uses I can see for non-ktime_t types in the kernel are:
> * inodes need 96 bit timestamps to represent the full range of values
> that can be stored in a file system, you made a convincing argument
> for that. Almost everything else can fit into 64 bit on a 32-bit
> kernel, in theory also on a 64-bit kernel if we want that.
Just ot be pedantic, inodes don't *need* 96 bit timestamps - some
filesystems can *support up to* 96 bit timestamps. If the kernel
only supports 64 bit timestamps and that's all the kernel can
represent, then the upper bits of the 96 bit on-disk inode
timestamps simply remain zero.
If you move the filesystem between kernels with different time
ranges, then the filesystem needs to be able to tell the kernel what
it's supported range is. This is where having the VFS limit the
range of supported timestamps is important: the limit is the
min(kernel range, filesystem range). This allows the filesystems
to be indepenent of the kernel time representation, and the kernel
to be independent of the physical filesystem time encoding....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-03 21:38 ` Dave Chinner
@ 2014-06-04 15:03 ` Arnd Bergmann
2014-06-04 17:30 ` Nicolas Pitre
0 siblings, 1 reply; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-04 15:03 UTC (permalink / raw)
To: Dave Chinner
Cc: hch, linux-mtd, H. Peter Anvin, logfs, linux-afs, Joseph S. Myers,
linux-arch, linux-cifs, linux-scsi, ceph-devel, cluster-devel,
coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel,
xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev,
samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel,
linux-fsdevel, lftan, linux-btrfs
On Tuesday 03 June 2014, Dave Chinner wrote:
> On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote:
> > On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> > > On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> > The possible uses I can see for non-ktime_t types in the kernel are:
> > * inodes need 96 bit timestamps to represent the full range of values
> > that can be stored in a file system, you made a convincing argument
> > for that. Almost everything else can fit into 64 bit on a 32-bit
> > kernel, in theory also on a 64-bit kernel if we want that.
>
> Just ot be pedantic, inodes don't need 96 bit timestamps - some
> filesystems can *support up to* 96 bit timestamps. If the kernel
> only supports 64 bit timestamps and that's all the kernel can
> represent, then the upper bits of the 96 bit on-disk inode
> timestamps simply remain zero.
I meant the reverse: since we have file systems that can store
96-bit timestamps when using 64-bit kernels, we need to extend
32-bit kernels to have the same internal representation so we
can actually read those file systems correctly.
> If you move the filesystem between kernels with different time
> ranges, then the filesystem needs to be able to tell the kernel what
> it's supported range is. This is where having the VFS limit the
> range of supported timestamps is important: the limit is the
> min(kernel range, filesystem range). This allows the filesystems
> to be indepenent of the kernel time representation, and the kernel
> to be independent of the physical filesystem time encoding....
I agree it makes sense to let the kernel know about the limits
of the file system it accesses, but for the reverse, we're probably
better off just making the kernel representation large enough (i.e.
96 bits) so it can work with any known file system. We need another
check at the user space boundary to turn that into a value that the
user can understand, but that's another problem.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 21:02 ` Joseph S. Myers
@ 2014-06-04 15:05 ` Arnd Bergmann
0 siblings, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-04 15:05 UTC (permalink / raw)
To: Joseph S. Myers
Cc: hch, linux-mtd, hpa, logfs, linux-afs, linux-arch, linux-cifs,
linux-scsi, ceph-devel, codalist, cluster-devel, coda, geert,
linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx,
linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel,
linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
On Monday 02 June 2014, Joseph S. Myers wrote:
> On Mon, 2 Jun 2014, Arnd Bergmann wrote:
>
> > Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> > may not be as relevant here, except for the one question whether we
> > actually want to have the new ABI on all 32-bit architectures or only
> > as an opt-in for those that expect to stay around for another 24 years.
>
> For glibc I think it will make the most sense to add the support for
> 64-bit time_t across all architectures that currently have 32-bit time_t
> (with the new interfaces having fallback support to implementation in
> terms of the 32-bit kernel interfaces, if the 64-bit syscalls are
> unavailable either at runtime or in the kernel headers against which glibc
> is compiled - this fallback code will of course need to check for overflow
> when passing a time value to the kernel, hopefully with error handling
> consistent with whatever the kernel ends up doing when a filesystem can't
> support a timestamp). If some architectures don't provide the new
> interfaces in the kernel then that will mean the fallback code in glibc
> can't be removed until glibc support for those architectures is removed
> (as opposed to removing it when glibc no longer supports kernels predating
> the kernel support).
Ok, that's a good reason to just provide the new interfaces on all
architectures right away. Thanks for the insight!
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-04 15:03 ` Arnd Bergmann
@ 2014-06-04 17:30 ` Nicolas Pitre
2014-06-04 19:24 ` Arnd Bergmann
0 siblings, 1 reply; 71+ messages in thread
From: Nicolas Pitre @ 2014-06-04 17:30 UTC (permalink / raw)
To: Arnd Bergmann
Cc: hch, linux-mtd, H. Peter Anvin, linux-f2fs-devel, ceph-devel,
Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, linux-afs,
cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel,
reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev,
samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel,
lftan, ocfs2-devel
On Wed, 4 Jun 2014, Arnd Bergmann wrote:
> On Tuesday 03 June 2014, Dave Chinner wrote:
> > Just ot be pedantic, inodes don't need 96 bit timestamps - some
> > filesystems can *support up to* 96 bit timestamps. If the kernel
> > only supports 64 bit timestamps and that's all the kernel can
> > represent, then the upper bits of the 96 bit on-disk inode
> > timestamps simply remain zero.
>
> I meant the reverse: since we have file systems that can store
> 96-bit timestamps when using 64-bit kernels, we need to extend
> 32-bit kernels to have the same internal representation so we
> can actually read those file systems correctly.
>
> > If you move the filesystem between kernels with different time
> > ranges, then the filesystem needs to be able to tell the kernel what
> > it's supported range is. This is where having the VFS limit the
> > range of supported timestamps is important: the limit is the
> > min(kernel range, filesystem range). This allows the filesystems
> > to be indepenent of the kernel time representation, and the kernel
> > to be independent of the physical filesystem time encoding....
>
> I agree it makes sense to let the kernel know about the limits
> of the file system it accesses, but for the reverse, we're probably
> better off just making the kernel representation large enough (i.e.
> 96 bits) so it can work with any known file system.
Depends... 96 bit handling may get prohibitive on 32-bit archs.
The important point here is for the kernel to be able to represent the
time _range_ used by any known filesystem, not necessarily the time
_precision_.
For example, a 64 bit representation can be made of 40 bits for seconds
spanning 34865 years, and 24 bits for fractional seconds providing
precision down to 60 nanosecs. That ought to be plenty good on 32 bit
systems while still being cheap to handle.
Nicolas
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-04 17:30 ` Nicolas Pitre
@ 2014-06-04 19:24 ` Arnd Bergmann
2014-06-05 0:10 ` H. Peter Anvin
0 siblings, 1 reply; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-04 19:24 UTC (permalink / raw)
To: Nicolas Pitre
Cc: hch, linux-mtd, H. Peter Anvin, linux-f2fs-devel, ceph-devel,
Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, linux-afs,
cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel,
reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev,
samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel,
lftan, ocfs2-devel
On Wednesday 04 June 2014 13:30:32 Nicolas Pitre wrote:
> On Wed, 4 Jun 2014, Arnd Bergmann wrote:
>
> > On Tuesday 03 June 2014, Dave Chinner wrote:
> > > Just ot be pedantic, inodes don't need 96 bit timestamps - some
> > > filesystems can *support up to* 96 bit timestamps. If the kernel
> > > only supports 64 bit timestamps and that's all the kernel can
> > > represent, then the upper bits of the 96 bit on-disk inode
> > > timestamps simply remain zero.
> >
> > I meant the reverse: since we have file systems that can store
> > 96-bit timestamps when using 64-bit kernels, we need to extend
> > 32-bit kernels to have the same internal representation so we
> > can actually read those file systems correctly.
> >
> > > If you move the filesystem between kernels with different time
> > > ranges, then the filesystem needs to be able to tell the kernel what
> > > it's supported range is. This is where having the VFS limit the
> > > range of supported timestamps is important: the limit is the
> > > min(kernel range, filesystem range). This allows the filesystems
> > > to be indepenent of the kernel time representation, and the kernel
> > > to be independent of the physical filesystem time encoding....
> >
> > I agree it makes sense to let the kernel know about the limits
> > of the file system it accesses, but for the reverse, we're probably
> > better off just making the kernel representation large enough (i.e.
> > 96 bits) so it can work with any known file system.
>
> Depends... 96 bit handling may get prohibitive on 32-bit archs.
>
> The important point here is for the kernel to be able to represent the
> time _range_ used by any known filesystem, not necessarily the time
> _precision_.
>
> For example, a 64 bit representation can be made of 40 bits for seconds
> spanning 34865 years, and 24 bits for fractional seconds providing
> precision down to 60 nanosecs. That ought to be plenty good on 32 bit
> systems while still being cheap to handle.
I have checked earlier that we don't do any computation on inode
time stamps in common code, we just pass them around, so there is
very little runtime overhead. There is a small bit of space overhead
(12 byte) per inode, but that structure is already on the order of
500 bytes.
For other timekeeping stuff in the kernel, I agree that using some
64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
...) has advantages, that's exactly the point I was making earlier
against simply extending the internal time_t/timespec to 64-bit
seconds for everything.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-04 19:24 ` Arnd Bergmann
@ 2014-06-05 0:10 ` H. Peter Anvin
2014-06-10 9:54 ` Arnd Bergmann
0 siblings, 1 reply; 71+ messages in thread
From: H. Peter Anvin @ 2014-06-05 0:10 UTC (permalink / raw)
To: Arnd Bergmann, Nicolas Pitre
Cc: hch, linux-mtd, linux-f2fs-devel, ceph-devel, Joseph S. Myers,
linux-arch, linux-cifs, linux-scsi, linux-afs, cluster-devel,
coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel,
xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev,
samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel,
lftan, ocfs2-devel
On 06/04/2014 12:24 PM, Arnd Bergmann wrote:
>
> For other timekeeping stuff in the kernel, I agree that using some
> 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
> ...) has advantages, that's exactly the point I was making earlier
> against simply extending the internal time_t/timespec to 64-bit
> seconds for everything.
>
How much of a performance issue is it to make time_t 64 bits, and for
the bits there are, how hard are they to fix?
-hpa
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-05 0:10 ` H. Peter Anvin
@ 2014-06-10 9:54 ` Arnd Bergmann
0 siblings, 0 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-06-10 9:54 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Nicolas Pitre, hch, linux-mtd, linux-f2fs-devel, ceph-devel,
Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, linux-afs,
cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel,
reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev,
samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel,
lftan, ocfs2-devel
On Wednesday 04 June 2014 17:10:24 H. Peter Anvin wrote:
> On 06/04/2014 12:24 PM, Arnd Bergmann wrote:
> >
> > For other timekeeping stuff in the kernel, I agree that using some
> > 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
> > ...) has advantages, that's exactly the point I was making earlier
> > against simply extending the internal time_t/timespec to 64-bit
> > seconds for everything.
> >
>
> How much of a performance issue is it to make time_t 64 bits, and for
> the bits there are, how hard are they to fix?
Probably very little overhead for most uses, it's more the regression
potential in the less common parts of the kernel I'm worried about.
There is a significant but not overwhelming number of uses of the
main problematic types in the kernel:
arnd@wuerfel:~/arm-soc$ git grep -wl time_t | wc
188 188 5566
arnd@wuerfel:~/arm-soc$ git grep -wl timeval | wc
320 320 10353
arnd@wuerfel:~/arm-soc$ git grep -wl timespec | wc
406 406 10886
I believe we have to audit all of them anyway if we want to change
the kernel to less problematic types and introduce new user
interfaces.
IMHO this work is helped if we change the uses to a new type
as we find the problems. This lets us do the work one subsystem
at a time and avoid accidental ABI changes. I don't care much what
type that will be, and having a 96-bit type will certainly work
well in a lot of cases, but I don't see a strong reason to use
that over other types, especially when they can be more efficient.
Arnd
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread
end of thread, other threads:[~2014-06-10 9:57 UTC | newest]
Thread overview: 71+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann
2014-05-30 20:01 ` [RFC 11/32] xfs: convert to struct inode_time Arnd Bergmann
2014-05-31 0:37 ` Dave Chinner
2014-05-31 0:41 ` H. Peter Anvin
2014-05-31 1:14 ` Dave Chinner
2014-05-31 1:22 ` H. Peter Anvin
2014-05-31 5:54 ` Dave Chinner
2014-05-31 8:41 ` H. Peter Anvin
2014-05-31 15:46 ` Nicolas Pitre
2014-06-01 19:56 ` Arnd Bergmann
2014-06-01 20:26 ` H. Peter Anvin
2014-06-02 11:02 ` Arnd Bergmann
2014-06-02 1:36 ` Nicolas Pitre
2014-06-02 2:22 ` Dave Chinner
2014-06-02 7:09 ` Geert Uytterhoeven
2014-06-02 10:56 ` Arnd Bergmann
2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 12:38 ` Arnd Bergmann
2014-06-02 13:15 ` Theodore Ts'o
2014-06-02 12:52 ` Arnd Bergmann
2014-06-02 13:07 ` Theodore Ts'o
2014-06-02 15:01 ` Arnd Bergmann
2014-06-02 14:52 ` H. Peter Anvin
2014-06-02 15:04 ` Chuck Lever
2014-06-02 15:31 ` Theodore Ts'o
2014-06-02 17:12 ` H. Peter Anvin
2014-06-02 18:50 ` Arnd Bergmann
2014-06-02 22:29 ` Theodore Ts'o
2014-06-02 22:32 ` H. Peter Anvin
2014-06-02 23:32 ` Theodore Ts'o
2014-06-02 23:33 ` H. Peter Anvin
2014-06-03 13:09 ` Roger Willcocks
2014-06-02 18:52 ` Arnd Bergmann
2014-06-02 18:58 ` Roger Willcocks
2014-06-02 19:04 ` Chuck Lever
2014-06-02 19:10 ` Arnd Bergmann
2014-06-01 0:39 ` Dave Chinner
2014-06-02 14:00 ` Joseph S. Myers
2014-05-31 15:37 ` Arnd Bergmann
2014-06-01 0:24 ` Dave Chinner
2014-06-02 0:28 ` Dave Chinner
2014-06-02 11:35 ` Roger Willcocks
2014-06-02 11:43 ` Arnd Bergmann
2014-06-03 0:32 ` Dave Chinner
2014-06-03 7:33 ` Arnd Bergmann
2014-06-03 8:41 ` Dave Chinner
2014-06-03 9:16 ` Arnd Bergmann
2014-05-31 14:30 ` [RFC 00/32] making inode time stamps y2038 ready Vyacheslav Dubeyko
2014-06-03 12:21 ` Arnd Bergmann
2014-05-31 14:51 ` Richard Cochran
[not found] ` <6347520.8jMPlVsFjM@wuerfel>
2014-05-31 16:20 ` Geert Uytterhoeven
2014-05-31 18:22 ` Richard Cochran
2014-05-31 19:34 ` H. Peter Anvin
2014-06-01 4:46 ` Richard Cochran
2014-06-01 4:44 ` Richard Cochran
2014-06-02 13:52 ` Joseph S. Myers
2014-06-02 19:19 ` Arnd Bergmann
2014-06-02 19:26 ` H. Peter Anvin
2014-06-02 19:55 ` Arnd Bergmann
2014-06-02 21:57 ` H. Peter Anvin
2014-06-03 14:22 ` Arnd Bergmann
2014-06-03 14:33 ` Joseph S. Myers
2014-06-03 14:37 ` Arnd Bergmann
2014-06-03 21:38 ` Dave Chinner
2014-06-04 15:03 ` Arnd Bergmann
2014-06-04 17:30 ` Nicolas Pitre
2014-06-04 19:24 ` Arnd Bergmann
2014-06-05 0:10 ` H. Peter Anvin
2014-06-10 9:54 ` Arnd Bergmann
2014-06-02 21:02 ` Joseph S. Myers
2014-06-04 15:05 ` Arnd Bergmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox