* [RFC 00/32] making inode time stamps y2038 ready
@ 2014-05-30 20:01 Arnd Bergmann
2014-05-30 20:01 ` [RFC 11/32] xfs: convert to struct inode_time Arnd Bergmann
` (3 more replies)
0 siblings, 4 replies; 71+ messages in thread
From: Arnd Bergmann @ 2014-05-30 20:01 UTC (permalink / raw)
To: linux-kernel
Cc: hch, linux-mtd, hpa, logfs, linux-afs, joseph, linux-arch,
linux-cifs, linux-scsi, ceph-devel, codalist, cluster-devel, coda,
geert, linux-ext4, Arnd Bergmann, fuse-devel, reiserfs-devel, xfs,
john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical,
linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
Based on the recent discussion about 64-bit time_t for new
architectures, and for solving the year 2038 problem in general,
I decided to try out what it would take to solve part of the
kernel side of things.
This is a proof-of-concept work to get us to the point where
two system calls (utimes and stat) provide a working interface
to user space to pass 64-bit inode time stamps in and out of
the kernel all the way to the file systems.
I picked this because it is a fairly isolated problem, as the
inode time stamps are rarely assigned to any other time values.
As a byproduct of this work, I documented for each of the file
systems we support how long the on-disk format can work[1].
Obviously we also need to convert all the other syscalls and
have a proper libc implementation using those for this to
be really useful, but it's a start and it can be tested
independently (I didn't so far, want to wait for initial
feedback).
All the interesting stuff is in the first five patches here,
the rest is the straightforward conversion of all file systems
that use 'timespec' values internally.
There are of course a number of open questions:
a) is this the right approach in general? The previous discussion
pointed this way, but there may be other opinions.
b) what type should we use internally to represent inode time
stamps? The code contains three different versions that would
all work, we just have to pick a good tradeoff between
efficiency and the range of times we want to cover.
c) Should we continue this way for all 32-bit platforms for
consistency, including future ones, or should we go to
different 64-bit types right away? My feeling is that the
second approach would complicate this work.
Arnd
[1] http://kernelnewbies.org/y2038
Arnd Bergmann (32):
fs: introduce new 'struct inode_time'
uapi: add struct __kernel_timespec{32,64}
fs: introduce sys_utimens64at
fs: introduce sys_newfstat64/sys_newfstatat64
arch: hook up new stat and utimes syscalls
isofs: fix timestamps beyond 2027
fs/nfs: convert to struct inode_time
fs/ceph: convert to 'struct inode_time'
fs/pstore: convert to struct inode_time
fs/coda: convert to struct inode_time
xfs: convert to struct inode_time
btrfs: convert to struct inode_time
ext3: convert to struct inode_time
ext4: convert to struct inode_time
cifs: convert to struct inode_time
ntfs: convert to struct inode_time
ubifs: convert to struct inode_time
ocfs2: convert to struct inode_time
fs/fat: convert to struct inode_time
afs: convert to struct inode_time
udf: convert to struct inode_time
fs: convert simple fs to inode_time
logfs: convert to struct inode_time
hfs, hfsplus: convert to struct inode_time
gfs2: convert to struct inode_time
reiserfs: convert to struct inode_time
jffs2: convert to struct inode_time
adfs: convert to struct inode_time
f2fs: convert to struct inode_time
fuse: convert to struct inode_time
scsi: fnic: use current_kernel_time() for timestamp
fs: use new inode_time definition unconditionally
arch/alpha/kernel/osf_sys.c | 2 +-
arch/arm/include/asm/unistd.h | 2 +-
arch/arm/include/uapi/asm/stat.h | 25 +++++++++++++++++
arch/arm/include/uapi/asm/unistd.h | 3 +++
arch/arm/kernel/calls.S | 3 +++
arch/arm64/include/asm/unistd32.h | 5 +++-
arch/x86/include/uapi/asm/stat.h | 28 +++++++++++++++++++
arch/x86/syscalls/syscall_32.tbl | 3 +++
drivers/block/rbd.c | 2 +-
drivers/firmware/efi/efi-pstore.c | 28 +++++++++----------
drivers/scsi/fnic/fnic_trace.c | 2 +-
drivers/tty/tty_io.c | 2 +-
drivers/usb/gadget/f_fs.c | 2 +-
fs/adfs/inode.c | 4 +--
fs/afs/afs.h | 6 ++---
fs/afs/fsclient.c | 2 +-
fs/attr.c | 8 +++---
fs/btrfs/file.c | 6 ++---
fs/btrfs/inode.c | 4 +--
fs/btrfs/ioctl.c | 4 +--
fs/btrfs/root-tree.c | 2 +-
fs/btrfs/transaction.c | 2 +-
fs/ceph/cache.c | 2 +-
fs/ceph/caps.c | 6 ++---
fs/ceph/file.c | 4 +--
fs/ceph/inode.c | 20 +++++++-------
fs/ceph/super.h | 8 +++---
fs/cifs/cache.c | 6 ++---
fs/cifs/cifsglob.h | 6 ++---
fs/cifs/cifsproto.h | 6 ++---
fs/cifs/cifssmb.c | 5 ++--
fs/cifs/inode.c | 2 +-
fs/cifs/netmisc.c | 15 ++++++-----
fs/coda/coda_linux.c | 18 ++++++++-----
fs/compat.c | 19 ++-----------
fs/configfs/inode.c | 6 ++---
fs/cramfs/inode.c | 2 +-
fs/ext3/inode.c | 4 +--
fs/ext4/ext4.h | 10 +++----
fs/ext4/extents.c | 2 +-
fs/f2fs/file.c | 6 ++---
fs/fat/dir.c | 2 +-
fs/fat/fat.h | 6 ++---
fs/fat/misc.c | 4 +--
fs/fat/namei_msdos.c | 8 +++---
fs/fat/namei_vfat.c | 10 +++----
fs/fuse/inode.c | 6 ++---
fs/gfs2/dir.c | 6 ++---
fs/gfs2/glops.c | 4 +--
fs/hfs/hfs_fs.h | 2 +-
fs/hfsplus/hfsplus_fs.h | 2 +-
fs/inode.c | 18 ++++++-------
fs/isofs/util.c | 2 +-
fs/jffs2/os-linux.h | 2 +-
fs/locks.c | 4 +--
fs/logfs/readwrite.c | 18 ++++++-------
fs/nfs/callback.h | 4 +--
fs/nfs/callback_xdr.c | 6 ++---
fs/nfs/file.c | 2 +-
fs/nfs/fscache-index.c | 8 +++---
fs/nfs/inode.c | 10 +++----
fs/nfs/internal.h | 4 +--
fs/nfs/netns.h | 2 +-
fs/nfs/nfs2xdr.c | 8 +++---
fs/nfs/nfs3xdr.c | 10 +++----
fs/nfs/nfs4xdr.c | 20 +++++++-------
fs/nfsd/nfs3xdr.c | 6 ++---
fs/nfsd/nfsfh.h | 4 +--
fs/nfsd/nfsxdr.c | 2 +-
fs/ntfs/inode.c | 12 ++++-----
fs/ntfs/time.h | 8 +++---
fs/ocfs2/dlmglue.c | 16 +++++------
fs/ocfs2/file.c | 6 ++---
fs/ocfs2/ocfs2.h | 2 +-
fs/pstore/inode.c | 2 +-
fs/pstore/internal.h | 2 +-
fs/pstore/platform.c | 2 +-
fs/pstore/ram.c | 18 +++++++------
fs/reiserfs/namei.c | 2 +-
fs/reiserfs/xattr.c | 4 +--
fs/stat.c | 55 ++++++++++++++++++++++++++++++++++++++
fs/ubifs/dir.c | 2 +-
fs/ubifs/file.c | 16 +++++------
fs/ubifs/misc.h | 2 +-
fs/udf/udf_i.h | 2 +-
fs/udf/udf_sb.h | 2 +-
fs/udf/udfdecl.h | 7 ++---
fs/udf/udftime.c | 7 ++---
fs/utimes.c | 47 +++++++++++++++++++++++++++-----
fs/xfs/time.h | 4 +--
fs/xfs/xfs_inode.c | 2 +-
fs/xfs/xfs_iops.c | 2 +-
fs/xfs/xfs_trans_inode.c | 6 ++---
include/linux/ceph/decode.h | 8 +++---
include/linux/ceph/osd_client.h | 4 +--
include/linux/compat.h | 2 +-
include/linux/fs.h | 32 +++++++++++-----------
include/linux/nfs_fs_sb.h | 2 +-
include/linux/nfs_xdr.h | 14 +++++-----
include/linux/pstore.h | 4 +--
include/linux/stat.h | 6 ++---
include/linux/syscalls.h | 9 ++++++-
include/linux/time.h | 44 +++++++++++++++++++++++++++---
include/uapi/asm-generic/stat.h | 29 ++++++++++++++++++--
include/uapi/asm-generic/unistd.h | 8 +++++-
include/uapi/linux/coda.h | 1 +
include/uapi/linux/time.h | 40 ++++++++++++++++++++++++++-
init/initramfs.c | 2 +-
kernel/audit.c | 2 +-
kernel/auditsc.c | 2 +-
kernel/time.c | 44 +++++++++++++++++++++++++-----
kernel/time/timekeeping.c | 16 +++++++++++
net/ceph/auth_x.c | 2 +-
net/ceph/osd_client.c | 4 +--
114 files changed, 642 insertions(+), 333 deletions(-)
--
1.8.3.2
Bcc: "J. Bruce Fields" <bfields@fieldses.org>
Bcc: "Theodore Ts'o" <tytso@mit.edu>
Bcc: Adrian Hunter <adrian.hunter@intel.com>
Bcc: Andreas Dilger <adilger.kernel@dilger.ca>
Bcc: Andrew Morton <akpm@linux-foundation.org>
Bcc: Anton Altaparmakov <anton@tuxera.com>
Bcc: Anton Vorontsov <anton@enomsg.org>
Bcc: Artem Bityutskiy <dedekind1@gmail.com>
Bcc: Brian Uchino <buchino@cisco.com>
Bcc: Chris Mason <clm@fb.com>
Bcc: Colin Cross <ccross@android.com>
Bcc: Dave Chinner <david@fromorbit.com>
Bcc: David Howells <dhowells@redhat.com>
Bcc: David Woodhouse <dwmw2@infradead.org>
Bcc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bcc: Hiral Patel <hiralpat@cisco.com>
Bcc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Bcc: Jan Harkes <jaharkes@cs.cmu.edu>
Bcc: Jan Kara <jack@suse.cz>
Bcc: Joel Becker <jlbec@evilplan.org>
Bcc: Joern Engel <joern@logfs.org>
Bcc: Josef Bacik <jbacik@fb.com>
Bcc: Kees Cook <keescook@chromium.org>
Bcc: Mark Fasheh <mfasheh@suse.com>
Bcc: Miklos Szeredi <miklos@szeredi.hu>
Bcc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Bcc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Bcc: Sage Weil <sage@inktank.com>
Bcc: Steve French <sfrench@samba.org>
Bcc: Steven Whitehouse <swhiteho@redhat.com>
Bcc: Suma Ramars <sramars@cisco.com>
Bcc: Tony Luck <tony.luck@intel.com>
Cc: ceph-devel@vger.kernel.org
Cc: cluster-devel@redhat.com
Cc: coda@cs.cmu.edu
Cc: codalist@coda.cs.cmu.edu
Cc: fuse-devel@lists.sourceforge.net
Cc: linux-afs@lists.infradead.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: linux-ext4@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mtd@lists.infradead.org
Cc: linux-nfs@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: linux-scsi@vger.kernel.org
Cc: logfs@logfs.org
Cc: ocfs2-devel@oss.oracle.com
Cc: reiserfs-devel@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: xfs@oss.sgi.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 71+ messages in thread* [RFC 11/32] xfs: convert to struct inode_time 2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann @ 2014-05-30 20:01 ` Arnd Bergmann 2014-05-31 0:37 ` Dave Chinner 2014-05-31 14:30 ` [RFC 00/32] making inode time stamps y2038 ready Vyacheslav Dubeyko ` (2 subsequent siblings) 3 siblings, 1 reply; 71+ messages in thread From: Arnd Bergmann @ 2014-05-30 20:01 UTC (permalink / raw) To: linux-kernel Cc: linux-arch, Arnd Bergmann, hpa, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph xfs uses unsigned 32-bit seconds for inode timestamps, which will work for the next 92 years, but the VFS uses struct timespec for timestamps, which is only good until 2038 on 32-bit CPUs. This gets us one small step closer to lifting the VFS limit by using struct inode_time in XFS. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Dave Chinner <david@fromorbit.com> Cc: xfs@oss.sgi.com --- fs/xfs/time.h | 4 ++-- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_iops.c | 2 +- fs/xfs/xfs_trans_inode.c | 6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/xfs/time.h b/fs/xfs/time.h index 387e695..a490f1b 100644 --- a/fs/xfs/time.h +++ b/fs/xfs/time.h @@ -21,14 +21,14 @@ #include <linux/sched.h> #include <linux/time.h> -typedef struct timespec timespec_t; +typedef struct inode_time timespec_t; static inline void delay(long ticks) { schedule_timeout_uninterruptible(ticks); } -static inline void nanotime(struct timespec *tvp) +static inline void nanotime(struct inode_time *tvp) { *tvp = CURRENT_TIME; } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index a6115fe..16d5392 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -654,7 +654,7 @@ xfs_ialloc( xfs_inode_t *ip; uint flags; int error; - timespec_t tv; + struct inode_time tv; /* * Call the space management code to pick diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 205613a..092ee7c 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -956,7 +956,7 @@ xfs_vn_setattr( STATIC int xfs_vn_update_time( struct inode *inode, - struct timespec *now, + struct inode_time *now, int flags) { struct xfs_inode *ip = XFS_I(inode); diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c index 50c3f56..bae2520 100644 --- a/fs/xfs/xfs_trans_inode.c +++ b/fs/xfs/xfs_trans_inode.c @@ -70,7 +70,7 @@ xfs_trans_ichgtime( int flags) { struct inode *inode = VFS_I(ip); - timespec_t tv; + struct inode_time tv; ASSERT(tp); ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); @@ -78,13 +78,13 @@ xfs_trans_ichgtime( tv = current_fs_time(inode->i_sb); if ((flags & XFS_ICHGTIME_MOD) && - !timespec_equal(&inode->i_mtime, &tv)) { + !inode_time_equal(&inode->i_mtime, &tv)) { inode->i_mtime = tv; ip->i_d.di_mtime.t_sec = tv.tv_sec; ip->i_d.di_mtime.t_nsec = tv.tv_nsec; } if ((flags & XFS_ICHGTIME_CHG) && - !timespec_equal(&inode->i_ctime, &tv)) { + !inode_time_equal(&inode->i_ctime, &tv)) { inode->i_ctime = tv; ip->i_d.di_ctime.t_sec = tv.tv_sec; ip->i_d.di_ctime.t_nsec = tv.tv_nsec; -- 1.8.3.2 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-30 20:01 ` [RFC 11/32] xfs: convert to struct inode_time Arnd Bergmann @ 2014-05-31 0:37 ` Dave Chinner 2014-05-31 0:41 ` H. Peter Anvin 0 siblings, 1 reply; 71+ messages in thread From: Dave Chinner @ 2014-05-31 0:37 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-arch, hpa, linux-kernel, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph On Fri, May 30, 2014 at 10:01:35PM +0200, Arnd Bergmann wrote: > xfs uses unsigned 32-bit seconds for inode timestamps, which will work > for the next 92 years, but the VFS uses struct timespec for timestamps, > which is only good until 2038 on 32-bit CPUs. > > This gets us one small step closer to lifting the VFS limit by using > struct inode_time in XFS. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > Cc: Dave Chinner <david@fromorbit.com> > Cc: xfs@oss.sgi.com > --- > fs/xfs/time.h | 4 ++-- > fs/xfs/xfs_inode.c | 2 +- > fs/xfs/xfs_iops.c | 2 +- > fs/xfs/xfs_trans_inode.c | 6 +++--- > 4 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/fs/xfs/time.h b/fs/xfs/time.h > index 387e695..a490f1b 100644 > --- a/fs/xfs/time.h > +++ b/fs/xfs/time.h > @@ -21,14 +21,14 @@ > #include <linux/sched.h> > #include <linux/time.h> > > -typedef struct timespec timespec_t; > +typedef struct inode_time timespec_t; > > static inline void delay(long ticks) > { > schedule_timeout_uninterruptible(ticks); > } > > -static inline void nanotime(struct timespec *tvp) > +static inline void nanotime(struct inode_time *tvp) > { > *tvp = CURRENT_TIME; > } > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index a6115fe..16d5392 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -654,7 +654,7 @@ xfs_ialloc( > xfs_inode_t *ip; > uint flags; > int error; > - timespec_t tv; > + struct inode_time tv; > > /* > * Call the space management code to pick > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > index 205613a..092ee7c 100644 > --- a/fs/xfs/xfs_iops.c > +++ b/fs/xfs/xfs_iops.c > @@ -956,7 +956,7 @@ xfs_vn_setattr( > STATIC int > xfs_vn_update_time( > struct inode *inode, > - struct timespec *now, > + struct inode_time *now, > int flags) > { > struct xfs_inode *ip = XFS_I(inode); > diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c > index 50c3f56..bae2520 100644 > --- a/fs/xfs/xfs_trans_inode.c > +++ b/fs/xfs/xfs_trans_inode.c > @@ -70,7 +70,7 @@ xfs_trans_ichgtime( > int flags) > { > struct inode *inode = VFS_I(ip); > - timespec_t tv; > + struct inode_time tv; > > ASSERT(tp); > ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); > @@ -78,13 +78,13 @@ xfs_trans_ichgtime( > tv = current_fs_time(inode->i_sb); > > if ((flags & XFS_ICHGTIME_MOD) && > - !timespec_equal(&inode->i_mtime, &tv)) { > + !inode_time_equal(&inode->i_mtime, &tv)) { > inode->i_mtime = tv; > ip->i_d.di_mtime.t_sec = tv.tv_sec; > ip->i_d.di_mtime.t_nsec = tv.tv_nsec; > } The problem I see here is that the code is now potentially stuffing a variable that is larger than 32 bits into on on-disk structure that is only 32 bits in size. You can't just change the in-memory representation of inode timestamps and expect the problem to be fixed - this just pushes the problem down a layer without any intrastructure allowing filesystems to handle storage of the new timestamp format sanely. IOWs, the filesystem has to be able to reject any attempt to set a timestamp that is can't represent on disk otherwise Bad Stuff will happen, and filesystems have to be able to specify in their on disk format what timestamp encoding is being used. The solution will be different for every filesystem that needs to support time beyond 2038. Hence I think you are going to need superblock flags and/or variables to indicate the epoch range the fielsystem can support. Then the fileystems need conversion functions from whatever the internal VFS timestamp representation is to whatever their on-disk format is, and only then can we switch the VFS to using a new timestamp format. At that point, filesystem developers can make the changes they need to the on-disk format to support timestamps beyond 2038, and all they need to do at the VFS layer is set the "supported range" fields appropriately in the VFS superblock... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 0:37 ` Dave Chinner @ 2014-05-31 0:41 ` H. Peter Anvin 2014-05-31 1:14 ` Dave Chinner 0 siblings, 1 reply; 71+ messages in thread From: H. Peter Anvin @ 2014-05-31 0:41 UTC (permalink / raw) To: Dave Chinner, Arnd Bergmann Cc: linux-arch, linux-kernel, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph On 05/30/2014 05:37 PM, Dave Chinner wrote: > > IOWs, the filesystem has to be able to reject any attempt to set a > timestamp that is can't represent on disk otherwise Bad Stuff will > happen, Actually it is questionable if it is worse to reject a timestamp or just let it wrap. Rejecting a valid timestamp is a bit like "You don't exist, go away." > and filesystems have to be able to specify in their on > disk format what timestamp encoding is being used. The solution will > be different for every filesystem that needs to support time beyond > 2038. Actually the cutoff can be really different for each filesystem, not necessarily 2038. However, I maintain the above still holds. Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What would you have expected such a filesystem to do on Jan 1, 2000? -hpa _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 0:41 ` H. Peter Anvin @ 2014-05-31 1:14 ` Dave Chinner 2014-05-31 1:22 ` H. Peter Anvin 2014-05-31 15:37 ` Arnd Bergmann 0 siblings, 2 replies; 71+ messages in thread From: Dave Chinner @ 2014-05-31 1:14 UTC (permalink / raw) To: H. Peter Anvin Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: > On 05/30/2014 05:37 PM, Dave Chinner wrote: > > > > IOWs, the filesystem has to be able to reject any attempt to set a > > timestamp that is can't represent on disk otherwise Bad Stuff will > > happen, > > Actually it is questionable if it is worse to reject a timestamp or just > let it wrap. Rejecting a valid timestamp is a bit like "You don't > exist, go away." I think having the new systems calls being able to return EINVAL if the value cannot be stored permanently on disk correctly is the right thing to do. Having it silently mangled by the filesystem and returning "everything is just fine, trust me" is close to the worst solution I can think of. That's exactly what leads to overflow bugs occurring.... > > and filesystems have to be able to specify in their on > > disk format what timestamp encoding is being used. The solution will > > be different for every filesystem that needs to support time beyond > > 2038. > > Actually the cutoff can be really different for each filesystem, not > necessarily 2038. However, I maintain the above still holds. Sure, but all filesystems are supposed to handle at least the current unix epoch. > Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What > would you have expected such a filesystem to do on Jan 1, 2000? Strawman. We don't need to cater for fundamentally broken designs that can't even handle the current unix epoch correctly. If such filesystems exist, then they can simple say "original unix epoch support only" and do whatever crap they are doing right now. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 1:14 ` Dave Chinner @ 2014-05-31 1:22 ` H. Peter Anvin 2014-05-31 5:54 ` Dave Chinner 2014-05-31 15:37 ` Arnd Bergmann 1 sibling, 1 reply; 71+ messages in thread From: H. Peter Anvin @ 2014-05-31 1:22 UTC (permalink / raw) To: Dave Chinner Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph No, not a strawman. Replace with Jan 26, 2038 and you have the same situation. On May 30, 2014 6:14:50 PM PDT, Dave Chinner <david@fromorbit.com> wrote: >On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: >> On 05/30/2014 05:37 PM, Dave Chinner wrote: >> > >> > IOWs, the filesystem has to be able to reject any attempt to set a >> > timestamp that is can't represent on disk otherwise Bad Stuff will >> > happen, >> >> Actually it is questionable if it is worse to reject a timestamp or >just >> let it wrap. Rejecting a valid timestamp is a bit like "You don't >> exist, go away." > >I think having the new systems calls being able to >return EINVAL if the value cannot be stored permanently on disk >correctly is the right thing to do. Having it silently mangled >by the filesystem and returning "everything is just fine, trust me" >is close to the worst solution I can think of. That's exactly what >leads to overflow bugs occurring.... > >> > and filesystems have to be able to specify in their on >> > disk format what timestamp encoding is being used. The solution >will >> > be different for every filesystem that needs to support time beyond >> > 2038. >> >> Actually the cutoff can be really different for each filesystem, not >> necessarily 2038. However, I maintain the above still holds. > >Sure, but all filesystems are supposed to handle at least the >current unix epoch. > >> Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. >What >> would you have expected such a filesystem to do on Jan 1, 2000? > >Strawman. > >We don't need to cater for fundamentally broken designs that can't >even handle the current unix epoch correctly. If such filesystems >exist, then they can simple say "original unix epoch support only" >and do whatever crap they are doing right now. > >Cheers, > >Dave. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 1:22 ` H. Peter Anvin @ 2014-05-31 5:54 ` Dave Chinner 2014-05-31 8:41 ` H. Peter Anvin 2014-06-02 14:00 ` Joseph S. Myers 0 siblings, 2 replies; 71+ messages in thread From: Dave Chinner @ 2014-05-31 5:54 UTC (permalink / raw) To: H. Peter Anvin Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph [ Please don't top post. ] On Fri, May 30, 2014 at 06:22:55PM -0700, H. Peter Anvin wrote: > On May 30, 2014 6:14:50 PM PDT, Dave Chinner <david@fromorbit.com> wrote: > >On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: > >> On 05/30/2014 05:37 PM, Dave Chinner wrote: > >> > > >> > IOWs, the filesystem has to be able to reject any attempt to > >> > set a timestamp that is can't represent on disk otherwise Bad > >> > Stuff will happen, > >> > >> Actually it is questionable if it is worse to reject a > >> timestamp or > >just > >> let it wrap. Rejecting a valid timestamp is a bit like "You > >> don't exist, go away." > > > >I think having the new systems calls being able to return EINVAL > >if the value cannot be stored permanently on disk correctly is > >the right thing to do. Having it silently mangled by the > >filesystem and returning "everything is just fine, trust me" is > >close to the worst solution I can think of. That's exactly what > >leads to overflow bugs occurring.... > > > >> > and filesystems have to be able to specify in their on disk > >> > format what timestamp encoding is being used. The solution > >will > >> > be different for every filesystem that needs to support time > >> > beyond 2038. > >> > >> Actually the cutoff can be really different for each > >> filesystem, not necessarily 2038. However, I maintain the > >> above still holds. > > > >Sure, but all filesystems are supposed to handle at least the > >current unix epoch. > > > >> Consider a filesystem that kept timestamps in YYMMDDHHMMSS > >> format. > >What > >> would you have expected such a filesystem to do on Jan 1, 2000? > > > >Strawman. > > > >We don't need to cater for fundamentally broken designs that > >can't even handle the current unix epoch correctly. If such > >filesystems exist, then they can simple say "original unix epoch > >support only" and do whatever crap they are doing right now. > > No, not a strawman. Replace with Jan 26, 2038 and you have the > same situation. But that's not the problem I'm talking about. The problem isn't the roll-over date of the epoch - the problem is that we're changing the in-memory meaning of time without changing what the filesystems store on disk or how they translate them. To use your example, what I'm actually talking about is the kernel switching to CCYYMMDDHHMMSS while the filesystem has YYMMDDHHMMSS on disk. The filesystem doesn't know the timestamp is now a different format, so it could mangle it writing it to disk, or it could mangle existing timestamps in the YY.. format reading them from disk and putting them into CC.. format structures. IOWs, it will incorrectly translate YY format dates to CC format, or translate something in the CC format as though it was in YY format. And it wouldn't even know what was the correct format because there's nothing telling it on disk whether the date is in CC or YY format. Either way, you get mangled timestamps, the filesystem doesn't know about it because it's just storing what the kernel gives it, the kernel thinks they are fine because they are just opaque when read back, but the user says "what the fuck did a reboot do to all these timestamps?". Hence your example of roll-over dates is a strawman - you've constructed a problem that is irrelevant to the issue being pointed out. FWIW, we already have code in the superblock and VFS to avoid such problems on filesystems with limited timestamp resolution (i.e s_time_gran and current_fs_time()) so that what the VFS hands the filesystem is exactly what the VFS expects to get back from disk when comparing timestamps. If we are changing the in-kernel timestamp to have a greater dynamic range that anything we current support on disk, then we need support for all filesystems for similar translation and constraint. The filesystems need to be able to tell the kernel what they timestamp range they support, and then the kernel needs to follow those guidelines. And if the filesystem is mounted on a kernel that doesn't support the current filesystem's timestamp format, then at minimum that filesystem cannot do anything that writes a timestamp.... Put simply: the filesystem defines the timestamp range that can be used safely, not the userspace API. If the filesystem can't support the date it is handed then that is an out-of-range error. Since when have we accepted that it's OK to handle out-of-range data with silent overflows or corruption of the data that we are attempting to store? We're defining a new API to support a wider date range - there is nothing that prevents us from saying ERANGE can be returned to a timestamp that the file cannot store correctly.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 5:54 ` Dave Chinner @ 2014-05-31 8:41 ` H. Peter Anvin 2014-05-31 15:46 ` Nicolas Pitre 2014-06-01 0:39 ` Dave Chinner 2014-06-02 14:00 ` Joseph S. Myers 1 sibling, 2 replies; 71+ messages in thread From: H. Peter Anvin @ 2014-05-31 8:41 UTC (permalink / raw) To: Dave Chinner Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph On 05/30/2014 10:54 PM, Dave Chinner wrote: > > If we are changing the in-kernel timestamp to have a greater dynamic > range that anything we current support on disk, then we need support > for all filesystems for similar translation and constraint. The > filesystems need to be able to tell the kernel what they timestamp > range they support, and then the kernel needs to follow those > guidelines. And if the filesystem is mounted on a kernel that > doesn't support the current filesystem's timestamp format, then at > minimum that filesystem cannot do anything that writes a > timestamp.... > > Put simply: the filesystem defines the timestamp range that can be > used safely, not the userspace API. If the filesystem can't support > the date it is handed then that is an out-of-range error. Since > when have we accepted that it's OK to handle out-of-range data with > silent overflows or corruption of the data that we are attempting to > store? We're defining a new API to support a wider date range - > there is nothing that prevents us from saying ERANGE can be returned > to a timestamp that the file cannot store correctly.... > I'm still puzzled. Are you saying that you want a program that does: /* Deliberately simplified */ gettimeofdayns(&now ...); utimensat(... now); ... to suddenly start failing on Jan 19, 2038 (for a filesystem with 32-bit timestamps), or would you propose some ways for the filesystems in question to extend the range of the timestamps? What you seem to propose also seems to imply that on Jan 19, 2038 anything that writes a timestamp with the current date (which logically ends up being almost every write operation) would be dead and frozen on such a filesystem -- pretty much meaning the filesystem would become readonly if not in reality than in practice. I strongly suspect that that would be a more catastrophic failure than incorrect timestamps, as you suddenly have all kinds of machines embedded in $DEITY knows what places just stop and refuse to run. If that is not what you mean I genuinely like to understand the situation better. -hpa _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 8:41 ` H. Peter Anvin @ 2014-05-31 15:46 ` Nicolas Pitre 2014-06-01 19:56 ` Arnd Bergmann 2014-06-01 0:39 ` Dave Chinner 1 sibling, 1 reply; 71+ messages in thread From: Nicolas Pitre @ 2014-05-31 15:46 UTC (permalink / raw) To: H. Peter Anvin Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph On Sat, 31 May 2014, H. Peter Anvin wrote: > On 05/30/2014 10:54 PM, Dave Chinner wrote: > > > > If we are changing the in-kernel timestamp to have a greater dynamic > > range that anything we current support on disk, then we need support > > for all filesystems for similar translation and constraint. The > > filesystems need to be able to tell the kernel what they timestamp > > range they support, and then the kernel needs to follow those > > guidelines. And if the filesystem is mounted on a kernel that > > doesn't support the current filesystem's timestamp format, then at > > minimum that filesystem cannot do anything that writes a > > timestamp.... > > > > Put simply: the filesystem defines the timestamp range that can be > > used safely, not the userspace API. If the filesystem can't support > > the date it is handed then that is an out-of-range error. Since > > when have we accepted that it's OK to handle out-of-range data with > > silent overflows or corruption of the data that we are attempting to > > store? We're defining a new API to support a wider date range - > > there is nothing that prevents us from saying ERANGE can be returned > > to a timestamp that the file cannot store correctly.... > > > > I'm still puzzled. > > Are you saying that you want a program that does: > > /* Deliberately simplified */ > gettimeofdayns(&now ...); > utimensat(... now); > > ... to suddenly start failing on Jan 19, 2038 (for a filesystem with > 32-bit timestamps), or would you propose some ways for the filesystems > in question to extend the range of the timestamps? > > What you seem to propose also seems to imply that on Jan 19, 2038 > anything that writes a timestamp with the current date (which logically > ends up being almost every write operation) would be dead and frozen on > such a filesystem -- pretty much meaning the filesystem would become > readonly if not in reality than in practice. For those (legacy) filesystems with a signed 32-bit timestamps, any attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be (silently) clamped to 0x7fffffff and that value (the last representable time) used as an overflow indicator. The filesystem driver should convert that value into a corresponding overflow value for whatever kernel internal time representation being used when read back, and this should be propagated up to user space. It should not be a hard error otherwise, as you rightfully stated, everything non read-only would come to a halt on that day. Inside the kernel, the overflow indicator could be as simple as dedicating one of the top bit in a 64-bit time_t value in order to still transmit the overflow limit. For example, in the above case, we could use 0x40000000-7fffffff to indicate the actual time is unavailable due to the filesystem's time representation being overflowed from 0x7fffffff. If for example a filesystem cannot represent timestamps from Jan 1 00:00:00 2100 UTC then the overflow representation for this particular filesystem would be 0x40000000-f48656ff. Those syscalls with a 32-bit time_t would be returned 0x7fffffff whenever there is an overflow being signaled. Whether 64-bit overflow-marked time_t values, when passed to user space, should clear the overflow bit, or use a unique time_t overflow value, could be decided and even changed later after discussion with glibc people for example. Hard errors should be signaled to user space, and the actual operation aborted, only with the presence of a new flag passed to the kernel. However, by default, things should "just work" albeit with the "wrong" i.e clamped time being saved on disk as much as possible otherwise. Nicolas _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 15:46 ` Nicolas Pitre @ 2014-06-01 19:56 ` Arnd Bergmann 2014-06-01 20:26 ` H. Peter Anvin 2014-06-02 1:36 ` Nicolas Pitre 0 siblings, 2 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-01 19:56 UTC (permalink / raw) To: Nicolas Pitre Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: > > readonly if not in reality than in practice. > > For those (legacy) filesystems with a signed 32-bit timestamps, any > attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be > (silently) clamped to 0x7fffffff and that value (the last representable > time) used as an overflow indicator. The filesystem driver should > convert that value into a corresponding overflow value for whatever > kernel internal time representation being used when read back, and this > should be propagated up to user space. It should not be a hard error > otherwise, as you rightfully stated, everything non read-only would come > to a halt on that day. I don't think there is much of a difference between not being able to write at all and all newly written files having the same timestamp, causing random things to break differently. The clamp to the maximum supported time stamp sounds like a reasonable choice for 'utimens' and related syscalls for the case of someone setting an arbitrary future date beyond what the file system can represent. Then again, I don't see a reason why that shouldn't just cause an error to be returned. For actually running kernels beyond 2038, the best idea I've seen so far is to disallow all broken code at compile time. I don't see a choice but to audit the entire kernel for invalid uses on both 32 and 64 bit in the next few years. A lot of code will get changed in the process so we can actually keep running 32-bit kernels and file systems, but other code will likely go away: * any system calls that pass a time_t, timeval or timespec on 32-bit systems return -ENOSYS, to ensure all user land uses the replacements we will put into place * The definition of 'time_t', 'timval' and 'timespec' can be hidden from the kernel, and all code using it left out. * ext2 and ext3 file system code will have to be disabled, but that's file since ext4 can mount old file systems. * until xfs gets extended, we can also disiable it at build time. For most users, we probably want to leave all that enabled by default until we get much closer to 2038, but a compile time option should allow us to test what works or doesn't, and it can be set by embedded developers that want to ensure their code keeps running for the next few decades. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-01 19:56 ` Arnd Bergmann @ 2014-06-01 20:26 ` H. Peter Anvin 2014-06-02 11:02 ` Arnd Bergmann 2014-06-02 1:36 ` Nicolas Pitre 1 sibling, 1 reply; 71+ messages in thread From: H. Peter Anvin @ 2014-06-01 20:26 UTC (permalink / raw) To: Arnd Bergmann, Nicolas Pitre Cc: linux-arch, linux-kernel, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph Perhaps we should make this a kernel command line option instead, with the settings: error out on outside the standard window, or a date indicating the earliest date that should be recognized and do windowing (0 for no windowing, 1970 for retconning the Unix epoch as unsigned...) But again, the kernel is probably the least problem here... On June 1, 2014 12:56:52 PM PDT, Arnd Bergmann <arnd@arndb.de> wrote: >On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: >> > readonly if not in reality than in practice. >> >> For those (legacy) filesystems with a signed 32-bit timestamps, any >> attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be > >> (silently) clamped to 0x7fffffff and that value (the last >representable >> time) used as an overflow indicator. The filesystem driver should >> convert that value into a corresponding overflow value for whatever >> kernel internal time representation being used when read back, and >this >> should be propagated up to user space. It should not be a hard error > >> otherwise, as you rightfully stated, everything non read-only would >come >> to a halt on that day. > >I don't think there is much of a difference between not being able to >write at all and all newly written files having the same timestamp, >causing random things to break differently. > >The clamp to the maximum supported time stamp sounds like a reasonable >choice for 'utimens' and related syscalls for the case of someone >setting an arbitrary future date beyond what the file system can >represent. Then again, I don't see a reason why that shouldn't just >cause an error to be returned. > >For actually running kernels beyond 2038, the best idea I've seen so >far is to disallow all broken code at compile time. I don't see >a choice but to audit the entire kernel for invalid uses on both >32 and 64 bit in the next few years. A lot of code will get changed >in the process so we can actually keep running 32-bit kernels and >file systems, but other code will likely go away: > >* any system calls that pass a time_t, timeval or timespec on > 32-bit systems return -ENOSYS, to ensure all user land uses > the replacements we will put into place >* The definition of 'time_t', 'timval' and 'timespec' can be hidden > from the kernel, and all code using it left out. >* ext2 and ext3 file system code will have to be disabled, but that's > file since ext4 can mount old file systems. >* until xfs gets extended, we can also disiable it at build time. > >For most users, we probably want to leave all that enabled by >default until we get much closer to 2038, but a compile time >option should allow us to test what works or doesn't, and it >can be set by embedded developers that want to ensure their >code keeps running for the next few decades. > > Arnd -- Sent from my mobile phone. Please pardon brevity and lack of formatting. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-01 20:26 ` H. Peter Anvin @ 2014-06-02 11:02 ` Arnd Bergmann 0 siblings, 0 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 11:02 UTC (permalink / raw) To: H. Peter Anvin Cc: Nicolas Pitre, linux-arch, linux-kernel, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph On Sunday 01 June 2014 13:26:03 H. Peter Anvin wrote: > Perhaps we should make this a kernel command line option instead, with the > settings: error out on outside the standard window, or a date indicating the > earliest date that should be recognized and do windowing (0 for no windowing, > 1970 for retconning the Unix epoch as unsigned...) What's wrong with compile-time errors? We have a pretty good understanding of how time values are passed in the kernel, and we know they will all break in 2038 for 32-bit kernels unless we do something about it. > But again, the kernel is probably the least problem here... I agree the glibc side is harder than this, but we have to get the kernel into shape first (at the minimum we have to do the APIs), and there is enough work to do here. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-01 19:56 ` Arnd Bergmann 2014-06-01 20:26 ` H. Peter Anvin @ 2014-06-02 1:36 ` Nicolas Pitre 2014-06-02 2:22 ` Dave Chinner 2014-06-02 10:56 ` Arnd Bergmann 1 sibling, 2 replies; 71+ messages in thread From: Nicolas Pitre @ 2014-06-02 1:36 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Sun, 1 Jun 2014, Arnd Bergmann wrote: > On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: > > > readonly if not in reality than in practice. > > > > For those (legacy) filesystems with a signed 32-bit timestamps, any > > attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be > > (silently) clamped to 0x7fffffff and that value (the last representable > > time) used as an overflow indicator. The filesystem driver should > > convert that value into a corresponding overflow value for whatever > > kernel internal time representation being used when read back, and this > > should be propagated up to user space. It should not be a hard error > > otherwise, as you rightfully stated, everything non read-only would come > > to a halt on that day. > > I don't think there is much of a difference between not being able to > write at all and all newly written files having the same timestamp, > causing random things to break differently. Well, in one case you have a crash certitude. In the other case you have some probability that your system might still be usable. > The clamp to the maximum supported time stamp sounds like a reasonable > choice for 'utimens' and related syscalls for the case of someone > setting an arbitrary future date beyond what the file system can > represent. Then again, I don't see a reason why that shouldn't just > cause an error to be returned. Resiliance is better than outright failure. > For actually running kernels beyond 2038, the best idea I've seen so > far is to disallow all broken code at compile time. I don't see > a choice but to audit the entire kernel for invalid uses on both > 32 and 64 bit in the next few years. A lot of code will get changed > in the process so we can actually keep running 32-bit kernels and > file systems, but other code will likely go away: > > * any system calls that pass a time_t, timeval or timespec on > 32-bit systems return -ENOSYS, to ensure all user land uses > the replacements we will put into place > * The definition of 'time_t', 'timval' and 'timespec' can be hidden > from the kernel, and all code using it left out. > * ext2 and ext3 file system code will have to be disabled, but that's > file since ext4 can mount old file systems. Syscalls and libs can be "fixed". Existing filesystem content might not. So if you need to mount some old media in read-write mode after 2038 and that happens to content an ext2 or similarly limited filesystem then it'd better just "work". Having the kernel refuse to modify the filesystem would be unacceptable. Nicolas _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 1:36 ` Nicolas Pitre @ 2014-06-02 2:22 ` Dave Chinner 2014-06-02 7:09 ` Geert Uytterhoeven 2014-06-02 10:56 ` Arnd Bergmann 1 sibling, 1 reply; 71+ messages in thread From: Dave Chinner @ 2014-06-02 2:22 UTC (permalink / raw) To: Nicolas Pitre Cc: linux-arch, Arnd Bergmann, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Sun, Jun 01, 2014 at 09:36:26PM -0400, Nicolas Pitre wrote: > On Sun, 1 Jun 2014, Arnd Bergmann wrote: > > On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: > > For actually running kernels beyond 2038, the best idea I've seen so > > far is to disallow all broken code at compile time. I don't see > > a choice but to audit the entire kernel for invalid uses on both > > 32 and 64 bit in the next few years. A lot of code will get changed > > in the process so we can actually keep running 32-bit kernels and > > file systems, but other code will likely go away: > > > > * any system calls that pass a time_t, timeval or timespec on > > 32-bit systems return -ENOSYS, to ensure all user land uses > > the replacements we will put into place > > * The definition of 'time_t', 'timval' and 'timespec' can be hidden > > from the kernel, and all code using it left out. > > * ext2 and ext3 file system code will have to be disabled, but that's > > file since ext4 can mount old file systems. > > Syscalls and libs can be "fixed". Existing filesystem content might > not. So if you need to mount some old media in read-write mode after > 2038 and that happens to content an ext2 or similarly limited filesystem > then it'd better just "work". Having the kernel refuse to modify the > filesystem would be unacceptable. We can already tell the VFS/filesystems not to update timestamps: inode->i_flags |= S_NOATIME | S_NOCMTIME; Just enforce that everywhere (i.e. notify_change()) rather than just on the IO path and the "legacy filesystem timestamp" problem is "solved". New interfaces need to return errors when an out-of-range parameter is set. And right now, >epoch dates are out of range for most filesystems, and so we need to handle that condition appropriately. Silent date overflow == filesystem corruption, and as such I'm going to error out such conditions in the filesystem regardless of what the userspace API says. Filesystems place all sorts of userspace visible limits on storage - ever tried to create a file >16TB on ext4? The on-disk format doesn't support it, so it returns an out of range error (E2BIG, I think) if you try. XFS, OTOH, handles this just fine and so it continues to work. It's exactly the same with timestamps - there's a physical limit to what can sanely be stored in any given filesystem and it's an *error condition* to go beyond that limit.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 2:22 ` Dave Chinner @ 2014-06-02 7:09 ` Geert Uytterhoeven 0 siblings, 0 replies; 71+ messages in thread From: Geert Uytterhoeven @ 2014-06-02 7:09 UTC (permalink / raw) To: Dave Chinner Cc: Nicolas Pitre, Linux-Arch, Arnd Bergmann, linux-kernel@vger.kernel.org, xfs, Christoph Hellwig, John Stultz, H. Peter Anvin, Linux FS Devel, Ley Foon Tan, Thomas Gleixner, Joseph S. Myers On Mon, Jun 2, 2014 at 4:22 AM, Dave Chinner <david@fromorbit.com> wrote: > Filesystems place all sorts of userspace visible limits on storage - > ever tried to create a file >16TB on ext4? The on-disk format > doesn't support it, so it returns an out of range error (E2BIG, I > think) if you try. XFS, OTOH, handles this just fine and so it > continues to work. It's exactly the same with timestamps - there's a > physical limit to what can sanely be stored in any given filesystem > and it's an *error condition* to go beyond that limit.... This comparison doesn't fly. File sizes do not depend on the current time (except for the increase of megapixels in your new camera ;-). Writing a 15 GiB file to ext4 is not something that magically stops working tomorrow. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 1:36 ` Nicolas Pitre 2014-06-02 2:22 ` Dave Chinner @ 2014-06-02 10:56 ` Arnd Bergmann 2014-06-02 11:57 ` Theodore Ts'o 2014-06-02 15:04 ` Chuck Lever 1 sibling, 2 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 10:56 UTC (permalink / raw) To: Nicolas Pitre Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote: > > > For actually running kernels beyond 2038, the best idea I've seen so > > far is to disallow all broken code at compile time. I don't see > > a choice but to audit the entire kernel for invalid uses on both > > 32 and 64 bit in the next few years. A lot of code will get changed > > in the process so we can actually keep running 32-bit kernels and > > file systems, but other code will likely go away: > > > > * any system calls that pass a time_t, timeval or timespec on > > 32-bit systems return -ENOSYS, to ensure all user land uses > > the replacements we will put into place > > * The definition of 'time_t', 'timval' and 'timespec' can be hidden > > from the kernel, and all code using it left out. > > * ext2 and ext3 file system code will have to be disabled, but that's > > file since ext4 can mount old file systems. > > Syscalls and libs can be "fixed". Existing filesystem content might > not. So if you need to mount some old media in read-write mode after > 2038 and that happens to content an ext2 or similarly limited filesystem > then it'd better just "work". Having the kernel refuse to modify the > filesystem would be unacceptable. I think you misunderstood what I suggested: the intent is to avoid seeing things break in 2038 by making them break much earlier. We have a solution for ext2 file systems, it's called ext4, and we just need to ensure that everybody knows they have to migrate eventually. At some point before the mid 2030ies, you should no longer be able to build a kernel that has support for ext2 or any other module that will run into bugs later. Until then (rather sooner than later), I'd like to get to the point where you can choose whether to include those modules at build time or not, and then get everybody to turn off that option and fix the bugs they run into. You wouldn't need that for a 2014-generation long-term support disto (rhel 7, sles 12, debian 7, ubuntu 14.04, ...), but perhaps for the next generation, or the one after that. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 10:56 ` Arnd Bergmann @ 2014-06-02 11:57 ` Theodore Ts'o 2014-06-02 12:38 ` Arnd Bergmann ` (2 more replies) 2014-06-02 15:04 ` Chuck Lever 1 sibling, 3 replies; 71+ messages in thread From: Theodore Ts'o @ 2014-06-02 11:57 UTC (permalink / raw) To: Arnd Bergmann Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: > > I think you misunderstood what I suggested: the intent is to avoid > seeing things break in 2038 by making them break much earlier. We have > a solution for ext2 file systems, it's called ext4, and we just need > to ensure that everybody knows they have to migrate eventually. > > At some point before the mid 2030ies, you should no longer be able to > build a kernel that has support for ext2 or any other module that will > run into bugs later.... Even for ext4, it's not quite so simple as that. You only have support for times post 2038 if you are using an inode size > 128 bytes. There are a very, very large number of machines which even today, are using 128 byte inodes with ext4 for performance reasons. The vast majority of those machines which I know of can probably move to 256 byte inodes relatively easily, since hard drive replacement cycles are order 5-6 years tops, so I'm not that concerned, but it just goes to show this is a very complicated problem. And even if we're talking about flash and embedded devices, the good news is if you assume that 10 years is enough time for people to update their embedded OS builds, and that the vast majority of deployed devices will probably only be in service for 10-15 years, we do have enough time to make file system format changes, although admittedly we can't afford to dilly-dally. Regards, - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 11:57 ` Theodore Ts'o @ 2014-06-02 12:38 ` Arnd Bergmann 2014-06-02 13:15 ` Theodore Ts'o 2014-06-02 12:52 ` Arnd Bergmann 2014-06-02 14:52 ` H. Peter Anvin 2 siblings, 1 reply; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 12:38 UTC (permalink / raw) To: Theodore Ts'o Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote: > On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: > > > > I think you misunderstood what I suggested: the intent is to avoid > > seeing things break in 2038 by making them break much earlier. We have > > a solution for ext2 file systems, it's called ext4, and we just need > > to ensure that everybody knows they have to migrate eventually. > > > > At some point before the mid 2030ies, you should no longer be able to > > build a kernel that has support for ext2 or any other module that will > > run into bugs later.... > > Even for ext4, it's not quite so simple as that. You only have > support for times post 2038 if you are using an inode size > 128 > bytes. There are a very, very large number of machines which even > today, are using 128 byte inodes with ext4 for performance reasons. > > The vast majority of those machines which I know of can probably move > to 256 byte inodes relatively easily, since hard drive replacement > cycles are order 5-6 years tops, so I'm not that concerned, but it > just goes to show this is a very complicated problem. Ok, I see. I also now noticed this comment above EXT4_FITS_IN_INODE(): "For new inodes we always reserve enough space for the kernel's known extended fields, but for inodes created with an old kernel this might not have been the case. None of the extended inode fields is critical for correct filesystem operation." Do we have to worry about this for inodes that contain extended attributes and that get updated after 2038? Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 12:38 ` Arnd Bergmann @ 2014-06-02 13:15 ` Theodore Ts'o 0 siblings, 0 replies; 71+ messages in thread From: Theodore Ts'o @ 2014-06-02 13:15 UTC (permalink / raw) To: Arnd Bergmann Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Mon, Jun 02, 2014 at 02:38:09PM +0200, Arnd Bergmann wrote: > > "For new inodes we always reserve enough space for the kernel's known > extended fields, but for inodes created with an old kernel this might > not have been the case. None of the extended inode fields is critical > for correct filesystem operation." > > Do we have to worry about this for inodes that contain extended > attributes and that get updated after 2038? In practice, the extended timestamps was one of the first things added to ext4, so the vast majority of ext4 file systems with inode sizes > 128 bytes will have room for the extended timestamps. There are some legacy ext3 file systems with 256-byte inodes (enabled for fast sotrage of SELinux xattrs) that in theory, could have been converted to ext4 and had enough xattrs so that the extended timestamps couldn't be added. That would be a vanishingly small use case, and in practice, it's not likely to be the case for the embedded market. I could imagine someone worrying about file systems originally formatted using RHEL 4 post-2038 (perhaps running in a VM), but I don't work for IBM any more, and hopefully even IBM would just tell such customers that they need to suck it up, and do a backup/reformat/restore pass. Cheers, - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 11:57 ` Theodore Ts'o 2014-06-02 12:38 ` Arnd Bergmann @ 2014-06-02 12:52 ` Arnd Bergmann 2014-06-02 13:07 ` Theodore Ts'o 2014-06-02 14:52 ` H. Peter Anvin 2 siblings, 1 reply; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 12:52 UTC (permalink / raw) To: Theodore Ts'o Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote: > On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: > > > > I think you misunderstood what I suggested: the intent is to avoid > > seeing things break in 2038 by making them break much earlier. We have > > a solution for ext2 file systems, it's called ext4, and we just need > > to ensure that everybody knows they have to migrate eventually. > > > > At some point before the mid 2030ies, you should no longer be able to > > build a kernel that has support for ext2 or any other module that will > > run into bugs later.... > > Even for ext4, it's not quite so simple as that. You only have > support for times post 2038 if you are using an inode size > 128 > bytes. There are a very, very large number of machines which even > today, are using 128 byte inodes with ext4 for performance reasons. > > The vast majority of those machines which I know of can probably move > to 256 byte inodes relatively easily, since hard drive replacement > cycles are order 5-6 years tops, so I'm not that concerned, but it > just goes to show this is a very complicated problem. One stupid question about the current code: static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra) { if (sizeof(time->tv_sec) > 4) time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32; time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; } #define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode) \ do { \ if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \ (einode)->xtime.tv_sec = \ (signed)le32_to_cpu((raw_inode)->xtime); \ else \ (einode)->xtime.tv_sec = 0; \ if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra)) \ ext4_decode_extra_time(&(einode)->xtime, \ raw_inode->xtime ## _extra); \ else \ (einode)->xtime.tv_nsec = 0; \ } while (0) For a time between 2038 and 2106, this looks like xtime.tv_sec is negative when ext4_decode_extra_time gets called, so the '|=' operator doesn't actually do anything. Shouldn't that be '+='? Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 12:52 ` Arnd Bergmann @ 2014-06-02 13:07 ` Theodore Ts'o 2014-06-02 15:01 ` Arnd Bergmann 0 siblings, 1 reply; 71+ messages in thread From: Theodore Ts'o @ 2014-06-02 13:07 UTC (permalink / raw) To: Arnd Bergmann Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph Yes, there are some ongoing dicussions about changing the post-2038 encoding of the timestamp in ext4, which is why this hasn't been fixed yet. The main thing that's been missing is time for me to review the patches, and a good way of writing regression tests that will work (or at least not fail) on build environments with a 32-bit time_t and 32-bit-only capable versions of functions such as gmtime(3). And given current discussions, I may want to think about some kind of superblock flag to allow the use of a 32-bit unsigned encoding for file systems using a 128-byte inode, with a way of setting that flag after scanning the file system to make sure there are no times that are previous to January 1, 1970. (Or more generally, allow any epoch to be defined using a 64-bit time_t offset stored in the superblock...) Cheers, - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 13:07 ` Theodore Ts'o @ 2014-06-02 15:01 ` Arnd Bergmann 0 siblings, 0 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 15:01 UTC (permalink / raw) To: Theodore Ts'o Cc: Nicolas Pitre, linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Monday 02 June 2014 09:07:00 Theodore Ts'o wrote: > Yes, there are some ongoing dicussions about changing the post-2038 > encoding of the timestamp in ext4, which is why this hasn't been fixed > yet. The main thing that's been missing is time for me to review the > patches, and a good way of writing regression tests that will work (or > at least not fail) on build environments with a 32-bit time_t and > 32-bit-only capable versions of functions such as gmtime(3). > > And given current discussions, I may want to think about some kind of > superblock flag to allow the use of a 32-bit unsigned encoding for > file systems using a 128-byte inode, with a way of setting that flag > after scanning the file system to make sure there are no times that > are previous to January 1, 1970. (Or more generally, allow any epoch > to be defined using a 64-bit time_t offset stored in the superblock...) FWIW, I've gone through the other file system implementations once more. The most common pattern I've encountered is to have a read_inode function with inode->i_mtime = le32_to_cpu(raw_inode->mtime); which results in interpreting the time as 'signed' on 32-bit kernels, but as 'unsigned' on 64-bit kernels. This could have been done intentionally to extend the valid time range to 2106 on 64-bit kernels, but it seems more likely that the code was written with no thought given to 64-bit time_t at all. I see this pattern on p9fs (old protocol only), afs, bfs, ceph, efs, freevxfs, hpfs, jffs2, jfs, minix, nfsv2/v3 (this was clearly intentional and is spelled out in the RFC), qnx4, qnx6, reiserfs, squashfs, sysv, and ufs (protocol version 1 only). The other behavior I see is to treat the on-disk 32-bit value as signed on both 32-bit and 64-bit kernels: inode->i_mtime = (signed)le32_to_cpu(raw_inode->mtime); this seems to be done intentionally in all cases, to maintain compatibility between 32-bit and 64-bit kernels, but it's relatively rare: exofs, ext2/3/4 (good old inodes) and xfs are the only ones doing this. In case of ext2/3/4, the sign handlign was introduced here: http://www.spinics.net/lists/linux-ext4/msg01758.html exofs and xfs seem to have done it like this for all of git history. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 11:57 ` Theodore Ts'o 2014-06-02 12:38 ` Arnd Bergmann 2014-06-02 12:52 ` Arnd Bergmann @ 2014-06-02 14:52 ` H. Peter Anvin 2 siblings, 0 replies; 71+ messages in thread From: H. Peter Anvin @ 2014-06-02 14:52 UTC (permalink / raw) To: Theodore Ts'o Cc: Nicolas Pitre, linux-arch@vger.kernel.org, Arnd Bergmann, linux-kernel@vger.kernel.org, xfs@oss.sgi.com, hch@infradead.org, john.stultz@linaro.org, lftan@altera.com, linux-fsdevel@vger.kernel.org, geert@linux-m68k.org, tglx@linutronix.de, joseph@codesourcery.com > On Jun 2, 2014, at 4:57, "Theodore Ts'o" <tytso@mit.edu> wrote: > >> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: >> >> I think you misunderstood what I suggested: the intent is to avoid >> seeing things break in 2038 by making them break much earlier. We have >> a solution for ext2 file systems, it's called ext4, and we just need >> to ensure that everybody knows they have to migrate eventually. >> >> At some point before the mid 2030ies, you should no longer be able to >> build a kernel that has support for ext2 or any other module that will >> run into bugs later.... > > Even for ext4, it's not quite so simple as that. You only have > support for times post 2038 if you are using an inode size > 128 > bytes. There are a very, very large number of machines which even > today, are using 128 byte inodes with ext4 for performance reasons. > > The vast majority of those machines which I know of can probably move > to 256 byte inodes relatively easily, since hard drive replacement > cycles are order 5-6 years tops, so I'm not that concerned, but it > just goes to show this is a very complicated problem. > > And even if we're talking about flash and embedded devices, the good > news is if you assume that 10 years is enough time for people to > update their embedded OS builds, and that the vast majority of > deployed devices will probably only be in service for 10-15 years, we > do have enough time to make file system format changes, although > admittedly we can't afford to dilly-dally. I have a number of file systems older than any device they are sitting on. RAID allows individual disks to be swapped out, and when all disks have been swapped out, extend the file system online. The system doesn't even have to be taken offline in the process if it is possible to physically get to the drives with the system powered (e.g. hot plug bays), which is really damned nice. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 10:56 ` Arnd Bergmann 2014-06-02 11:57 ` Theodore Ts'o @ 2014-06-02 15:04 ` Chuck Lever 2014-06-02 15:31 ` Theodore Ts'o ` (2 more replies) 1 sibling, 3 replies; 71+ messages in thread From: Chuck Lever @ 2014-06-02 15:04 UTC (permalink / raw) To: Arnd Bergmann Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, LKML Kernel, lftan, Christoph Hellwig, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Jun 2, 2014, at 6:56 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote: >> >>> For actually running kernels beyond 2038, the best idea I've seen so >>> far is to disallow all broken code at compile time. I don't see >>> a choice but to audit the entire kernel for invalid uses on both >>> 32 and 64 bit in the next few years. A lot of code will get changed >>> in the process so we can actually keep running 32-bit kernels and >>> file systems, but other code will likely go away: >>> >>> * any system calls that pass a time_t, timeval or timespec on >>> 32-bit systems return -ENOSYS, to ensure all user land uses >>> the replacements we will put into place >>> * The definition of 'time_t', 'timval' and 'timespec' can be hidden >>> from the kernel, and all code using it left out. >>> * ext2 and ext3 file system code will have to be disabled, but that's >>> file since ext4 can mount old file systems. >> >> Syscalls and libs can be "fixed". Existing filesystem content might >> not. So if you need to mount some old media in read-write mode after >> 2038 and that happens to content an ext2 or similarly limited filesystem >> then it'd better just "work". Having the kernel refuse to modify the >> filesystem would be unacceptable. > > I think you misunderstood what I suggested: the intent is to avoid > seeing things break in 2038 by making them break much earlier. We have > a solution for ext2 file systems, it's called ext4, and we just need > to ensure that everybody knows they have to migrate eventually. > > At some point before the mid 2030ies, you should no longer be able to > build a kernel that has support for ext2 or any other module that will > run into bugs later. Until then (rather sooner than later), I'd like > to get to the point where you can choose whether to include those > modules at build time or not, and then get everybody to turn off that > option and fix the bugs they run into. You wouldn't need that for a > 2014-generation long-term support disto (rhel 7, sles 12, debian 7, > ubuntu 14.04, ...), but perhaps for the next generation, or the > one after that. I’m wondering what should be done about NFS. A solution for NFS should match any scheme that is considered for local file systems, IMO. NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. (See the definition of nfstime3 in RFC 1813). NFSv4 uses a signed 64-bit value where zero represents midnight UTC on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See the definition of nfstime4 in RFC 5661). The NFSv4 protocol is probably not problematic, and NFSv3 should be out of the picture by 2038. But if changes are planned for dealing _now_ with timestamp issues, compatibility with NFSv3 is a consideration. It is already the case that, via NFSv3, the Linux NFS client transmits timestamps earlier than 1970 as large positive numbers. Try this with xfstests generic/258. Maybe nfs3_proc_setattr() should recognize pre-epoch timestamps and timestamps larger than can be represented in an unsigned 32-bit field and return an immediate error to the requesting application (like EINVAL). If the Linux NFS server encounters a local file with a timestamp that cannot be represented via a u32, should it also return NFS3ERR_INVAL? RFC 1813 does not provide guidance on the behavior nor does it suggest a particular error status code. The Solaris 11 server appears to return NFS3ERR_INVAL in this case. An alternative would be to “cap” the timestamps transmitted via NFSv3 by Linux, so that a pre-epoch timestamp is transmitted as zero, and a large timestamp is transmitted as UINT_MAX. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 15:04 ` Chuck Lever @ 2014-06-02 15:31 ` Theodore Ts'o 2014-06-02 17:12 ` H. Peter Anvin 2014-06-02 18:52 ` Arnd Bergmann 2014-06-02 18:58 ` Roger Willcocks 2 siblings, 1 reply; 71+ messages in thread From: Theodore Ts'o @ 2014-06-02 15:31 UTC (permalink / raw) To: Chuck Lever Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann, LKML Kernel, lftan, Christoph Hellwig, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Mon, Jun 02, 2014 at 11:04:23AM -0400, Chuck Lever wrote: > I’m wondering what should be done about NFS. A solution for NFS should > match any scheme that is considered for local file systems, IMO. > > An alternative would be to “cap” the timestamps transmitted via NFSv3 by > Linux, so that a pre-epoch timestamp is transmitted as zero, and a large > timestamp is transmitted as UINT_MAX. I wonder if it would make sense to try to promulgate via the Austin group, and possibly the C standards committee the concept of a bit pattern (that might commonly be INT_MAX or UINT_MAX) that means "time unknown", or "time indefinite" or "we couldn't encode the time". We would then teach gmtime(3) and asctime(3) to print some appropriate message, and we could teach programs like find (with the -mtime) option, make, tmpwatch, et. al., that they can't make any presumption about the comparibility of any timestamp which has a value of TIME_UNDEFINIED. It would be problematic for time(2) or gettimeofday(2) to return TIME_UNDEFINED, since there are programs that care about time ticking forward, but I could imagine a new interface which would be permitted to return a flag indicating that we don't know the current time (because the CMOS battery had run down, etc.) so instead we're going to be counting the number of seconds since the system was booted. - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 15:31 ` Theodore Ts'o @ 2014-06-02 17:12 ` H. Peter Anvin 2014-06-02 18:50 ` Arnd Bergmann 2014-06-02 22:29 ` Theodore Ts'o 0 siblings, 2 replies; 71+ messages in thread From: H. Peter Anvin @ 2014-06-02 17:12 UTC (permalink / raw) To: Theodore Ts'o, Chuck Lever, Arnd Bergmann, Nicolas Pitre, Dave Chinner, LKML Kernel, linux-arch, joseph, john.stultz, Christoph Hellwig, tglx, geert, lftan, linux-fsdevel, xfs, Linux NFS Mailing List On 06/02/2014 08:31 AM, Theodore Ts'o wrote: > > I wonder if it would make sense to try to promulgate via the Austin > group, and possibly the C standards committee the concept of a bit > pattern (that might commonly be INT_MAX or UINT_MAX) that means "time > unknown", or "time indefinite" or "we couldn't encode the time". > (time_t)-1 already has this meaning for some calls (e.g. time(2)). However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately something similar applies to all possible bit patterns, certainly within the range of an int. > We would then teach gmtime(3) and asctime(3) to print some appropriate > message, and we could teach programs like find (with the -mtime) > option, make, tmpwatch, et. al., that they can't make any presumption > about the comparibility of any timestamp which has a value of > TIME_UNDEFINIED. > > It would be problematic for time(2) or gettimeofday(2) to return > TIME_UNDEFINED, since there are programs that care about time ticking > forward, but I could imagine a new interface which would be permitted > to return a flag indicating that we don't know the current time > (because the CMOS battery had run down, etc.) so instead we're going > to be counting the number of seconds since the system was booted. This assumes that we actually know that that is the case, which may be an aggressive assumption. -hpa _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 17:12 ` H. Peter Anvin @ 2014-06-02 18:50 ` Arnd Bergmann 2014-06-02 22:29 ` Theodore Ts'o 1 sibling, 0 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 18:50 UTC (permalink / raw) To: H. Peter Anvin Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Theodore Ts'o, LKML Kernel, xfs, Christoph Hellwig, Chuck Lever, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph On Monday 02 June 2014 10:12:37 H. Peter Anvin wrote: > On 06/02/2014 08:31 AM, Theodore Ts'o wrote: > > > > I wonder if it would make sense to try to promulgate via the Austin > > group, and possibly the C standards committee the concept of a bit > > pattern (that might commonly be INT_MAX or UINT_MAX) that means "time > > unknown", or "time indefinite" or "we couldn't encode the time". > > > > (time_t)-1 already has this meaning for some calls (e.g. time(2)). > However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately > something similar applies to all possible bit patterns, certainly within > the range of an int. Worse than Wed Dec 31 23:59:59 UTC 1969, on NFSv3 it also means "Sun Feb 7 07:28:15 CET 2106", and that is much harder to distinguish from a real future date. If we had the choice, I'd go for something like 1, i.e. "Thu Jan 1 01:00:01 CET 1970". > > We would then teach gmtime(3) and asctime(3) to print some appropriate > > message, and we could teach programs like find (with the -mtime) > > option, make, tmpwatch, et. al., that they can't make any presumption > > about the comparibility of any timestamp which has a value of > > TIME_UNDEFINIED. > > > > It would be problematic for time(2) or gettimeofday(2) to return > > TIME_UNDEFINED, since there are programs that care about time ticking > > forward, but I could imagine a new interface which would be permitted > > to return a flag indicating that we don't know the current time > > (because the CMOS battery had run down, etc.) so instead we're going > > to be counting the number of seconds since the system was booted. > > This assumes that we actually know that that is the case, which may be > an aggressive assumption. It's harder for time(2), but for the inode case, we can definitely detect when the file system specific representation overflows or underflows, which may be be at a number of very different points of time. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 17:12 ` H. Peter Anvin 2014-06-02 18:50 ` Arnd Bergmann @ 2014-06-02 22:29 ` Theodore Ts'o 2014-06-02 22:32 ` H. Peter Anvin 1 sibling, 1 reply; 71+ messages in thread From: Theodore Ts'o @ 2014-06-02 22:29 UTC (permalink / raw) To: H. Peter Anvin Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann, LKML Kernel, xfs, Christoph Hellwig, Chuck Lever, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph On Mon, Jun 02, 2014 at 10:12:37AM -0700, H. Peter Anvin wrote: > > It would be problematic for time(2) or gettimeofday(2) to return > > TIME_UNDEFINED, since there are programs that care about time ticking > > forward, but I could imagine a new interface which would be permitted > > to return a flag indicating that we don't know the current time > > (because the CMOS battery had run down, etc.) so instead we're going > > to be counting the number of seconds since the system was booted. > > This assumes that we actually know that that is the case, which may be > an aggressive assumption. We won't know if the RTC clock is wrong, true --- but the kernel will know if (a) the hardware doesn't have RTC clock at all, or if (b) the RTC clock is ticking some time that can't be encoded using the current time_t type. So in that case, the fallback would be to be for the kernel to tick starting with time_t == 0 when the system is initially booted, and the "time indefinite flag" would be set. Now assume that we have a new system call, gettimestampofday(2), which returns a new timestamp structure which has a 64-bit ts_sec field, the ts_nsec field (ala struct timespec), and a ts_flags field, where the kernel could signal things like "time invalid", or "time can't be encoded in the legacy time_t type", or "I'm not sure if the time is correct" --- i.e., because the RTC battery isn't working. Not all hardware might be able to support the last, of course, but if the battery is low, or the system has been exposed to very low temperatures (or large amounts of cosmic radiation, etc.) the RTC time may just be plain wrong. No system is going to be perfect, but it should be possible to make htings better, at for certain classes of hardware. And since we are already returning (time_t) -1 in some cases, we might as well try to make things a bit more formal. - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 22:29 ` Theodore Ts'o @ 2014-06-02 22:32 ` H. Peter Anvin 2014-06-02 23:32 ` Theodore Ts'o 0 siblings, 1 reply; 71+ messages in thread From: H. Peter Anvin @ 2014-06-02 22:32 UTC (permalink / raw) To: Theodore Ts'o, Chuck Lever, Arnd Bergmann, Nicolas Pitre, Dave Chinner, LKML Kernel, linux-arch, joseph, john.stultz, Christoph Hellwig, tglx, geert, lftan, linux-fsdevel, xfs, Linux NFS Mailing List On 06/02/2014 03:29 PM, Theodore Ts'o wrote: > > And since we are already returning (time_t) -1 in some cases, we might > as well try to make things a bit more formal. > Are we? I am not aware of *Linux* actually using that. -hpa _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 22:32 ` H. Peter Anvin @ 2014-06-02 23:32 ` Theodore Ts'o 2014-06-02 23:33 ` H. Peter Anvin 2014-06-03 13:09 ` Roger Willcocks 0 siblings, 2 replies; 71+ messages in thread From: Theodore Ts'o @ 2014-06-02 23:32 UTC (permalink / raw) To: H. Peter Anvin Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann, LKML Kernel, xfs, Christoph Hellwig, Chuck Lever, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote: > On 06/02/2014 03:29 PM, Theodore Ts'o wrote: > > > > And since we are already returning (time_t) -1 in some cases, we might > > as well try to make things a bit more formal. > > > > Are we? I am not aware of *Linux* actually using that. Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per the Posix specification: SYSCALL_DEFINE1(time, time_t __user *, tloc) { time_t i = get_seconds(); if (tloc) { if (put_user(i,tloc)) return -EFAULT; } force_successful_syscall_return(); return i; } Cheers, - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 23:32 ` Theodore Ts'o @ 2014-06-02 23:33 ` H. Peter Anvin 2014-06-03 13:09 ` Roger Willcocks 1 sibling, 0 replies; 71+ messages in thread From: H. Peter Anvin @ 2014-06-02 23:33 UTC (permalink / raw) To: Theodore Ts'o, Chuck Lever, Arnd Bergmann, Nicolas Pitre, Dave Chinner, LKML Kernel, linux-arch, joseph, john.stultz, Christoph Hellwig, tglx, geert, lftan, linux-fsdevel, xfs, Linux NFS Mailing List On 06/02/2014 04:32 PM, Theodore Ts'o wrote: > On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote: >> On 06/02/2014 03:29 PM, Theodore Ts'o wrote: >>> >>> And since we are already returning (time_t) -1 in some cases, we might >>> as well try to make things a bit more formal. >>> >> >> Are we? I am not aware of *Linux* actually using that. > > Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per > the Posix specification: > > SYSCALL_DEFINE1(time, time_t __user *, tloc) > { > time_t i = get_seconds(); > > if (tloc) { > if (put_user(i,tloc)) > return -EFAULT; > } > force_successful_syscall_return(); > return i; > } > OK, I guess I should have said... other than for -EFAULT. I just don't know of anyone using time(2) with an argument other than NULL. -hpa _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 23:32 ` Theodore Ts'o 2014-06-02 23:33 ` H. Peter Anvin @ 2014-06-03 13:09 ` Roger Willcocks 1 sibling, 0 replies; 71+ messages in thread From: Roger Willcocks @ 2014-06-03 13:09 UTC (permalink / raw) To: Theodore Ts'o Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann, LKML Kernel, geert, xfs, Christoph Hellwig, Chuck Lever, john.stultz, H. Peter Anvin, linux-fsdevel, lftan, tglx, joseph On Mon, 2014-06-02 at 19:32 -0400, Theodore Ts'o wrote: > Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per > the Posix specification: > > SYSCALL_DEFINE1(time, time_t __user *, tloc) > { > time_t i = get_seconds(); > > if (tloc) { > if (put_user(i,tloc)) > return -EFAULT; > } > force_successful_syscall_return(); > return i; > } get_seconds() returns an unsigned long so there's potential for overflow here. -- Roger _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 15:04 ` Chuck Lever 2014-06-02 15:31 ` Theodore Ts'o @ 2014-06-02 18:52 ` Arnd Bergmann 2014-06-02 18:58 ` Roger Willcocks 2 siblings, 0 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 18:52 UTC (permalink / raw) To: Chuck Lever Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, LKML Kernel, lftan, Christoph Hellwig, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Monday 02 June 2014 11:04:23 Chuck Lever wrote: > I’m wondering what should be done about NFS. A solution for NFS should > match any scheme that is considered for local file systems, IMO. > > NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for > seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. > (See the definition of nfstime3 in RFC 1813). > > NFSv4 uses a signed 64-bit value where zero represents midnight UTC > on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See > the definition of nfstime4 in RFC 5661). > > The NFSv4 protocol is probably not problematic, and NFSv3 should be out > of the picture by 2038. But if changes are planned for dealing _now_ > with timestamp issues, compatibility with NFSv3 is a consideration. > > It is already the case that, via NFSv3, the Linux NFS client transmits > timestamps earlier than 1970 as large positive numbers. Try this with > xfstests generic/258. If I read the code correctly, a pre-1970 timestamp will be sent as a large unsigned integer, but received as a post-2038 timestamp on 64-bit kernels, both in the nfs client and server code. This behavior is clearly wrong, but it's the same bug that we have in lots of other file systems, and it makes sense to have the same fix everywhere, at lease the cases where we know what interpretation we actually want. NFS has the luxury of having an actual specification saying that the value is unsigned. For most of the legacy file systems, we can only make a guess at how other OSs would interpret the same numbers. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 15:04 ` Chuck Lever 2014-06-02 15:31 ` Theodore Ts'o 2014-06-02 18:52 ` Arnd Bergmann @ 2014-06-02 18:58 ` Roger Willcocks 2014-06-02 19:04 ` Chuck Lever 2 siblings, 1 reply; 71+ messages in thread From: Roger Willcocks @ 2014-06-02 18:58 UTC (permalink / raw) To: Chuck Lever Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann, LKML Kernel, geert, Christoph Hellwig, john.stultz, H. Peter Anvin, linux-fsdevel, lftan, tglx, xfs, joseph On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote: > NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for > seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. > (See the definition of nfstime3 in RFC 1813). > nfstime3 could be extended by redefining the otherwise unused nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit seconds field and an unsigned 30-bit nanoseconds field. This could represent 1970 +/- 272 years. Servers could indicate they can understand the extended time format by adding a new FSINFO capability - FSF3_CANSETTIME_EX. Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending timestamps so old servers would be protected from new clients. Old clients don't need to be protected from new servers because the on-the-wire bit pattern for dates between 1970 and 2106 stays the same, so they're no worse off than they were before. Arguably the new server ought to clamp out-of-range timestamps before sending them to old clients but that would need per-client state (and nfs3 is stateless.) -- Roger _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 18:58 ` Roger Willcocks @ 2014-06-02 19:04 ` Chuck Lever 2014-06-02 19:10 ` Arnd Bergmann 0 siblings, 1 reply; 71+ messages in thread From: Chuck Lever @ 2014-06-02 19:04 UTC (permalink / raw) To: Roger Willcocks Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, Arnd Bergmann, LKML Kernel, geert, Christoph Hellwig, john.stultz, H. Peter Anvin, linux-fsdevel, lftan, tglx, xfs, joseph On Jun 2, 2014, at 2:58 PM, Roger Willcocks <roger@filmlight.ltd.uk> wrote: > > On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote: > >> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for >> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. >> (See the definition of nfstime3 in RFC 1813). >> > > nfstime3 could be extended by redefining the otherwise unused > nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit > seconds field and an unsigned 30-bit nanoseconds field. > > This could represent 1970 +/- 272 years. > > Servers could indicate they can understand the extended time format by > adding a new FSINFO capability - FSF3_CANSETTIME_EX. > > Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending > timestamps so old servers would be protected from new clients. You would have to get the IETF’s NFSv4 working group to sign off on this change. Otherwise, Linux would be the only NFSv3 implementation that supports the extension. But I suspect the answer you’d get is “Use NFSv4.” > Old clients don't need to be protected from new servers because the > on-the-wire bit pattern for dates between 1970 and 2106 stays the same, > so they're no worse off than they were before. > > Arguably the new server ought to clamp out-of-range timestamps before > sending them to old clients but that would need per-client state (and > nfs3 is stateless.) There’s no reliable way in NFSv3 for clients and servers to identify the software running on the peer. Practically speaking, you should assume that the NFSv3 protocol is never going to change. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 19:04 ` Chuck Lever @ 2014-06-02 19:10 ` Arnd Bergmann 0 siblings, 0 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 19:10 UTC (permalink / raw) To: Chuck Lever Cc: Nicolas Pitre, linux-arch, Linux NFS Mailing List, xfs, LKML Kernel, geert, Christoph Hellwig, john.stultz, H. Peter Anvin, linux-fsdevel, lftan, tglx, Roger Willcocks, joseph On Monday 02 June 2014 15:04:27 Chuck Lever wrote: > On Jun 2, 2014, at 2:58 PM, Roger Willcocks <roger@filmlight.ltd.uk> wrote: > > > > > On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote: > > > >> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for > >> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. > >> (See the definition of nfstime3 in RFC 1813). > >> > > > > nfstime3 could be extended by redefining the otherwise unused > > nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit > > seconds field and an unsigned 30-bit nanoseconds field. > > > > This could represent 1970 +/- 272 years. > > > > Servers could indicate they can understand the extended time format by > > adding a new FSINFO capability - FSF3_CANSETTIME_EX. > > > > Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending > > timestamps so old servers would be protected from new clients. > > You would have to get the IETF’s NFSv4 working group to sign off on > this change. Otherwise, Linux would be the only NFSv3 implementation > that supports the extension. > > But I suspect the answer you’d get is “Use NFSv4.” While I've never dealt with an NFS standardization, I'd assume this is a workable answer. The NFSv2 and NFSv3 definition clearly defines a valid range of times until 2106 using unsigned seconds, and that should really give enough time to migrate to something better (not necessarily NFSv4). Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 8:41 ` H. Peter Anvin 2014-05-31 15:46 ` Nicolas Pitre @ 2014-06-01 0:39 ` Dave Chinner 1 sibling, 0 replies; 71+ messages in thread From: Dave Chinner @ 2014-06-01 0:39 UTC (permalink / raw) To: H. Peter Anvin Cc: linux-arch, Arnd Bergmann, linux-kernel, xfs, hch, john.stultz, lftan, linux-fsdevel, geert, tglx, joseph On Sat, May 31, 2014 at 01:41:56AM -0700, H. Peter Anvin wrote: > On 05/30/2014 10:54 PM, Dave Chinner wrote: > > > > If we are changing the in-kernel timestamp to have a greater dynamic > > range that anything we current support on disk, then we need support > > for all filesystems for similar translation and constraint. The > > filesystems need to be able to tell the kernel what they timestamp > > range they support, and then the kernel needs to follow those > > guidelines. And if the filesystem is mounted on a kernel that > > doesn't support the current filesystem's timestamp format, then at > > minimum that filesystem cannot do anything that writes a > > timestamp.... > > > > Put simply: the filesystem defines the timestamp range that can be > > used safely, not the userspace API. If the filesystem can't support > > the date it is handed then that is an out-of-range error. Since > > when have we accepted that it's OK to handle out-of-range data with > > silent overflows or corruption of the data that we are attempting to > > store? We're defining a new API to support a wider date range - > > there is nothing that prevents us from saying ERANGE can be returned > > to a timestamp that the file cannot store correctly.... > > > > I'm still puzzled. > > Are you saying that you want a program that does: > > /* Deliberately simplified */ > gettimeofdayns(&now ...); > utimensat(... now); > > ... to suddenly start failing on Jan 19, 2038 (for a filesystem with > 32-bit timestamps), Yes. Hard fail so overflows are in your face and we know exactly what is going to cause silent timestamp screwups when the epoch > or would you propose some ways for the filesystems > in question to extend the range of the timestamps? Filesystems are going to have to change their on-disk formats, so we'd do that just like we do every other on-disk format change. With feature bits and translation layers, new ioctl structures, etc. Depending on the amount of work necessary, some filesystems could do this in 3.16, others it might be 3.20 before everything is sorted out across the kernel and userspace code... Either way, the hard fail problem goes away as each filesystem is converted. Further, if we have regression tests then new filesystems are guaranteed to be designed to handle 2038 epoch rollover, and so in a year of two this "hard fail" is effectively a non-problem. If someone breaks something in future, then we'll know about it pretty quickly. > What you seem to propose also seems to imply that on Jan 19, 2038 > anything that writes a timestamp with the current date (which logically > ends up being almost every write operation) would be dead and frozen on > such a filesystem -- pretty much meaning the filesystem would become > readonly if not in reality than in practice. Yup. If we can't do what the user wants without the user thinking corruption has occurred, then the only thing we are left with is "shut down the filesystem" error handling. Kind of like using BUG() rather than returning an error. That's why we need to be able to hard fail and return an error. However, we've got 20+ years to fix our current filesystems and all their support code to ensure this doesn't happen. In the mean time, having stuff hard fail is a great way to ensure that filesystems get fixed sooner rather than later... > I strongly suspect that that would be a more catastrophic failure than > incorrect timestamps, as you suddenly have all kinds of machines > embedded in $DEITY knows what places just stop and refuse to run. Yup, that's a great way of flushing out problems 20 years before they really matter. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 5:54 ` Dave Chinner 2014-05-31 8:41 ` H. Peter Anvin @ 2014-06-02 14:00 ` Joseph S. Myers 1 sibling, 0 replies; 71+ messages in thread From: Joseph S. Myers @ 2014-06-02 14:00 UTC (permalink / raw) To: Dave Chinner Cc: linux-arch, Arnd Bergmann, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs On Sat, 31 May 2014, Dave Chinner wrote: > If we are changing the in-kernel timestamp to have a greater dynamic > range that anything we current support on disk, then we need support > for all filesystems for similar translation and constraint. The > filesystems need to be able to tell the kernel what they timestamp > range they support, and then the kernel needs to follow those > guidelines. And if the filesystem is mounted on a kernel that > doesn't support the current filesystem's timestamp format, then at > minimum that filesystem cannot do anything that writes a > timestamp.... > > Put simply: the filesystem defines the timestamp range that can be > used safely, not the userspace API. If the filesystem can't support > the date it is handed then that is an out-of-range error. Since > when have we accepted that it's OK to handle out-of-range data with > silent overflows or corruption of the data that we are attempting to > store? We're defining a new API to support a wider date range - > there is nothing that prevents us from saying ERANGE can be returned > to a timestamp that the file cannot store correctly.... I don't see anything new about this issue. All problems that could arise from the kernel being able to represent a timestamp some filesystems can't are problems that already apply with 64-bit kernels using 64-bit time_t internally. So while as part of Y2038-preparedness we do need a clear understanding of which filesystems have what timestamp limits and what happens with timestamps beyond those limits, I think this is a separate strand of the problem - one that applies to both 32-bit and 64-bit systems - from the more general issue for 32-bit systems. -- Joseph S. Myers joseph@codesourcery.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 1:14 ` Dave Chinner 2014-05-31 1:22 ` H. Peter Anvin @ 2014-05-31 15:37 ` Arnd Bergmann 2014-06-01 0:24 ` Dave Chinner 1 sibling, 1 reply; 71+ messages in thread From: Arnd Bergmann @ 2014-05-31 15:37 UTC (permalink / raw) To: Dave Chinner Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Saturday 31 May 2014 11:14:50 Dave Chinner wrote: > On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: > > On 05/30/2014 05:37 PM, Dave Chinner wrote: > > > > > > IOWs, the filesystem has to be able to reject any attempt to set a > > > timestamp that is can't represent on disk otherwise Bad Stuff will > > > happen, > > > > Actually it is questionable if it is worse to reject a timestamp or just > > let it wrap. Rejecting a valid timestamp is a bit like "You don't > > exist, go away." > > I think having the new systems calls being able to > return EINVAL if the value cannot be stored permanently on disk > correctly is the right thing to do. Having it silently mangled > by the filesystem and returning "everything is just fine, trust me" > is close to the worst solution I can think of. That's exactly what > leads to overflow bugs occurring.... While going through the file systems, I was wondering whether we should have the times stop at the end of each file systems epoch rather than wrap around. > > > and filesystems have to be able to specify in their on > > > disk format what timestamp encoding is being used. The solution will > > > be different for every filesystem that needs to support time beyond > > > 2038. > > > > Actually the cutoff can be really different for each filesystem, not > > necessarily 2038. However, I maintain the above still holds. > > Sure, but all filesystems are supposed to handle at least the > current unix epoch. In my list at http://kernelnewbies.org/y2038, I found that almost all file systems at least times until 2106, because they treat the on-disk value as unsigned on 64-bit systems, or they use a completely different representation. My guess is that somebody earlier spent a lot of work on making that happen. The exceptions are: * exofs uses signed values, which can probably be changed to be consistent with the others. * isofs has a bug that limits it until 2027 on architectures with a signed 'char' type (otherwise it's 2155). * udf can represent times for many thousands of years through a 16-bit year representation, but the code to convert to epoch uses a const array that ends at 2038. * afs uses signed seconds and can probably be fixed * coda relies on user space time representation getting passed through an ioctl. * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, where they really use signed. I was confused about XFS since I didn't noticed that there are separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected XFS to also use the 1970-2106 time range on 64-bit systems today. If we are using the variant of my patch that extends indode_time->tv_sec to s64, nothing should change for XFS at all, the main difference is that we if it gets extended to wider on-disk timestamps, they will work the same way on 32-bit and 64-bit kernels. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-05-31 15:37 ` Arnd Bergmann @ 2014-06-01 0:24 ` Dave Chinner 2014-06-02 0:28 ` Dave Chinner 0 siblings, 1 reply; 71+ messages in thread From: Dave Chinner @ 2014-06-01 0:24 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > On Saturday 31 May 2014 11:14:50 Dave Chinner wrote: > > On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: > > > On 05/30/2014 05:37 PM, Dave Chinner wrote: > > > > > > > > IOWs, the filesystem has to be able to reject any attempt to set a > > > > timestamp that is can't represent on disk otherwise Bad Stuff will > > > > happen, > > > > > > Actually it is questionable if it is worse to reject a timestamp or just > > > let it wrap. Rejecting a valid timestamp is a bit like "You don't > > > exist, go away." > > > > I think having the new systems calls being able to > > return EINVAL if the value cannot be stored permanently on disk > > correctly is the right thing to do. Having it silently mangled > > by the filesystem and returning "everything is just fine, trust me" > > is close to the worst solution I can think of. That's exactly what > > leads to overflow bugs occurring.... > > While going through the file systems, I was wondering whether > we should have the times stop at the end of each file systems > epoch rather than wrap around. > > > > > and filesystems have to be able to specify in their on > > > > disk format what timestamp encoding is being used. The solution will > > > > be different for every filesystem that needs to support time beyond > > > > 2038. > > > > > > Actually the cutoff can be really different for each filesystem, not > > > necessarily 2038. However, I maintain the above still holds. > > > > Sure, but all filesystems are supposed to handle at least the > > current unix epoch. > > In my list at http://kernelnewbies.org/y2038, I found that almost > all file systems at least times until 2106, because they treat > the on-disk value as unsigned on 64-bit systems, or they use > a completely different representation. My guess is that somebody > earlier spent a lot of work on making that happen. > > The exceptions are: > > * exofs uses signed values, which can probably be changed to be > consistent with the others. > * isofs has a bug that limits it until 2027 on architectures with > a signed 'char' type (otherwise it's 2155). > * udf can represent times for many thousands of years through a > 16-bit year representation, but the code to convert to epoch > uses a const array that ends at 2038. > * afs uses signed seconds and can probably be fixed > * coda relies on user space time representation getting passed > through an ioctl. > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, > where they really use signed. > > I was confused about XFS since I didn't noticed that there are > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected > XFS to also use the 1970-2106 time range on 64-bit systems today. You've missed an awful lot more than just the implications for the core kernel code. There's a good chance such changes propagate to APIs elsewhere in the filesystems, because something you haven't realised is that XFS effectively exposes the on-disk timestamp format directly to userspace via the bulkstat interface (see struct xfs_bstat). It also affects the XFS open-by-handle ioctl and the swap extent ioctl used by the online defragmenter. IOWs, if we are changing the on-disk timestamp format then this affects several ioctl()s and hence quite a few of the XFS userspace utilities. The hardest to fix will be xfsdump which would need a new dump format to store the extended timestamp ranges, and then xfs_restore will need to be able to handle restoring such timestamps on filesystems that don't have extended timestamp support... Put simply, changing the structure of system time isn't as straight forward as changing the kernel structures. System time gets stored permanently, and that has a cascade effect through the kernel all to all of the filesystem utilities that know about that permanent storage in some way.... So yes, you can change the kernel definition, but until the permanent storage of system time can be extended to support the same range as the kernel the *system* will still have nasty, silent epoch overflow, truncation or corruption issues. > If we are using the variant of my patch that extends > indode_time->tv_sec to s64, nothing should change for XFS > at all, the main difference is that we if it gets extended > to wider on-disk timestamps, they will work the same way on > 32-bit and 64-bit kernels. "nothing should change" except for the fact that a 64 bit timestamp gets silently truncated to 32 bits and the timestamp is not what the user expects it to be. The user does not find out until the inode passes out of cache and is re-read from disk, and then it's wrong. To put it politely: that is broken, obnoxious behaviour and we don't design new interfaces with such ugly warts anymore. Define an EOVERFLOW, EINVAL or ERANGE error in the new syscalls to handle this case and *hard fail* if the storage cannot support the extended timestamp being passed in. There is no excuse for silently mangling out-of-range data, especially as we have plenty of time to add support to the filesystems so that such errors don't occur. It might take us a year to implement, but it will be done long before the epoch overflows. And, FWIW, this patchset needs a set of regression tests that ensure timestamps beyond 2038 and 2106 don't change across unmount/mount. Written for xfstests, preferably, so that it's run as part of every filesystem developer's daily workflow. This is the only way we are going to ensure that the filesystem and VFS code works correctly and continues to work correctly up to the end of the current epoch.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-01 0:24 ` Dave Chinner @ 2014-06-02 0:28 ` Dave Chinner 2014-06-02 11:35 ` Roger Willcocks 2014-06-02 11:43 ` Arnd Bergmann 0 siblings, 2 replies; 71+ messages in thread From: Dave Chinner @ 2014-06-02 0:28 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > In my list at http://kernelnewbies.org/y2038, I found that almost > > all file systems at least times until 2106, because they treat > > the on-disk value as unsigned on 64-bit systems, or they use > > a completely different representation. My guess is that somebody > > earlier spent a lot of work on making that happen. > > > > The exceptions are: > > > > * exofs uses signed values, which can probably be changed to be > > consistent with the others. > > * isofs has a bug that limits it until 2027 on architectures with > > a signed 'char' type (otherwise it's 2155). > > * udf can represent times for many thousands of years through a > > 16-bit year representation, but the code to convert to epoch > > uses a const array that ends at 2038. > > * afs uses signed seconds and can probably be fixed > > * coda relies on user space time representation getting passed > > through an ioctl. > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, > > where they really use signed. > > > > I was confused about XFS since I didn't noticed that there are > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected > > XFS to also use the 1970-2106 time range on 64-bit systems today. > > You've missed an awful lot more than just the implications for the > core kernel code. > > There's a good chance such changes propagate to APIs elsewhere in > the filesystems, because something you haven't realised is that XFS > effectively exposes the on-disk timestamp format directly to > userspace via the bulkstat interface (see struct xfs_bstat). It also > affects the XFS open-by-handle ioctl and the swap extent ioctl used > by the online defragmenter. > > IOWs, if we are changing the on-disk timestamp format then this > affects several ioctl()s and hence quite a few of the XFS userspace > utilities. The hardest to fix will be xfsdump which would need a new > dump format to store the extended timestamp ranges, and then > xfs_restore will need to be able to handle restoring such timestamps > on filesystems that don't have extended timestamp support... > > Put simply, changing the structure of system time isn't as straight > forward as changing the kernel structures. System time gets stored > permanently, and that has a cascade effect through the kernel all > to all of the filesystem utilities that know about that permanent > storage in some way.... > > So yes, you can change the kernel definition, but until the > permanent storage of system time can be extended to support the same > range as the kernel the *system* will still have nasty, silent epoch > overflow, truncation or corruption issues. Just to put that in context, here's the kernel patch to add extended epoch support to XFS. It's completely untested as I haven't done any userspace code changes to enable the feature. However, it should give you an indication of how far the simple act of changing the kernel time representation spread through the filesystem. This does not include any of the VFS infrastructure to specifying the range of supported timestamps. It survives some smoke testing, but dies when the online defragmenter starts using the bulkstat and swap extent ioctls (the assert in xfs_inode_time_from_epoch() fires), so I probably don't have that all sorted correctly yet... To test extended epoch support, however, I need to some fstests that define and validate the behaviour of the new syscalls - until we get those we can't validate that the filesystem follows the spec properly. I also suspect we are going to need an interface to query the supported range of timestamps from a filesystem so that we can test boundary conditions in an automated fashion.... Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: support timestamps beyond Unix epochs From: Dave Chinner <dchinner@redhat.com> The 32 bit second counters in timestamps are too small to represent time beyond the unix epoch (jan 2038) correctly. Extend the on-disk format for a timestamp to include an 8-bit epoch counter so that we can extend time for up to 255 Unix epochs. This should be good for representing timestamps from 1970 to somewhere around 19,000 A.D.... Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/time.h | 7 ------ fs/xfs/xfs_bmap_util.c | 35 +++++++++++++++++----------- fs/xfs/xfs_dinode.h | 48 ++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_fs.h | 9 +++++++- fs/xfs/xfs_fsops.c | 5 +++- fs/xfs/xfs_inode.c | 16 ++++++++++--- fs/xfs/xfs_inode_buf.c | 8 +++++++ fs/xfs/xfs_ioctl32.c | 3 +++ fs/xfs/xfs_ioctl32.h | 5 +++- fs/xfs/xfs_iops.c | 59 +++++++++++++++++++++++++++++++----------------- fs/xfs/xfs_itable.c | 12 ++++++++++ fs/xfs/xfs_log_format.h | 4 ++++ fs/xfs/xfs_sb.h | 12 +++++++++- fs/xfs/xfs_trans_inode.c | 2 +- 14 files changed, 175 insertions(+), 50 deletions(-) diff --git a/fs/xfs/time.h b/fs/xfs/time.h index 387e695..9f38d60 100644 --- a/fs/xfs/time.h +++ b/fs/xfs/time.h @@ -21,16 +21,9 @@ #include <linux/sched.h> #include <linux/time.h> -typedef struct timespec timespec_t; - static inline void delay(long ticks) { schedule_timeout_uninterruptible(ticks); } -static inline void nanotime(struct timespec *tvp) -{ - *tvp = CURRENT_TIME; -} - #endif /* __XFS_SUPPORT_TIME_H__ */ diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 703b3ec..dbc9a74 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1686,6 +1686,7 @@ xfs_swap_extents( int aforkblks = 0; int taforkblks = 0; __uint64_t tmp; + struct timespec tv; tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL); if (!tempifp) { @@ -1746,25 +1747,33 @@ xfs_swap_extents( } /* - * Compare the current change & modify times with that - * passed in. If they differ, we abort this swap. - * This is the mechanism used to ensure the calling - * process that the file was not changed out from + * Compare the current change & modify times with that passed in. If + * they differ, we abort this swap. This is the mechanism used to + * ensure the calling process that the file was not changed out from * under it. */ - if ((sbp->bs_ctime.tv_sec != VFS_I(ip)->i_ctime.tv_sec) || - (sbp->bs_ctime.tv_nsec != VFS_I(ip)->i_ctime.tv_nsec) || - (sbp->bs_mtime.tv_sec != VFS_I(ip)->i_mtime.tv_sec) || - (sbp->bs_mtime.tv_nsec != VFS_I(ip)->i_mtime.tv_nsec)) { + tv.tv_sec = xfs_inode_time_from_epoch(sbp->bs_ctime.tv_sec, + sbp->bs_ctime_epoch); + tv.tv_nsec = sbp->bs_ctime.tv_nsec; + if (timespec_compare(&tv, &VFS_I(ip)->i_ctime)) { error = XFS_ERROR(EBUSY); goto out_unlock; } - /* We need to fail if the file is memory mapped. Once we have tossed - * all existing pages, the page fault will have no option - * but to go to the filesystem for pages. By making the page fault call - * vop_read (or write in the case of autogrow) they block on the iolock - * until we have switched the extents. + tv.tv_sec = xfs_inode_time_from_epoch(sbp->bs_mtime.tv_sec, + sbp->bs_mtime_epoch); + tv.tv_nsec = sbp->bs_mtime.tv_nsec; + if (timespec_compare(&tv, &VFS_I(ip)->i_mtime)) { + error = XFS_ERROR(EBUSY); + goto out_unlock; + } + + /* + * We need to fail if the file is memory mapped. Once we have tossed + * all existing pages, the page fault will have no option but to go to + * the filesystem for pages. By making the page fault call vop_read (or + * write in the case of autogrow) they block on the iolock until we have + * switched the extents. */ if (VN_MAPPED(VFS_I(ip))) { error = XFS_ERROR(EBUSY); diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h index 623bbe8..79f94722 100644 --- a/fs/xfs/xfs_dinode.h +++ b/fs/xfs/xfs_dinode.h @@ -21,11 +21,53 @@ #define XFS_DINODE_MAGIC 0x494e /* 'IN' */ #define XFS_DINODE_GOOD_VERSION(v) ((v) >= 1 && (v) <= 3) +/* + * Inode timestamps get more complex when we consider supporting times beyond + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot support + * more than a single extension by playing sign games, and that is still not + * reliable. We also can't extend the timestamp structure because there is no + * free space around them in the on-disk inode. + * + * Hence the simplest thing to do is to add an epoch counter for each timestamp + * in the inode. This can be a single byte for each timestamp and make use of + * a hole we currently pad. This gives us another 255 epochs range for the + * timestamps, but requires a superblock feature bit to indicate that these + * fields have meaning and can be non-zero. + * + * Provide wrapper functions for converting the kernel inode time format to + * the on-disk fields. The nanosecond counter is unlikely to change in future, + * so it's mostly just for the second+epoch counter conversion. + */ typedef struct xfs_timestamp { __be32 t_sec; /* timestamp seconds */ __be32 t_nsec; /* timestamp nanoseconds */ } xfs_timestamp_t; +static inline __uint8_t +xfs_timestamp_epoch( + struct timespec *time) +{ + /* will be zero until the extended struct inode_time is introduced */ + return 0; +} + +static inline __int32_t +xfs_timestamp_sec( + struct timespec *time) +{ + return time->tv_sec; +} + +static inline __kernel_time_t +xfs_inode_time_from_epoch( + __uint8_t epoch, + __int32_t seconds) +{ + /* need to handle non-zero epoch when struct inode_time is introduced */ + ASSERT(epoch == 0); + return seconds; +} + /* * On-disk inode structure. * @@ -54,7 +96,11 @@ typedef struct xfs_dinode { __be32 di_nlink; /* number of links to file */ __be16 di_projid_lo; /* lower part of owner's project id */ __be16 di_projid_hi; /* higher part owner's project id */ - __u8 di_pad[6]; /* unused, zeroed space */ + __u8 di_atime_epoch; /* access time epoch */ + __u8 di_mtime_epoch; /* modify time epoch */ + __u8 di_ctime_epoch; /* change time epoch */ + __u8 di_crtime_epoch;/* create time epoch */ + __u8 di_pad[2]; /* unused, zeroed space */ __be16 di_flushiter; /* incremented on flush */ xfs_timestamp_t di_atime; /* time last accessed */ xfs_timestamp_t di_mtime; /* time last modified */ diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h index d34703d..fb0a0ea 100644 --- a/fs/xfs/xfs_fs.h +++ b/fs/xfs/xfs_fs.h @@ -239,6 +239,7 @@ typedef struct xfs_fsop_resblks { #define XFS_FSOP_GEOM_FLAGS_V5SB 0x8000 /* version 5 superblock */ #define XFS_FSOP_GEOM_FLAGS_FTYPE 0x10000 /* inode directory types */ #define XFS_FSOP_GEOM_FLAGS_FINOBT 0x20000 /* free inode btree */ +#define XFS_FSOP_GEOM_FLAGS_EPOCH 0x40000 /* timestamp epochs */ /* * Minimum and maximum sizes need for growth checks. @@ -280,6 +281,9 @@ typedef struct xfs_growfs_rt { /* * Structures returned from ioctl XFS_IOC_FSBULKSTAT & XFS_IOC_FSBULKSTAT_SINGLE + * + * Time epoch structures are only used if the XFS_FSOP_GEOM_FLAGS_EPOCH flag is + * asserted in the geometry output. */ typedef struct xfs_bstime { time_t tv_sec; /* seconds */ @@ -307,7 +311,10 @@ typedef struct xfs_bstat { #define bs_projid bs_projid_lo /* (previously just bs_projid) */ __u16 bs_forkoff; /* inode fork offset in bytes */ __u16 bs_projid_hi; /* higher part of project id */ - unsigned char bs_pad[10]; /* pad space, unused */ + __u8 bs_atime_epoch; /* access time epoch */ + __u8 bs_mtime_epoch; /* modify time epoch */ + __u8 bs_ctime_epoch; /* change time epoch */ + unsigned char bs_pad[7]; /* pad space, unused */ __u32 bs_dmevmask; /* DMIG event mask */ __u16 bs_dmstate; /* DMIG state info */ __u16 bs_aextents; /* attribute number of extents */ diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index d229556..7b8db57 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -103,7 +103,10 @@ xfs_fs_geometry( (xfs_sb_version_hasftype(&mp->m_sb) ? XFS_FSOP_GEOM_FLAGS_FTYPE : 0) | (xfs_sb_version_hasfinobt(&mp->m_sb) ? - XFS_FSOP_GEOM_FLAGS_FINOBT : 0); + XFS_FSOP_GEOM_FLAGS_FINOBT : 0) | + (xfs_sb_version_hasepoch(&mp->m_sb) ? + XFS_FSOP_GEOM_FLAGS_EPOCH : 0); + geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ? mp->m_sb.sb_logsectsize : BBSIZE; geo->rtsectsize = mp->m_sb.sb_blocksize; diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index a6115fe..eecae93 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -654,7 +654,8 @@ xfs_ialloc( xfs_inode_t *ip; uint flags; int error; - timespec_t tv; + struct timespec tv; + bool has_epoch; /* * Call the space management code to pick @@ -720,12 +721,19 @@ xfs_ialloc( ip->i_d.di_nextents = 0; ASSERT(ip->i_d.di_nblocks == 0); - nanotime(&tv); - ip->i_d.di_mtime.t_sec = (__int32_t)tv.tv_sec; + has_epoch = xfs_sb_version_hasepoch(&mp->m_sb); + tv = current_fs_time(mp->m_super); + ip->i_d.di_mtime.t_sec = xfs_timestamp_sec(&tv); ip->i_d.di_mtime.t_nsec = (__int32_t)tv.tv_nsec; ip->i_d.di_atime = ip->i_d.di_mtime; ip->i_d.di_ctime = ip->i_d.di_mtime; + if (has_epoch) { + ip->i_d.di_mtime_epoch = xfs_timestamp_epoch(&tv); + ip->i_d.di_atime_epoch = ip->i_d.di_mtime_epoch; + ip->i_d.di_ctime_epoch = ip->i_d.di_mtime_epoch; + } + /* * di_gen will have been taken care of in xfs_iread. */ @@ -743,6 +751,8 @@ xfs_ialloc( ip->i_d.di_flags2 = 0; memset(&(ip->i_d.di_pad2[0]), 0, sizeof(ip->i_d.di_pad2)); ip->i_d.di_crtime = ip->i_d.di_mtime; + if (has_epoch) + ip->i_d.di_crtime_epoch = ip->i_d.di_mtime_epoch; } diff --git a/fs/xfs/xfs_inode_buf.c b/fs/xfs/xfs_inode_buf.c index cb35ae4..0459e3d 100644 --- a/fs/xfs/xfs_inode_buf.c +++ b/fs/xfs/xfs_inode_buf.c @@ -208,6 +208,10 @@ xfs_dinode_from_disk( to->di_nlink = be32_to_cpu(from->di_nlink); to->di_projid_lo = be16_to_cpu(from->di_projid_lo); to->di_projid_hi = be16_to_cpu(from->di_projid_hi); + to->di_atime_epoch = from->di_atime_epoch; + to->di_mtime_epoch = from->di_mtime_epoch; + to->di_ctime_epoch = from->di_ctime_epoch; + to->di_crtime_epoch = from->di_crtime_epoch; memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad)); to->di_flushiter = be16_to_cpu(from->di_flushiter); to->di_atime.t_sec = be32_to_cpu(from->di_atime.t_sec); @@ -255,6 +259,10 @@ xfs_dinode_to_disk( to->di_nlink = cpu_to_be32(from->di_nlink); to->di_projid_lo = cpu_to_be16(from->di_projid_lo); to->di_projid_hi = cpu_to_be16(from->di_projid_hi); + to->di_atime_epoch = from->di_atime_epoch; + to->di_mtime_epoch = from->di_mtime_epoch; + to->di_ctime_epoch = from->di_ctime_epoch; + to->di_crtime_epoch = from->di_crtime_epoch; memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad)); to->di_atime.t_sec = cpu_to_be32(from->di_atime.t_sec); to->di_atime.t_nsec = cpu_to_be32(from->di_atime.t_nsec); diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c index 944d5ba..215324f 100644 --- a/fs/xfs/xfs_ioctl32.c +++ b/fs/xfs/xfs_ioctl32.c @@ -161,6 +161,9 @@ xfs_ioctl32_bstat_copyin( get_user(bstat->bs_gen, &bstat32->bs_gen) || get_user(bstat->bs_projid_lo, &bstat32->bs_projid_lo) || get_user(bstat->bs_projid_hi, &bstat32->bs_projid_hi) || + get_user(bstat->bs_atime_epoch, &bstat32->bs_atime_epoch) || + get_user(bstat->bs_mtime_epoch, &bstat32->bs_mtime_epoch) || + get_user(bstat->bs_ctime_epoch, &bstat32->bs_ctime_epoch) || get_user(bstat->bs_dmevmask, &bstat32->bs_dmevmask) || get_user(bstat->bs_dmstate, &bstat32->bs_dmstate) || get_user(bstat->bs_aextents, &bstat32->bs_aextents)) diff --git a/fs/xfs/xfs_ioctl32.h b/fs/xfs/xfs_ioctl32.h index 80f4060..2a35c62 100644 --- a/fs/xfs/xfs_ioctl32.h +++ b/fs/xfs/xfs_ioctl32.h @@ -68,7 +68,10 @@ typedef struct compat_xfs_bstat { __u16 bs_projid_lo; /* lower part of project id */ #define bs_projid bs_projid_lo /* (previously just bs_projid) */ __u16 bs_projid_hi; /* high part of project id */ - unsigned char bs_pad[12]; /* pad space, unused */ + __u8 bs_atime_epoch; /* access time epoch */ + __u8 bs_mtime_epoch; /* modify time epoch */ + __u8 bs_ctime_epoch; /* change time epoch */ + unsigned char bs_pad[9]; /* pad space, unused */ __u32 bs_dmevmask; /* DMIG event mask */ __u16 bs_dmstate; /* DMIG state info */ __u16 bs_aextents; /* attribute number of extents */ diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 205613a..0588381 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -505,23 +505,34 @@ xfs_setattr_time( struct iattr *iattr) { struct inode *inode = VFS_I(ip); + bool has_epoch; ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + has_epoch = xfs_sb_version_hasepoch(&ip->i_mount->m_sb); if (iattr->ia_valid & ATTR_ATIME) { inode->i_atime = iattr->ia_atime; - ip->i_d.di_atime.t_sec = iattr->ia_atime.tv_sec; - ip->i_d.di_atime.t_nsec = iattr->ia_atime.tv_nsec; + ip->i_d.di_atime.t_sec = xfs_timestamp_sec(&iattr->ia_atime); + ip->i_d.di_atime.t_nsec = (__int32_t)iattr->ia_atime.tv_nsec; + if (has_epoch) + ip->i_d.di_atime_epoch = + xfs_timestamp_epoch(&iattr->ia_atime); } if (iattr->ia_valid & ATTR_CTIME) { inode->i_ctime = iattr->ia_ctime; - ip->i_d.di_ctime.t_sec = iattr->ia_ctime.tv_sec; - ip->i_d.di_ctime.t_nsec = iattr->ia_ctime.tv_nsec; + ip->i_d.di_ctime.t_sec = xfs_timestamp_sec(&iattr->ia_ctime); + ip->i_d.di_ctime.t_nsec = (__int32_t)iattr->ia_ctime.tv_nsec; + if (has_epoch) + ip->i_d.di_ctime_epoch = + xfs_timestamp_epoch(&iattr->ia_ctime); } if (iattr->ia_valid & ATTR_MTIME) { inode->i_mtime = iattr->ia_mtime; - ip->i_d.di_mtime.t_sec = iattr->ia_mtime.tv_sec; - ip->i_d.di_mtime.t_nsec = iattr->ia_mtime.tv_nsec; + ip->i_d.di_mtime.t_sec = xfs_timestamp_sec(&iattr->ia_mtime); + ip->i_d.di_mtime.t_nsec = (__int32_t)iattr->ia_mtime.tv_nsec; + if (has_epoch) + ip->i_d.di_mtime_epoch = + xfs_timestamp_epoch(&iattr->ia_mtime); } } @@ -963,6 +974,7 @@ xfs_vn_update_time( struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; int error; + struct iattr iattr = {0}; trace_xfs_update_time(ip); @@ -975,20 +987,19 @@ xfs_vn_update_time( xfs_ilock(ip, XFS_ILOCK_EXCL); if (flags & S_CTIME) { - inode->i_ctime = *now; - ip->i_d.di_ctime.t_sec = (__int32_t)now->tv_sec; - ip->i_d.di_ctime.t_nsec = (__int32_t)now->tv_nsec; + iattr.ia_valid |= ATTR_CTIME; + iattr.ia_ctime = *now; } if (flags & S_MTIME) { - inode->i_mtime = *now; - ip->i_d.di_mtime.t_sec = (__int32_t)now->tv_sec; - ip->i_d.di_mtime.t_nsec = (__int32_t)now->tv_nsec; + iattr.ia_valid |= ATTR_MTIME; + iattr.ia_mtime = *now; } if (flags & S_ATIME) { - inode->i_atime = *now; - ip->i_d.di_atime.t_sec = (__int32_t)now->tv_sec; - ip->i_d.di_atime.t_nsec = (__int32_t)now->tv_nsec; + iattr.ia_valid |= ATTR_ATIME; + iattr.ia_atime = *now; } + xfs_setattr_time(ip, &iattr); + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP); return -xfs_trans_commit(tp, 0); @@ -1239,12 +1250,18 @@ xfs_setup_inode( inode->i_generation = ip->i_d.di_gen; i_size_write(inode, ip->i_d.di_size); - inode->i_atime.tv_sec = ip->i_d.di_atime.t_sec; - inode->i_atime.tv_nsec = ip->i_d.di_atime.t_nsec; - inode->i_mtime.tv_sec = ip->i_d.di_mtime.t_sec; - inode->i_mtime.tv_nsec = ip->i_d.di_mtime.t_nsec; - inode->i_ctime.tv_sec = ip->i_d.di_ctime.t_sec; - inode->i_ctime.tv_nsec = ip->i_d.di_ctime.t_nsec; + inode->i_atime.tv_sec = xfs_inode_time_from_epoch( + ip->i_d.di_atime_epoch, + ip->i_d.di_atime.t_sec); + inode->i_atime.tv_nsec = ip->i_d.di_atime.t_nsec; + inode->i_mtime.tv_sec = xfs_inode_time_from_epoch( + ip->i_d.di_mtime_epoch, + ip->i_d.di_mtime.t_sec); + inode->i_mtime.tv_nsec = ip->i_d.di_mtime.t_nsec; + inode->i_ctime.tv_sec = xfs_inode_time_from_epoch( + ip->i_d.di_ctime_epoch, + ip->i_d.di_ctime.t_sec); + inode->i_ctime.tv_nsec = ip->i_d.di_ctime.t_nsec; xfs_diflags_to_iflags(inode, ip); ip->d_ops = ip->i_mount->m_nondir_inode_ops; diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index cb64f22..e902418 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -97,12 +97,24 @@ xfs_bulkstat_one_int( buf->bs_uid = dic->di_uid; buf->bs_gid = dic->di_gid; buf->bs_size = dic->di_size; + + /* timestamp epochs are emitted only when configured */ buf->bs_atime.tv_sec = dic->di_atime.t_sec; buf->bs_atime.tv_nsec = dic->di_atime.t_nsec; buf->bs_mtime.tv_sec = dic->di_mtime.t_sec; buf->bs_mtime.tv_nsec = dic->di_mtime.t_nsec; buf->bs_ctime.tv_sec = dic->di_ctime.t_sec; buf->bs_ctime.tv_nsec = dic->di_ctime.t_nsec; + if (xfs_sb_version_hasepoch(&mp->m_sb)) { + buf->bs_atime_epoch = dic->di_atime_epoch; + buf->bs_mtime_epoch = dic->di_mtime_epoch; + buf->bs_ctime_epoch = dic->di_ctime_epoch; + } else { + buf->bs_atime_epoch = 0; + buf->bs_mtime_epoch = 0; + buf->bs_ctime_epoch = 0; + } + buf->bs_xflags = xfs_ip2xflags(ip); buf->bs_extsize = dic->di_extsize << mp->m_sb.sb_blocklog; buf->bs_extents = dic->di_nextents; diff --git a/fs/xfs/xfs_log_format.h b/fs/xfs/xfs_log_format.h index f0969c7..abac6ad 100644 --- a/fs/xfs/xfs_log_format.h +++ b/fs/xfs/xfs_log_format.h @@ -374,6 +374,10 @@ typedef struct xfs_icdinode { __uint32_t di_nlink; /* number of links to file */ __uint16_t di_projid_lo; /* lower part of owner's project id */ __uint16_t di_projid_hi; /* higher part of owner's project id */ + __uint8_t di_atime_epoch; /* access time epoch */ + __uint8_t di_mtime_epoch; /* modify time epoch */ + __uint8_t di_ctime_epoch; /* change time epoch */ + __uint8_t di_crtime_epoch;/* create time epoch */ __uint8_t di_pad[6]; /* unused, zeroed space */ __uint16_t di_flushiter; /* incremented on flush */ xfs_ictimestamp_t di_atime; /* time last accessed */ diff --git a/fs/xfs/xfs_sb.h b/fs/xfs/xfs_sb.h index c43c2d6..1b3ccd8 100644 --- a/fs/xfs/xfs_sb.h +++ b/fs/xfs/xfs_sb.h @@ -509,8 +509,11 @@ xfs_sb_has_ro_compat_feature( } #define XFS_SB_FEAT_INCOMPAT_FTYPE (1 << 0) /* filetype in dirent */ +#define XFS_SB_FEAT_INCOMPAT_EPOCH (1 << 1) /* Time beyond 2038 */ #define XFS_SB_FEAT_INCOMPAT_ALL \ - (XFS_SB_FEAT_INCOMPAT_FTYPE) + (XFS_SB_FEAT_INCOMPAT_FTYPE | \ + XFS_SB_FEAT_INCOMPAT_EPOCH | \ + 0) #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL static inline bool @@ -558,6 +561,13 @@ static inline int xfs_sb_version_hasfinobt(xfs_sb_t *sbp) (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FINOBT); } +static inline int xfs_sb_version_hasepoch(xfs_sb_t *sbp) +{ + return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) && + (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_EPOCH); +} + + /* * end of superblock version macros */ diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c index 50c3f56..cdb4d86 100644 --- a/fs/xfs/xfs_trans_inode.c +++ b/fs/xfs/xfs_trans_inode.c @@ -70,7 +70,7 @@ xfs_trans_ichgtime( int flags) { struct inode *inode = VFS_I(ip); - timespec_t tv; + struct timespec tv; ASSERT(tp); ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 0:28 ` Dave Chinner @ 2014-06-02 11:35 ` Roger Willcocks 2014-06-02 11:43 ` Arnd Bergmann 1 sibling, 0 replies; 71+ messages in thread From: Roger Willcocks @ 2014-06-02 11:35 UTC (permalink / raw) To: Dave Chinner Cc: linux-arch, Arnd Bergmann, linux-kernel, geert, hch, john.stultz, H. Peter Anvin, linux-fsdevel, lftan, tglx, xfs, joseph On Mon, 2014-06-02 at 10:28 +1000, Dave Chinner wrote: > > The 32 bit second counters in timestamps are too small to represent > time beyond the unix epoch (jan 2038) correctly. Extend the on-disk > format for a timestamp to include an 8-bit epoch counter so that we > can extend time for up to 255 Unix epochs. This should be good for > representing timestamps from 1970 to somewhere around 19,000 A.D.... > I assume you're using an 'epoch' variable and not simply using the padding byte as an eight-bit prefix to the existing 32-bit counter because the existing counter is signed ? For long term sanity it might make more sense for the eight-bit value to be a simple (sign-extended) prefix from 1970. So if the feature bit is set it's a 40-bit signed time, which is good for 1970 +/- 17400 years or so. -- Roger _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 0:28 ` Dave Chinner 2014-06-02 11:35 ` Roger Willcocks @ 2014-06-02 11:43 ` Arnd Bergmann 2014-06-03 0:32 ` Dave Chinner 1 sibling, 1 reply; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 11:43 UTC (permalink / raw) To: Dave Chinner Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Monday 02 June 2014 10:28:22 Dave Chinner wrote: > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > > In my list at http://kernelnewbies.org/y2038, I found that almost > > > all file systems at least times until 2106, because they treat > > > the on-disk value as unsigned on 64-bit systems, or they use > > > a completely different representation. My guess is that somebody > > > earlier spent a lot of work on making that happen. > > > > > > The exceptions are: > > > > > > * exofs uses signed values, which can probably be changed to be > > > consistent with the others. > > > * isofs has a bug that limits it until 2027 on architectures with > > > a signed 'char' type (otherwise it's 2155). > > > * udf can represent times for many thousands of years through a > > > 16-bit year representation, but the code to convert to epoch > > > uses a const array that ends at 2038. > > > * afs uses signed seconds and can probably be fixed > > > * coda relies on user space time representation getting passed > > > through an ioctl. > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, > > > where they really use signed. > > > > > > I was confused about XFS since I didn't noticed that there are > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected > > > XFS to also use the 1970-2106 time range on 64-bit systems today. > > > > You've missed an awful lot more than just the implications for the > > core kernel code. > > > > There's a good chance such changes propagate to APIs elsewhere in > > the filesystems, because something you haven't realised is that XFS > > effectively exposes the on-disk timestamp format directly to > > userspace via the bulkstat interface (see struct xfs_bstat). It also > > affects the XFS open-by-handle ioctl and the swap extent ioctl used > > by the online defragmenter. I really didn't look at them at all, as ioctl is very late on my mental list of things to change. I do realize that a lot of drivers and file systems do have ioctls that pass time values and we need to address them one by one. I just looked at the ioctls you mentioned but don't see how open-by-handle is affected by this. Can you point me to what you mean? > Just to put that in context, here's the kernel patch to add extended > epoch support to XFS. It's completely untested as I haven't done any > userspace code changes to enable the feature. However, it should > give you an indication of how far the simple act of changing the > kernel time representation spread through the filesystem. This does > not include any of the VFS infrastructure to specifying the range of > supported timestamps. It survives some smoke testing, but dies when > the online defragmenter starts using the bulkstat and swap extent > ioctls (the assert in xfs_inode_time_from_epoch() fires), so I > probably don't have that all sorted correctly yet... > > To test extended epoch support, however, I need to some fstests that > define and validate the behaviour of the new syscalls - until we get > those we can't validate that the filesystem follows the spec > properly. I also suspect we are going to need an interface to query > the supported range of timestamps from a filesystem so that we can > test boundary conditions in an automated fashion.... Thanks a lot for having an initial look at this yourself! I'd still consider the two problems largely orthogonal. My patch set (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave more like 64-bit kernels regarding inode time stamps, which does impact all the file systems that the a 64-bit time or the NFS unsigned epoch (1970-2106), while your patch extends the file system internal epoch (1901-2038 for XFS) so it can be used by anything that knows how to handle larger than 32-bit second values (either 64-bit kernel or 32-bit with inode_time patch). > diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h > index 623bbe8..79f94722 100644 > --- a/fs/xfs/xfs_dinode.h > +++ b/fs/xfs/xfs_dinode.h > @@ -21,11 +21,53 @@ > #define XFS_DINODE_MAGIC 0x494e /* 'IN' */ > #define XFS_DINODE_GOOD_VERSION(v) ((v) >= 1 && (v) <= 3) > > +/* > + * Inode timestamps get more complex when we consider supporting times beyond > + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot support > + * more than a single extension by playing sign games, and that is still not > + * reliable. We also can't extend the timestamp structure because there is no > + * free space around them in the on-disk inode. > + * > + * Hence the simplest thing to do is to add an epoch counter for each timestamp > + * in the inode. This can be a single byte for each timestamp and make use of > + * a hole we currently pad. This gives us another 255 epochs range for the > + * timestamps, but requires a superblock feature bit to indicate that these > + * fields have meaning and can be non-zero. Nice trick! > +static inline __uint8_t > +xfs_timestamp_epoch( > + struct timespec *time) > +{ > + /* will be zero until the extended struct inode_time is introduced */ > + return 0; > +} > + > +static inline __int32_t > +xfs_timestamp_sec( > + struct timespec *time) > +{ > + return time->tv_sec; > +} > + > +static inline __kernel_time_t > +xfs_inode_time_from_epoch( > + __uint8_t epoch, > + __int32_t seconds) > +{ > + /* need to handle non-zero epoch when struct inode_time is introduced */ > + ASSERT(epoch == 0); > + return seconds; > +} Why don't you already implement epoch conversion for 64-bit kernels that are able to represent the time today? This is how ext4 does it (I mean the sizeof() trick, not the bit stuffing they do): static inline __le32 ext4_encode_extra_time(struct inode_time *time) { return cpu_to_le32((sizeof(time->tv_sec) > 4 ? (time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) | ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK)); } static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra) { if (sizeof(time->tv_sec) > 4) time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32; time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; } I guess if there is general agreement on introducing 'struct inode_time', we can skip that intermediate step. > @@ -509,8 +509,11 @@ xfs_sb_has_ro_compat_feature( > } > > #define XFS_SB_FEAT_INCOMPAT_FTYPE (1 << 0) /* filetype in dirent */ > +#define XFS_SB_FEAT_INCOMPAT_EPOCH (1 << 1) /* Time beyond 2038 */ > #define XFS_SB_FEAT_INCOMPAT_ALL \ > - (XFS_SB_FEAT_INCOMPAT_FTYPE) > + (XFS_SB_FEAT_INCOMPAT_FTYPE | \ > + XFS_SB_FEAT_INCOMPAT_EPOCH | \ > + 0) > > #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL How does this flag get set? Do you have to manually change it in the superblock? Since most of the time I'd suspect you wouldn't actually use it for the foreseeable future, would it make sense to have a mount option that allows it to be set, but doesn't actually change the superblock until the first inode gets written with a nonzero epoch? That way, you'd still be able to mount it with an older kernel but also be forward compatible with time moving on. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-02 11:43 ` Arnd Bergmann @ 2014-06-03 0:32 ` Dave Chinner 2014-06-03 7:33 ` Arnd Bergmann 0 siblings, 1 reply; 71+ messages in thread From: Dave Chinner @ 2014-06-03 0:32 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: > On Monday 02 June 2014 10:28:22 Dave Chinner wrote: > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > > > In my list at http://kernelnewbies.org/y2038, I found that almost > > > > all file systems at least times until 2106, because they treat > > > > the on-disk value as unsigned on 64-bit systems, or they use > > > > a completely different representation. My guess is that somebody > > > > earlier spent a lot of work on making that happen. > > > > > > > > The exceptions are: > > > > > > > > * exofs uses signed values, which can probably be changed to be > > > > consistent with the others. > > > > * isofs has a bug that limits it until 2027 on architectures with > > > > a signed 'char' type (otherwise it's 2155). > > > > * udf can represent times for many thousands of years through a > > > > 16-bit year representation, but the code to convert to epoch > > > > uses a const array that ends at 2038. > > > > * afs uses signed seconds and can probably be fixed > > > > * coda relies on user space time representation getting passed > > > > through an ioctl. > > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, > > > > where they really use signed. > > > > > > > > I was confused about XFS since I didn't noticed that there are > > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected > > > > XFS to also use the 1970-2106 time range on 64-bit systems today. > > > > > > You've missed an awful lot more than just the implications for the > > > core kernel code. > > > > > > There's a good chance such changes propagate to APIs elsewhere in > > > the filesystems, because something you haven't realised is that XFS > > > effectively exposes the on-disk timestamp format directly to > > > userspace via the bulkstat interface (see struct xfs_bstat). It also > > > affects the XFS open-by-handle ioctl and the swap extent ioctl used > > > by the online defragmenter. > > I really didn't look at them at all, as ioctl is very late on my > mental list of things to change. I do realize that a lot of drivers > and file systems do have ioctls that pass time values and we need to > address them one by one. > > I just looked at the ioctls you mentioned but don't see how open-by-handle > is affected by this. Can you point me to what you mean? Sorry, I misremembered how some of the XFS open-by-handle code works in userspace (XFS has a pretty rich open-by-handle ioctl() interface that predates the kernel syscalls by at least 10 years). Basically there is code in userspace that uses the information returned from bulkstat to construct file handles to pass to the open-by-handle ioctls. xfs_fsr then uses the combination of open-by-handle from the bulkstat output and the bulkstat output to feed into the swap extent ioctls.... i.e. the filesystem's idea of what time is is passed to userspace as an opaque cookie in this case, but it is not used directly by the open-by-handle interfaces like I implied it was. > > Just to put that in context, here's the kernel patch to add extended > > epoch support to XFS. It's completely untested as I haven't done any > > userspace code changes to enable the feature. However, it should > > give you an indication of how far the simple act of changing the > > kernel time representation spread through the filesystem. This does > > not include any of the VFS infrastructure to specifying the range of > > supported timestamps. It survives some smoke testing, but dies when > > the online defragmenter starts using the bulkstat and swap extent > > ioctls (the assert in xfs_inode_time_from_epoch() fires), so I > > probably don't have that all sorted correctly yet... > > > > To test extended epoch support, however, I need to some fstests that > > define and validate the behaviour of the new syscalls - until we get > > those we can't validate that the filesystem follows the spec > > properly. I also suspect we are going to need an interface to query > > the supported range of timestamps from a filesystem so that we can > > test boundary conditions in an automated fashion.... > > Thanks a lot for having an initial look at this yourself! > > I'd still consider the two problems largely orthogonal. Depends how you look at it. You can't extend the kernel's idea of time without permanent storage being able to specify the supported bounds - that's a non-negotiable aspect of introducing extended epoch timestamp support. The actual addition of extended timestamp support to each individual filesystem is orthoganol to the introduction of the struct inode_time, but doing this addition properly is dependent on the VFS infrastructure being there in the first place. > My patch set > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave > more like 64-bit kernels regarding inode time stamps, which does > impact all the file systems that the a 64-bit time or the NFS > unsigned epoch (1970-2106), while your patch extends the file > system internal epoch (1901-2038 for XFS) so it can be used by > anything that knows how to handle larger than 32-bit second values > (either 64-bit kernel or 32-bit with inode_time patch). Right, but the issue is that 64 bit second counters are broken right now because most filesystems can't support more than 32 bit values. So it doesn't matter whether it's 32 bit or 64 bit machines, just adding explicit support for >32 bit second counters without doing anything else just extends that brokenness into the indefinite future. If we don't fix it now (i.e in the new user API and supporting infrastructure), then we'll *never be able to fix it* and we'll be stuck with timestamps that do really weird things when you pass arbitrary future dates to the kernel. > > diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h > > index 623bbe8..79f94722 100644 > > --- a/fs/xfs/xfs_dinode.h > > +++ b/fs/xfs/xfs_dinode.h > > @@ -21,11 +21,53 @@ > > #define XFS_DINODE_MAGIC 0x494e /* 'IN' */ > > #define XFS_DINODE_GOOD_VERSION(v) ((v) >= 1 && (v) <= 3) > > > > +/* > > + * Inode timestamps get more complex when we consider supporting times beyond > > + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot support > > + * more than a single extension by playing sign games, and that is still not > > + * reliable. We also can't extend the timestamp structure because there is no > > + * free space around them in the on-disk inode. > > + * > > + * Hence the simplest thing to do is to add an epoch counter for each timestamp > > + * in the inode. This can be a single byte for each timestamp and make use of > > + * a hole we currently pad. This gives us another 255 epochs range for the > > + * timestamps, but requires a superblock feature bit to indicate that these > > + * fields have meaning and can be non-zero. > > Nice trick! It's a pretty common way of extending the range of a variable for on-disk formats. The on-disk format is completely disconnected from the in-memory representation, so it's "easy" to play games like this within the on-disk format. If you look closely at ext4, you'll see all the lo/hi variables where extension of 16->32 bits or 32->48 bits has occurred from the ext2/3 variable formats... ;) > > > +static inline __uint8_t > > +xfs_timestamp_epoch( > > + struct timespec *time) > > +{ > > + /* will be zero until the extended struct inode_time is introduced */ > > + return 0; > > +} > > + > > +static inline __int32_t > > +xfs_timestamp_sec( > > + struct timespec *time) > > +{ > > + return time->tv_sec; > > +} > > + > > +static inline __kernel_time_t > > +xfs_inode_time_from_epoch( > > + __uint8_t epoch, > > + __int32_t seconds) > > +{ > > + /* need to handle non-zero epoch when struct inode_time is introduced */ > > + ASSERT(epoch == 0); > > + return seconds; > > +} > > Why don't you already implement epoch conversion for 64-bit kernels that > are able to represent the time today? Because I wasn't trying to solve the entire problem, just demonstrate the infrastructure needed to support extended timestamps..... > This is how ext4 does it (I mean > the sizeof() trick, not the bit stuffing they do): .... > I guess if there is general agreement on introducing 'struct inode_time', > we can skip that intermediate step. Also, I don't like the concept of having filesystems that will work on 64 bit but not 32 bit machines. Over the past 10 years, we've managed to remove most of those differences from the VFS and XFS, so adding new distinctions between 32/64 bit machines is not the direction I want to head in. As it is, I'm expecting to do this only after the struct inode_time and the superblock "time range" infrastructure have been added to the kernel and VFS. If that change is not made, then we've still only got 32 bit time.... > > @@ -509,8 +509,11 @@ xfs_sb_has_ro_compat_feature( > > } > > > > #define XFS_SB_FEAT_INCOMPAT_FTYPE (1 << 0) /* filetype in dirent */ > > +#define XFS_SB_FEAT_INCOMPAT_EPOCH (1 << 1) /* Time beyond 2038 */ > > #define XFS_SB_FEAT_INCOMPAT_ALL \ > > - (XFS_SB_FEAT_INCOMPAT_FTYPE) > > + (XFS_SB_FEAT_INCOMPAT_FTYPE | \ > > + XFS_SB_FEAT_INCOMPAT_EPOCH | \ > > + 0) > > > > #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL > > How does this flag get set? mkfs.xfs > Do you have to manually change it in the > superblock? Since most of the time I'd suspect you wouldn't actually > use it for the foreseeable future, would it make sense to have a mount > option that allows it to be set, but doesn't actually change the > superblock until the first inode gets written with a nonzero epoch? Yes, we could set the flag on the first timestamp that goes beyond the current epoch, but that has two problems: 1. filesystem silently becomes incompatible with older kernels so failed upgrade rollbacks become problematic; and 2. It adds unecessary complexity, as this will end up being the default behaviour for all new filesystems within a year. Then we end up with a mount option and conversion functions that never get used but we have to support for years.... > That way, you'd still be able to mount it with an older kernel but > also be forward compatible with time moving on. We've got plenty of time to roll this out so I don't see any need for putting in place temporary support mechanisms that unnecessarily complicate the code. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-03 0:32 ` Dave Chinner @ 2014-06-03 7:33 ` Arnd Bergmann 2014-06-03 8:41 ` Dave Chinner 0 siblings, 1 reply; 71+ messages in thread From: Arnd Bergmann @ 2014-06-03 7:33 UTC (permalink / raw) To: Dave Chinner Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote: > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote: > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > > > > In my list at http://kernelnewbies.org/y2038, I found that almost > > > > > all file systems at least times until 2106, because they treat > > > > > the on-disk value as unsigned on 64-bit systems, or they use > > > > > a completely different representation. My guess is that somebody > > > > > earlier spent a lot of work on making that happen. > > > > > > > > > > The exceptions are: > > > > > > > > > > * exofs uses signed values, which can probably be changed to be > > > > > consistent with the others. > > > > > * isofs has a bug that limits it until 2027 on architectures with > > > > > a signed 'char' type (otherwise it's 2155). > > > > > * udf can represent times for many thousands of years through a > > > > > 16-bit year representation, but the code to convert to epoch > > > > > uses a const array that ends at 2038. > > > > > * afs uses signed seconds and can probably be fixed > > > > > * coda relies on user space time representation getting passed > > > > > through an ioctl. > > > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, > > > > > where they really use signed. > > > > > > > > > > I was confused about XFS since I didn't noticed that there are > > > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected > > > > > XFS to also use the 1970-2106 time range on 64-bit systems today. > > > > > > > > You've missed an awful lot more than just the implications for the > > > > core kernel code. > > > > > > > > There's a good chance such changes propagate to APIs elsewhere in > > > > the filesystems, because something you haven't realised is that XFS > > > > effectively exposes the on-disk timestamp format directly to > > > > userspace via the bulkstat interface (see struct xfs_bstat). It also > > > > affects the XFS open-by-handle ioctl and the swap extent ioctl used > > > > by the online defragmenter. > > > > I really didn't look at them at all, as ioctl is very late on my > > mental list of things to change. I do realize that a lot of drivers > > and file systems do have ioctls that pass time values and we need to > > address them one by one. > > > > I just looked at the ioctls you mentioned but don't see how open-by-handle > > is affected by this. Can you point me to what you mean? > > Sorry, I misremembered how some of the XFS open-by-handle code works > in userspace (XFS has a pretty rich open-by-handle ioctl() interface > that predates the kernel syscalls by at least 10 years). Basically > there is code in userspace that uses the information returned from > bulkstat to construct file handles to pass to the open-by-handle > ioctls. xfs_fsr then uses the combination of open-by-handle from the > bulkstat output and the bulkstat output to feed into the swap extent > ioctls.... > > i.e. the filesystem's idea of what time is is passed to userspace as > an opaque cookie in this case, but it is not used directly by the > open-by-handle interfaces like I implied it was. Ok, I see. > > My patch set > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave > > more like 64-bit kernels regarding inode time stamps, which does > > impact all the file systems that the a 64-bit time or the NFS > > unsigned epoch (1970-2106), while your patch extends the file > > system internal epoch (1901-2038 for XFS) so it can be used by > > anything that knows how to handle larger than 32-bit second values > > (either 64-bit kernel or 32-bit with inode_time patch). > > Right, but the issue is that 64 bit second counters are broken right > now because most filesystems can't support more than 32 bit values. > So it doesn't matter whether it's 32 bit or 64 bit machines, just > adding explicit support for >32 bit second counters without doing > anything else just extends that brokenness into the indefinite > future. Of course, "most filesystems" are obsolete, and most of the modern file systems already support >32 bit timestamps: ext4, btrfs, cifs, f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else except xfs, ext2/3 and exofs uses the nfsv3 interpretation on 64-bit systems, which interprets time stamps with the high bit set as years 2038-2106 rather than 1903-1969. > If we don't fix it now (i.e in the new user API and supporting > infrastructure), then we'll *never be able to fix it* and we'll be > stuck with timestamps that do really weird things when you pass > arbitrary future dates to the kernel. We already have that. I agree it's fixable and we should fix it, but I don't see how this is different from what we had 20 years ago when Linux on Alpha first introduced a 64-bit time_t. It's been this way on every 64-bit Linux system since. > > This is how ext4 does it (I mean > > the sizeof() trick, not the bit stuffing they do): > .... > > I guess if there is general agreement on introducing 'struct inode_time', > > we can skip that intermediate step. > > Also, I don't like the concept of having filesystems that will work > on 64 bit but not 32 bit machines. Over the past 10 years, we've > managed to remove most of those differences from the VFS and XFS, > so adding new distinctions between 32/64 bit machines is not the > direction I want to head in. > > As it is, I'm expecting to do this only after the struct inode_time > and the superblock "time range" infrastructure have been added to > the kernel and VFS. If that change is not made, then we've still > only got 32 bit time.... Ok. > > Do you have to manually change it in the > > superblock? Since most of the time I'd suspect you wouldn't actually > > use it for the foreseeable future, would it make sense to have a mount > > option that allows it to be set, but doesn't actually change the > > superblock until the first inode gets written with a nonzero epoch? > > Yes, we could set the flag on the first timestamp that goes beyond > the current epoch, but that has two problems: > > 1. filesystem silently becomes incompatible with older > kernels so failed upgrade rollbacks become problematic; and > > 2. It adds unecessary complexity, as this will end up being > the default behaviour for all new filesystems within a year. > Then we end up with a mount option and conversion functions > that never get used but we have to support for years.... > > > That way, you'd still be able to mount it with an older kernel but > > also be forward compatible with time moving on. > > We've got plenty of time to roll this out so I don't see any need > for putting in place temporary support mechanisms that unnecessarily > complicate the code. Ok, fair enough. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-03 7:33 ` Arnd Bergmann @ 2014-06-03 8:41 ` Dave Chinner 2014-06-03 9:16 ` Arnd Bergmann 0 siblings, 1 reply; 71+ messages in thread From: Dave Chinner @ 2014-06-03 8:41 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote: > On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote: > > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: > > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote: > > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > > My patch set > > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave > > > more like 64-bit kernels regarding inode time stamps, which does > > > impact all the file systems that the a 64-bit time or the NFS > > > unsigned epoch (1970-2106), while your patch extends the file > > > system internal epoch (1901-2038 for XFS) so it can be used by > > > anything that knows how to handle larger than 32-bit second values > > > (either 64-bit kernel or 32-bit with inode_time patch). > > > > Right, but the issue is that 64 bit second counters are broken right > > now because most filesystems can't support more than 32 bit values. > > So it doesn't matter whether it's 32 bit or 64 bit machines, just > > adding explicit support for >32 bit second counters without doing > > anything else just extends that brokenness into the indefinite > > future. > > Of course, "most filesystems" are obsolete, and most of the modern > file systems already support >32 bit timestamps: ext4, btrfs, cifs, > f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else > except xfs, ext2/3 and exofs uses the nfsv3 interpretation on > 64-bit systems, which interprets time stamps with the high bit > set as years 2038-2106 rather than 1903-1969. I'm not sure that's an entirely correct representation - the remainder of the 32 bit-only timestamp filesystems don't actively interpret the time stamp at all - it's just an opaque 32 bit value. hence the interpretation of the value is dependent on whether the kernel treats it as signed or unsigned.... > > infrastructure), then we'll *never be able to fix it* and we'll be > > stuck with timestamps that do really weird things when you pass > > arbitrary future dates to the kernel. > > We already have that. I agree it's fixable and we should fix it, > but I don't see how this is different from what we had 20 years > ago when Linux on Alpha first introduced a 64-bit time_t. It's > been this way on every 64-bit Linux system since. I see it differently: we've got 20 years more experience than when the 64 bit time_t was introduced. That experience tells us that best practices for API design are to range check every input to prevent unintended side effects from occurring due to out-of-range data.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 11/32] xfs: convert to struct inode_time 2014-06-03 8:41 ` Dave Chinner @ 2014-06-03 9:16 ` Arnd Bergmann 0 siblings, 0 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-03 9:16 UTC (permalink / raw) To: Dave Chinner Cc: linux-arch, linux-kernel, lftan, hch, john.stultz, H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph On Tuesday 03 June 2014 18:41:30 Dave Chinner wrote: > On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote: > > On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote: > > > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: > > > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote: > > > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > > > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > > > My patch set > > > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave > > > > more like 64-bit kernels regarding inode time stamps, which does > > > > impact all the file systems that the a 64-bit time or the NFS > > > > unsigned epoch (1970-2106), while your patch extends the file > > > > system internal epoch (1901-2038 for XFS) so it can be used by > > > > anything that knows how to handle larger than 32-bit second values > > > > (either 64-bit kernel or 32-bit with inode_time patch). > > > > > > Right, but the issue is that 64 bit second counters are broken right > > > now because most filesystems can't support more than 32 bit values. > > > So it doesn't matter whether it's 32 bit or 64 bit machines, just > > > adding explicit support for >32 bit second counters without doing > > > anything else just extends that brokenness into the indefinite > > > future. > > > > Of course, "most filesystems" are obsolete, and most of the modern > > file systems already support >32 bit timestamps: ext4, btrfs, cifs, > > f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else > > except xfs, ext2/3 and exofs uses the nfsv3 interpretation on > > 64-bit systems, which interprets time stamps with the high bit > > set as years 2038-2106 rather than 1903-1969. > > I'm not sure that's an entirely correct representation - the > remainder of the 32 bit-only timestamp filesystems don't actively > interpret the time stamp at all - it's just an opaque 32 bit value. > hence the interpretation of the value is dependent on whether the > kernel treats it as signed or unsigned.... As I mentioned elsewhere in the thread, I don't the way it's handled is intentional, but it's definitely the file system code that does the assignment to the timeval and decides on the interpretation, doing either inode->i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode.mtime); or inode->i_mtime.tv_sec = le32_to_cpu(raw_inode.mtime); Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann 2014-05-30 20:01 ` [RFC 11/32] xfs: convert to struct inode_time Arnd Bergmann @ 2014-05-31 14:30 ` Vyacheslav Dubeyko 2014-06-03 12:21 ` Arnd Bergmann 2014-05-31 14:51 ` Richard Cochran 2014-06-02 13:52 ` Joseph S. Myers 3 siblings, 1 reply; 71+ messages in thread From: Vyacheslav Dubeyko @ 2014-05-31 14:30 UTC (permalink / raw) To: Arnd Bergmann Cc: hch, linux-mtd, hpa, logfs, linux-afs, joseph, linux-arch, linux-cifs, linux-scsi, ceph-devel, codalist, cluster-devel, coda, geert, linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs Hi Arnd, On Fri, 2014-05-30 at 22:01 +0200, Arnd Bergmann wrote: [snip] > > Arnd Bergmann (32): > fs: introduce new 'struct inode_time' > uapi: add struct __kernel_timespec{32,64} > fs: introduce sys_utimens64at > fs: introduce sys_newfstat64/sys_newfstatat64 > arch: hook up new stat and utimes syscalls > isofs: fix timestamps beyond 2027 > fs/nfs: convert to struct inode_time > fs/ceph: convert to 'struct inode_time' > fs/pstore: convert to struct inode_time > fs/coda: convert to struct inode_time > xfs: convert to struct inode_time > btrfs: convert to struct inode_time > ext3: convert to struct inode_time > ext4: convert to struct inode_time > cifs: convert to struct inode_time > ntfs: convert to struct inode_time > ubifs: convert to struct inode_time > ocfs2: convert to struct inode_time > fs/fat: convert to struct inode_time > afs: convert to struct inode_time > udf: convert to struct inode_time > fs: convert simple fs to inode_time > logfs: convert to struct inode_time > hfs, hfsplus: convert to struct inode_time > gfs2: convert to struct inode_time > reiserfs: convert to struct inode_time > jffs2: convert to struct inode_time > adfs: convert to struct inode_time > f2fs: convert to struct inode_time > fuse: convert to struct inode_time > scsi: fnic: use current_kernel_time() for timestamp > fs: use new inode_time definition unconditionally > By the way, what about NILFS2? Is NILFS2 ready for suggested approach without any changes? Thanks, Vyacheslav Dubeyko. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-05-31 14:30 ` [RFC 00/32] making inode time stamps y2038 ready Vyacheslav Dubeyko @ 2014-06-03 12:21 ` Arnd Bergmann 0 siblings, 0 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-03 12:21 UTC (permalink / raw) To: Vyacheslav Dubeyko Cc: hch, linux-mtd, hpa, logfs, linux-afs, joseph, linux-arch, linux-cifs, linux-scsi, ceph-devel, codalist, cluster-devel, coda, geert, linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Saturday 31 May 2014 18:30:49 Vyacheslav Dubeyko wrote: > By the way, what about NILFS2? Is NILFS2 ready for suggested approach > without any changes? nilfs2 and a lot of other file systems don't need any changes for this, because they don't assign the inode time stamp fields to a 'struct timespec'. FWIW, nilfs2 uses a 64-bit seconds value, which is always safe and can represent the full range of user space timespec on all machines. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann 2014-05-30 20:01 ` [RFC 11/32] xfs: convert to struct inode_time Arnd Bergmann 2014-05-31 14:30 ` [RFC 00/32] making inode time stamps y2038 ready Vyacheslav Dubeyko @ 2014-05-31 14:51 ` Richard Cochran [not found] ` <6347520.8jMPlVsFjM@wuerfel> 2014-06-02 13:52 ` Joseph S. Myers 3 siblings, 1 reply; 71+ messages in thread From: Richard Cochran @ 2014-05-31 14:51 UTC (permalink / raw) To: Arnd Bergmann Cc: hch, linux-mtd, hpa, linux-f2fs-devel, ceph-devel, joseph, linux-arch, linux-cifs, linux-scsi, codalist, cluster-devel, coda, geert, linux-ext4, linux-afs, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan, ocfs2-devel On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote: > > I picked this because it is a fairly isolated problem, as the > inode time stamps are rarely assigned to any other time values. > As a byproduct of this work, I documented for each of the file > systems we support how long the on-disk format can work[1]. Why are some of the time stamp expiration dates marked as "never"? Thanks, Richard _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
[parent not found: <6347520.8jMPlVsFjM@wuerfel>]
* Re: [RFC 00/32] making inode time stamps y2038 ready [not found] ` <6347520.8jMPlVsFjM@wuerfel> @ 2014-05-31 16:20 ` Geert Uytterhoeven 2014-05-31 18:22 ` Richard Cochran 2014-06-01 4:44 ` Richard Cochran 2 siblings, 0 replies; 71+ messages in thread From: Geert Uytterhoeven @ 2014-05-31 16:20 UTC (permalink / raw) To: Arnd Bergmann Cc: Christoph Hellwig, MTD Maling List, H. Peter Anvin, linux-f2fs-devel, ceph-devel, Joseph S. Myers, Linux-Arch, linux-cifs, scsi, codalist, cluster-devel, coda, linux-ext4@vger.kernel.org, linux-afs, fuse-devel, Richard Cochran, reiserfs-devel, xfs, John Stultz, Thomas Gleixner, open list:NFS, SUNRPC, AND..., linux-ntfs-dev, samba-technical, linux-kernel@vger.kernel.org, logfs, linux-btrfs, Linux FS Devel, Ley Foon Tan, ocfs2-devel On Sat, May 31, 2014 at 5:23 PM, Arnd Bergmann <arnd@arndb.de> wrote: > On Saturday 31 May 2014 16:51:15 Richard Cochran wrote: >> On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote: >> > I picked this because it is a fairly isolated problem, as the >> > inode time stamps are rarely assigned to any other time values. >> > As a byproduct of this work, I documented for each of the file >> > systems we support how long the on-disk format can work[1]. >> >> Why are some of the time stamp expiration dates marked as "never"? > > It's an approximation: > with 64-bit timestamps, you can represent close to 300 billion > years, which is way past the time that our planet can sustain > life of any form[1]. FWIW, the 48-bit second limit of befs marked never happens sooner than the 32-bit day limit of affs marked as Y11760870. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready [not found] ` <6347520.8jMPlVsFjM@wuerfel> 2014-05-31 16:20 ` Geert Uytterhoeven @ 2014-05-31 18:22 ` Richard Cochran 2014-05-31 19:34 ` H. Peter Anvin 2014-06-01 4:44 ` Richard Cochran 2 siblings, 1 reply; 71+ messages in thread From: Richard Cochran @ 2014-05-31 18:22 UTC (permalink / raw) To: Arnd Bergmann Cc: hch, linux-mtd, hpa, linux-f2fs-devel, ceph-devel, joseph, linux-arch, linux-cifs, linux-scsi, codalist, cluster-devel, coda, geert, linux-ext4, linux-afs, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan, ocfs2-devel On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote: > > It's an approximation: (Approximately never ;) > with 64-bit timestamps, you can represent close to 300 billion > years, which is way past the time that our planet can sustain > life of any form[1]. Did you mean mean 64 bits worth of seconds? 2^64 / (3600*24*365) = 584,942,417,355 That is more than 300 billion years, and still, it is not quite the same as "never". In any case, that term is not too helpful in the comparison table, IMHO. One could think that some sort of clever running count relative to the last mount time was implied. Thanks, Richard [1] You are forgetting the immortal robotic overlords. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-05-31 18:22 ` Richard Cochran @ 2014-05-31 19:34 ` H. Peter Anvin 2014-06-01 4:46 ` Richard Cochran 0 siblings, 1 reply; 71+ messages in thread From: H. Peter Anvin @ 2014-05-31 19:34 UTC (permalink / raw) To: Richard Cochran, Arnd Bergmann Cc: hch, linux-mtd, linux-f2fs-devel, ceph-devel, joseph, linux-arch, linux-cifs, linux-scsi, linux-afs, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan, ocfs2-devel Typically they are using 64-bit signed seconds. On May 31, 2014 11:22:37 AM PDT, Richard Cochran <richardcochran@gmail.com> wrote: >On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote: >> >> It's an approximation: > >(Approximately never ;) > >> with 64-bit timestamps, you can represent close to 300 billion >> years, which is way past the time that our planet can sustain >> life of any form[1]. > >Did you mean mean 64 bits worth of seconds? > > 2^64 / (3600*24*365) = 584,942,417,355 > >That is more than 300 billion years, and still, it is not quite the >same as "never". > >In any case, that term is not too helpful in the comparison table, >IMHO. One could think that some sort of clever running count relative >to the last mount time was implied. > >Thanks, >Richard > >[1] You are forgetting the immortal robotic overlords. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-05-31 19:34 ` H. Peter Anvin @ 2014-06-01 4:46 ` Richard Cochran 0 siblings, 0 replies; 71+ messages in thread From: Richard Cochran @ 2014-06-01 4:46 UTC (permalink / raw) To: H. Peter Anvin Cc: hch, linux-mtd, linux-f2fs-devel, ceph-devel, joseph, linux-arch, linux-cifs, linux-scsi, linux-afs, cluster-devel, coda, geert, linux-ext4, codalist, Arnd Bergmann, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan, ocfs2-devel On Sat, May 31, 2014 at 12:34:12PM -0700, H. Peter Anvin wrote: > Typically they are using 64-bit signed seconds. Okay, that is what I wanted to know. Thanks, Richard _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready [not found] ` <6347520.8jMPlVsFjM@wuerfel> 2014-05-31 16:20 ` Geert Uytterhoeven 2014-05-31 18:22 ` Richard Cochran @ 2014-06-01 4:44 ` Richard Cochran 2 siblings, 0 replies; 71+ messages in thread From: Richard Cochran @ 2014-06-01 4:44 UTC (permalink / raw) To: Arnd Bergmann Cc: hch, linux-mtd, hpa, linux-f2fs-devel, ceph-devel, joseph, linux-arch, linux-cifs, linux-scsi, codalist, cluster-devel, coda, geert, linux-ext4, linux-afs, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan, ocfs2-devel On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote: > On Saturday 31 May 2014 16:51:15 Richard Cochran wrote: > > > > Why are some of the time stamp expiration dates marked as "never"? > > It's an approximation: Also, the term "never" might mean using arbitrarily long integers as in ASN.1. Thanks, Richard _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann ` (2 preceding siblings ...) 2014-05-31 14:51 ` Richard Cochran @ 2014-06-02 13:52 ` Joseph S. Myers 2014-06-02 19:19 ` Arnd Bergmann 3 siblings, 1 reply; 71+ messages in thread From: Joseph S. Myers @ 2014-06-02 13:52 UTC (permalink / raw) To: Arnd Bergmann Cc: hch, linux-mtd, hpa, logfs, linux-afs, linux-arch, linux-cifs, linux-scsi, ceph-devel, codalist, cluster-devel, coda, geert, linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Fri, 30 May 2014, Arnd Bergmann wrote: > a) is this the right approach in general? The previous discussion > pointed this way, but there may be other opinions. The syscall changes seem like the sort of thing I'd expect, although patches adding new syscalls or otherwise affecting the kernel/userspace interface (as opposed to those relating to an individual filesystem) should go to linux-api as well as other relevant lists. -- Joseph S. Myers joseph@codesourcery.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-02 13:52 ` Joseph S. Myers @ 2014-06-02 19:19 ` Arnd Bergmann 2014-06-02 19:26 ` H. Peter Anvin 2014-06-02 21:02 ` Joseph S. Myers 0 siblings, 2 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 19:19 UTC (permalink / raw) To: Joseph S. Myers Cc: hch, linux-mtd, hpa, logfs, linux-afs, linux-arch, linux-cifs, linux-scsi, ceph-devel, codalist, cluster-devel, coda, geert, linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote: > On Fri, 30 May 2014, Arnd Bergmann wrote: > > > a) is this the right approach in general? The previous discussion > > pointed this way, but there may be other opinions. > > The syscall changes seem like the sort of thing I'd expect, although > patches adding new syscalls or otherwise affecting the kernel/userspace > interface (as opposed to those relating to an individual filesystem) > should go to linux-api as well as other relevant lists. Ok. Sorry about missing linux-api, I confused it with linux-arch, which may not be as relevant here, except for the one question whether we actually want to have the new ABI on all 32-bit architectures or only as an opt-in for those that expect to stay around for another 24 years. Two more questions for you: - are you (and others) happy with adding this type of stat syscall (fstatat64/fstat64) as opposed to the more generic xstat that has been discussed in the past and that never made it through the bike- shedding discussion? - once we have enough buy-in from reviewers to merge this initial series, should we proceed to define rest of the syscall ABI (minus driver ioctls) so glibc and kernel can do the conversion on top of that, or should we better try to do things one syscall family at a time and actually get the kernel to handle them correctly internally? Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-02 19:19 ` Arnd Bergmann @ 2014-06-02 19:26 ` H. Peter Anvin 2014-06-02 19:55 ` Arnd Bergmann 2014-06-02 21:02 ` Joseph S. Myers 1 sibling, 1 reply; 71+ messages in thread From: H. Peter Anvin @ 2014-06-02 19:26 UTC (permalink / raw) To: Arnd Bergmann, Joseph S. Myers Cc: hch, linux-mtd, logfs, linux-afs, linux-arch, linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On 06/02/2014 12:19 PM, Arnd Bergmann wrote: > On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote: >> On Fri, 30 May 2014, Arnd Bergmann wrote: >> >>> a) is this the right approach in general? The previous discussion >>> pointed this way, but there may be other opinions. >> >> The syscall changes seem like the sort of thing I'd expect, although >> patches adding new syscalls or otherwise affecting the kernel/userspace >> interface (as opposed to those relating to an individual filesystem) >> should go to linux-api as well as other relevant lists. > > Ok. Sorry about missing linux-api, I confused it with linux-arch, which > may not be as relevant here, except for the one question whether we > actually want to have the new ABI on all 32-bit architectures or only > as an opt-in for those that expect to stay around for another 24 years. > > Two more questions for you: > > - are you (and others) happy with adding this type of stat syscall > (fstatat64/fstat64) as opposed to the more generic xstat that has > been discussed in the past and that never made it through the bike- > shedding discussion? > > - once we have enough buy-in from reviewers to merge this initial > series, should we proceed to define rest of the syscall ABI > (minus driver ioctls) so glibc and kernel can do the conversion > on top of that, or should we better try to do things one syscall > family at a time and actually get the kernel to handle them > correctly internally? > The bit that is really going to hurt is every single ioctl that uses a timespec. Honestly, though, I really don't understand the point with "struct inode_time". It seems like the zeroeth-order thing is to change the kernel internal version of struct timespec to have a 64-bit time... it isn't just about inodes. We then should be explicit about the external uses of time, and use accessors. -hpa _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-02 19:26 ` H. Peter Anvin @ 2014-06-02 19:55 ` Arnd Bergmann 2014-06-02 21:57 ` H. Peter Anvin 0 siblings, 1 reply; 71+ messages in thread From: Arnd Bergmann @ 2014-06-02 19:55 UTC (permalink / raw) To: H. Peter Anvin Cc: hch, linux-mtd, logfs, linux-afs, Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Monday 02 June 2014 12:26:22 H. Peter Anvin wrote: > On 06/02/2014 12:19 PM, Arnd Bergmann wrote: > > On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote: > >> On Fri, 30 May 2014, Arnd Bergmann wrote: > >> > >>> a) is this the right approach in general? The previous discussion > >>> pointed this way, but there may be other opinions. > >> > >> The syscall changes seem like the sort of thing I'd expect, although > >> patches adding new syscalls or otherwise affecting the kernel/userspace > >> interface (as opposed to those relating to an individual filesystem) > >> should go to linux-api as well as other relevant lists. > > > > Ok. Sorry about missing linux-api, I confused it with linux-arch, which > > may not be as relevant here, except for the one question whether we > > actually want to have the new ABI on all 32-bit architectures or only > > as an opt-in for those that expect to stay around for another 24 years. > > > > Two more questions for you: > > > > - are you (and others) happy with adding this type of stat syscall > > (fstatat64/fstat64) as opposed to the more generic xstat that has > > been discussed in the past and that never made it through the bike- > > shedding discussion? > > > > - once we have enough buy-in from reviewers to merge this initial > > series, should we proceed to define rest of the syscall ABI > > (minus driver ioctls) so glibc and kernel can do the conversion > > on top of that, or should we better try to do things one syscall > > family at a time and actually get the kernel to handle them > > correctly internally? > > > > The bit that is really going to hurt is every single ioctl that uses a > timespec. > > Honestly, though, I really don't understand the point with "struct > inode_time". It seems like the zeroeth-order thing is to change the > kernel internal version of struct timespec to have a 64-bit time... it > isn't just about inodes. We then should be explicit about the external > uses of time, and use accessors. I picked these because they are fairly isolated from all other uses, in particular since inode times are the only things where we really care about times in the distant past or future (decades away as opposed to things that happened between boot and shutdown). For other kernel-internal uses, we may be better off migrating to a completely different representation, such as nanoseconds since boot or the architecture specific ktime_t, but this is really something to decide for each subsystem. I just tried building an arm32 kernel with a s64 time_t, and that failed horribly, I get linker errors for missing 64-bit divides and lots of warnings for code that expects time_t pointers to functions taking a 'long' or vice versa. I also think the only way to maintain ABI compatibility is to separate the internal uses from the interface, which means auditing all code in the end. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-02 19:55 ` Arnd Bergmann @ 2014-06-02 21:57 ` H. Peter Anvin 2014-06-03 14:22 ` Arnd Bergmann 0 siblings, 1 reply; 71+ messages in thread From: H. Peter Anvin @ 2014-06-02 21:57 UTC (permalink / raw) To: Arnd Bergmann Cc: hch, linux-mtd, logfs, linux-afs, Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On 06/02/2014 12:55 PM, Arnd Bergmann wrote: >> >> The bit that is really going to hurt is every single ioctl that uses a >> timespec. >> >> Honestly, though, I really don't understand the point with "struct >> inode_time". It seems like the zeroeth-order thing is to change the >> kernel internal version of struct timespec to have a 64-bit time... it >> isn't just about inodes. We then should be explicit about the external >> uses of time, and use accessors. > > I picked these because they are fairly isolated from all other uses, > in particular since inode times are the only things where we really > care about times in the distant past or future (decades away as opposed > to things that happened between boot and shutdown). > If nothing else, I would expect to be able to set the system time to weird values for testing. So I'm not so sure I agree with that... > For other kernel-internal uses, we may be better off migrating to > a completely different representation, such as nanoseconds since > boot or the architecture specific ktime_t, but this is really something > to decide for each subsystem. Having a bunch of different time representations in the kernel seems like a real headache... -hpa _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-02 21:57 ` H. Peter Anvin @ 2014-06-03 14:22 ` Arnd Bergmann 2014-06-03 14:33 ` Joseph S. Myers 2014-06-03 21:38 ` Dave Chinner 0 siblings, 2 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-03 14:22 UTC (permalink / raw) To: H. Peter Anvin Cc: hch, linux-mtd, logfs, linux-afs, Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote: > On 06/02/2014 12:55 PM, Arnd Bergmann wrote: > >> > >> The bit that is really going to hurt is every single ioctl that uses a > >> timespec. > >> > >> Honestly, though, I really don't understand the point with "struct > >> inode_time". It seems like the zeroeth-order thing is to change the > >> kernel internal version of struct timespec to have a 64-bit time... it > >> isn't just about inodes. We then should be explicit about the external > >> uses of time, and use accessors. > > > > I picked these because they are fairly isolated from all other uses, > > in particular since inode times are the only things where we really > > care about times in the distant past or future (decades away as opposed > > to things that happened between boot and shutdown). > > > > If nothing else, I would expect to be able to set the system time to > weird values for testing. So I'm not so sure I agree with that... I think John Stultz and Thomas Gleixner have already started looking at how the timekeeping code can be updated. Once that is done, we should be able to add a functional 64-bit gettimeofday/settimeofday syscall pair. While I definitely agree this is one of the most basic things to have, it's also not an area of the kernel that is easy to change. > > For other kernel-internal uses, we may be better off migrating to > > a completely different representation, such as nanoseconds since > > boot or the architecture specific ktime_t, but this is really something > > to decide for each subsystem. > > Having a bunch of different time representations in the kernel seems > like a real headache... We already have time_t, ktime_t, timeval, timespec, compat_timespec, clock_t, cputime_t, cputime64_t, tm, nanoseconds, jiffies, jiffies64, and lots of driver or file system specific representations. I'm all for removing a bunch of these from the kernel, but my feeling is that this is one of the cases where we first have to add new ones in order to remove those that are already there. To complicate things further, we also have various times bases (realtime/utc, realtime/tai, monotonic, monotonic_raw, boottime, ...), and at least for the timespec values we pass around, it's not always obvious which one is used, of if that's the right one. We probably don't want to add a lot of new representations, and it's possible that we can change most of the internal code we have to ktime_t and then convert that to whatever user space wants at the interfaces. The possible uses I can see for non-ktime_t types in the kernel are: * inodes need 96 bit timestamps to represent the full range of values that can be stored in a file system, you made a convincing argument for that. Almost everything else can fit into 64 bit on a 32-bit kernel, in theory also on a 64-bit kernel if we want that. * A number of interfaces pass relative timespecs: nanosleep(), poll(), select(), sigtimedwait(), alarm(), futex() and probably more. There is nothing wrong with the use of timespec here, and it may be good to annotate that by using a new type (e.g. struct timeout) that is defined as compatible with the current timespec. * For new user interfaces, we need a new type such as the __kernel_timespec64 I introduced, so it doesn't clash with the normal user timespec that may be smaller, depending on the libc. * A lot of drivers will need new ioctl commands, and for drivers that just need time stamps (audio, v4l, sockets, ...) it may be more efficient and more correct to use a new timestamp_t (e.g. boot time 64-bit nanoseconds) than __kernel_timespec64, which is not normally monotonic and requires a normalization step. If we end up introducing such a type in the user interface, we can also start using it in the kernel. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-03 14:22 ` Arnd Bergmann @ 2014-06-03 14:33 ` Joseph S. Myers 2014-06-03 14:37 ` Arnd Bergmann 2014-06-03 21:38 ` Dave Chinner 1 sibling, 1 reply; 71+ messages in thread From: Joseph S. Myers @ 2014-06-03 14:33 UTC (permalink / raw) To: Arnd Bergmann Cc: hch, linux-mtd, H. Peter Anvin, logfs, linux-afs, linux-arch, linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Tue, 3 Jun 2014, Arnd Bergmann wrote: > I think John Stultz and Thomas Gleixner have already started looking > at how the timekeeping code can be updated. Once that is done, we should > be able to add a functional 64-bit gettimeofday/settimeofday syscall > pair. While I definitely agree this is one of the most basic things to > have, it's also not an area of the kernel that is easy to change. 64-bit clock_gettime / clock_settime instead of gettimeofday / settimeofday should avoid the need for the kernel to have a 64-bit version of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would need to use a combination of the syscalls if the tz pointer is non-NULL.) -- Joseph S. Myers joseph@codesourcery.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-03 14:33 ` Joseph S. Myers @ 2014-06-03 14:37 ` Arnd Bergmann 0 siblings, 0 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-03 14:37 UTC (permalink / raw) To: Joseph S. Myers Cc: hch, linux-mtd, H. Peter Anvin, logfs, linux-afs, linux-arch, linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Tuesday 03 June 2014 14:33:10 Joseph S. Myers wrote: > On Tue, 3 Jun 2014, Arnd Bergmann wrote: > > > I think John Stultz and Thomas Gleixner have already started looking > > at how the timekeeping code can be updated. Once that is done, we should > > be able to add a functional 64-bit gettimeofday/settimeofday syscall > > pair. While I definitely agree this is one of the most basic things to > > have, it's also not an area of the kernel that is easy to change. > > 64-bit clock_gettime / clock_settime instead of gettimeofday / > settimeofday should avoid the need for the kernel to have a 64-bit version > of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would > need to use a combination of the syscalls if the tz pointer is non-NULL.) Yes, that's what I meant. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-03 14:22 ` Arnd Bergmann 2014-06-03 14:33 ` Joseph S. Myers @ 2014-06-03 21:38 ` Dave Chinner 2014-06-04 15:03 ` Arnd Bergmann 1 sibling, 1 reply; 71+ messages in thread From: Dave Chinner @ 2014-06-03 21:38 UTC (permalink / raw) To: Arnd Bergmann Cc: hch, linux-mtd, H. Peter Anvin, logfs, linux-afs, Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote: > On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote: > > On 06/02/2014 12:55 PM, Arnd Bergmann wrote: > The possible uses I can see for non-ktime_t types in the kernel are: > * inodes need 96 bit timestamps to represent the full range of values > that can be stored in a file system, you made a convincing argument > for that. Almost everything else can fit into 64 bit on a 32-bit > kernel, in theory also on a 64-bit kernel if we want that. Just ot be pedantic, inodes don't *need* 96 bit timestamps - some filesystems can *support up to* 96 bit timestamps. If the kernel only supports 64 bit timestamps and that's all the kernel can represent, then the upper bits of the 96 bit on-disk inode timestamps simply remain zero. If you move the filesystem between kernels with different time ranges, then the filesystem needs to be able to tell the kernel what it's supported range is. This is where having the VFS limit the range of supported timestamps is important: the limit is the min(kernel range, filesystem range). This allows the filesystems to be indepenent of the kernel time representation, and the kernel to be independent of the physical filesystem time encoding.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-03 21:38 ` Dave Chinner @ 2014-06-04 15:03 ` Arnd Bergmann 2014-06-04 17:30 ` Nicolas Pitre 0 siblings, 1 reply; 71+ messages in thread From: Arnd Bergmann @ 2014-06-04 15:03 UTC (permalink / raw) To: Dave Chinner Cc: hch, linux-mtd, H. Peter Anvin, logfs, linux-afs, Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, ceph-devel, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Tuesday 03 June 2014, Dave Chinner wrote: > On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote: > > On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote: > > > On 06/02/2014 12:55 PM, Arnd Bergmann wrote: > > The possible uses I can see for non-ktime_t types in the kernel are: > > * inodes need 96 bit timestamps to represent the full range of values > > that can be stored in a file system, you made a convincing argument > > for that. Almost everything else can fit into 64 bit on a 32-bit > > kernel, in theory also on a 64-bit kernel if we want that. > > Just ot be pedantic, inodes don't need 96 bit timestamps - some > filesystems can *support up to* 96 bit timestamps. If the kernel > only supports 64 bit timestamps and that's all the kernel can > represent, then the upper bits of the 96 bit on-disk inode > timestamps simply remain zero. I meant the reverse: since we have file systems that can store 96-bit timestamps when using 64-bit kernels, we need to extend 32-bit kernels to have the same internal representation so we can actually read those file systems correctly. > If you move the filesystem between kernels with different time > ranges, then the filesystem needs to be able to tell the kernel what > it's supported range is. This is where having the VFS limit the > range of supported timestamps is important: the limit is the > min(kernel range, filesystem range). This allows the filesystems > to be indepenent of the kernel time representation, and the kernel > to be independent of the physical filesystem time encoding.... I agree it makes sense to let the kernel know about the limits of the file system it accesses, but for the reverse, we're probably better off just making the kernel representation large enough (i.e. 96 bits) so it can work with any known file system. We need another check at the user space boundary to turn that into a value that the user can understand, but that's another problem. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-04 15:03 ` Arnd Bergmann @ 2014-06-04 17:30 ` Nicolas Pitre 2014-06-04 19:24 ` Arnd Bergmann 0 siblings, 1 reply; 71+ messages in thread From: Nicolas Pitre @ 2014-06-04 17:30 UTC (permalink / raw) To: Arnd Bergmann Cc: hch, linux-mtd, H. Peter Anvin, linux-f2fs-devel, ceph-devel, Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, linux-afs, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan, ocfs2-devel On Wed, 4 Jun 2014, Arnd Bergmann wrote: > On Tuesday 03 June 2014, Dave Chinner wrote: > > Just ot be pedantic, inodes don't need 96 bit timestamps - some > > filesystems can *support up to* 96 bit timestamps. If the kernel > > only supports 64 bit timestamps and that's all the kernel can > > represent, then the upper bits of the 96 bit on-disk inode > > timestamps simply remain zero. > > I meant the reverse: since we have file systems that can store > 96-bit timestamps when using 64-bit kernels, we need to extend > 32-bit kernels to have the same internal representation so we > can actually read those file systems correctly. > > > If you move the filesystem between kernels with different time > > ranges, then the filesystem needs to be able to tell the kernel what > > it's supported range is. This is where having the VFS limit the > > range of supported timestamps is important: the limit is the > > min(kernel range, filesystem range). This allows the filesystems > > to be indepenent of the kernel time representation, and the kernel > > to be independent of the physical filesystem time encoding.... > > I agree it makes sense to let the kernel know about the limits > of the file system it accesses, but for the reverse, we're probably > better off just making the kernel representation large enough (i.e. > 96 bits) so it can work with any known file system. Depends... 96 bit handling may get prohibitive on 32-bit archs. The important point here is for the kernel to be able to represent the time _range_ used by any known filesystem, not necessarily the time _precision_. For example, a 64 bit representation can be made of 40 bits for seconds spanning 34865 years, and 24 bits for fractional seconds providing precision down to 60 nanosecs. That ought to be plenty good on 32 bit systems while still being cheap to handle. Nicolas _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-04 17:30 ` Nicolas Pitre @ 2014-06-04 19:24 ` Arnd Bergmann 2014-06-05 0:10 ` H. Peter Anvin 0 siblings, 1 reply; 71+ messages in thread From: Arnd Bergmann @ 2014-06-04 19:24 UTC (permalink / raw) To: Nicolas Pitre Cc: hch, linux-mtd, H. Peter Anvin, linux-f2fs-devel, ceph-devel, Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, linux-afs, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan, ocfs2-devel On Wednesday 04 June 2014 13:30:32 Nicolas Pitre wrote: > On Wed, 4 Jun 2014, Arnd Bergmann wrote: > > > On Tuesday 03 June 2014, Dave Chinner wrote: > > > Just ot be pedantic, inodes don't need 96 bit timestamps - some > > > filesystems can *support up to* 96 bit timestamps. If the kernel > > > only supports 64 bit timestamps and that's all the kernel can > > > represent, then the upper bits of the 96 bit on-disk inode > > > timestamps simply remain zero. > > > > I meant the reverse: since we have file systems that can store > > 96-bit timestamps when using 64-bit kernels, we need to extend > > 32-bit kernels to have the same internal representation so we > > can actually read those file systems correctly. > > > > > If you move the filesystem between kernels with different time > > > ranges, then the filesystem needs to be able to tell the kernel what > > > it's supported range is. This is where having the VFS limit the > > > range of supported timestamps is important: the limit is the > > > min(kernel range, filesystem range). This allows the filesystems > > > to be indepenent of the kernel time representation, and the kernel > > > to be independent of the physical filesystem time encoding.... > > > > I agree it makes sense to let the kernel know about the limits > > of the file system it accesses, but for the reverse, we're probably > > better off just making the kernel representation large enough (i.e. > > 96 bits) so it can work with any known file system. > > Depends... 96 bit handling may get prohibitive on 32-bit archs. > > The important point here is for the kernel to be able to represent the > time _range_ used by any known filesystem, not necessarily the time > _precision_. > > For example, a 64 bit representation can be made of 40 bits for seconds > spanning 34865 years, and 24 bits for fractional seconds providing > precision down to 60 nanosecs. That ought to be plenty good on 32 bit > systems while still being cheap to handle. I have checked earlier that we don't do any computation on inode time stamps in common code, we just pass them around, so there is very little runtime overhead. There is a small bit of space overhead (12 byte) per inode, but that structure is already on the order of 500 bytes. For other timekeeping stuff in the kernel, I agree that using some 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds, ...) has advantages, that's exactly the point I was making earlier against simply extending the internal time_t/timespec to 64-bit seconds for everything. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-04 19:24 ` Arnd Bergmann @ 2014-06-05 0:10 ` H. Peter Anvin 2014-06-10 9:54 ` Arnd Bergmann 0 siblings, 1 reply; 71+ messages in thread From: H. Peter Anvin @ 2014-06-05 0:10 UTC (permalink / raw) To: Arnd Bergmann, Nicolas Pitre Cc: hch, linux-mtd, linux-f2fs-devel, ceph-devel, Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, linux-afs, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan, ocfs2-devel On 06/04/2014 12:24 PM, Arnd Bergmann wrote: > > For other timekeeping stuff in the kernel, I agree that using some > 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds, > ...) has advantages, that's exactly the point I was making earlier > against simply extending the internal time_t/timespec to 64-bit > seconds for everything. > How much of a performance issue is it to make time_t 64 bits, and for the bits there are, how hard are they to fix? -hpa _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-05 0:10 ` H. Peter Anvin @ 2014-06-10 9:54 ` Arnd Bergmann 0 siblings, 0 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-10 9:54 UTC (permalink / raw) To: H. Peter Anvin Cc: Nicolas Pitre, hch, linux-mtd, linux-f2fs-devel, ceph-devel, Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, linux-afs, cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, logfs, linux-btrfs, linux-fsdevel, lftan, ocfs2-devel On Wednesday 04 June 2014 17:10:24 H. Peter Anvin wrote: > On 06/04/2014 12:24 PM, Arnd Bergmann wrote: > > > > For other timekeeping stuff in the kernel, I agree that using some > > 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds, > > ...) has advantages, that's exactly the point I was making earlier > > against simply extending the internal time_t/timespec to 64-bit > > seconds for everything. > > > > How much of a performance issue is it to make time_t 64 bits, and for > the bits there are, how hard are they to fix? Probably very little overhead for most uses, it's more the regression potential in the less common parts of the kernel I'm worried about. There is a significant but not overwhelming number of uses of the main problematic types in the kernel: arnd@wuerfel:~/arm-soc$ git grep -wl time_t | wc 188 188 5566 arnd@wuerfel:~/arm-soc$ git grep -wl timeval | wc 320 320 10353 arnd@wuerfel:~/arm-soc$ git grep -wl timespec | wc 406 406 10886 I believe we have to audit all of them anyway if we want to change the kernel to less problematic types and introduce new user interfaces. IMHO this work is helped if we change the uses to a new type as we find the problems. This lets us do the work one subsystem at a time and avoid accidental ABI changes. I don't care much what type that will be, and having a 96-bit type will certainly work well in a lot of cases, but I don't see a strong reason to use that over other types, especially when they can be more efficient. Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-02 19:19 ` Arnd Bergmann 2014-06-02 19:26 ` H. Peter Anvin @ 2014-06-02 21:02 ` Joseph S. Myers 2014-06-04 15:05 ` Arnd Bergmann 1 sibling, 1 reply; 71+ messages in thread From: Joseph S. Myers @ 2014-06-02 21:02 UTC (permalink / raw) To: Arnd Bergmann Cc: hch, linux-mtd, hpa, logfs, linux-afs, linux-arch, linux-cifs, linux-scsi, ceph-devel, codalist, cluster-devel, coda, geert, linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Mon, 2 Jun 2014, Arnd Bergmann wrote: > Ok. Sorry about missing linux-api, I confused it with linux-arch, which > may not be as relevant here, except for the one question whether we > actually want to have the new ABI on all 32-bit architectures or only > as an opt-in for those that expect to stay around for another 24 years. For glibc I think it will make the most sense to add the support for 64-bit time_t across all architectures that currently have 32-bit time_t (with the new interfaces having fallback support to implementation in terms of the 32-bit kernel interfaces, if the 64-bit syscalls are unavailable either at runtime or in the kernel headers against which glibc is compiled - this fallback code will of course need to check for overflow when passing a time value to the kernel, hopefully with error handling consistent with whatever the kernel ends up doing when a filesystem can't support a timestamp). If some architectures don't provide the new interfaces in the kernel then that will mean the fallback code in glibc can't be removed until glibc support for those architectures is removed (as opposed to removing it when glibc no longer supports kernels predating the kernel support). > Two more questions for you: > > - are you (and others) happy with adding this type of stat syscall > (fstatat64/fstat64) as opposed to the more generic xstat that has > been discussed in the past and that never made it through the bike- > shedding discussion? I am. > - once we have enough buy-in from reviewers to merge this initial > series, should we proceed to define rest of the syscall ABI > (minus driver ioctls) so glibc and kernel can do the conversion > on top of that, or should we better try to do things one syscall > family at a time and actually get the kernel to handle them > correctly internally? I don't have any comments on that ordering question. -- Joseph S. Myers joseph@codesourcery.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [RFC 00/32] making inode time stamps y2038 ready 2014-06-02 21:02 ` Joseph S. Myers @ 2014-06-04 15:05 ` Arnd Bergmann 0 siblings, 0 replies; 71+ messages in thread From: Arnd Bergmann @ 2014-06-04 15:05 UTC (permalink / raw) To: Joseph S. Myers Cc: hch, linux-mtd, hpa, logfs, linux-afs, linux-arch, linux-cifs, linux-scsi, ceph-devel, codalist, cluster-devel, coda, geert, linux-ext4, fuse-devel, reiserfs-devel, xfs, john.stultz, tglx, linux-nfs, linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel, ocfs2-devel, linux-fsdevel, lftan, linux-btrfs On Monday 02 June 2014, Joseph S. Myers wrote: > On Mon, 2 Jun 2014, Arnd Bergmann wrote: > > > Ok. Sorry about missing linux-api, I confused it with linux-arch, which > > may not be as relevant here, except for the one question whether we > > actually want to have the new ABI on all 32-bit architectures or only > > as an opt-in for those that expect to stay around for another 24 years. > > For glibc I think it will make the most sense to add the support for > 64-bit time_t across all architectures that currently have 32-bit time_t > (with the new interfaces having fallback support to implementation in > terms of the 32-bit kernel interfaces, if the 64-bit syscalls are > unavailable either at runtime or in the kernel headers against which glibc > is compiled - this fallback code will of course need to check for overflow > when passing a time value to the kernel, hopefully with error handling > consistent with whatever the kernel ends up doing when a filesystem can't > support a timestamp). If some architectures don't provide the new > interfaces in the kernel then that will mean the fallback code in glibc > can't be removed until glibc support for those architectures is removed > (as opposed to removing it when glibc no longer supports kernels predating > the kernel support). Ok, that's a good reason to just provide the new interfaces on all architectures right away. Thanks for the insight! Arnd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 71+ messages in thread
end of thread, other threads:[~2014-06-10 9:57 UTC | newest]
Thread overview: 71+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann
2014-05-30 20:01 ` [RFC 11/32] xfs: convert to struct inode_time Arnd Bergmann
2014-05-31 0:37 ` Dave Chinner
2014-05-31 0:41 ` H. Peter Anvin
2014-05-31 1:14 ` Dave Chinner
2014-05-31 1:22 ` H. Peter Anvin
2014-05-31 5:54 ` Dave Chinner
2014-05-31 8:41 ` H. Peter Anvin
2014-05-31 15:46 ` Nicolas Pitre
2014-06-01 19:56 ` Arnd Bergmann
2014-06-01 20:26 ` H. Peter Anvin
2014-06-02 11:02 ` Arnd Bergmann
2014-06-02 1:36 ` Nicolas Pitre
2014-06-02 2:22 ` Dave Chinner
2014-06-02 7:09 ` Geert Uytterhoeven
2014-06-02 10:56 ` Arnd Bergmann
2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 12:38 ` Arnd Bergmann
2014-06-02 13:15 ` Theodore Ts'o
2014-06-02 12:52 ` Arnd Bergmann
2014-06-02 13:07 ` Theodore Ts'o
2014-06-02 15:01 ` Arnd Bergmann
2014-06-02 14:52 ` H. Peter Anvin
2014-06-02 15:04 ` Chuck Lever
2014-06-02 15:31 ` Theodore Ts'o
2014-06-02 17:12 ` H. Peter Anvin
2014-06-02 18:50 ` Arnd Bergmann
2014-06-02 22:29 ` Theodore Ts'o
2014-06-02 22:32 ` H. Peter Anvin
2014-06-02 23:32 ` Theodore Ts'o
2014-06-02 23:33 ` H. Peter Anvin
2014-06-03 13:09 ` Roger Willcocks
2014-06-02 18:52 ` Arnd Bergmann
2014-06-02 18:58 ` Roger Willcocks
2014-06-02 19:04 ` Chuck Lever
2014-06-02 19:10 ` Arnd Bergmann
2014-06-01 0:39 ` Dave Chinner
2014-06-02 14:00 ` Joseph S. Myers
2014-05-31 15:37 ` Arnd Bergmann
2014-06-01 0:24 ` Dave Chinner
2014-06-02 0:28 ` Dave Chinner
2014-06-02 11:35 ` Roger Willcocks
2014-06-02 11:43 ` Arnd Bergmann
2014-06-03 0:32 ` Dave Chinner
2014-06-03 7:33 ` Arnd Bergmann
2014-06-03 8:41 ` Dave Chinner
2014-06-03 9:16 ` Arnd Bergmann
2014-05-31 14:30 ` [RFC 00/32] making inode time stamps y2038 ready Vyacheslav Dubeyko
2014-06-03 12:21 ` Arnd Bergmann
2014-05-31 14:51 ` Richard Cochran
[not found] ` <6347520.8jMPlVsFjM@wuerfel>
2014-05-31 16:20 ` Geert Uytterhoeven
2014-05-31 18:22 ` Richard Cochran
2014-05-31 19:34 ` H. Peter Anvin
2014-06-01 4:46 ` Richard Cochran
2014-06-01 4:44 ` Richard Cochran
2014-06-02 13:52 ` Joseph S. Myers
2014-06-02 19:19 ` Arnd Bergmann
2014-06-02 19:26 ` H. Peter Anvin
2014-06-02 19:55 ` Arnd Bergmann
2014-06-02 21:57 ` H. Peter Anvin
2014-06-03 14:22 ` Arnd Bergmann
2014-06-03 14:33 ` Joseph S. Myers
2014-06-03 14:37 ` Arnd Bergmann
2014-06-03 21:38 ` Dave Chinner
2014-06-04 15:03 ` Arnd Bergmann
2014-06-04 17:30 ` Nicolas Pitre
2014-06-04 19:24 ` Arnd Bergmann
2014-06-05 0:10 ` H. Peter Anvin
2014-06-10 9:54 ` Arnd Bergmann
2014-06-02 21:02 ` Joseph S. Myers
2014-06-04 15:05 ` Arnd Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox