linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Kleikamp <shaggy@austin.ibm.com>
To: Takashi Sato <sho@bsd.tnes.nec.co.jp>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: stat64 for over 2TB file returned invalid st_blocks
Date: Thu, 01 Dec 2005 08:32:19 -0600	[thread overview]
Message-ID: <1133447539.8557.14.camel@kleikamp.austin.ibm.com> (raw)
In-Reply-To: <01e901c5f66e$d4551b70$4168010a@bsd.tnes.nec.co.jp>

On Thu, 2005-12-01 at 21:00 +0900, Takashi Sato wrote:
> Hi all,
> 
> I found a problem at stat64 on 32bit architecture.
> 
> When I called stat64 for a file which is larger than 2TB, stat64
> returned an invalid number of blocks at st_blocks on 32bit
> architecture, although it returned a valid number of blocks on 64bit
> architecture(ia64).

For jfs, it's a bigger problem than just stat64.  When writing the inode
to disk, jfs calculates the number of blocks from the 32-bit value:
	dip->di_nblocks = cpu_to_le64(PBLK2LBLK(ip->i_sb, ip->i_blocks))

So it won't only report the wrong number of blocks, but it will actually
store the wrong number.  :-(

> The following describes the cause of this issue:
> i_blocks in inode is 4bytes on 32bit architecture.  If it receives
> more than 2^32 number of blocks, it would overflow and set an
> invalid number to st_blocks.
> 
> Below describes a sequence of setting overflowed inode.i_blocks
> to st_blocks through stat64.
> 
> 1. generic_fillattr(struct inode *inode, struct kstat *stat)
>   - Copy data from overflowed inode.i_blocks to kstat.blocks.
> 
> 2. vfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
>         struct kstat *stat)
>   - Return invalid kstat.blocks to sys_stat64().
> 
> 3. sys_stat64(char __user * filename, struct stat64 __user * statbuf)
>   - Copy data from invalid kstat.blocks to stat64.st_blocks.
> 
> I also found the following problem.
> 
> - ioctl with FIOQSIZE command returns the size of file's data which
>   has written to disk.  The size of file's data is calculated as
>   follows in inode_get_bytes().
>    
>    (((loff_t)inode->i_blocks) << 9) + inode->i_bytes
> 
>    On the file which is larger than 2TB, the ioctl will return an
>    invalid size because i_blocks can't express the right number of
>    blocks.
> 
> I think the following modification is essential to fix these
> problems.
> 
> 1. Change the type of inode.i_blocks and kstat.blocks from unsigned
>    long to unsigned long long.

This would be okay.

> 2. Change the type of architecture dependent stat64.st_blocks in
>    include/asm/asm-*/stat.h from unsigned long to unsigned long long.
>    I tried modifying only stat64 of 32bit architecture
>    (include/asm-i386/stat.h).

This changes the API, but the structure does suggest that the 4-byte pad
should be used for the high-order bytes of st_blocks, so that's not
really a problem.  A correct fix would replace __pad4 with
st_blocks_high (or something like that) and ensure that the high-order
word was stored there.  Your proposed fix would only be correct on
little-endian hardware, as Jörn pointed out.

> I have some tested for a file whose size is 3TB on JFS filesystem.
> The following is the patch.
> 
> Signed-off-by: Takashi Sato <sho@bsd.tnes.nec.co.jp>
> 
> diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
> g/include/asm-i386/stat.h linux-2.6.14-blocks/include/asm-i386/stat.h
> --- linux-2.6.14.org/include/asm-i386/stat.h 2005-10-28 09:02:08.000000000 +0900
> +++ linux-2.6.14-blocks/include/asm-i386/stat.h 2005-11-18 22:42:37.000000000 +0900
> @@ -58,8 +58,7 @@ struct stat64 {
>   long long st_size;
>   unsigned long st_blksize;
>  
> - unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> - unsigned long __pad4;  /* future possible st_blocks high bits */
> + unsigned long long st_blocks; /* Number 512-byte blocks allocated. */
>  
>   unsigned long st_atime;
>   unsigned long st_atime_nsec;
> diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
> g/include/linux/fs.h linux-2.6.14-blocks/include/linux/fs.h
> --- linux-2.6.14.org/include/linux/fs.h 2005-10-28 09:02:08.000000000 +0900
> +++ linux-2.6.14-blocks/include/linux/fs.h 2005-11-18 17:08:03.000000000 +0900
> @@ -438,7 +438,7 @@ struct inode {
>   unsigned int  i_blkbits;
>   unsigned long  i_blksize;
>   unsigned long  i_version;
> - unsigned long  i_blocks;
> + unsigned long long i_blocks;
>   unsigned short          i_bytes;
>   spinlock_t  i_lock; /* i_blocks, i_bytes, maybe i_size */
>   struct semaphore i_sem;
> diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
> g/include/linux/stat.h linux-2.6.14-blocks/include/linux/stat.h
> --- linux-2.6.14.org/include/linux/stat.h 2005-10-28 09:02:08.000000000 +0900
> +++ linux-2.6.14-blocks/include/linux/stat.h 2005-11-18 17:08:56.000000000 +0900
> @@ -69,7 +69,7 @@ struct kstat {
>   struct timespec mtime;
>   struct timespec ctime;
>   unsigned long blksize;
> - unsigned long blocks;
> + unsigned long long blocks;
>  };
>  
>  #endif
> 
> Any feedback and comments are welcome.
> 
> Best regards, Takashi Sato
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2005-12-01 14:32 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-01 12:00 stat64 for over 2TB file returned invalid st_blocks Takashi Sato
2005-12-01 12:39 ` Jörn Engel
2005-12-01 12:52 ` Jörn Engel
2005-12-01 13:52   ` Avi Kivity
2005-12-01 14:32 ` Dave Kleikamp [this message]
2005-12-02 13:18   ` Takashi Sato
2005-12-02 14:11     ` Dave Kleikamp
2005-12-02 18:58     ` Andreas Dilger
2005-12-03 13:00       ` Takashi Sato
2005-12-05  8:11         ` Andreas Dilger
2005-12-05 12:35           ` Takashi Sato
2005-12-05 13:34           ` Trond Myklebust
2005-12-01 14:53 ` Al Viro
     [not found] <5eVqw-2ug-61@gated-at.bofh.it>
2005-12-03  2:19 ` Bodo Eggert
  -- strict thread matches above, loose matches on Subject: below --
2005-12-06 12:42 Takashi Sato
2005-12-06 14:30 ` Dave Kleikamp
2005-12-06 14:48   ` Trond Myklebust
2005-12-06 14:51     ` Dave Kleikamp
2005-12-06 21:24     ` Andreas Dilger
2005-12-07  0:59       ` Trond Myklebust
2005-12-07 10:57       ` Takashi Sato
2005-12-07 13:52         ` Trond Myklebust
2005-12-07 15:01           ` Dave Kleikamp
2005-12-07 15:34             ` Trond Myklebust
2005-12-07 16:34               ` Dave Kleikamp
2005-12-07 18:55                 ` Trond Myklebust
2005-12-08 11:38                 ` Takashi Sato
2005-12-08 14:27                   ` Trond Myklebust
2005-12-08 14:50                     ` Anton Altaparmakov
2005-12-08 15:03                       ` Trond Myklebust
2005-12-10 11:22                     ` Takashi Sato

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1133447539.8557.14.camel@kleikamp.austin.ibm.com \
    --to=shaggy@austin.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sho@bsd.tnes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).