All of lore.kernel.org
 help / color / mirror / Atom feed
From: "majianpeng" <majianpeng@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Re: the max size of block device on 32bit os,when usingdo_generic_file_read() proceed.
Date: Mon, 28 May 2012 14:26:27 +0800	[thread overview]
Message-ID: <201205281426238284699@gmail.com> (raw)
In-Reply-To: 201205242138175936268@gmail.com

Sorry for late to reply.I reviewed the code again and found some probleam.
I created a soft-raid and the size was larger than 16T.
The os is ubuntu 12.04 32bit x86.
The udev create the block node is /dev dir(as tmpfs).
And I readed the tmpfs code :
in mm/shmem.c:shmem_fill_super()
>sb->s_maxbytes = MAX_LFS_FILESIZE;
In my computer, MAX_LFS_FILESZE is equal 8T -1.
But the read code:
generic_file_aio_read-->do_generic_file_read[not use direct flag
In function:do_generic_file_read():
>index = *ppos >> PAGE_CACHE_SHIFT;
index is the type of pgoff_t.
So if  *ppos is larger than 16T, the index is overflow.As you said, it will read low position data.

But I tested the write operation:
blkdev_aio_write->__generic_file_aio_write.
In function:__generic_file_aio_write()
It will check by function:generic_write_checks()
But In function
>if (likely(!isblk)) {
>		if (unlikely(*pos >= inode->i_sb->s_maxbytes)) {
>			if (*count || *pos > inode->i_sb->s_maxbytes) {
>				return -EFBIG;
>			}
>			/* zero-length writes at ->s_maxbytes are OK */
>		}

>		if (unlikely(*pos + *count > inode->i_sb->s_maxbytes))
>			*count = inode->i_sb->s_maxbytes - *pos;
>	} else {
>#ifdef CONFIG_BLOCK
>		loff_t isize;
>		if (bdev_read_only(I_BDEV(inode)))
>			return -EPERM;
>		isize = i_size_read(inode);
>		if (*pos >= isize) {
>			if (*count || *pos > isize)
>				return -ENOSPC;
>		}

>		if (*pos + *count > isize)
>			*count = isize - *pos;
>#else
>		return -EPERM;
>#endif
Although it check (s_maxbytes)MAX_LFS_FILESIZE.But is file is block device,it did not check,it only check the real size.
But there is also a bug.Because if block size > 16T,there was not error and execed continue.
When exec generic_file_buffered_write()[no odriect action] --->generic_perform_write-->write_begin[blkdev_write_begin]
--->block_write_begin
In function:block_write_begin()
>pgoff_t index = pos >> PAGE_CACHE_SHIFT;
index will overflow.

I once thought to patch those bug(I may be well-known ,haha).But I can't,as is generic_write_checks():
>/*
>	 * Are we about to exceed the fs block limit ?
>	 *
>	 * If we have written data it becomes a short write.  If we have
>	 * exceeded without writing data we send a signal and return EFBIG.
>	 * Linus frestrict idea will clean these up nicely..
>	 */
>	if (likely(!isblk)) {
how to deal with block? As a regular file or not?
						



------------------				 
majianpeng
2012-05-28

-------------------------------------------------------------
发件人:Hugh Dickins
发送日期:2012-05-27 05:24:13
收件人:majianpeng
抄送:Al Viro; Andrew Morton; linux-mm; linux-fsdevel
主题:Re: the max size of block device on 32bit os,when usingdo_generic_file_read() proceed.

On Thu, 24 May 2012, majianpeng wrote:
>   Hi all:
> 		I readed a raid5,which size 30T.OS is RHEL6 32bit.
> 	    I reaed the raid5(as a whole,not parted) and found read address which not i wanted.
> 		So I tested the newest kernel code,the problem is still.
> 		I review the code, in function do_generic_file_read()
> 
> 		index = *ppos >> PAGE_CACHE_SHIFT;
> 		index is u32.and *ppos is long long.
> 		So when *ppos is larger than 0xFFFF FFFF *  PAGE_CACHE_SHIFT(16T Byte),then the index is error.
> 
> 		I wonder this .In 32bit os ,block devices size do not large then 16T,in other words, if block devices larger than 16T,must parted.

I am not surprised that the page cache limitation prevents you from
reading the whole device with a 32-bit kernel.  See MAX_LFS_FILESIZE in
include/linux/fs.h.  Our answer to that is just to use a 64-bit kernel.

#if BITS_PER_LONG==32
#define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) 
#elif BITS_PER_LONG==64
#define MAX_LFS_FILESIZE 0x7fffffffffffffffUL
#endif

But I am a little surprised that you get as far as 16TiB (with 4k page):
I would have expected you to be stopped just before 8TiB (although I
suspect that the limitation to 8TiB rather than 16TiB is unnecessary).

And if I understand you correctly, read() or pread() gave you no error
at those large offsets, but supplied data from the low offset instead?

That does surprise me - have we missed a check there?

Hugh

  parent reply	other threads:[~2012-05-28  6:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-24 13:38 the max size of block device on 32bit os,when using do_generic_file_read() proceed majianpeng
2012-05-26 21:23 ` Hugh Dickins
2012-05-28  6:26 ` majianpeng [this message]
2012-05-28  6:26 ` Re: the max size of block device on 32bit os,when usingdo_generic_file_read() proceed majianpeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201205281426238284699@gmail.com \
    --to=majianpeng@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.