From: Ni zhan Chen <nizhan.chen@gmail.com>
To: YingHang Zhu <casualfisher@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>,
akpm@linux-foundation.org, Fengguang Wu <fengguang.wu@intel.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
Date: Thu, 25 Oct 2012 10:12:01 +0800 [thread overview]
Message-ID: <50889FF1.9030107@gmail.com> (raw)
In-Reply-To: <CAA9v8mEULAEHn8qSsFokEue3c0hy8pK8bkYB+6xOtz_Tgbp0vw@mail.gmail.com>
On 10/25/2012 10:04 AM, YingHang Zhu wrote:
> On Thu, Oct 25, 2012 at 9:50 AM, Dave Chinner <david@fromorbit.com> wrote:
>> On Thu, Oct 25, 2012 at 08:17:05AM +0800, YingHang Zhu wrote:
>>> On Thu, Oct 25, 2012 at 4:19 AM, Dave Chinner <david@fromorbit.com> wrote:
>>>> On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote:
>>>>> Hi Dave,
>>>>> On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner <david@fromorbit.com> wrote:
>>>>>> On Tue, Oct 23, 2012 at 08:46:51PM +0800, Ying Zhu wrote:
>>>>>>> Hi,
>>>>>>> Recently we ran into the bug that an opened file's ra_pages does not
>>>>>>> synchronize with it's backing device's when the latter is changed
>>>>>>> with blockdev --setra, the application needs to reopen the file
>>>>>>> to know the change,
>>>>>> or simply call fadvise(fd, POSIX_FADV_NORMAL) to reset the readhead
>>>>>> window to the (new) bdi default.
>>>>>>
>>>>>>> which is inappropriate under our circumstances.
>>>>>> Which are? We don't know your circumstances, so you need to tell us
>>>>>> why you need this and why existing methods of handling such changes
>>>>>> are insufficient...
>>>>>>
>>>>>> Optimal readahead windows tend to be a physical property of the
>>>>>> storage and that does not tend to change dynamically. Hence block
>>>>>> device readahead should only need to be set up once, and generally
>>>>>> that can be done before the filesystem is mounted and files are
>>>>>> opened (e.g. via udev rules). Hence you need to explain why you need
>>>>>> to change the default block device readahead on the fly, and why
>>>>>> fadvise(POSIX_FADV_NORMAL) is "inappropriate" to set readahead
>>>>>> windows to the new defaults.
>>>>> Our system is a fuse-based file system, fuse creates a
>>>>> pseudo backing device for the user space file systems, the default readahead
>>>>> size is 128KB and it can't fully utilize the backing storage's read ability,
>>>>> so we should tune it.
>>>> Sure, but that doesn't tell me anything about why you can't do this
>>>> at mount time before the application opens any files. i.e. you've
>>>> simply stated the reason why readahead is tunable, not why you need
>>>> to be fully dynamic.....
>>> We store our file system's data on different disks so we need to change ra_pages
>>> dynamically according to where the data resides, it can't be fixed at mount time
>>> or when we open files.
>> That doesn't make a whole lot of sense to me. let me try to get this
>> straight.
>>
>> There is data that resides on two devices (A + B), and a fuse
>> filesystem to access that data. There is a single file in the fuse
>> fs has data on both devices. An app has the file open, and when the
>> data it is accessing is on device A you need to set the readahead to
>> what is best for device A? And when the app tries to access data for
>> that file that is on device B, you need to set the readahead to what
>> is best for device B? And you are changing the fuse BDI readahead
>> settings according to where the data in the back end lies?
>>
>> It seems to me that you should be setting the fuse readahead to the
>> maximum of the readahead windows the data devices have configured at
>> mount time and leaving it at that....
> Then it may not fully utilize some device's read IO bandwidth and put too much
> burden on other devices.
>>> The abstract bdi of fuse and btrfs provides some dynamically changing
>>> bdi.ra_pages
>>> based on the real backing device. IMHO this should not be ignored.
>> btrfs simply takes into account the number of disks it has for a
>> given storage pool when setting up the default bdi ra_pages during
>> mount. This is basically doing what I suggested above. Same with
>> the generic fuse code - it's simply setting a sensible default value
>> for the given fuse configuration.
>>
>> Neither are dynamic in the sense you are talking about, though.
> Actually I've talked about it with Fengguang, he advised we should unify the
But how can bdi related ra_pages reflect different files' readahead
window? Maybe these different files are sequential read, random read and
so on.
> ra_pages in struct bdi and file_ra_state and leave the issue that
> spreading data
> across disks as it is.
> Fengguang, what's you opinion about this?
>
> Thanks,
> Ying Zhu
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Ni zhan Chen <nizhan.chen@gmail.com>
To: YingHang Zhu <casualfisher@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>,
akpm@linux-foundation.org, Fengguang Wu <fengguang.wu@intel.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
Date: Thu, 25 Oct 2012 10:12:01 +0800 [thread overview]
Message-ID: <50889FF1.9030107@gmail.com> (raw)
In-Reply-To: <CAA9v8mEULAEHn8qSsFokEue3c0hy8pK8bkYB+6xOtz_Tgbp0vw@mail.gmail.com>
On 10/25/2012 10:04 AM, YingHang Zhu wrote:
> On Thu, Oct 25, 2012 at 9:50 AM, Dave Chinner <david@fromorbit.com> wrote:
>> On Thu, Oct 25, 2012 at 08:17:05AM +0800, YingHang Zhu wrote:
>>> On Thu, Oct 25, 2012 at 4:19 AM, Dave Chinner <david@fromorbit.com> wrote:
>>>> On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote:
>>>>> Hi Dave,
>>>>> On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner <david@fromorbit.com> wrote:
>>>>>> On Tue, Oct 23, 2012 at 08:46:51PM +0800, Ying Zhu wrote:
>>>>>>> Hi,
>>>>>>> Recently we ran into the bug that an opened file's ra_pages does not
>>>>>>> synchronize with it's backing device's when the latter is changed
>>>>>>> with blockdev --setra, the application needs to reopen the file
>>>>>>> to know the change,
>>>>>> or simply call fadvise(fd, POSIX_FADV_NORMAL) to reset the readhead
>>>>>> window to the (new) bdi default.
>>>>>>
>>>>>>> which is inappropriate under our circumstances.
>>>>>> Which are? We don't know your circumstances, so you need to tell us
>>>>>> why you need this and why existing methods of handling such changes
>>>>>> are insufficient...
>>>>>>
>>>>>> Optimal readahead windows tend to be a physical property of the
>>>>>> storage and that does not tend to change dynamically. Hence block
>>>>>> device readahead should only need to be set up once, and generally
>>>>>> that can be done before the filesystem is mounted and files are
>>>>>> opened (e.g. via udev rules). Hence you need to explain why you need
>>>>>> to change the default block device readahead on the fly, and why
>>>>>> fadvise(POSIX_FADV_NORMAL) is "inappropriate" to set readahead
>>>>>> windows to the new defaults.
>>>>> Our system is a fuse-based file system, fuse creates a
>>>>> pseudo backing device for the user space file systems, the default readahead
>>>>> size is 128KB and it can't fully utilize the backing storage's read ability,
>>>>> so we should tune it.
>>>> Sure, but that doesn't tell me anything about why you can't do this
>>>> at mount time before the application opens any files. i.e. you've
>>>> simply stated the reason why readahead is tunable, not why you need
>>>> to be fully dynamic.....
>>> We store our file system's data on different disks so we need to change ra_pages
>>> dynamically according to where the data resides, it can't be fixed at mount time
>>> or when we open files.
>> That doesn't make a whole lot of sense to me. let me try to get this
>> straight.
>>
>> There is data that resides on two devices (A + B), and a fuse
>> filesystem to access that data. There is a single file in the fuse
>> fs has data on both devices. An app has the file open, and when the
>> data it is accessing is on device A you need to set the readahead to
>> what is best for device A? And when the app tries to access data for
>> that file that is on device B, you need to set the readahead to what
>> is best for device B? And you are changing the fuse BDI readahead
>> settings according to where the data in the back end lies?
>>
>> It seems to me that you should be setting the fuse readahead to the
>> maximum of the readahead windows the data devices have configured at
>> mount time and leaving it at that....
> Then it may not fully utilize some device's read IO bandwidth and put too much
> burden on other devices.
>>> The abstract bdi of fuse and btrfs provides some dynamically changing
>>> bdi.ra_pages
>>> based on the real backing device. IMHO this should not be ignored.
>> btrfs simply takes into account the number of disks it has for a
>> given storage pool when setting up the default bdi ra_pages during
>> mount. This is basically doing what I suggested above. Same with
>> the generic fuse code - it's simply setting a sensible default value
>> for the given fuse configuration.
>>
>> Neither are dynamic in the sense you are talking about, though.
> Actually I've talked about it with Fengguang, he advised we should unify the
But how can bdi related ra_pages reflect different files' readahead
window? Maybe these different files are sequential read, random read and
so on.
> ra_pages in struct bdi and file_ra_state and leave the issue that
> spreading data
> across disks as it is.
> Fengguang, what's you opinion about this?
>
> Thanks,
> Ying Zhu
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
next prev parent reply other threads:[~2012-10-25 2:12 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-23 12:46 [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state Ying Zhu
2012-10-23 12:46 ` Ying Zhu
2012-10-23 13:21 ` Ni zhan Chen
2012-10-23 13:21 ` Ni zhan Chen
[not found] ` <CAA9v8mGMa3SDD1OLTG_wdhCGx7K-0kvSV1+MRi9uCGTz6zZaLg@mail.gmail.com>
2012-10-23 13:41 ` YingHang Zhu
2012-10-23 13:41 ` YingHang Zhu
2012-10-24 1:02 ` Ni zhan Chen
2012-10-24 1:02 ` Ni zhan Chen
2012-10-24 1:33 ` YingHang Zhu
2012-10-24 1:33 ` YingHang Zhu
2012-10-23 22:47 ` Dave Chinner
2012-10-23 22:47 ` Dave Chinner
2012-10-23 23:53 ` YingHang Zhu
2012-10-23 23:53 ` YingHang Zhu
2012-10-24 20:19 ` Dave Chinner
2012-10-24 20:19 ` Dave Chinner
2012-10-25 0:17 ` YingHang Zhu
2012-10-25 0:17 ` YingHang Zhu
2012-10-25 1:48 ` Ni zhan Chen
2012-10-25 1:48 ` Ni zhan Chen
2012-10-25 1:50 ` Dave Chinner
2012-10-25 1:50 ` Dave Chinner
2012-10-25 2:04 ` YingHang Zhu
2012-10-25 2:04 ` YingHang Zhu
2012-10-25 2:12 ` Ni zhan Chen [this message]
2012-10-25 2:12 ` Ni zhan Chen
2012-10-25 2:31 ` YingHang Zhu
2012-10-25 2:31 ` YingHang Zhu
2012-10-25 2:58 ` Fengguang Wu
2012-10-25 2:58 ` Fengguang Wu
2012-10-25 3:12 ` YingHang Zhu
2012-10-25 3:12 ` YingHang Zhu
2012-10-26 0:25 ` Dave Chinner
2012-10-26 0:25 ` Dave Chinner
2012-10-26 1:27 ` Fengguang Wu
2012-10-26 1:27 ` Fengguang Wu
2012-10-26 2:30 ` Ni zhan Chen
2012-10-26 2:30 ` Ni zhan Chen
2012-10-26 3:28 ` YingHang Zhu
2012-10-26 3:28 ` YingHang Zhu
2012-10-26 3:51 ` Ni zhan Chen
2012-10-26 3:51 ` Ni zhan Chen
2012-10-26 4:35 ` YingHang Zhu
2012-10-26 4:35 ` YingHang Zhu
2012-10-26 6:58 ` Fengguang Wu
2012-10-26 6:58 ` Fengguang Wu
2012-10-26 7:03 ` Ni zhan Chen
2012-10-26 7:03 ` Ni zhan Chen
2012-10-26 7:09 ` Fengguang Wu
2012-10-26 7:09 ` Fengguang Wu
2012-10-26 7:19 ` Ni zhan Chen
2012-10-26 7:19 ` Ni zhan Chen
2012-10-26 7:36 ` Fengguang Wu
2012-10-26 7:36 ` Fengguang Wu
2012-10-26 7:47 ` Ni zhan Chen
2012-10-26 7:47 ` Ni zhan Chen
2012-10-26 8:02 ` Fengguang Wu
2012-10-26 8:02 ` Fengguang Wu
2012-10-26 8:08 ` Ni zhan Chen
2012-10-26 8:08 ` Ni zhan Chen
2012-10-26 8:13 ` YingHang Zhu
2012-10-26 8:13 ` YingHang Zhu
2012-10-26 2:25 ` Ni zhan Chen
2012-10-26 2:25 ` Ni zhan Chen
2012-10-26 3:38 ` YingHang Zhu
2012-10-26 3:38 ` YingHang Zhu
2012-10-26 3:55 ` Fengguang Wu
2012-10-26 3:55 ` Fengguang Wu
2012-10-26 5:00 ` YingHang Zhu
2012-10-26 5:00 ` YingHang Zhu
2012-10-25 2:38 ` Fengguang Wu
2012-10-25 2:38 ` Fengguang Wu
2012-10-25 3:08 ` YingHang Zhu
2012-10-25 3:08 ` YingHang Zhu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50889FF1.9030107@gmail.com \
--to=nizhan.chen@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=casualfisher@gmail.com \
--cc=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.