From: Raghavendra D Prabhu <raghu.prabhu13@gmail.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: linux-mm@kvack.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org
Subject: Re: Re: [PATCH 5/5] mm/readahead: Use find_get_pages instead of radix_tree_lookup.
Date: Wed, 26 Sep 2012 08:28:20 +0530 [thread overview]
Message-ID: <20120926025820.GA38848@Archie> (raw)
In-Reply-To: <20120922131507.GC15962@localhost>
[-- Attachment #1: Type: text/plain, Size: 6051 bytes --]
Hi,
* On Sat, Sep 22, 2012 at 09:15:07PM +0800, Fengguang Wu <fengguang.wu@intel.com> wrote:
>On Sat, Sep 22, 2012 at 04:03:14PM +0530, raghu.prabhu13@gmail.com wrote:
>> From: Raghavendra D Prabhu <rprabhu@wnohang.net>
>>
>> Instead of running radix_tree_lookup in a loop and lock/unlocking in the
>> process, find_get_pages is called once, which returns a page_list, some of which
>> are not NULL and are in core.
>>
>> Also, since find_get_pages returns number of pages, if all pages are already
>> cached, it can return early.
>>
>> This will be mostly helpful when a higher proportion of nr_to_read pages are
>> already in the cache, which will mean less locking for page cache hits.
>
>Do you mean the rcu_read_lock()? But it's a no-op for most archs. So
>the benefit of this patch is questionable. Will need real performance
>numbers to support it.
Aside from the rcu lock/unlock, isn't it better to not make
separate calls to radix_tree_lookup and merge them into one call?
Similar approach is used with pagevec_lookup which is usually
used when one needs to deal with a set of pages.
>
>> Signed-off-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
>> ---
>> mm/readahead.c | 31 +++++++++++++++++++++++--------
>> 1 file changed, 23 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/readahead.c b/mm/readahead.c
>> index 3977455..3a1798d 100644
>> --- a/mm/readahead.c
>> +++ b/mm/readahead.c
>> @@ -157,35 +157,42 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
>> {
>> struct inode *inode = mapping->host;
>> struct page *page;
>> + struct page **page_list = NULL;
>> unsigned long end_index; /* The last page we want to read */
>> LIST_HEAD(page_pool);
>> int page_idx;
>> int ret = 0;
>> int ret_read = 0;
>> + unsigned long num;
>> + pgoff_t page_offset;
>> loff_t isize = i_size_read(inode);
>>
>> if (isize == 0)
>> goto out;
>>
>> + page_list = kzalloc(nr_to_read * sizeof(struct page *), GFP_KERNEL);
>> + if (!page_list)
>> + goto out;
>
>That cost one more memory allocation and added code to maintain the
>page list. The original code also don't have the cost of grabbing the
>page count, which eliminate the trouble of page release.
>
>> end_index = ((isize - 1) >> PAGE_CACHE_SHIFT);
>> + num = find_get_pages(mapping, offset, nr_to_read, page_list);
>
>Assume we want to readahead pages for indexes [0, 100] and the cached
>pages are in [1000, 1100]. find_get_pages() will return the latter.
>Which is probably not the your expected results.
I thought if I ask for pages in the range [0,100] it will
return a sparse array [0,100] but with holes (NULL) for pages not in
cache and references to pages in cache. Isn't that the expected
behavior?
>
>> /*
>> * Preallocate as many pages as we will need.
>> */
>> for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
>> - pgoff_t page_offset = offset + page_idx;
>> + if (page_list[page_idx]) {
>> + page_cache_release(page_list[page_idx]);
>> + continue;
>> + }
>> +
>> + page_offset = offset + page_idx;
>>
>> if (page_offset > end_index)
>> break;
>>
>> - rcu_read_lock();
>> - page = radix_tree_lookup(&mapping->page_tree, page_offset);
>> - rcu_read_unlock();
>> - if (page)
>> - continue;
>> -
>> page = page_cache_alloc_readahead(mapping);
>> - if (!page)
>> + if (unlikely(!page))
>> break;
>
>That break will leave the remaining pages' page_count lifted and lead
>to memory leak.
Thanks. Yes, I realized that now.
>
>> page->index = page_offset;
>> list_add(&page->lru, &page_pool);
>> @@ -194,6 +201,13 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
>> lookahead_size = 0;
>> }
>> ret++;
>> +
>> + /*
>> + * Since num pages are already returned, bail out after
>> + * nr_to_read - num pages are allocated and added.
>> + */
>> + if (ret == nr_to_read - num)
>> + break;
>
>Confused. That break seems unnecessary?
I fixed that:
- pgoff_t page_offset = offset + page_idx;
-
- if (page_offset > end_index)
- break;
-
- rcu_read_lock();
- page = radix_tree_lookup(&mapping->page_tree, page_offset);
- rcu_read_unlock();
- if (page)
+ if (page_list[page_idx]) {
+ page_cache_release(page_list[page_idx]);
+ num--;
continue;
+ }
+
+ page_offset = offset + page_idx;
+
+ /*
+ * Break only if all the previous
+ * references have been released
+ */
+ if (page_offset > end_index) {
+ if (!num)
+ break;
+ else
+ continue;
+ }
page = page_cache_alloc_readahead(mapping);
- if (!page)
- break;
+ if (unlikely(!page))
+ continue;
>
>Thanks,
>Fengguang
>
>> }
>>
>> /*
>> @@ -205,6 +219,7 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
>> ret_read = read_pages(mapping, filp, &page_pool, ret);
>> BUG_ON(!list_empty(&page_pool));
>> out:
>> + kfree(page_list);
>> return (ret_read < 0 ? ret_read : ret);
>> }
>>
>> --
>> 1.7.12.1
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
Regards,
--
Raghavendra Prabhu
GPG Id : 0xD72BE977
Fingerprint: B93F EBCB 8E05 7039 CD3C A4B8 A616 DCA1 D72B E977
www: wnohang.net
[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]
next prev parent reply other threads:[~2012-09-26 2:58 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-22 10:33 [PATCH 0/5] Readahead fixes / improvements raghu.prabhu13
[not found] ` <cover.1348309711.git.rprabhu@wnohang.net>
2012-09-22 10:33 ` [PATCH 1/5] mm/readahead: Check return value of read_pages raghu.prabhu13
2012-09-22 12:43 ` Fengguang Wu
2012-09-26 1:25 ` Raghavendra D Prabhu
2012-09-28 11:54 ` Fengguang Wu
2012-10-16 17:47 ` Raghavendra D Prabhu
2012-10-17 2:53 ` Fengguang Wu
2012-09-22 10:33 ` [PATCH 2/5] mm/readahead: Change the condition for SetPageReadahead raghu.prabhu13
2012-09-22 12:49 ` Fengguang Wu
2012-09-26 1:29 ` Raghavendra D Prabhu
2012-09-28 11:56 ` Fengguang Wu
2012-10-16 17:42 ` Raghavendra D Prabhu
2012-10-17 2:34 ` Fengguang Wu
2012-09-22 10:33 ` [PATCH 3/5] Remove file_ra_state from arguments of count_history_pages raghu.prabhu13
2012-09-22 12:40 ` Fengguang Wu
2012-10-16 18:21 ` Raghavendra D Prabhu
2012-10-17 3:15 ` Fengguang Wu
2012-09-22 10:33 ` [PATCH 4/5] Move the check for ra_pages after VM_SequentialReadHint() raghu.prabhu13
2012-09-22 12:42 ` Fengguang Wu
2012-09-26 1:39 ` Raghavendra D Prabhu
2012-10-16 18:15 ` Raghavendra D Prabhu
2012-10-17 3:13 ` Fengguang Wu
2012-09-22 10:33 ` [PATCH 5/5] mm/readahead: Use find_get_pages instead of radix_tree_lookup raghu.prabhu13
2012-09-22 13:15 ` Fengguang Wu
2012-09-26 2:58 ` Raghavendra D Prabhu [this message]
2012-09-28 12:18 ` Fengguang Wu
2012-10-16 16:59 ` Raghavendra D Prabhu
2012-10-17 2:12 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120926025820.GA38848@Archie \
--to=raghu.prabhu13@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=linux-mm@kvack.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).