From: Fengguang Wu <fengguang.wu@intel.com>
To: Raghavendra D Prabhu <raghu.prabhu13@gmail.com>
Cc: linux-mm@kvack.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org
Subject: Re: Re: Re: [PATCH 5/5] mm/readahead: Use find_get_pages instead of radix_tree_lookup.
Date: Wed, 17 Oct 2012 10:12:53 +0800 [thread overview]
Message-ID: <20121017021253.GB13769@localhost> (raw)
In-Reply-To: <20121016165714.GA2826@Archie>
On Tue, Oct 16, 2012 at 10:29:31PM +0530, Raghavendra D Prabhu wrote:
> Hi,
>
>
> * On Fri, Sep 28, 2012 at 08:18:50PM +0800, Fengguang Wu <fengguang.wu@intel.com> wrote:
> >On Wed, Sep 26, 2012 at 08:28:20AM +0530, Raghavendra D Prabhu wrote:
> >>Hi,
> >>
> >>
> >>* On Sat, Sep 22, 2012 at 09:15:07PM +0800, Fengguang Wu <fengguang.wu@intel.com> wrote:
> >>>On Sat, Sep 22, 2012 at 04:03:14PM +0530, raghu.prabhu13@gmail.com wrote:
> >>>>From: Raghavendra D Prabhu <rprabhu@wnohang.net>
> >>>>
> >>>>Instead of running radix_tree_lookup in a loop and lock/unlocking in the
> >>>>process, find_get_pages is called once, which returns a page_list, some of which
> >>>>are not NULL and are in core.
> >>>>
> >>>>Also, since find_get_pages returns number of pages, if all pages are already
> >>>>cached, it can return early.
> >>>>
> >>>>This will be mostly helpful when a higher proportion of nr_to_read pages are
> >>>>already in the cache, which will mean less locking for page cache hits.
> >>>
> >>>Do you mean the rcu_read_lock()? But it's a no-op for most archs. So
> >>>the benefit of this patch is questionable. Will need real performance
> >>>numbers to support it.
> >>
> >>Aside from the rcu lock/unlock, isn't it better to not make separate
> >>calls to radix_tree_lookup and merge them into one call? Similar
> >>approach is used with pagevec_lookup which is usually used when one
> >>needs to deal with a set of pages.
> >
> >Yeah, batching is generally good, however find_get_pages() is not the
> >right tool. It costs:
> >- get/release page counts
> >- likely a lot more searches in the address space, because it does not
> > limit the end index of the search.
> >
> >radix_tree_next_hole() will be the right tool, and I have a patch to
> >make it actually smarter than the current dumb loop.
>
> Good to know.
>
> >
> >>>>Signed-off-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
> >>>>---
> >>>> mm/readahead.c | 31 +++++++++++++++++++++++--------
> >>>> 1 file changed, 23 insertions(+), 8 deletions(-)
> >>>>
> >>>>diff --git a/mm/readahead.c b/mm/readahead.c
> >>>>index 3977455..3a1798d 100644
> >>>>--- a/mm/readahead.c
> >>>>+++ b/mm/readahead.c
> >>>>@@ -157,35 +157,42 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
> >>>> {
> >>>> struct inode *inode = mapping->host;
> >>>> struct page *page;
> >>>>+ struct page **page_list = NULL;
> >>>> unsigned long end_index; /* The last page we want to read */
> >>>> LIST_HEAD(page_pool);
> >>>> int page_idx;
> >>>> int ret = 0;
> >>>> int ret_read = 0;
> >>>>+ unsigned long num;
> >>>>+ pgoff_t page_offset;
> >>>> loff_t isize = i_size_read(inode);
> >>>>
> >>>> if (isize == 0)
> >>>> goto out;
> >>>>
> >>>>+ page_list = kzalloc(nr_to_read * sizeof(struct page *), GFP_KERNEL);
> >>>>+ if (!page_list)
> >>>>+ goto out;
> >>>
> >>>That cost one more memory allocation and added code to maintain the
> >>>page list. The original code also don't have the cost of grabbing the
> >>>page count, which eliminate the trouble of page release.
> >>>
> >>>> end_index = ((isize - 1) >> PAGE_CACHE_SHIFT);
> >>>>+ num = find_get_pages(mapping, offset, nr_to_read, page_list);
> >>>
> >>>Assume we want to readahead pages for indexes [0, 100] and the cached
> >>>pages are in [1000, 1100]. find_get_pages() will return the latter.
> >>>Which is probably not the your expected results.
> >>
> >>I thought if I ask for pages in the range [0,100] it will return a
> >>sparse array [0,100] but with holes (NULL) for pages not in cache
> >>and references to pages in cache. Isn't that the expected behavior?
> >
> >Nope. The comments above find_get_pages() made it clear, that it's
> >limited by the number of pages rather than the end page index.
>
> Yes, I noticed that, however since nr_to_read in this case is equal
> to nr_pages.
>
> However, I think I understand what you are saying -- ie. if offset +
You seem to still don't understand it after all the explanations..
> nr_pages exceeds the end_index then it will return pages not
> belonging to the file, is that right?
Nope.
> In that case, won't capping nr_pages do, ie. check if offset +
> nr_pages > end_index and if that is true, then reduce nr_to_read
> by end_index. Won't that work?
Either you or me are get lost in the discussion. Please research the
code and emails and show me working code (that's actually tested) that
can serve as the new base for our discussions.
Thanks,
Fengguang
> >>>> /*
> >>>> * Preallocate as many pages as we will need.
> >>>> */
> >>>> for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
> >>>>- pgoff_t page_offset = offset + page_idx;
> >>>>+ if (page_list[page_idx]) {
> >>>>+ page_cache_release(page_list[page_idx]);
> >>>>+ continue;
> >>>>+ }
> >>>>+
> >>>>+ page_offset = offset + page_idx;
> >>>>
> >>>> if (page_offset > end_index)
> >>>> break;
> >>>>
> >>>>- rcu_read_lock();
> >>>>- page = radix_tree_lookup(&mapping->page_tree, page_offset);
> >>>>- rcu_read_unlock();
> >>>>- if (page)
> >>>>- continue;
> >>>>-
> >>>> page = page_cache_alloc_readahead(mapping);
> >>>>- if (!page)
> >>>>+ if (unlikely(!page))
> >>>> break;
> >>>
> >>>That break will leave the remaining pages' page_count lifted and lead
> >>>to memory leak.
> >>
> >>Thanks. Yes, I realized that now.
> >>>
> >>>> page->index = page_offset;
> >>>> list_add(&page->lru, &page_pool);
> >>>>@@ -194,6 +201,13 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
> >>>> lookahead_size = 0;
> >>>> }
> >>>> ret++;
> >>>>+
> >>>>+ /*
> >>>>+ * Since num pages are already returned, bail out after
> >>>>+ * nr_to_read - num pages are allocated and added.
> >>>>+ */
> >>>>+ if (ret == nr_to_read - num)
> >>>>+ break;
> >>>
> >>>Confused. That break seems unnecessary?
> >>
> >>I fixed that:
> >>
> >>
> >> - pgoff_t page_offset = offset + page_idx;
> >> -
> >> - if (page_offset > end_index)
> >> - break;
> >> -
> >> - rcu_read_lock();
> >> - page = radix_tree_lookup(&mapping->page_tree, page_offset);
> >> - rcu_read_unlock();
> >> - if (page)
> >
> >> + if (page_list[page_idx]) {
> >> + page_cache_release(page_list[page_idx]);
> >
> >No, you cannot expect:
> >
> > page_list[page_idx]->index == page_idx
> >
> >Thanks,
> >Fengguang
> >
> >
> >> + num--;
> >> continue;
> >> + }
> >> +
> >> + page_offset = offset + page_idx;
> >> +
> >> + /*
> >> + * Break only if all the previous
> >> + * references have been released
> >> + */
> >> + if (page_offset > end_index) {
> >> + if (!num)
> >> + break;
> >> + else
> >> + continue;
> >> + }
> >>
> >> page = page_cache_alloc_readahead(mapping);
> >> - if (!page)
> >> - break;
> >> + if (unlikely(!page))
> >> + continue;
> >>
> >>>
> >>>Thanks,
> >>>Fengguang
> >>>
> >>>> }
> >>>>
> >>>> /*
> >>>>@@ -205,6 +219,7 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
> >>>> ret_read = read_pages(mapping, filp, &page_pool, ret);
> >>>> BUG_ON(!list_empty(&page_pool));
> >>>> out:
> >>>>+ kfree(page_list);
> >>>> return (ret_read < 0 ? ret_read : ret);
> >>>> }
> >>>>
> >>>>--
> >>>>1.7.12.1
> >>>>
> >>>>--
> >>>>To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>>>the body to majordomo@kvack.org. For more info on Linux MM,
> >>>>see: http://www.linux-mm.org/ .
> >>>>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>>
> >>
> >>
> >>
> >>
> >>Regards,
> >>--
> >>Raghavendra Prabhu
> >>GPG Id : 0xD72BE977
> >>Fingerprint: B93F EBCB 8E05 7039 CD3C A4B8 A616 DCA1 D72B E977
> >>www: wnohang.net
> >
> >
>
>
>
>
> Regards,
> --
> Raghavendra Prabhu
> GPG Id : 0xD72BE977
> Fingerprint: B93F EBCB 8E05 7039 CD3C A4B8 A616 DCA1 D72B E977
> www: wnohang.net
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2012-10-17 2:12 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-22 10:33 [PATCH 0/5] Readahead fixes / improvements raghu.prabhu13
[not found] ` <cover.1348309711.git.rprabhu@wnohang.net>
2012-09-22 10:33 ` [PATCH 1/5] mm/readahead: Check return value of read_pages raghu.prabhu13
2012-09-22 12:43 ` Fengguang Wu
2012-09-26 1:25 ` Raghavendra D Prabhu
2012-09-28 11:54 ` Fengguang Wu
2012-10-16 17:47 ` Raghavendra D Prabhu
2012-10-17 2:53 ` Fengguang Wu
2012-09-22 10:33 ` [PATCH 2/5] mm/readahead: Change the condition for SetPageReadahead raghu.prabhu13
2012-09-22 12:49 ` Fengguang Wu
2012-09-26 1:29 ` Raghavendra D Prabhu
2012-09-28 11:56 ` Fengguang Wu
2012-10-16 17:42 ` Raghavendra D Prabhu
2012-10-17 2:34 ` Fengguang Wu
2012-09-22 10:33 ` [PATCH 3/5] Remove file_ra_state from arguments of count_history_pages raghu.prabhu13
2012-09-22 12:40 ` Fengguang Wu
2012-10-16 18:21 ` Raghavendra D Prabhu
2012-10-17 3:15 ` Fengguang Wu
2012-09-22 10:33 ` [PATCH 4/5] Move the check for ra_pages after VM_SequentialReadHint() raghu.prabhu13
2012-09-22 12:42 ` Fengguang Wu
2012-09-26 1:39 ` Raghavendra D Prabhu
2012-10-16 18:15 ` Raghavendra D Prabhu
2012-10-17 3:13 ` Fengguang Wu
2012-09-22 10:33 ` [PATCH 5/5] mm/readahead: Use find_get_pages instead of radix_tree_lookup raghu.prabhu13
2012-09-22 13:15 ` Fengguang Wu
2012-09-26 2:58 ` Raghavendra D Prabhu
2012-09-28 12:18 ` Fengguang Wu
2012-10-16 16:59 ` Raghavendra D Prabhu
2012-10-17 2:12 ` Fengguang Wu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121017021253.GB13769@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=raghu.prabhu13@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).