From: Eric St-Laurent <ericstl34@sympatico.ca>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Fengguang Wu <wfg@mail.ustc.edu.cn>, riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Rusty Russell <rusty@rustcorp.com.au>,
Tim Pepper <lnxninja@us.ibm.com>, Chris Snook <csnook@redhat.com>,
Nick Piggin <nickpiggin@yahoo.com.au>
Subject: Re: [PATCH 1/3] readahead: drop behind
Date: Tue, 24 Jul 2007 23:55:17 -0400 [thread overview]
Message-ID: <1185335717.7105.6.camel@perkele> (raw)
In-Reply-To: <20070721210051.975382000@chello.nl>
[-- Attachment #1: Type: text/plain, Size: 3307 bytes --]
On Sat, 2007-21-07 at 23:00 +0200, Peter Zijlstra wrote:
> Use the read-ahead code to provide hints to page reclaim.
>
> This patch has the potential to solve the streaming-IO trashes my
> desktop problem.
>
> It tries to aggressively reclaim pages that were loaded in a strong
> sequential pattern and have been consumed. Thereby limiting the damage
> to the current resident set.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
(sorry for the delay)
Ok, I've done some tests with your patches,
I came up with a test program that should approximate my use case. It
simply mmap() and scan (read) a 375M file which represent the usual used
memory on my desktop system. This data is frequently used, and should
stay cached as much as possible in preference over the "used once" data
read in the page cache when copying large files. I don't claim that the
test program is perfect or even correct, I'm open for suggestions.
Test system:
- Linux x86_64 2.6.23-rc1
- 1G of RAM
- I use the basic drop behind and sysctl patches. The readahead size
patch is _not_ included.
Setting up:
dd if=/dev/zero of=/tmp/375M_file bs=1M count=375
dd if=/dev/zero of=/tmp/5G_file bs=1M count=5120
Tests with stock kernel (drop behind disabled):
echo 0 >/proc/sys/vm/drop_behind
Base test:
sync; echo 1 >/proc/sys/vm/drop_caches
time ./large_app_load_simul /tmp/375M_file
time ./large_app_load_simul /tmp/375M_file
time ./large_app_load_simul /tmp/375M_file
time ./large_app_load_simul /tmp/375M_file
1st execution: 0m7.146s
2nd execution: 0m1.119s
3rd execution: 0m1.109s
4th execution: 0m1.105s
Reading a large file test:
sync; echo 1 >/proc/sys/vm/drop_caches
time ./large_app_load_simul /tmp/375M_file
time ./large_app_load_simul /tmp/375M_file
cp /tmp/5G_file /dev/null
time ./large_app_load_simul /tmp/375M_file
time ./large_app_load_simul /tmp/375M_file
1st execution: 0m7.224s
2nd execution: 0m1.114s
3rd execution: 0m7.178s <<< Much slower
4th execution: 0m1.115s
Copying (read+write) a large file test:
sync; echo 1 >/proc/sys/vm/drop_caches
time ./large_app_load_simul /tmp/375M_file
time ./large_app_load_simul /tmp/375M_file
cp /tmp/5G_file /tmp/copy_of_5G_file
time ./large_app_load_simul /tmp/375M_file
time ./large_app_load_simul /tmp/375M_file
rm /tmp/copy_of_5G_file
1st execution: 0m7.203s
2nd execution: 0m1.147s
3rd execution: 0m7.238s <<< Much slower
4th execution: 0m1.129s
Tests with drop behind enabled:
echo 1 >/proc/sys/vm/drop_behind
Base test:
[same tests as above]
1st execution: 0m7.206s
2nd execution: 0m1.110s
3rd execution: 0m1.102s
4th execution: 0m1.106s
Reading a large file test:
[same tests as above]
1st execution: 0m7.197s
2nd execution: 0m1.116s
3rd execution: 0m1.114s <<< Great!!!
4th execution: 0m1.111s
Copying (read+write) a large file test:
[same tests as above]
1st execution: 0m7.186s
2nd execution: 0m1.111s
3rd execution: 0m7.339s <<< Not fixed
4th execution: 0m1.121s
Conclusion:
- The drop-behind patch works and really prevents the page cache content
from being fulled with useless read-once data.
- It doesn't help the copy (read+write) case. This should also be fixed,
as it's a common workload.
Tested-By: Eric St-Laurent (ericstl34@xxxxxxxxx.xx)
Best regards,
- Eric
(*) Test program and batch file are attached.
[-- Attachment #2: drop-behind+sysctl.patch --]
[-- Type: text/x-patch, Size: 5642 bytes --]
diff -urN linux-2.6/include/linux/swap.h linux-2.6-drop-behind/include/linux/swap.h
--- linux-2.6/include/linux/swap.h 2007-07-21 18:26:00.000000000 -0400
+++ linux-2.6-drop-behind/include/linux/swap.h 2007-07-22 16:22:48.000000000 -0400
@@ -180,6 +180,7 @@
/* linux/mm/swap.c */
extern void FASTCALL(lru_cache_add(struct page *));
extern void FASTCALL(lru_cache_add_active(struct page *));
+extern void FASTCALL(lru_demote(struct page *));
extern void FASTCALL(activate_page(struct page *));
extern void FASTCALL(mark_page_accessed(struct page *));
extern void lru_add_drain(void);
diff -urN linux-2.6/kernel/sysctl.c linux-2.6-drop-behind/kernel/sysctl.c
--- linux-2.6/kernel/sysctl.c 2007-07-21 18:26:01.000000000 -0400
+++ linux-2.6-drop-behind/kernel/sysctl.c 2007-07-22 16:20:27.000000000 -0400
@@ -163,6 +163,7 @@
extern int prove_locking;
extern int lock_stat;
+extern int sysctl_dropbehind;
/* The default sysctl tables: */
@@ -1048,6 +1049,14 @@
.extra1 = &zero,
},
#endif
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "drop_behind",
+ .data = &sysctl_dropbehind,
+ .maxlen = sizeof(sysctl_dropbehind),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
/*
* NOTE: do not add new entries to this table unless you have read
* Documentation/sysctl/ctl_unnumbered.txt
diff -urN linux-2.6/mm/readahead.c linux-2.6-drop-behind/mm/readahead.c
--- linux-2.6/mm/readahead.c 2007-07-21 18:26:01.000000000 -0400
+++ linux-2.6-drop-behind/mm/readahead.c 2007-07-22 16:41:47.000000000 -0400
@@ -15,6 +15,7 @@
#include <linux/backing-dev.h>
#include <linux/task_io_accounting_ops.h>
#include <linux/pagevec.h>
+#include <linux/swap.h>
void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
{
@@ -429,6 +430,8 @@
}
EXPORT_SYMBOL_GPL(page_cache_sync_readahead);
+int sysctl_dropbehind = 1;
+
/**
* page_cache_async_readahead - file readahead for marked pages
* @mapping: address_space which holds the pagecache and I/O vectors
@@ -449,6 +452,11 @@
struct page *page, pgoff_t offset,
unsigned long req_size)
{
+ pgoff_t demote_idx = offset - min_t(pgoff_t, offset, ra->size);
+ struct page *pages[16];
+ unsigned nr_pages;
+ unsigned i;
+
/* no read-ahead */
if (!ra->ra_pages)
return;
@@ -467,6 +475,36 @@
if (bdi_read_congested(mapping->backing_dev_info))
return;
+ /*
+ * Read-ahead use once: when the ra window is maximal this is a good
+ * hint that there is sequential IO, which implies that the pages that
+ * have been used thus far can be reclaimed
+ */
+ if (sysctl_dropbehind && ra->size == ra->ra_pages) {
+ nr_pages = find_get_pages(mapping,
+ demote_idx, ARRAY_SIZE(pages), pages);
+
+ for (i = 0; i < nr_pages; i++) {
+ page = pages[i];
+ demote_idx = page_index(page);
+
+ /*
+ * The page is active. This means there are other
+ * users. We should not take away somebody else's
+ * pages, so do not drop behind beyond this point.
+ */
+ if (demote_idx < offset && !PageActive(page)) {
+ lru_demote(page);
+ } else {
+ demote_idx = offset;
+ break;
+ }
+ }
+ demote_idx++;
+
+ release_pages(pages, nr_pages, 0);
+ } while (demote_idx < offset);
+
/* do read-ahead */
ondemand_readahead(mapping, ra, filp, true, offset, req_size);
}
diff -urN linux-2.6/mm/swap.c linux-2.6-drop-behind/mm/swap.c
--- linux-2.6/mm/swap.c 2007-07-21 18:26:01.000000000 -0400
+++ linux-2.6-drop-behind/mm/swap.c 2007-07-22 16:26:35.000000000 -0400
@@ -30,6 +30,7 @@
#include <linux/cpu.h>
#include <linux/notifier.h>
#include <linux/init.h>
+#include <linux/rmap.h>
/* How many pages do we try to swap or page in/out together? */
int page_cluster;
@@ -176,6 +177,7 @@
*/
static DEFINE_PER_CPU(struct pagevec, lru_add_pvecs) = { 0, };
static DEFINE_PER_CPU(struct pagevec, lru_add_active_pvecs) = { 0, };
+static DEFINE_PER_CPU(struct pagevec, lru_demote_pvecs) = { 0, };
void fastcall lru_cache_add(struct page *page)
{
@@ -197,6 +199,37 @@
put_cpu_var(lru_add_active_pvecs);
}
+static void __pagevec_lru_demote(struct pagevec *pvec)
+{
+ int i;
+ struct zone *zone = NULL;
+
+ for (i = 0; i < pagevec_count(pvec); i++) {
+ struct page *page = pvec->pages[i];
+ struct zone *pagezone = page_zone(page);
+
+ if (pagezone != zone) {
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ zone = pagezone;
+ spin_lock_irq(&zone->lru_lock);
+ }
+ if (PageLRU(page)) {
+ page_referenced(page, 0);
+ if (PageActive(page)) {
+ ClearPageActive(page);
+ __dec_zone_state(zone, NR_ACTIVE);
+ __inc_zone_state(zone, NR_INACTIVE);
+ }
+ list_move_tail(&page->lru, &zone->inactive_list);
+ }
+ }
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ release_pages(pvec->pages, pvec->nr, pvec->cold);
+ pagevec_reinit(pvec);
+}
+
static void __lru_add_drain(int cpu)
{
struct pagevec *pvec = &per_cpu(lru_add_pvecs, cpu);
@@ -207,6 +240,9 @@
pvec = &per_cpu(lru_add_active_pvecs, cpu);
if (pagevec_count(pvec))
__pagevec_lru_add_active(pvec);
+ pvec = &per_cpu(lru_demote_pvecs, cpu);
+ if (pagevec_count(pvec))
+ __pagevec_lru_demote(pvec);
}
void lru_add_drain(void)
@@ -403,6 +439,21 @@
}
/*
+ * Function used to forcefully demote a page to the tail of the inactive
+ * list.
+ */
+void fastcall lru_demote(struct page *page)
+{
+ if (likely(get_page_unless_zero(page))) {
+ struct pagevec *pvec = &get_cpu_var(lru_demote_pvecs);
+
+ if (!pagevec_add(pvec, page))
+ __pagevec_lru_demote(pvec);
+ put_cpu_var(lru_demote_pvecs);
+ }
+}
+
+/*
* Try to drop buffers from the pages in a pagevec
*/
void pagevec_strip(struct pagevec *pvec)
[-- Attachment #3: large_app_load_simul.cpp --]
[-- Type: text/x-c++src, Size: 459 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
if (argc != 2)
return EXIT_FAILURE;
FILE *file = fopen(argv[1], "rb");
fseek(file, 0, SEEK_END);
long size = ftell(file);
char *buf = (char *)mmap(NULL, size, PROT_READ, MAP_PRIVATE, fileno(file), 0);
for (long i = 0; i < size; i++)
char temp = buf[i];
int result = munmap(buf, size);
fclose(file);
return EXIT_SUCCESS;
}
[-- Attachment #4: Makefile --]
[-- Type: text/x-makefile, Size: 69 bytes --]
all:
gcc large_app_load_simul.cpp -o large_app_load_simul -lstdc++
[-- Attachment #5: useonce-test.sh --]
[-- Type: application/x-shellscript, Size: 1737 bytes --]
next prev parent reply other threads:[~2007-07-25 3:55 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-21 21:00 [PATCH 0/3] readahead drop behind and size adjustment Peter Zijlstra
2007-07-21 21:00 ` [PATCH 1/3] readahead: drop behind Peter Zijlstra
2007-07-21 20:29 ` Eric St-Laurent
2007-07-21 20:37 ` Peter Zijlstra
2007-07-21 20:59 ` Eric St-Laurent
2007-07-21 21:06 ` Peter Zijlstra
2007-07-25 3:55 ` Eric St-Laurent [this message]
2007-07-21 21:00 ` [PATCH 2/3] readahead: fadvise drop behind controls Peter Zijlstra
2007-07-21 21:00 ` [PATCH 3/3] readahead: scale max readahead size depending on memory size Peter Zijlstra
2007-07-22 8:24 ` Jens Axboe
2007-07-22 8:36 ` Peter Zijlstra
2007-07-22 8:50 ` Jens Axboe
2007-07-22 9:17 ` Peter Zijlstra
2007-07-22 16:44 ` Jens Axboe
2007-07-23 10:04 ` Jörn Engel
2007-07-23 10:11 ` Jens Axboe
2007-07-23 22:44 ` Rusty Russell
2007-07-22 23:52 ` Rik van Riel
2007-07-23 5:22 ` Jens Axboe
[not found] ` <20070722084526.GB6317@mail.ustc.edu.cn>
2007-07-22 8:45 ` Fengguang Wu
2007-07-22 8:59 ` Peter Zijlstra
[not found] ` <20070722095313.GA8136@mail.ustc.edu.cn>
2007-07-22 9:53 ` Fengguang Wu
[not found] ` <20070722023923.GA6438@mail.ustc.edu.cn>
2007-07-22 2:39 ` [PATCH 0/3] readahead drop behind and size adjustment Fengguang Wu
2007-07-22 2:44 ` Dave Jones
[not found] ` <20070722081010.GA6317@mail.ustc.edu.cn>
2007-07-22 8:10 ` Fengguang Wu
2007-07-22 8:24 ` Peter Zijlstra
[not found] ` <20070722082923.GA7790@mail.ustc.edu.cn>
2007-07-22 8:29 ` Fengguang Wu
2007-07-22 8:33 ` Rusty Russell
2007-07-22 8:45 ` Peter Zijlstra
2007-07-23 9:00 ` Nick Piggin
[not found] ` <20070723142457.GA10130@mail.ustc.edu.cn>
2007-07-23 14:24 ` Fengguang Wu
2007-07-23 19:40 ` Andrew Morton
[not found] ` <20070724004728.GA8026@mail.ustc.edu.cn>
2007-07-24 0:47 ` Fengguang Wu
2007-07-24 1:17 ` Andrew Morton
2007-07-24 8:50 ` Andreas Dilger
2007-07-24 4:30 ` Nick Piggin
2007-07-25 4:35 ` Eric St-Laurent
2007-07-25 5:19 ` Nick Piggin
2007-07-25 6:18 ` Eric St-Laurent
2007-07-25 7:09 ` Nick Piggin
2007-07-25 7:48 ` Eric St-Laurent
2007-07-25 15:36 ` Rik van Riel
2007-07-25 15:33 ` Rik van Riel
2007-07-29 7:44 ` Eric St-Laurent
2007-07-25 15:28 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1185335717.7105.6.camel@perkele \
--to=ericstl34@sympatico.ca \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=csnook@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lnxninja@us.ibm.com \
--cc=nickpiggin@yahoo.com.au \
--cc=riel@redhat.com \
--cc=rusty@rustcorp.com.au \
--cc=wfg@mail.ustc.edu.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox