From: Eric Rannaud <eric.rannaud@gmail.com>
To: linux-kernel@vger.kernel.org
Subject: madvise(2) MADV_SEQUENTIAL behavior
Date: Tue, 15 Jul 2008 23:03:42 +0000 [thread overview]
Message-ID: <1216163022.3443.156.camel@zenigma> (raw)
mm/madvise.c and madvise(2) say:
* MADV_SEQUENTIAL - pages in the given range will probably be accessed
* once, so they can be aggressively read ahead, and
* can be freed soon after they are accessed.
But as the sample program at the end of this post shows, and as I
understand the code in mm/filemap.c, MADV_SEQUENTIAL will only increase
the amount of read ahead for the specified page range, but will not
influence the rate at which the pages just read will be freed from
memory.
Running the sample program on a large file, say 4GB on a machine with
3GB of RAM, the resident size of the program will grow enough to evict
pretty much everything else. (on 2.6.25.9-40.fc8)
Right before the program below is done reading the 4GB file:
7f6c3e654000-7f6d3e654000 r--s 00000000 fd:02 98125 /tmp/bigfile
Size: 4194304 kB
Rss: 2472220 kB
Pss: 2472220 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 2472220 kB
Private_Dirty: 0 kB
Referenced: 718748 kB
I'm well aware that the kernel is free to ignore the advice given
through madvise(2) (fadvise(2) seems to behave similarly, btw), so I'm
certainly not claiming this is a bug. However, I was wondering what was
the rationale behind it, and whether the manpages should be updated to
be more accurate.
There is a very straightforward workaround: MADV_DONTNEED on the range
just read, every so often, will be very effective at controlling the
resident size of the mapping. (mm/madvise.c:madvise_dontneed() calls
zap_page_range())
Thanks.
---
# dd if=/dev/zero of=/tmp/bigfile bs=1024 count=$((4*1024*1024))
# gcc test.c
# Run:
file=/tmp/bigfile; ./a.out $file & pid=$! ; while true; do cat /proc/$pid/smaps | grep -A 8 $file; sleep 1; done
# cat test.c
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
int main(int argc, char **argv)
{
if (argc != 2)
return -EINVAL;
char *fn = argv[1];
int fd = open(fn, O_RDONLY);
if (fd < 0)
return -errno;
struct stat st;
int ret = fstat(fd, &st);
if (ret)
return -errno;
unsigned char *map = mmap(0, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
if (map == MAP_FAILED)
return -errno;
ret = madvise(map, st.st_size, MADV_SEQUENTIAL);
if (ret) {
fprintf(stderr, "madvise failed\n");
return -errno;
}
const int pagesize = sysconf(_SC_PAGESIZE);
unsigned char dummy = 0;
off_t i;
for (i = 0; i < st.st_size; i += pagesize) {
dummy += map[i];
}
munmap(map, st.st_size);
close(fd);
return dummy;
}
next reply other threads:[~2008-07-15 23:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-15 23:03 Eric Rannaud [this message]
2008-07-16 12:14 ` madvise(2) MADV_SEQUENTIAL behavior Peter Zijlstra
2008-07-16 14:50 ` Rik van Riel
2008-07-16 21:05 ` Chris Snook
2008-07-17 0:01 ` Eric Rannaud
2008-07-17 6:14 ` Nick Piggin
2008-07-17 14:21 ` Rik van Riel
2008-07-17 18:04 ` Chris Snook
2008-07-17 18:09 ` Peter Zijlstra
2008-07-17 14:20 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1216163022.3443.156.camel@zenigma \
--to=eric.rannaud@gmail.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox