Re: block cache replacement strategy?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Johannes Stezenbach <js@sig21.net>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: block cache replacement strategy?
Date: Fri, 1 Oct 2010 15:05:28 +0200	[thread overview]
Message-ID: <20101001130528.GA28723@sig21.net> (raw)
In-Reply-To: <20100930232758.GI3573@quack.suse.cz>

Hi,

On Fri, Oct 01, 2010 at 01:27:59AM +0200, Jan Kara wrote:
> On Tue 07-09-10 15:34:29, Johannes Stezenbach wrote:
> > 
> > zzz:~# echo 3 >/proc/sys/vm/drop_caches 
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 13.9454 s, 75.2 MB/s
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 0.92799 s, 1.1 GB/s
> > 
> > OK, seems like the blocks are cached. But:
> > 
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 13.8375 s, 75.8 MB/s
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 13.8429 s, 75.7 MB/s
>   I took a look at this because it looked strange at the first sight to me.
> After some code reading the result is that everything is working as
> designed.
>   The first dd fills up memory with 1GB of data. Pages with data just freshly
> read from disk are in "Inactive" state. When these pages are read again by
> the second dd, they move into the "Active" state - caching has proved
> useful and thus we value the data more. When the third dd is run, it
> eventually needs to reclaim some pages to cache new data. System preferably
> reclaims "Inactive" pages and since it has plenty of them - all the data
> the third dd has read so far - it succeeds. Thus when a third dd finishes,
> only a small part of the whole 1 GB chunk is in memory since we continually
> reclaimed pages from it.
>   Active pages would start becoming inactive only when there would be too
> many of them (e.g. when there would be more active pages than inactive
> pages). But that does not happen with your workload... I guess this
> explains it.

Thank you for your comments, I see now how it works.

What you snipped from my post:

> > Even if I let 15min pass and repeat the dd command
> > several times, I cannot see any caching effects, it
> > stays at ~75 MB/s.
...
> > Active:           792720 kB
> > Inactive:         758832 kB

So with my new knowledge I tried to run dd with a smaller data set
to get new data on the Active pages list:

zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=680 skip=1000
680+0 records in
680+0 records out
713031680 bytes (713 MB) copied, 9.8105 s, 72.7 MB/s
zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=680 skip=1000
680+0 records in
680+0 records out
713031680 bytes (713 MB) copied, 0.676862 s, 1.1 GB/s

zzz:~# cat /proc/meminfo 
MemTotal:        1793272 kB
MemFree:           15788 kB
Buffers:         1379332 kB
Cached:            14084 kB
SwapCached:        19516 kB
Active:          1493748 kB
Inactive:          45928 kB
Active(anon):     106416 kB
Inactive(anon):    42456 kB
Active(file):    1387332 kB
Inactive(file):     3472 kB

zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 5.09198 s, 206 MB/s
zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 1.63369 s, 642 MB/s
zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 0.892916 s, 1.2 GB/s


Yippie!

BTW, it seems this has nothing to do with sequential read, and my
earlier testing with lmdd was flawed since lmdd uses 1M = 1000000
and 1m = 1048576, thus my test read overlapping blocks and the
resulting data set was smaller than the number of inactive pages.
A correct test with lmdd would use

  lmdd if=some_large_file_or_blockdev bs=1m count=1024 rand=5g norepeat=
  lmdd if=some_large_file_or_blockdev bs=1m count=1024 rand=5g norepeat= start=5g

and shows the same caching behaviour (on a machine with 2G RAM).


Thanks
Johannes

WARNING: multiple messages have this Message-ID (diff)

From: Johannes Stezenbach <js@sig21.net>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: block cache replacement strategy?
Date: Fri, 1 Oct 2010 15:05:28 +0200	[thread overview]
Message-ID: <20101001130528.GA28723@sig21.net> (raw)
In-Reply-To: <20100930232758.GI3573@quack.suse.cz>

Hi,

On Fri, Oct 01, 2010 at 01:27:59AM +0200, Jan Kara wrote:
> On Tue 07-09-10 15:34:29, Johannes Stezenbach wrote:
> > 
> > zzz:~# echo 3 >/proc/sys/vm/drop_caches 
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 13.9454 s, 75.2 MB/s
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 0.92799 s, 1.1 GB/s
> > 
> > OK, seems like the blocks are cached. But:
> > 
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 13.8375 s, 75.8 MB/s
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 13.8429 s, 75.7 MB/s
>   I took a look at this because it looked strange at the first sight to me.
> After some code reading the result is that everything is working as
> designed.
>   The first dd fills up memory with 1GB of data. Pages with data just freshly
> read from disk are in "Inactive" state. When these pages are read again by
> the second dd, they move into the "Active" state - caching has proved
> useful and thus we value the data more. When the third dd is run, it
> eventually needs to reclaim some pages to cache new data. System preferably
> reclaims "Inactive" pages and since it has plenty of them - all the data
> the third dd has read so far - it succeeds. Thus when a third dd finishes,
> only a small part of the whole 1 GB chunk is in memory since we continually
> reclaimed pages from it.
>   Active pages would start becoming inactive only when there would be too
> many of them (e.g. when there would be more active pages than inactive
> pages). But that does not happen with your workload... I guess this
> explains it.

Thank you for your comments, I see now how it works.

What you snipped from my post:

> > Even if I let 15min pass and repeat the dd command
> > several times, I cannot see any caching effects, it
> > stays at ~75 MB/s.
...
> > Active:           792720 kB
> > Inactive:         758832 kB

So with my new knowledge I tried to run dd with a smaller data set
to get new data on the Active pages list:

zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=680 skip=1000
680+0 records in
680+0 records out
713031680 bytes (713 MB) copied, 9.8105 s, 72.7 MB/s
zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=680 skip=1000
680+0 records in
680+0 records out
713031680 bytes (713 MB) copied, 0.676862 s, 1.1 GB/s

zzz:~# cat /proc/meminfo 
MemTotal:        1793272 kB
MemFree:           15788 kB
Buffers:         1379332 kB
Cached:            14084 kB
SwapCached:        19516 kB
Active:          1493748 kB
Inactive:          45928 kB
Active(anon):     106416 kB
Inactive(anon):    42456 kB
Active(file):    1387332 kB
Inactive(file):     3472 kB

zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 5.09198 s, 206 MB/s
zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 1.63369 s, 642 MB/s
zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 0.892916 s, 1.2 GB/s


Yippie!

BTW, it seems this has nothing to do with sequential read, and my
earlier testing with lmdd was flawed since lmdd uses 1M = 1000000
and 1m = 1048576, thus my test read overlapping blocks and the
resulting data set was smaller than the number of inactive pages.
A correct test with lmdd would use

  lmdd if=some_large_file_or_blockdev bs=1m count=1024 rand=5g norepeat=
  lmdd if=some_large_file_or_blockdev bs=1m count=1024 rand=5g norepeat= start=5g

and shows the same caching behaviour (on a machine with 2G RAM).


Thanks
Johannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-10-01 13:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-07 13:34 block cache replacement strategy? Johannes Stezenbach
2010-09-07 13:34 ` Johannes Stezenbach
2010-09-09 12:00 ` Johannes Stezenbach
2010-09-09 12:00   ` Johannes Stezenbach
2010-09-10 10:02   ` Florian Mickler
2010-09-10 10:02     ` Florian Mickler
2010-09-10 16:02     ` Johannes Stezenbach
2010-09-10 16:02       ` Johannes Stezenbach
2010-09-13 15:21       ` Johannes Stezenbach
2010-09-13 15:21         ` Johannes Stezenbach
2010-09-13 19:09         ` dave b
2010-09-13 19:09           ` dave b
2010-09-13 19:26           ` Johannes Stezenbach
2010-09-13 19:26             ` Johannes Stezenbach
2010-09-30 23:27 ` Jan Kara
2010-09-30 23:27   ` Jan Kara
2010-10-01 13:05   ` Johannes Stezenbach [this message]
2010-10-01 13:05     ` Johannes Stezenbach

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101001130528.GA28723@sig21.net \
    --to=js@sig21.net \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.