From: Martin Steigerwald <ms@teamix.de>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@vger.kernel.org, Mega Maddin <maddin@megamaddin.de>
Subject: Understanding buffers / buffer cache
Date: Thu, 14 Apr 2011 14:16:55 +0200 [thread overview]
Message-ID: <201104141417.10748.ms@teamix.de> (raw)
[-- Attachment #1: Type: Text/Plain, Size: 5167 bytes --]
Please keep either linux-kernel or my address as cc, as I am only subscribed
to linux-kernel, not linux-mm.
Hi!
In this weeks Linux performance analysis and tuning course that I hold there
have been detailed questions about what the Linux kernel uses the memory for
that free displays under "buffers".
I know as much:
- it is for buffers that have to be written to disk at some time (opposed to
caches which are for reads)
- it is somewhat related to pdflush / flush-major:minor threads, XFS doesn't use
these (but uses xfsbufd / xfsyncd) instead
- observation is, that it doesn't increase much on a simple dd, but does
increase much more on a tar -xf linux-x.y.tar.gz (after a echo 3 >
/proc/sys/vm/drop_caches)
- the data to be written via dd instead displays with Dirty: and then
Writeback and /proc/meminfo
Thus I thought buffers were mainly related to metadata stuff.
But one course member (on cc) digged into the kernel source and found it with:
- fs/block_dev.c:
- long nr_blockdev_pages(void)
{
struct block_device *bdev;
long ret = 0;
spin_lock(&bdev_lock);
list_for_each_entry(bdev, &all_bdevs, bd_list) {
ret += bdev->bd_inode->i_mapping->nrpages;
}
spin_unlock(&bdev_lock);
return ret;
}
- include/fs.h:
struct block_device {
dev_t bd_dev; /* not a kdev_t - it's a search key
*/
struct inode * bd_inode; /* will die */
[...]
struct inode {
/* RCU path lookup touches following: */
[...]
struct address_space *i_mapping;
- And then this in lots of places:
martin@shambhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> find -name
"*.c" -or -name "*.h" | xargs grep i_mapping
./include/linux/fs.h: struct address_space *i_mapping;
./include/linux/fs.h: invalidate_mapping_pages(inode->i_mapping, 0,
-1);
./include/trace/events/ext4.h: __entry->writeback_index = inode-
>i_mapping->writeback_index;
./include/trace/events/ext4.h: __entry->writeback_index = inode-
>i_mapping->writeback_index;
./kernel/cgroup.c: inode->i_mapping->backing_dev_info =
&cgroup_backing_dev_info;
./arch/powerpc/platforms/cell/spufs/file.c: ctx->local_store =
inode->i_mapping;
./arch/powerpc/platforms/cell/spufs/file.c: ctx->cntl = inode-
>i_mapping;
[...]
./arch/tile/kernel/smp.c:static unsigned long __iomem *ipi_mappings[NR_CPUS];
./arch/tile/kernel/smp.c: ipi_mappings[cpu] =
ioremap_prot(offset, PAGE_SIZE, pte);
./arch/tile/kernel/smp.c: ((unsigned long __force *)ipi_mappings[cpu])
[IRQ_RESCHEDULE] = 0;
[...]
including various filesystems where it seems to be used related to metadata
*and* file I/O as well as "journal" / cow I/O. For example:
./fs/btrfs/inode.c: page = find_get_page(inode->i_mapping,
./fs/btrfs/inode.c: inode->i_mapping,
start,
./fs/btrfs/inode.c: inode->i_mapping->a_ops = &btrfs_aops;
./fs/btrfs/inode.c: inode->i_mapping->backing_dev_info = &root-
>fs_info->bdi;
[...]
./fs/btrfs/ordered-data.c: !mapping_tagged(inode->i_mapping,
PAGECACHE_TAG_DIRTY)) {
./fs/btrfs/ordered-data.c: filemap_flush(inode-
>i_mapping);
./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode-
>i_mapping, start, end);
./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode->i_mapping,
start, orig_end);
./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode->i_mapping,
start, orig_end);
./fs/btrfs/ordered-data.c: filemap_fdatawait_range(inode->i_mapping,
start, orig_end);
[...]
./fs/btrfs/file.c: pages[i] = grab_cache_page(inode->i_mapping,
index + i);
./fs/btrfs/file.c: current->backing_dev_info = inode->i_mapping-
>backing_dev_info;
./fs/btrfs/file.c: filemap_fdatawrite_range(inode-
>i_mapping, pos,
./fs/btrfs/file.c: inode-
>i_mapping,
./fs/btrfs/file.c: invalidate_mapping_pages(inode-
>i_mapping,
./fs/btrfs/file.c: filemap_flush(inode->i_mapping);
So what exactly are buffers used for? Is there any up-to-date and detailed
documentation or howto or explaination available? Most hits I found on search
engine are either quite short and vague or relate to really old kernel
versions.
Is there any detailed explaination available on how - as in which steps - the
Linux kernel writes certain kinds of data like
- inode / metadata traffic
- dirty pages (ok, via pdlush / flush, as long as one process doesn't overuse
it)
- I/O from processes by using system functions like write()
- direct i/o
Or do you have any hints on what source files to read in order to understand
more regarding these questions?
Thanks,
--
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
reply other threads:[~2011-04-14 16:24 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201104141417.10748.ms@teamix.de \
--to=ms@teamix.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@vger.kernel.org \
--cc=maddin@megamaddin.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.