Understanding buffers / buffer cache

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin Steigerwald <ms@teamix.de>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@vger.kernel.org, Mega Maddin <maddin@megamaddin.de>
Subject: Understanding buffers / buffer cache
Date: Thu, 14 Apr 2011 14:16:55 +0200	[thread overview]
Message-ID: <201104141417.10748.ms@teamix.de> (raw)

[-- Attachment #1: Type: Text/Plain, Size: 5167 bytes --]

Please keep either linux-kernel or my address as cc, as I am only subscribed 
to linux-kernel, not linux-mm.


Hi!

In this weeks Linux performance analysis and tuning course that I hold there 
have been detailed questions about what the Linux kernel uses the memory for 
that free displays under "buffers".

I know as much:

- it is for buffers that have to be written to disk at some time (opposed to 
caches which are for reads)

- it is somewhat related to pdflush / flush-major:minor threads, XFS doesn't use 
these (but uses xfsbufd / xfsyncd) instead

- observation is, that it doesn't increase much on a simple dd, but does 
increase much more on a tar -xf linux-x.y.tar.gz (after a echo 3 > 
/proc/sys/vm/drop_caches)

- the data to be written via dd instead displays with Dirty: and then 
Writeback and /proc/meminfo


Thus I thought buffers were mainly related to metadata stuff.


But one course member (on cc) digged into the kernel source and found it with:

- fs/block_dev.c:

- long nr_blockdev_pages(void)
{
        struct block_device *bdev;
        long ret = 0;
        spin_lock(&bdev_lock);
        list_for_each_entry(bdev, &all_bdevs, bd_list) {
                ret += bdev->bd_inode->i_mapping->nrpages;
        }
        spin_unlock(&bdev_lock);
        return ret;
}

- include/fs.h:

struct block_device {
        dev_t                   bd_dev;  /* not a kdev_t - it's a search key 
*/
        struct inode *          bd_inode;       /* will die */

[...]

struct inode {
        /* RCU path lookup touches following: */
[...]
        struct address_space    *i_mapping;


- And then this in lots of places:

martin@shambhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> find -name 
"*.c" -or -name "*.h" | xargs grep i_mapping
./include/linux/fs.h:   struct address_space    *i_mapping;
./include/linux/fs.h:           invalidate_mapping_pages(inode->i_mapping, 0, 
-1);
./include/trace/events/ext4.h:          __entry->writeback_index = inode-
>i_mapping->writeback_index;
./include/trace/events/ext4.h:          __entry->writeback_index = inode-
>i_mapping->writeback_index;
./kernel/cgroup.c:              inode->i_mapping->backing_dev_info = 
&cgroup_backing_dev_info;
./arch/powerpc/platforms/cell/spufs/file.c:             ctx->local_store = 
inode->i_mapping;
./arch/powerpc/platforms/cell/spufs/file.c:             ctx->cntl = inode-
>i_mapping;
[...]
./arch/tile/kernel/smp.c:static unsigned long __iomem *ipi_mappings[NR_CPUS];
./arch/tile/kernel/smp.c:               ipi_mappings[cpu] = 
ioremap_prot(offset, PAGE_SIZE, pte);
./arch/tile/kernel/smp.c:       ((unsigned long __force *)ipi_mappings[cpu])
[IRQ_RESCHEDULE] = 0;
[...]

including various filesystems where it seems to be used related to metadata 
*and* file I/O as well as "journal" / cow I/O. For example:

./fs/btrfs/inode.c:             page = find_get_page(inode->i_mapping,
./fs/btrfs/inode.c:                                        inode->i_mapping, 
start,
./fs/btrfs/inode.c:             inode->i_mapping->a_ops = &btrfs_aops;
./fs/btrfs/inode.c:             inode->i_mapping->backing_dev_info = &root-
>fs_info->bdi;
[...]
./fs/btrfs/ordered-data.c:          !mapping_tagged(inode->i_mapping, 
PAGECACHE_TAG_DIRTY)) {
./fs/btrfs/ordered-data.c:                              filemap_flush(inode-
>i_mapping);
./fs/btrfs/ordered-data.c:              filemap_fdatawrite_range(inode-
>i_mapping, start, end);
./fs/btrfs/ordered-data.c:      filemap_fdatawrite_range(inode->i_mapping, 
start, orig_end);
./fs/btrfs/ordered-data.c:      filemap_fdatawrite_range(inode->i_mapping, 
start, orig_end);
./fs/btrfs/ordered-data.c:      filemap_fdatawait_range(inode->i_mapping, 
start, orig_end);
[...]
./fs/btrfs/file.c:              pages[i] = grab_cache_page(inode->i_mapping, 
index + i);
./fs/btrfs/file.c:      current->backing_dev_info = inode->i_mapping-
>backing_dev_info;
./fs/btrfs/file.c:                              filemap_fdatawrite_range(inode-
>i_mapping, pos,
./fs/btrfs/file.c:                                                      inode-
>i_mapping,
./fs/btrfs/file.c:                      invalidate_mapping_pages(inode-
>i_mapping,
./fs/btrfs/file.c:                      filemap_flush(inode->i_mapping);




So what exactly are buffers used for? Is there any up-to-date and detailed 
documentation or howto or explaination available? Most hits I found on search 
engine are either quite short and vague or relate to really old kernel 
versions.

Is there any detailed explaination available on how - as in which steps - the 
Linux kernel writes certain kinds of data like

- inode / metadata traffic
- dirty pages (ok, via pdlush / flush, as long as one process doesn't overuse 
it)
- I/O from processes by using system functions like write()
- direct i/o

Or do you have any hints on what source files to read in order to understand 
more regarding these questions?

Thanks,
-- 
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

                 reply	other threads:[~2011-04-14 16:24 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201104141417.10748.ms@teamix.de \
    --to=ms@teamix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=maddin@megamaddin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.