linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Steigerwald <ms@teamix.de>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@vger.kernel.org, Mega Maddin <maddin@megamaddin.de>
Subject: Understanding buffers / buffer cache
Date: Thu, 14 Apr 2011 14:16:55 +0200	[thread overview]
Message-ID: <201104141417.10748.ms@teamix.de> (raw)

[-- Attachment #1: Type: Text/Plain, Size: 5167 bytes --]

Please keep either linux-kernel or my address as cc, as I am only subscribed 
to linux-kernel, not linux-mm.


Hi!

In this weeks Linux performance analysis and tuning course that I hold there 
have been detailed questions about what the Linux kernel uses the memory for 
that free displays under "buffers".

I know as much:

- it is for buffers that have to be written to disk at some time (opposed to 
caches which are for reads)

- it is somewhat related to pdflush / flush-major:minor threads, XFS doesn't use 
these (but uses xfsbufd / xfsyncd) instead

- observation is, that it doesn't increase much on a simple dd, but does 
increase much more on a tar -xf linux-x.y.tar.gz (after a echo 3 > 
/proc/sys/vm/drop_caches)

- the data to be written via dd instead displays with Dirty: and then 
Writeback and /proc/meminfo


Thus I thought buffers were mainly related to metadata stuff.


But one course member (on cc) digged into the kernel source and found it with:

- fs/block_dev.c:

- long nr_blockdev_pages(void)
{
        struct block_device *bdev;
        long ret = 0;
        spin_lock(&bdev_lock);
        list_for_each_entry(bdev, &all_bdevs, bd_list) {
                ret += bdev->bd_inode->i_mapping->nrpages;
        }
        spin_unlock(&bdev_lock);
        return ret;
}

- include/fs.h:

struct block_device {
        dev_t                   bd_dev;  /* not a kdev_t - it's a search key 
*/
        struct inode *          bd_inode;       /* will die */

[...]

struct inode {
        /* RCU path lookup touches following: */
[...]
        struct address_space    *i_mapping;


- And then this in lots of places:

martin@shambhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> find -name 
"*.c" -or -name "*.h" | xargs grep i_mapping
./include/linux/fs.h:   struct address_space    *i_mapping;
./include/linux/fs.h:           invalidate_mapping_pages(inode->i_mapping, 0, 
-1);
./include/trace/events/ext4.h:          __entry->writeback_index = inode-
>i_mapping->writeback_index;
./include/trace/events/ext4.h:          __entry->writeback_index = inode-
>i_mapping->writeback_index;
./kernel/cgroup.c:              inode->i_mapping->backing_dev_info = 
&cgroup_backing_dev_info;
./arch/powerpc/platforms/cell/spufs/file.c:             ctx->local_store = 
inode->i_mapping;
./arch/powerpc/platforms/cell/spufs/file.c:             ctx->cntl = inode-
>i_mapping;
[...]
./arch/tile/kernel/smp.c:static unsigned long __iomem *ipi_mappings[NR_CPUS];
./arch/tile/kernel/smp.c:               ipi_mappings[cpu] = 
ioremap_prot(offset, PAGE_SIZE, pte);
./arch/tile/kernel/smp.c:       ((unsigned long __force *)ipi_mappings[cpu])
[IRQ_RESCHEDULE] = 0;
[...]

including various filesystems where it seems to be used related to metadata 
*and* file I/O as well as "journal" / cow I/O. For example:

./fs/btrfs/inode.c:             page = find_get_page(inode->i_mapping,
./fs/btrfs/inode.c:                                        inode->i_mapping, 
start,
./fs/btrfs/inode.c:             inode->i_mapping->a_ops = &btrfs_aops;
./fs/btrfs/inode.c:             inode->i_mapping->backing_dev_info = &root-
>fs_info->bdi;
[...]
./fs/btrfs/ordered-data.c:          !mapping_tagged(inode->i_mapping, 
PAGECACHE_TAG_DIRTY)) {
./fs/btrfs/ordered-data.c:                              filemap_flush(inode-
>i_mapping);
./fs/btrfs/ordered-data.c:              filemap_fdatawrite_range(inode-
>i_mapping, start, end);
./fs/btrfs/ordered-data.c:      filemap_fdatawrite_range(inode->i_mapping, 
start, orig_end);
./fs/btrfs/ordered-data.c:      filemap_fdatawrite_range(inode->i_mapping, 
start, orig_end);
./fs/btrfs/ordered-data.c:      filemap_fdatawait_range(inode->i_mapping, 
start, orig_end);
[...]
./fs/btrfs/file.c:              pages[i] = grab_cache_page(inode->i_mapping, 
index + i);
./fs/btrfs/file.c:      current->backing_dev_info = inode->i_mapping-
>backing_dev_info;
./fs/btrfs/file.c:                              filemap_fdatawrite_range(inode-
>i_mapping, pos,
./fs/btrfs/file.c:                                                      inode-
>i_mapping,
./fs/btrfs/file.c:                      invalidate_mapping_pages(inode-
>i_mapping,
./fs/btrfs/file.c:                      filemap_flush(inode->i_mapping);




So what exactly are buffers used for? Is there any up-to-date and detailed 
documentation or howto or explaination available? Most hits I found on search 
engine are either quite short and vague or relate to really old kernel 
versions.

Is there any detailed explaination available on how - as in which steps - the 
Linux kernel writes certain kinds of data like

- inode / metadata traffic
- dirty pages (ok, via pdlush / flush, as long as one process doesn't overuse 
it)
- I/O from processes by using system functions like write()
- direct i/o

Or do you have any hints on what source files to read in order to understand 
more regarding these questions?

Thanks,
-- 
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

                 reply	other threads:[~2011-04-14 16:24 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201104141417.10748.ms@teamix.de \
    --to=ms@teamix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=maddin@megamaddin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).