All of lore.kernel.org
 help / color / mirror / Atom feed
* Understanding buffers / buffer cache
@ 2011-04-14 12:16 Martin Steigerwald
  0 siblings, 0 replies; only message in thread
From: Martin Steigerwald @ 2011-04-14 12:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, Mega Maddin

[-- Attachment #1: Type: Text/Plain, Size: 5167 bytes --]

Please keep either linux-kernel or my address as cc, as I am only subscribed 
to linux-kernel, not linux-mm.


Hi!

In this weeks Linux performance analysis and tuning course that I hold there 
have been detailed questions about what the Linux kernel uses the memory for 
that free displays under "buffers".

I know as much:

- it is for buffers that have to be written to disk at some time (opposed to 
caches which are for reads)

- it is somewhat related to pdflush / flush-major:minor threads, XFS doesn't use 
these (but uses xfsbufd / xfsyncd) instead

- observation is, that it doesn't increase much on a simple dd, but does 
increase much more on a tar -xf linux-x.y.tar.gz (after a echo 3 > 
/proc/sys/vm/drop_caches)

- the data to be written via dd instead displays with Dirty: and then 
Writeback and /proc/meminfo


Thus I thought buffers were mainly related to metadata stuff.


But one course member (on cc) digged into the kernel source and found it with:

- fs/block_dev.c:

- long nr_blockdev_pages(void)
{
        struct block_device *bdev;
        long ret = 0;
        spin_lock(&bdev_lock);
        list_for_each_entry(bdev, &all_bdevs, bd_list) {
                ret += bdev->bd_inode->i_mapping->nrpages;
        }
        spin_unlock(&bdev_lock);
        return ret;
}

- include/fs.h:

struct block_device {
        dev_t                   bd_dev;  /* not a kdev_t - it's a search key 
*/
        struct inode *          bd_inode;       /* will die */

[...]

struct inode {
        /* RCU path lookup touches following: */
[...]
        struct address_space    *i_mapping;


- And then this in lots of places:

martin@shambhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> find -name 
"*.c" -or -name "*.h" | xargs grep i_mapping
./include/linux/fs.h:   struct address_space    *i_mapping;
./include/linux/fs.h:           invalidate_mapping_pages(inode->i_mapping, 0, 
-1);
./include/trace/events/ext4.h:          __entry->writeback_index = inode-
>i_mapping->writeback_index;
./include/trace/events/ext4.h:          __entry->writeback_index = inode-
>i_mapping->writeback_index;
./kernel/cgroup.c:              inode->i_mapping->backing_dev_info = 
&cgroup_backing_dev_info;
./arch/powerpc/platforms/cell/spufs/file.c:             ctx->local_store = 
inode->i_mapping;
./arch/powerpc/platforms/cell/spufs/file.c:             ctx->cntl = inode-
>i_mapping;
[...]
./arch/tile/kernel/smp.c:static unsigned long __iomem *ipi_mappings[NR_CPUS];
./arch/tile/kernel/smp.c:               ipi_mappings[cpu] = 
ioremap_prot(offset, PAGE_SIZE, pte);
./arch/tile/kernel/smp.c:       ((unsigned long __force *)ipi_mappings[cpu])
[IRQ_RESCHEDULE] = 0;
[...]

including various filesystems where it seems to be used related to metadata 
*and* file I/O as well as "journal" / cow I/O. For example:

./fs/btrfs/inode.c:             page = find_get_page(inode->i_mapping,
./fs/btrfs/inode.c:                                        inode->i_mapping, 
start,
./fs/btrfs/inode.c:             inode->i_mapping->a_ops = &btrfs_aops;
./fs/btrfs/inode.c:             inode->i_mapping->backing_dev_info = &root-
>fs_info->bdi;
[...]
./fs/btrfs/ordered-data.c:          !mapping_tagged(inode->i_mapping, 
PAGECACHE_TAG_DIRTY)) {
./fs/btrfs/ordered-data.c:                              filemap_flush(inode-
>i_mapping);
./fs/btrfs/ordered-data.c:              filemap_fdatawrite_range(inode-
>i_mapping, start, end);
./fs/btrfs/ordered-data.c:      filemap_fdatawrite_range(inode->i_mapping, 
start, orig_end);
./fs/btrfs/ordered-data.c:      filemap_fdatawrite_range(inode->i_mapping, 
start, orig_end);
./fs/btrfs/ordered-data.c:      filemap_fdatawait_range(inode->i_mapping, 
start, orig_end);
[...]
./fs/btrfs/file.c:              pages[i] = grab_cache_page(inode->i_mapping, 
index + i);
./fs/btrfs/file.c:      current->backing_dev_info = inode->i_mapping-
>backing_dev_info;
./fs/btrfs/file.c:                              filemap_fdatawrite_range(inode-
>i_mapping, pos,
./fs/btrfs/file.c:                                                      inode-
>i_mapping,
./fs/btrfs/file.c:                      invalidate_mapping_pages(inode-
>i_mapping,
./fs/btrfs/file.c:                      filemap_flush(inode->i_mapping);




So what exactly are buffers used for? Is there any up-to-date and detailed 
documentation or howto or explaination available? Most hits I found on search 
engine are either quite short and vague or relate to really old kernel 
versions.

Is there any detailed explaination available on how - as in which steps - the 
Linux kernel writes certain kinds of data like

- inode / metadata traffic
- dirty pages (ok, via pdlush / flush, as long as one process doesn't overuse 
it)
- I/O from processes by using system functions like write()
- direct i/o

Or do you have any hints on what source files to read in order to understand 
more regarding these questions?

Thanks,
-- 
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2011-04-14 16:24 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-14 12:16 Understanding buffers / buffer cache Martin Steigerwald

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.