Please keep either linux-kernel or my address as cc, as I am only subscribed to linux-kernel, not linux-mm. Hi! In this weeks Linux performance analysis and tuning course that I hold there have been detailed questions about what the Linux kernel uses the memory for that free displays under "buffers". I know as much: - it is for buffers that have to be written to disk at some time (opposed to caches which are for reads) - it is somewhat related to pdflush / flush-major:minor threads, XFS doesn't use these (but uses xfsbufd / xfsyncd) instead - observation is, that it doesn't increase much on a simple dd, but does increase much more on a tar -xf linux-x.y.tar.gz (after a echo 3 > /proc/sys/vm/drop_caches) - the data to be written via dd instead displays with Dirty: and then Writeback and /proc/meminfo Thus I thought buffers were mainly related to metadata stuff. But one course member (on cc) digged into the kernel source and found it with: - fs/block_dev.c: - long nr_blockdev_pages(void) { struct block_device *bdev; long ret = 0; spin_lock(&bdev_lock); list_for_each_entry(bdev, &all_bdevs, bd_list) { ret += bdev->bd_inode->i_mapping->nrpages; } spin_unlock(&bdev_lock); return ret; } - include/fs.h: struct block_device { dev_t bd_dev; /* not a kdev_t - it's a search key */ struct inode * bd_inode; /* will die */ [...] struct inode { /* RCU path lookup touches following: */ [...] struct address_space *i_mapping; - And then this in lots of places: martin@shambhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> find -name "*.c" -or -name "*.h" | xargs grep i_mapping ./include/linux/fs.h: struct address_space *i_mapping; ./include/linux/fs.h: invalidate_mapping_pages(inode->i_mapping, 0, -1); ./include/trace/events/ext4.h: __entry->writeback_index = inode- >i_mapping->writeback_index; ./include/trace/events/ext4.h: __entry->writeback_index = inode- >i_mapping->writeback_index; ./kernel/cgroup.c: inode->i_mapping->backing_dev_info = &cgroup_backing_dev_info; ./arch/powerpc/platforms/cell/spufs/file.c: ctx->local_store = inode->i_mapping; ./arch/powerpc/platforms/cell/spufs/file.c: ctx->cntl = inode- >i_mapping; [...] ./arch/tile/kernel/smp.c:static unsigned long __iomem *ipi_mappings[NR_CPUS]; ./arch/tile/kernel/smp.c: ipi_mappings[cpu] = ioremap_prot(offset, PAGE_SIZE, pte); ./arch/tile/kernel/smp.c: ((unsigned long __force *)ipi_mappings[cpu]) [IRQ_RESCHEDULE] = 0; [...] including various filesystems where it seems to be used related to metadata *and* file I/O as well as "journal" / cow I/O. For example: ./fs/btrfs/inode.c: page = find_get_page(inode->i_mapping, ./fs/btrfs/inode.c: inode->i_mapping, start, ./fs/btrfs/inode.c: inode->i_mapping->a_ops = &btrfs_aops; ./fs/btrfs/inode.c: inode->i_mapping->backing_dev_info = &root- >fs_info->bdi; [...] ./fs/btrfs/ordered-data.c: !mapping_tagged(inode->i_mapping, PAGECACHE_TAG_DIRTY)) { ./fs/btrfs/ordered-data.c: filemap_flush(inode- >i_mapping); ./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode- >i_mapping, start, end); ./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode->i_mapping, start, orig_end); ./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode->i_mapping, start, orig_end); ./fs/btrfs/ordered-data.c: filemap_fdatawait_range(inode->i_mapping, start, orig_end); [...] ./fs/btrfs/file.c: pages[i] = grab_cache_page(inode->i_mapping, index + i); ./fs/btrfs/file.c: current->backing_dev_info = inode->i_mapping- >backing_dev_info; ./fs/btrfs/file.c: filemap_fdatawrite_range(inode- >i_mapping, pos, ./fs/btrfs/file.c: inode- >i_mapping, ./fs/btrfs/file.c: invalidate_mapping_pages(inode- >i_mapping, ./fs/btrfs/file.c: filemap_flush(inode->i_mapping); So what exactly are buffers used for? Is there any up-to-date and detailed documentation or howto or explaination available? Most hits I found on search engine are either quite short and vague or relate to really old kernel versions. Is there any detailed explaination available on how - as in which steps - the Linux kernel writes certain kinds of data like - inode / metadata traffic - dirty pages (ok, via pdlush / flush, as long as one process doesn't overuse it) - I/O from processes by using system functions like write() - direct i/o Or do you have any hints on what source files to read in order to understand more regarding these questions? Thanks, -- Martin Steigerwald - team(ix) GmbH - http://www.teamix.de gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90