From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [RFC, PATCH 0/12] xfs: compound buffers for directory blocks
Date: Wed, 7 Dec 2011 17:18:11 +1100 [thread overview]
Message-ID: <1323238703-13198-1-git-send-email-david@fromorbit.com> (raw)
This series is an infrastructure change needed to allow CRCs to be
easily implemented on directory blocks. Directory blocks can be
larger than filesytem blocks and are mapped like data in a file via
the inode block map btree. Hence a given directory block can be made
up of discontiguous filesystem blocks.
The current way of handling this is via the struct xfs_dabuf - a
separate structure that tracks individual struct xfs_bufs for each
discontiguous region of a directory block. This abstracts the
discontiguity away from all the directory code by hiding it behind
linear memory buffer and memcpy()ing to and from the underlying
xfs_bufs as the dabuf is created and destroyed for each directory
operation that operates in a given directory block. the struct
xfs-bufs are cached, but the dabuf is not, leading to significant
overhead in constructing, destroying and modifying large directory
buffers.
Further, because CRCs requires a single CRC for each directory
block, we need to keep the buffer in an aggregated state until we do
IO on it and can run a CRC calculation callback. With the xfs_dabuf
destroyed long before write IO occurs, there is no way to calculate
the CRC sanely.
To solve this problem we effectively need the functionality of a
xfs_dabuf in a struct xfs_buf. That is, an xfs-buf needs to be able
to map a discontiguous block range and aggregate all the IO needed
to read and write such a discontiguous buffer. Further, the buffer
logging need to support discontiguous ranges as well, and translate
the in-memory new construct into the existing individual discontigous
buffer log format.
To do this, the xfs_buf has a block vector array added to it,
similar in concept to the page array. When IO is issued, it issues
separate Io for each vector in the block array, building the IO
appropriately from the page array. In this way, we avoid the need
for a separate memory buffer for the directory code to work on - it
can work directly on the vmapped buffer address. hence we remove two
memcpy()s from each large directory block modification. Adding a io
count for each vector means that the current method of dispatching,
completing and waiting for IO is unchanged.
Further, by modifying the buffer item formatting to deal with
discontiguous buffers, we remove the need for the xfs_dabuf to
interpose to select the correct xfs_buf to record the changes to.
This means that compound buffers can be used completely
transparently throughout the existing XFS codebase (not just the
directory code) without any modification.
To build compound buffers, we need some method of specifying the
block map. We already have a structure for this - the struct
xfs_bmbt_irec, which is what xfs_bmapi_*() uses and is the native
format for maps in the directory code. hence it makes sense to pass
these into the buffer cache as a method of specifying discontiguous
block ranges.
It makes further sense to use struct xfs_bmbt_irec as the internal
representation of block ranges for all the buffer interfaces, but
this requires one extension. That is, the bmbt format currently only
supports filesystem block sized units (FSB) and metadata requires
sector (disk) addressing (DADDR) units. This is easily handled by
adding a new state value that is held in the xfs_bmbt_irec.br_state
field to indicate what unit the xfs_bmbt_irec map is encoded in.
With this, the irec format can be used throughout the buffer
interfaces to support discontiguous buffers everywhere.
Finally, with al these changes, the struct xfs_dabuf is not
necessary anymore, so can be removed.
The series passes xfstests on 4k/4k, 4k/512b, 64k/4k and 64k/512b
(dirblksz/fsblksz) configurations without any new regressions, and
survives 100 million inode fs_mark benchmarks on a 17TB filesystem
using 4k/4k, 64k/512b and 64k/512b configurations.
Some of the series is a bit verbose - code is rearranged a couple of
times to suite testing step by step (e.g. duplicate code in the
patch that introduces a new interface, factor the duplication back
out in a later patch), so could probably be done neater. However,
I'd prefer not to have to redo the entire series to avoid this
if the end result is substantially identical code - it's time
consuming to make sure each patch doesn't break stuff and I'd like
to try to get this into 3.3 so I can focus on the real goal (CRC
support) ASAP.
Comments, flames and ridicule all welcome. :)
Cheers,
Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next reply other threads:[~2011-12-07 6:18 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-07 6:18 Dave Chinner [this message]
2011-12-07 6:18 ` [PATCH 01/12] xfs: remove remaining scraps of struct xfs_iomap Dave Chinner
2011-12-08 16:02 ` Christoph Hellwig
2011-12-07 6:18 ` [PATCH 02/12] xfs: clean up buffer get/read call API Dave Chinner
2011-12-08 16:07 ` Christoph Hellwig
2011-12-07 6:18 ` [PATCH 03/12] xfs: introduce a compound buffer construct Dave Chinner
2011-12-17 23:11 ` Christoph Hellwig
2011-12-07 6:18 ` [PATCH 04/12] xfs: add compound buffer get and read interfaces Dave Chinner
2011-12-17 23:14 ` Christoph Hellwig
2011-12-07 6:18 ` [PATCH 05/12] xfs: add irec interfaces to xfs_trans_buf_get/read Dave Chinner
2011-12-07 6:18 ` [PATCH 06/12] xfs: convert xfs_da_do_buf to use irec buffer interface Dave Chinner
2011-12-07 6:18 ` [PATCH 07/12] xfs: switch the buffer get/read API to use irec methods Dave Chinner
2011-12-07 6:18 ` [PATCH 08/12] xfs: support multiple irec maps in buffer code Dave Chinner
2011-12-07 6:18 ` [PATCH 09/12] xfs: support compund buffers in buf_item logging Dave Chinner
2011-12-07 6:18 ` [PATCH 10/12] xfs: use multiple irec xfs buf support in dabuf Dave Chinner
2011-12-07 6:18 ` [PATCH 11/12] xfs: remove struct xfs_dabuf and infrastructure Dave Chinner
2011-12-17 23:30 ` Christoph Hellwig
2011-12-07 6:18 ` [PATCH 12/12] xfs: remove duplication in transaction buffer operations Dave Chinner
2011-12-17 23:32 ` Christoph Hellwig
2011-12-07 6:35 ` [RFC, PATCH 0/12] xfs: compound buffers for directory blocks Christoph Hellwig
2011-12-07 9:23 ` Dave Chinner
2011-12-14 18:33 ` Christoph Hellwig
2011-12-18 23:01 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1323238703-13198-1-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox