* [PATCH 01/17] xfs: update mount options documentation
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 02/17] xfs: add pluging for bulkstat readahead Dave Chinner
` (15 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Because it's horribly out of date.
And mark various deprecated options as deprecated and give them a
removal date.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
Documentation/filesystems/xfs.txt | 282 +++++++++++++++++++++++++------------
1 file changed, 192 insertions(+), 90 deletions(-)
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 83577f0..28afbd1 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -25,6 +25,13 @@ When mounting an XFS filesystem, the following options are accepted.
Valid values for this option are page size (typically 4KiB)
through to 1GiB, inclusive, in power-of-2 increments.
+ The default behaviour is for dynamic end-of-file
+ preallocation size, which uses a set of heuristics to
+ optimise the preallocation size based on the current
+ allocation patterns within the file and the access patterns
+ to the file. Specifying a fixed allocsize value turns off
+ the dynamic behaviour.
+
attr2/noattr2
The options enable/disable (default is disabled for backward
compatibility on-disk) an "opportunistic" improvement to be
@@ -36,86 +43,108 @@ When mounting an XFS filesystem, the following options are accepted.
CRC enabled filesystems always use the attr2 format, and so
will reject the noattr2 mount option if it is set.
- barrier
- Enables the use of block layer write barriers for writes into
- the journal and unwritten extent conversion. This allows for
- drive level write caching to be enabled, for devices that
- support write barriers.
+ barrier/nobarrier
+ Enables/disables the use of block layer write barriers for
+ writes into the journal and for data integrity operations.
+ This allows for drive level write caching to be enabled, for
+ devices that support write barriers.
- discard
- Issue command to let the block device reclaim space freed by the
- filesystem. This is useful for SSD devices, thinly provisioned
- LUNs and virtual machine images, but may have a performance
- impact.
+ The default behaviour is to enable barriers.
- dmapi
- Enable the DMAPI (Data Management API) event callouts.
- Use with the "mtpt" option.
+ discard/nodiscard
- grpid/bsdgroups and nogrpid/sysvgroups
- These options define what group ID a newly created file gets.
- When grpid is set, it takes the group ID of the directory in
- which it is created; otherwise (the default) it takes the fsgid
- of the current process, unless the directory has the setgid bit
- set, in which case it takes the gid from the parent directory,
- and also gets the setgid bit set if it is a directory itself.
+ Enable/disable the issuing of commands to let the block
+ device reclaim space freed by the filesystem. This is
+ useful for SSD devices, thinly provisioned LUNs and virtual
+ machine images, but may have a performance impact.
- ihashsize=value
- In memory inode hashes have been removed, so this option has
- no function as of August 2007. Option is deprecated.
+ The default behaviour is disable discard commands.
+
+ Note: It is currently recommended that you use the fstrim
+ application to discard unused blocks rather than the discard
+ mount option because the performance impact of this option
+ is quite severe.
+
+ grpid/bsdgroups and nogrpid/sysvgroups
+ These options define what group ID a newly created file
+ gets. When grpid is set, it takes the group ID of the
+ directory in which it is created; otherwise (the default) it
+ takes the fsgid of the current process, unless the directory
+ has the setgid bit set, in which case it takes the gid from
+ the parent directory, and also gets the setgid bit set if it
+ is a directory itself.
+
+ filestreams
+ Make the data allocator use the filestreams allocation mode
+ across the entire filesystem rather than just on directories
+ configured to use it.
ikeep/noikeep
- When ikeep is specified, XFS does not delete empty inode clusters
- and keeps them around on disk. ikeep is the traditional XFS
- behaviour. When noikeep is specified, empty inode clusters
- are returned to the free space pool. The default is noikeep for
- non-DMAPI mounts, while ikeep is the default when DMAPI is in use.
+ When ikeep is specified, XFS does not delete empty inode
+ clusters and keeps them around on disk. When noikeep is
+ specified, empty inode clusters are returned to the free
+ space pool.
+
+ The default behaviour is delete inode clusters (noikeep).
inode64
- Indicates that XFS is allowed to create inodes at any location
- in the filesystem, including those which will result in inode
- numbers occupying more than 32 bits of significance. This is
- the default allocation option. Applications which do not handle
- inode numbers bigger than 32 bits, should use inode32 option.
+ When inode64 is specified, it indicates that XFS is allowed
+ to create inodes at any location in the filesystem,
+ including those which will result in inode numbers occupying
+ more than 32 bits of significance. Applications which do
+ not handle inode numbers bigger than 32 bits should use
+ inode32 option.
+
+ This is the default allocation behaviour, even on 32 bit
+ machines when neither inode64 or inode32 is specified.
inode32
- Indicates that XFS is limited to create inodes at locations which
- will not result in inode numbers with more than 32 bits of
- significance. This is provided for backwards compatibility, since
- 64 bits inode numbers might cause problems for some applications
- that cannot handle large inode numbers.
+ When inode32 is specified, it indicates that XFS limits
+ inode creation to locations which will not result in inode
+ numbers with more than 32 bits of significance. This is
+ provided for backwards compatibility with older systems and
+ applications, since 64 bits inode numbers might cause
+ problems for some applications that cannot handle large
+ inode numbers.
largeio/nolargeio
If "nolargeio" is specified, the optimal I/O reported in
- st_blksize by stat(2) will be as small as possible to allow user
- applications to avoid inefficient read/modify/write I/O.
- If "largeio" specified, a filesystem that has a "swidth" specified
- will return the "swidth" value (in bytes) in st_blksize. If the
- filesystem does not have a "swidth" specified but does specify
- an "allocsize" then "allocsize" (in bytes) will be returned
- instead.
+ st_blksize by stat(2) will be as small as possible to allow
+ user applications to avoid inefficient read/modify/write
+ I/O. This is typically the page size of the machine, as
+ this is the granularity of the page cache.
+
+ If "largeio" specified, a filesystem that was created with a
+ "swidth" specified will return the "swidth" value (in bytes)
+ in st_blksize. If the filesystem does not have a "swidth"
+ specified but does specify an "allocsize" then "allocsize"
+ (in bytes) will be returned instead. Otherwise the behaviour
+ is the same as if "nolargeio" was specified.
+
If neither of these two options are specified, then filesystem
will behave as if "nolargeio" was specified.
logbufs=value
- Set the number of in-memory log buffers. Valid numbers range
- from 2-8 inclusive.
- The default value is 8 buffers for filesystems with a
- blocksize of 64KiB, 4 buffers for filesystems with a blocksize
- of 32KiB, 3 buffers for filesystems with a blocksize of 16KiB
- and 2 buffers for all other configurations. Increasing the
- number of buffers may increase performance on some workloads
- at the cost of the memory used for the additional log buffers
- and their associated control structures.
+ Set the number of in-memory log buffers. Valid numbers
+ range from 2-8 inclusive.
+
+ The default value is 8 buffers.
+
+ If the memory cost of 8 log buffers is too high on small
+ systems, then it may be reduced at some cost to performance
+ on metadata intensive workloads.
logbsize=value
- Set the size of each in-memory log buffer.
- Size may be specified in bytes, or in kilobytes with a "k" suffix.
- Valid sizes for version 1 and version 2 logs are 16384 (16k) and
- 32768 (32k). Valid sizes for version 2 logs also include
- 65536 (64k), 131072 (128k) and 262144 (256k).
- The default value for machines with more than 32MiB of memory
- is 32768, machines with less memory use 16384 by default.
+ Set the size of each in-memory log buffer. The size may be
+ specified in bytes, or in kilobytes with a "k" suffix.
+ Valid sizes for version 1 and version 2 logs are 16384 (16k)
+ and 32768 (32k). Valid sizes for version 2 logs also
+ include 65536 (64k), 131072 (128k) and 262144 (256k). The
+ version 2 log size must be an integer multiple of the log
+ stripe unit configured at mkfs time.
+
+ The default value for for version 1 logs is 32768, while the
+ default value for version 2 logs is MAX(32768, log_sunit).
logdev=device and rtdev=device
Use an external log (metadata journal) and/or real-time device.
@@ -124,16 +153,12 @@ When mounting an XFS filesystem, the following options are accepted.
optional, and the log section can be separate from the data
section or contained within it.
- mtpt=mountpoint
- Use with the "dmapi" option. The value specified here will be
- included in the DMAPI mount event, and should be the path of
- the actual mountpoint that is used.
-
noalign
- Data allocations will not be aligned at stripe unit boundaries.
- noatime
- Access timestamps are not updated when a file is read.
+ Data allocations will not be aligned at stripe unit
+ boundaries. This is only relevant to filesystems created
+ with non-zero data alignment parameters (sunit, swidth) by
+ mkfs.
norecovery
The filesystem will be mounted without running log recovery.
@@ -144,8 +169,14 @@ When mounting an XFS filesystem, the following options are accepted.
the mount will fail.
nouuid
- Don't check for double mounted file systems using the file system uuid.
- This is useful to mount LVM snapshot volumes.
+ Don't check for double mounted file systems using the file
+ system uuid. This is useful to mount LVM snapshot volumes,
+ and often used in combination with "norecovery" for mounting
+ read-only snapshots.
+
+ noquota
+ Forcibly turns off all quota accounting and enforcement
+ within the filesystem.
uquota/usrquota/uqnoenforce/quota
User disk quota accounting enabled, and limits (optionally)
@@ -160,24 +191,68 @@ When mounting an XFS filesystem, the following options are accepted.
enforced. Refer to xfs_quota(8) for further details.
sunit=value and swidth=value
- Used to specify the stripe unit and width for a RAID device or
- a stripe volume. "value" must be specified in 512-byte block
- units.
- If this option is not specified and the filesystem was made on
- a stripe volume or the stripe width or unit were specified for
- the RAID device at mkfs time, then the mount system call will
- restore the value from the superblock. For filesystems that
- are made directly on RAID devices, these options can be used
- to override the information in the superblock if the underlying
- disk layout changes after the filesystem has been created.
- The "swidth" option is required if the "sunit" option has been
- specified, and must be a multiple of the "sunit" value.
+ Used to specify the stripe unit and width for a RAID device
+ or a stripe volume. "value" must be specified in 512-byte
+ block units. These options are only relevant to filesystems
+ that were created with non-zero data alignment parameters.
+
+ The sunit and swidth parameters specified must be compatible
+ with the existing filesystem alignment characteristics. If
+ the filesystem was not created with data alignment
+ constraints, then it may be impossible to set a valid sunit
+ (and hence swidth) value. In general, that means the only
+ valid changes to sunit are increasing it by a power-of-2
+ multiple. Valid swidth values are any integer multiple of a
+ valid sunit value.
+
+ For filesystems that have existing data alignment values on
+ disk (i.e. specified by mkfs), any new valid values passed
+ in as mount options will overwrite the values stored on
+ disk. Hence this mount option does not need to be specified
+ for every mount operation in this case.
swalloc
Data allocations will be rounded up to stripe width boundaries
when the current end of file is being extended and the file
size is larger than the stripe width size.
+ wsync
+ When specified, all filesystem namespace operations are
+ executed synchronously. This ensures that when the namespace
+ operation (create, unlink, etc) completes, the change to the
+ namespace is on stable storage. This is useful in HA setups
+ where failover must not result in clients seeing
+ inconsistent namespace presentation during or after a
+ failover event.
+
+
+Deprecated Mount Options
+========================
+
+ delaylog/nodelaylog
+ Delayed logging is the only logging method that XFS supports
+ now, so these mount options are now ignored.
+
+ Due for removal in 3.12.
+
+ ihashsize=value
+ In memory inode hashes have been removed, so this option has
+ no function as of August 2007. Option is deprecated.
+
+ Due for removal in 3.12.
+
+ irixsgid
+ This behaviour is now controlled by a sysctl, so the mount
+ option is ignored.
+
+ Due for removal in 3.12.
+
+ osyncisdsync
+ osyncisosync
+ O_SYNC and O_DSYNC are fully supported, so there is no need
+ for these options any more.
+
+ Due for removal in 3.12.
sysctls
=======
@@ -189,15 +264,20 @@ The following sysctls are available for the XFS filesystem:
in /proc/fs/xfs/stat. It then immediately resets to "0".
fs.xfs.xfssyncd_centisecs (Min: 100 Default: 3000 Max: 720000)
- The interval at which the xfssyncd thread flushes metadata
- out to disk. This thread will flush log activity out, and
- do some processing on unlinked inodes.
+ The interval at which the filesystem flushes metadata
+ out to disk and runs internal cache cleanup routines.
- fs.xfs.xfsbufd_centisecs (Min: 50 Default: 100 Max: 3000)
- The interval at which xfsbufd scans the dirty metadata buffers list.
+ fs.xfs.filestream_centisecs (Min: 1 Default: 3000 Max: 360000)
+ The interval at which the filesystem ages filestreams cache
+ references and returns timed-out AGs back to the free stream
+ pool.
- fs.xfs.age_buffer_centisecs (Min: 100 Default: 1500 Max: 720000)
- The age at which xfsbufd flushes dirty metadata buffers to disk.
+ fs.xfs.speculative_prealloc_lifetime
+ (Units: seconds Min: 1 Default: 300 Max: 86400)
+ The interval at which the background scanning for inodes
+ with unused speculative preallocation runs at. The scan
+ removes unused preallocation from clean inodes and releases
+ the unused space back to the free pool.
fs.xfs.error_level (Min: 0 Default: 3 Max: 11)
A volume knob for error reporting when internal errors occur.
@@ -254,9 +334,31 @@ The following sysctls are available for the XFS filesystem:
by the xfs_io(8) chattr command on a directory to be
inherited by files in that directory.
+ fs.xfs.inherit_nodefrag (Min: 0 Default: 1 Max: 1)
+ Setting this to "1" will cause the "nodefrag" flag set
+ by the xfs_io(8) chattr command on a directory to be
+ inherited by files in that directory.
+
fs.xfs.rotorstep (Min: 1 Default: 1 Max: 256)
In "inode32" allocation mode, this option determines how many
files the allocator attempts to allocate in the same allocation
group before moving to the next allocation group. The intent
is to control the rate at which the allocator moves between
allocation groups when allocating extents for new files.
+
+Deprecated Sysctls
+==================
+
+ fs.xfs.xfsbufd_centisecs (Min: 50 Default: 100 Max: 3000)
+ Dirty metadata is now tracked by the log subsystem and
+ flushing is driven by log space and idling demands. The
+ xfsbufd no longer exists, so this syctl does nothing.
+
+ Due for removal in 3.14.
+
+ fs.xfs.age_buffer_centisecs (Min: 100 Default: 1500 Max: 720000)
+ Dirty metadata is now tracked by the log subsystem and
+ flushing is driven by log space and idling demands. The
+ xfsbufd no longer exists, so this syctl does nothing.
+
+ Due for removal in 3.14.
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 02/17] xfs: add pluging for bulkstat readahead
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
2013-06-07 5:45 ` [PATCH 01/17] xfs: update mount options documentation Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 03/17] xfs: plug directory buffer readahead Dave Chinner
` (14 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
I was running some tests on bulkstat on CRC enabled filesystems when
I noticed that all the IO being issued was 8k in size, regardless of
the fact taht we are issuing sequential 8k buffers for inodes
clusters. The IO size shoul dbe 16k for 256 byte inodes, and 32k for
512 byte inodes, but this wasn't happening.
blktrace showed that there was an explict plug and unplug happening
around each readahead IO from _xfs_buf_ioapply, and the unplug was
causing the IO to be issued immediately. Hence no opportunity was
being given to the elevator to merge adjacent readahead requests and
dispatch them as a single IO.
Add plugging around the inode chunk readahead dispatch loop tin
bulkstat to ensure that we don't unplug the queue between adjacent
inode buffer readahead IOs and so we get fewer, larger IO requests
hitting the storage subsystemi for bulkstat.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_itable.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 2ea7d40..06d004d 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -383,11 +383,13 @@ xfs_bulkstat(
* Also start read-ahead now for this chunk.
*/
if (r.ir_freecount < XFS_INODES_PER_CHUNK) {
+ struct blk_plug plug;
/*
* Loop over all clusters in the next chunk.
* Do a readahead if there are any allocated
* inodes in that cluster.
*/
+ blk_start_plug(&plug);
agbno = XFS_AGINO_TO_AGBNO(mp, r.ir_startino);
for (chunkidx = 0;
chunkidx < XFS_INODES_PER_CHUNK;
@@ -399,6 +401,7 @@ xfs_bulkstat(
agbno, nbcluster,
&xfs_inode_buf_ops);
}
+ blk_finish_plug(&plug);
irbp->ir_startino = r.ir_startino;
irbp->ir_freecount = r.ir_freecount;
irbp->ir_free = r.ir_free;
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 03/17] xfs: plug directory buffer readahead
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
2013-06-07 5:45 ` [PATCH 01/17] xfs: update mount options documentation Dave Chinner
2013-06-07 5:45 ` [PATCH 02/17] xfs: add pluging for bulkstat readahead Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 04/17] xfs: don't use speculative prealloc for small files Dave Chinner
` (13 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Similar to bulkstat inode chunk readahead, we need to plug directory
data buffer readahead during getdents to ensure that we can merge
adjacent readahead requests and sort out of order requests optimally
before they are dispatched. This improves the readahead efficiency
and reduces the IO load it generates as the IO patterns are
significantly better for both contiguous and fragmented directories.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_dir2_leaf.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index da71a18..bb22aac 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -1108,6 +1108,7 @@ xfs_dir2_leaf_readbuf(
struct xfs_mount *mp = dp->i_mount;
struct xfs_buf *bp = *bpp;
struct xfs_bmbt_irec *map = mip->map;
+ struct blk_plug plug;
int error = 0;
int length;
int i;
@@ -1236,6 +1237,7 @@ xfs_dir2_leaf_readbuf(
/*
* Do we need more readahead?
*/
+ blk_start_plug(&plug);
for (mip->ra_index = mip->ra_offset = i = 0;
mip->ra_want > mip->ra_current && i < mip->map_blocks;
i += mp->m_dirblkfsbs) {
@@ -1287,6 +1289,7 @@ xfs_dir2_leaf_readbuf(
}
}
}
+ blk_finish_plug(&plug);
out:
*bpp = bp;
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 04/17] xfs: don't use speculative prealloc for small files
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (2 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 03/17] xfs: plug directory buffer readahead Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 05/17] xfs: don't do IO when creating an new inode Dave Chinner
` (12 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Dedicated small file workloads have been seeing significant free
space fragmentation causing premature inode allocation failure
when large inode sizes are in use. A particular test case showed
that a workload that runs to a real ENOSPC on 256 byte inodes would
fail inode allocation with ENOSPC about about 80% full with 512 byte
inodes, and at about 50% full with 1024 byte inodes.
The same workload, when run with -o allocsize=4096 on 1024 byte
inodes would run to being 100% full before giving ENOSPC. That is,
no freespace fragmentation at all.
The issue was caused by the specific IO pattern the application had
- the framework it was using did not support direct IO, and so it
was emulating it by using fadvise(DONT_NEED). The result was that
the data was getting written back before the speculative prealloc
had been trimmed from memory by the close(), and so small single
block files were being allocated with 2 blocks, and then having one
truncated away. The result was lots of small 4k free space extents,
and hence each new 8k allocation would take another 8k from
contiguous free space and turn it into 4k of allocated space and 4k
of free space.
Hence inode allocation, which requires contiguous, aligned
allocation of 16k (256 byte inodes), 32k (512 byte inodes) or 64k
(1024 byte inodes) can fail to find sufficiently large freespace and
hence fail while there is still lots of free space available.
There's a simple fix for this, and one that has precendence in the
allocator code already - don't do speculative allocation unless the
size of the file is larger than a certain size. In this case, that
size is the minimum default preallocation size:
mp->m_writeio_blocks. And to keep with the concept of being nice to
people when the files are still relatively small, cap the prealloc
to mp->m_writeio_blocks until the file goes over a stripe unit is
size, at which point we'll fall back to the current behaviour based
on the last extent size.
This will effectively turn off speculative prealloc for very small
files, keep preallocation low for small files, and behave as it
currently does for any file larger than a stripe unit. This
completely avoids the freespace fragmentation problem this
particular IO pattern was causing.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_iomap.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 8f8aaee..14be676 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -284,6 +284,15 @@ xfs_iomap_eof_want_preallocate(
return 0;
/*
+ * If the file is smaller than the minimum prealloc and we are using
+ * dynamic preallocation, don't do any preallocation at all as it is
+ * likely this is the only write to the file that is going to be done.
+ */
+ if (!(mp->m_flags & XFS_MOUNT_DFLT_IOSIZE) &&
+ XFS_ISIZE(ip) < mp->m_writeio_blocks)
+ return 0;
+
+ /*
* If there are any real blocks past eof, then don't
* do any speculative allocation.
*/
@@ -345,6 +354,10 @@ xfs_iomap_eof_prealloc_initial_size(
if (mp->m_flags & XFS_MOUNT_DFLT_IOSIZE)
return 0;
+ /* If the file is small, then use the minimum prealloc */
+ if (XFS_ISIZE(ip) < mp->m_dalign)
+ return 0;
+
/*
* As we write multiple pages, the offset will always align to the
* start of a page and hence point to a hole at EOF. i.e. if the size is
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 05/17] xfs: don't do IO when creating an new inode
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (3 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 04/17] xfs: don't use speculative prealloc for small files Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 06/17] xfs: xfs_ifree doesn't need to modify the inode buffer Dave Chinner
` (11 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
When we are allocating a new inode, we read the inode cluster off
disk to increment the generation number. We are already using a
random generation number for newly allocated inodes, so if we are not
using the ikeep mode, we can just generate a new generation number
when we initialise the newly allocated inode.
This avoids the need for reading the inode buffer during inode
creation. This will speed up allocation of inodes in cold, partially
allocated clusters as they will no longer need to be read from disk
during allocation. It will also reduce the CPU overhead of inode
allocation by not having the process the buffer read, even on cache
hits.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 36 ++++++++++++++++++++++++++++--------
1 file changed, 28 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 7f7be5f..d1f76da 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1028,6 +1028,11 @@ xfs_dinode_calc_crc(
/*
* Read the disk inode attributes into the in-core inode structure.
+ *
+ * If we are initialising a new inode and we are not utilising the
+ * XFS_MOUNT_IKEEP inode cluster mode, we can simple build the new inode core
+ * with a random generation number. If we are keeping inodes around, we need to
+ * read the inode cluster to get the existing generation number off disk.
*/
int
xfs_iread(
@@ -1047,6 +1052,22 @@ xfs_iread(
if (error)
return error;
+ /* shortcut IO on inode allocation if possible */
+ if ((iget_flags & XFS_IGET_CREATE) &&
+ !(mp->m_flags & XFS_MOUNT_IKEEP)) {
+ /* initialise the on-disk inode core */
+ memset(&ip->i_d, 0, sizeof(ip->i_d));
+ ip->i_d.di_magic = XFS_DINODE_MAGIC;
+ ip->i_d.di_gen = prandom_u32();
+ if (xfs_sb_version_hascrc(&mp->m_sb)) {
+ ip->i_d.di_version = 3;
+ ip->i_d.di_ino = ip->i_ino;
+ uuid_copy(&ip->i_d.di_uuid, &mp->m_sb.sb_uuid);
+ } else
+ ip->i_d.di_version = 2;
+ return 0;
+ }
+
/*
* Get pointers to the on-disk inode and the buffer containing it.
*/
@@ -1133,17 +1154,16 @@ xfs_iread(
xfs_buf_set_ref(bp, XFS_INO_REF);
/*
- * Use xfs_trans_brelse() to release the buffer containing the
- * on-disk inode, because it was acquired with xfs_trans_read_buf()
- * in xfs_imap_to_bp() above. If tp is NULL, this is just a normal
+ * Use xfs_trans_brelse() to release the buffer containing the on-disk
+ * inode, because it was acquired with xfs_trans_read_buf() in
+ * xfs_imap_to_bp() above. If tp is NULL, this is just a normal
* brelse(). If we're within a transaction, then xfs_trans_brelse()
* will only release the buffer if it is not dirty within the
* transaction. It will be OK to release the buffer in this case,
- * because inodes on disk are never destroyed and we will be
- * locking the new in-core inode before putting it in the hash
- * table where other processes can find it. Thus we don't have
- * to worry about the inode being changed just because we released
- * the buffer.
+ * because inodes on disk are never destroyed and we will be locking the
+ * new in-core inode before putting it in the cache where other
+ * processes can find it. Thus we don't have to worry about the inode
+ * being changed just because we released the buffer.
*/
out_brelse:
xfs_trans_brelse(tp, bp);
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 06/17] xfs: xfs_ifree doesn't need to modify the inode buffer
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (4 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 05/17] xfs: don't do IO when creating an new inode Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 07/17] xfs: Introduce ordered log vector support Dave Chinner
` (10 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Long ago, bulkstat used to read inodes directly fromteh backing
buffer for speed. This had the unfortunate problem of being cache
incoherent with unlinks, and so xfs_ifree() had to mark the inode
as free directly in the backing buffer. bulkstat was changed some
time ago to use inode cache coherent lookups, and so will never see
unlinked inodes in it's lookups. Hence xfs_ifree() does not need to
touch the inode backing buffer anymore.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 32 ++++----------------------------
1 file changed, 4 insertions(+), 28 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index d1f76da..9ecfe1e 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2048,8 +2048,6 @@ xfs_ifree(
int error;
int delete;
xfs_ino_t first_ino;
- xfs_dinode_t *dip;
- xfs_buf_t *ibp;
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
ASSERT(ip->i_d.di_nlink == 0);
@@ -2062,14 +2060,13 @@ xfs_ifree(
* Pull the on-disk inode from the AGI unlinked list.
*/
error = xfs_iunlink_remove(tp, ip);
- if (error != 0) {
+ if (error)
return error;
- }
error = xfs_difree(tp, ip->i_ino, flist, &delete, &first_ino);
- if (error != 0) {
+ if (error)
return error;
- }
+
ip->i_d.di_mode = 0; /* mark incore inode as free */
ip->i_d.di_flags = 0;
ip->i_d.di_dmevmask = 0;
@@ -2081,31 +2078,10 @@ xfs_ifree(
* by reincarnations of this inode.
*/
ip->i_d.di_gen++;
-
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
- error = xfs_imap_to_bp(ip->i_mount, tp, &ip->i_imap, &dip, &ibp,
- 0, 0);
- if (error)
- return error;
-
- /*
- * Clear the on-disk di_mode. This is to prevent xfs_bulkstat
- * from picking up this inode when it is reclaimed (its incore state
- * initialzed but not flushed to disk yet). The in-core di_mode is
- * already cleared and a corresponding transaction logged.
- * The hack here just synchronizes the in-core to on-disk
- * di_mode value in advance before the actual inode sync to disk.
- * This is OK because the inode is already unlinked and would never
- * change its di_mode again for this inode generation.
- * This is a temporary hack that would require a proper fix
- * in the future.
- */
- dip->di_mode = 0;
-
- if (delete) {
+ if (delete)
error = xfs_ifree_cluster(ip, tp, first_ino);
- }
return error;
}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 07/17] xfs: Introduce ordered log vector support
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (5 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 06/17] xfs: xfs_ifree doesn't need to modify the inode buffer Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 08/17] xfs: Introduce an ordered buffer item Dave Chinner
` (9 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
And "ordered log vector" is a log vector that is used for
tracking a log item through the CIL and into the AIL as part of the
log checkpointing. These ordered log vectors are special in that
they are not written to to journal in any way, and are not accounted
to the checkpoint being written.
The reason for this behaviour is to allow operations to attach items
to transactions and have them follow the normal transactional
lifecycle without actually having to write them to the journal. This
allows logging of items that track high level logical changes and
writing them to the log, while the physical items being modified
pass through into the AIL and pin the tail of the log (and therefore
the logical item in the log) until all the modified items are
physically written to disk.
IOWs, it allows us to write metadata without physically logging
every individual change but still maintain the full transactional
integrity guarantees we currently have w.r.t. crash recovery.
This change modifies some of the CIL item insertion loops, as
ordered log vectors introduce some new constraints as they don't
track any data. One advantage of this change is that it combines
two log vector chain walks into a single pass, so there is less
overhead in the transaction commit pass as well. It also kills some
unused code in the log vector walk loop when committing the CIL.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 21 +++++++++++---
fs/xfs/xfs_log.h | 2 ++
fs/xfs/xfs_log_cil.c | 75 ++++++++++++++++++++++++++++++++++----------------
3 files changed, 70 insertions(+), 28 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index b345a7c..db08d34 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1963,6 +1963,10 @@ xlog_write_calc_vec_length(
headers++;
for (lv = log_vector; lv; lv = lv->lv_next) {
+ /* we don't write ordered log vectors */
+ if (lv->lv_buf_len == XFS_LOG_VEC_ORDERED)
+ continue;
+
headers += lv->lv_niovecs;
for (i = 0; i < lv->lv_niovecs; i++) {
@@ -2216,7 +2220,7 @@ xlog_write(
index = 0;
lv = log_vector;
vecp = lv->lv_iovecp;
- while (lv && index < lv->lv_niovecs) {
+ while (lv && (!lv->lv_niovecs || index < lv->lv_niovecs)) {
void *ptr;
int log_offset;
@@ -2236,13 +2240,21 @@ xlog_write(
* This loop writes out as many regions as can fit in the amount
* of space which was allocated by xlog_state_get_iclog_space().
*/
- while (lv && index < lv->lv_niovecs) {
- struct xfs_log_iovec *reg = &vecp[index];
+ while (lv && (!lv->lv_niovecs || index < lv->lv_niovecs)) {
+ struct xfs_log_iovec *reg;
struct xlog_op_header *ophdr;
int start_rec_copy;
int copy_len;
int copy_off;
+ bool ordered = false;
+
+ /* ordered log vectors have no regions to write */
+ if (lv->lv_buf_len == XFS_LOG_VEC_ORDERED) {
+ ordered = true;
+ goto next_lv;
+ }
+ reg = &vecp[index];
ASSERT(reg->i_len % sizeof(__int32_t) == 0);
ASSERT((unsigned long)ptr % sizeof(__int32_t) == 0);
@@ -2302,12 +2314,13 @@ xlog_write(
break;
if (++index == lv->lv_niovecs) {
+next_lv:
lv = lv->lv_next;
index = 0;
if (lv)
vecp = lv->lv_iovecp;
}
- if (record_cnt == 0) {
+ if (record_cnt == 0 && ordered == false) {
if (!lv)
return 0;
break;
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 5caee96..b20918c 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -105,6 +105,8 @@ struct xfs_log_vec {
int lv_buf_len; /* size of formatted buffer */
};
+#define XFS_LOG_VEC_ORDERED (-1)
+
/*
* Structure used to pass callback function and the function's argument
* to the log manager.
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index d0833b5..02b9cf3 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -127,6 +127,7 @@ xlog_cil_prepare_log_vecs(
int index;
int len = 0;
uint niovecs;
+ bool ordered = false;
/* Skip items which aren't dirty in this transaction. */
if (!(lidp->lid_flags & XFS_LID_DIRTY))
@@ -137,14 +138,30 @@ xlog_cil_prepare_log_vecs(
if (!niovecs)
continue;
+ /*
+ * Ordered items need to be tracked but we do not wish to write
+ * them. We need a logvec to track the object, but we do not
+ * need an iovec or buffer to be allocated for copying data.
+ */
+ if (niovecs == XFS_LOG_VEC_ORDERED) {
+ ordered = true;
+ niovecs = 0;
+ }
+
new_lv = kmem_zalloc(sizeof(*new_lv) +
niovecs * sizeof(struct xfs_log_iovec),
KM_SLEEP|KM_NOFS);
+ new_lv->lv_item = lidp->lid_item;
+ new_lv->lv_niovecs = niovecs;
+ if (ordered) {
+ /* track as an ordered logvec */
+ new_lv->lv_buf_len = XFS_LOG_VEC_ORDERED;
+ goto next;
+ }
+
/* The allocated iovec region lies beyond the log vector. */
new_lv->lv_iovecp = (struct xfs_log_iovec *)&new_lv[1];
- new_lv->lv_niovecs = niovecs;
- new_lv->lv_item = lidp->lid_item;
/* build the vector array and calculate it's length */
IOP_FORMAT(new_lv->lv_item, new_lv->lv_iovecp);
@@ -165,6 +182,7 @@ xlog_cil_prepare_log_vecs(
}
ASSERT(ptr == new_lv->lv_buf + new_lv->lv_buf_len);
+next:
if (!ret_lv)
ret_lv = new_lv;
else
@@ -191,8 +209,18 @@ xfs_cil_prepare_item(
if (old) {
/* existing lv on log item, space used is a delta */
- ASSERT(!list_empty(&lv->lv_item->li_cil));
- ASSERT(old->lv_buf && old->lv_buf_len && old->lv_niovecs);
+ ASSERT((old->lv_buf && old->lv_buf_len && old->lv_niovecs) ||
+ old->lv_buf_len == XFS_LOG_VEC_ORDERED);
+
+ /*
+ * If the new item is ordered, keep the old one that is already
+ * tracking dirty or ordered regions
+ */
+ if (lv->lv_buf_len == XFS_LOG_VEC_ORDERED) {
+ ASSERT(!lv->lv_buf);
+ kmem_free(lv);
+ return;
+ }
*len += lv->lv_buf_len - old->lv_buf_len;
*diff_iovecs += lv->lv_niovecs - old->lv_niovecs;
@@ -201,10 +229,11 @@ xfs_cil_prepare_item(
} else {
/* new lv, must pin the log item */
ASSERT(!lv->lv_item->li_lv);
- ASSERT(list_empty(&lv->lv_item->li_cil));
- *len += lv->lv_buf_len;
- *diff_iovecs += lv->lv_niovecs;
+ if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED) {
+ *len += lv->lv_buf_len;
+ *diff_iovecs += lv->lv_niovecs;
+ }
IOP_PIN(lv->lv_item);
}
@@ -259,18 +288,24 @@ xlog_cil_insert_items(
* We can do this safely because the context can't checkpoint until we
* are done so it doesn't matter exactly how we update the CIL.
*/
- for (lv = log_vector; lv; lv = lv->lv_next)
- xfs_cil_prepare_item(log, lv, &len, &diff_iovecs);
-
- /* account for space used by new iovec headers */
- len += diff_iovecs * sizeof(xlog_op_header_t);
-
spin_lock(&cil->xc_cil_lock);
+ for (lv = log_vector; lv; ) {
+ struct xfs_log_vec *next = lv->lv_next;
- /* move the items to the tail of the CIL */
- for (lv = log_vector; lv; lv = lv->lv_next)
+ ASSERT(lv->lv_item->li_lv || list_empty(&lv->lv_item->li_cil));
+ lv->lv_next = NULL;
+
+ /*
+ * xfs_cil_prepare_item() may free the lv, so move the item on
+ * the CIL first.
+ */
list_move_tail(&lv->lv_item->li_cil, &cil->xc_cil);
+ xfs_cil_prepare_item(log, lv, &len, &diff_iovecs);
+ lv = next;
+ }
+ /* account for space used by new iovec headers */
+ len += diff_iovecs * sizeof(xlog_op_header_t);
ctx->nvecs += diff_iovecs;
/*
@@ -381,9 +416,7 @@ xlog_cil_push(
struct xfs_cil_ctx *new_ctx;
struct xlog_in_core *commit_iclog;
struct xlog_ticket *tic;
- int num_lv;
int num_iovecs;
- int len;
int error = 0;
struct xfs_trans_header thdr;
struct xfs_log_iovec lhdr;
@@ -428,12 +461,9 @@ xlog_cil_push(
* side which is currently locked out by the flush lock.
*/
lv = NULL;
- num_lv = 0;
num_iovecs = 0;
- len = 0;
while (!list_empty(&cil->xc_cil)) {
struct xfs_log_item *item;
- int i;
item = list_first_entry(&cil->xc_cil,
struct xfs_log_item, li_cil);
@@ -444,11 +474,7 @@ xlog_cil_push(
lv->lv_next = item->li_lv;
lv = item->li_lv;
item->li_lv = NULL;
-
- num_lv++;
num_iovecs += lv->lv_niovecs;
- for (i = 0; i < lv->lv_niovecs; i++)
- len += lv->lv_iovecp[i].i_len;
}
/*
@@ -701,6 +727,7 @@ xfs_log_commit_cil(
if (commit_lsn)
*commit_lsn = log->l_cilp->xc_ctx->sequence;
+ /* xlog_cil_insert_items() destroys log_vector list */
xlog_cil_insert_items(log, log_vector, tp->t_ticket);
/* check we didn't blow the reservation */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 08/17] xfs: Introduce an ordered buffer item
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (6 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 07/17] xfs: Introduce ordered log vector support Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 09/17] xfs: Inode create log items Dave Chinner
` (8 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
If we have a buffer that we have modified but we do not wish to
physically log in a transaction (e.g. we've logged a logical
change), we still need to ensure that transactional integrity is
maintained. Hence we must not move the tail of the log past the
transaction that the buffer is associated with before the buffer is
written to disk.
This means these special buffers still need to be included in the
transaction and added to the AIL just like a normal buffer, but we
do not want the modifications to the buffer written into the
transaction. IOWs, what we want is an "ordered buffer" that
maintains the same transactional life cycle as a physically logged
buffer, just without the transcribing of the modifications to the
log.
Hence we need to flag the buffer as an "ordered buffer" to avoid
including it in vector size calculations or formatting during the
transaction. Once the transaction is committed, the buffer appears
for all intents to be the same as a physically logged buffer as it
transitions through the log and AIL.
Relogging will also work just fine for such an ordered buffer - the
logical transaction will be replayed before the subsequent
modifications that relog the buffer, so everything will be
reconstructed correctly by recovery.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_buf_item.c | 75 +++++++++++++++++++++++++++++++-----------------
fs/xfs/xfs_buf_item.h | 4 ++-
fs/xfs/xfs_trace.h | 4 +++
fs/xfs/xfs_trans.h | 1 +
fs/xfs/xfs_trans_buf.c | 34 ++++++++++++++++++++--
5 files changed, 87 insertions(+), 31 deletions(-)
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 4ec4317..61f6876 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -140,6 +140,16 @@ xfs_buf_item_size(
ASSERT(bip->bli_flags & XFS_BLI_LOGGED);
+ if (bip->bli_flags & XFS_BLI_ORDERED) {
+ /*
+ * The buffer has been logged just to order it.
+ * It is not being included in the transaction
+ * commit, so no vectors are used at all.
+ */
+ trace_xfs_buf_item_size_ordered(bip);
+ return XFS_LOG_VEC_ORDERED;
+ }
+
/*
* the vector count is based on the number of buffer vectors we have
* dirty bits in. This will only be greater than one when we have a
@@ -212,6 +222,7 @@ xfs_buf_item_format_segment(
goto out;
}
+
/*
* Fill in an iovec for each set of contiguous chunks.
*/
@@ -311,6 +322,16 @@ xfs_buf_item_format(
bip->bli_flags &= ~XFS_BLI_INODE_BUF;
}
+ if ((bip->bli_flags & (XFS_BLI_ORDERED|XFS_BLI_STALE)) ==
+ XFS_BLI_ORDERED) {
+ /*
+ * The buffer has been logged just to order it. It is not being
+ * included in the transaction commit, so don't format it.
+ */
+ trace_xfs_buf_item_format_ordered(bip);
+ return;
+ }
+
for (i = 0; i < bip->bli_format_count; i++) {
vecp = xfs_buf_item_format_segment(bip, vecp, offset,
&bip->bli_formats[i]);
@@ -340,6 +361,7 @@ xfs_buf_item_pin(
ASSERT(atomic_read(&bip->bli_refcount) > 0);
ASSERT((bip->bli_flags & XFS_BLI_LOGGED) ||
+ (bip->bli_flags & XFS_BLI_ORDERED) ||
(bip->bli_flags & XFS_BLI_STALE));
trace_xfs_buf_item_pin(bip);
@@ -512,8 +534,9 @@ xfs_buf_item_unlock(
{
struct xfs_buf_log_item *bip = BUF_ITEM(lip);
struct xfs_buf *bp = bip->bli_buf;
- int aborted, clean, i;
- uint hold;
+ bool clean;
+ bool aborted;
+ int flags;
/* Clear the buffer's association with this transaction. */
bp->b_transp = NULL;
@@ -524,23 +547,21 @@ xfs_buf_item_unlock(
* (cancelled) buffers at unpin time, but we'll never go through the
* pin/unpin cycle if we abort inside commit.
*/
- aborted = (lip->li_flags & XFS_LI_ABORTED) != 0;
-
+ aborted = (lip->li_flags & XFS_LI_ABORTED) ? true : false;
/*
- * Before possibly freeing the buf item, determine if we should
- * release the buffer at the end of this routine.
+ * Before possibly freeing the buf item, copy the per-transaction state
+ * so we can reference it safely later after clearing it from the
+ * buffer log item.
*/
- hold = bip->bli_flags & XFS_BLI_HOLD;
-
- /* Clear the per transaction state. */
- bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD);
+ flags = bip->bli_flags;
+ bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
/*
* If the buf item is marked stale, then don't do anything. We'll
* unlock the buffer and free the buf item when the buffer is unpinned
* for the last time.
*/
- if (bip->bli_flags & XFS_BLI_STALE) {
+ if (flags & XFS_BLI_STALE) {
trace_xfs_buf_item_unlock_stale(bip);
ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL);
if (!aborted) {
@@ -557,13 +578,19 @@ xfs_buf_item_unlock(
* be the only reference to the buf item, so we free it anyway
* regardless of whether it is dirty or not. A dirty abort implies a
* shutdown, anyway.
+ *
+ * Ordered buffers are dirty but may have no recorded changes, so ensure
+ * we only release clean items here.
*/
- clean = 1;
- for (i = 0; i < bip->bli_format_count; i++) {
- if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
- bip->bli_formats[i].blf_map_size)) {
- clean = 0;
- break;
+ clean = (flags & XFS_BLI_DIRTY) ? false : true;
+ if (clean) {
+ int i;
+ for (i = 0; i < bip->bli_format_count; i++) {
+ if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
+ bip->bli_formats[i].blf_map_size)) {
+ clean = false;
+ break;
+ }
}
}
if (clean)
@@ -576,7 +603,7 @@ xfs_buf_item_unlock(
} else
atomic_dec(&bip->bli_refcount);
- if (!hold)
+ if (!(flags & XFS_BLI_HOLD))
xfs_buf_relse(bp);
}
@@ -842,12 +869,6 @@ xfs_buf_item_log(
struct xfs_buf *bp = bip->bli_buf;
/*
- * Mark the item as having some dirty data for
- * quick reference in xfs_buf_item_dirty.
- */
- bip->bli_flags |= XFS_BLI_DIRTY;
-
- /*
* walk each buffer segment and mark them dirty appropriately.
*/
start = 0;
@@ -873,7 +894,7 @@ xfs_buf_item_log(
/*
- * Return 1 if the buffer has some data that has been logged (at any
+ * Return 1 if the buffer has been logged or ordered in a transaction (at any
* point, not just the current transaction) and 0 if not.
*/
uint
@@ -907,11 +928,11 @@ void
xfs_buf_item_relse(
xfs_buf_t *bp)
{
- xfs_buf_log_item_t *bip;
+ xfs_buf_log_item_t *bip = bp->b_fspriv;
trace_xfs_buf_item_relse(bp, _RET_IP_);
+ ASSERT(!(bip->bli_item.li_flags & XFS_LI_IN_AIL));
- bip = bp->b_fspriv;
bp->b_fspriv = bip->bli_item.li_bio_list;
if (bp->b_fspriv == NULL)
bp->b_iodone = NULL;
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 2573d2a..0f1c247 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -120,6 +120,7 @@ xfs_blft_from_flags(struct xfs_buf_log_format *blf)
#define XFS_BLI_INODE_ALLOC_BUF 0x10
#define XFS_BLI_STALE_INODE 0x20
#define XFS_BLI_INODE_BUF 0x40
+#define XFS_BLI_ORDERED 0x80
#define XFS_BLI_FLAGS \
{ XFS_BLI_HOLD, "HOLD" }, \
@@ -128,7 +129,8 @@ xfs_blft_from_flags(struct xfs_buf_log_format *blf)
{ XFS_BLI_LOGGED, "LOGGED" }, \
{ XFS_BLI_INODE_ALLOC_BUF, "INODE_ALLOC" }, \
{ XFS_BLI_STALE_INODE, "STALE_INODE" }, \
- { XFS_BLI_INODE_BUF, "INODE_BUF" }
+ { XFS_BLI_INODE_BUF, "INODE_BUF" }, \
+ { XFS_BLI_ORDERED, "ORDERED" }
#ifdef __KERNEL__
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index aa4db33..34422b6 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -486,9 +486,12 @@ DEFINE_EVENT(xfs_buf_item_class, name, \
TP_PROTO(struct xfs_buf_log_item *bip), \
TP_ARGS(bip))
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size);
+DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_ordered);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_stale);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format);
+DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_ordered);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_stale);
+DEFINE_BUF_ITEM_EVENT(xfs_buf_item_ordered);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_pin);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_unpin);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_unpin_stale);
@@ -508,6 +511,7 @@ DEFINE_BUF_ITEM_EVENT(xfs_trans_bjoin);
DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold);
DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold_release);
DEFINE_BUF_ITEM_EVENT(xfs_trans_binval);
+DEFINE_BUF_ITEM_EVENT(xfs_trans_buf_ordered);
DECLARE_EVENT_CLASS(xfs_lock_class,
TP_PROTO(struct xfs_inode *ip, unsigned lock_flags,
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index a44dba5..850fe78 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -503,6 +503,7 @@ void xfs_trans_bhold_release(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_binval(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_inode_buf(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *);
+void xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
void xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 73a5fa4..aa5a04b 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -397,7 +397,6 @@ shutdown_abort:
return XFS_ERROR(EIO);
}
-
/*
* Release the buffer bp which was previously acquired with one of the
* xfs_trans_... buffer allocation routines if the buffer has not
@@ -603,8 +602,14 @@ xfs_trans_log_buf(xfs_trans_t *tp,
tp->t_flags |= XFS_TRANS_DIRTY;
bip->bli_item.li_desc->lid_flags |= XFS_LID_DIRTY;
- bip->bli_flags |= XFS_BLI_LOGGED;
- xfs_buf_item_log(bip, first, last);
+
+ /*
+ * If we have an ordered buffer we are not logging any dirty range but
+ * it still needs to be marked dirty and that it has been logged.
+ */
+ bip->bli_flags |= XFS_BLI_DIRTY | XFS_BLI_LOGGED;
+ if (!(bip->bli_flags & XFS_BLI_ORDERED))
+ xfs_buf_item_log(bip, first, last);
}
@@ -757,6 +762,29 @@ xfs_trans_inode_alloc_buf(
}
/*
+ * Mark the buffer as ordered for this transaction. This means
+ * that the contents of the buffer are not recorded in the transaction
+ * but it is tracked in the AIL as though it was. This allows us
+ * to record logical changes in transactions rather than the physical
+ * changes we make to the buffer without changing writeback ordering
+ * constraints of metadata buffers.
+ */
+void
+xfs_trans_ordered_buf(
+ struct xfs_trans *tp,
+ struct xfs_buf *bp)
+{
+ struct xfs_buf_log_item *bip = bp->b_fspriv;
+
+ ASSERT(bp->b_transp == tp);
+ ASSERT(bip != NULL);
+ ASSERT(atomic_read(&bip->bli_refcount) > 0);
+
+ bip->bli_flags |= XFS_BLI_ORDERED;
+ trace_xfs_buf_item_ordered(bip);
+}
+
+/*
* Set the type of the buffer for log recovery so that it can correctly identify
* and hence attach the correct buffer ops to the buffer after replay.
*/
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 09/17] xfs: Inode create log items
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (7 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 08/17] xfs: Introduce an ordered buffer item Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 10/17] xfs: Inode create transaction reservations Dave Chinner
` (7 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
Introduce the inode create log item type for logical inode create logging.
Instead of logging the changes in buffers, pass the range to be
initialised through the log by a new transaction type. This reduces
the amount of log space required to record initialisation during
allocation from about 128 bytes per inode to a small fixed amount
per inode extent to be initialised.
This requires a new log item type to track it through the log
and the AIL. This is a relatively simple item - most callbacks are
noops as this item has the same life cycle as the transaction.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_icreate_item.c | 195 +++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_icreate_item.h | 52 ++++++++++++
fs/xfs/xfs_log.h | 3 +-
fs/xfs/xfs_super.c | 8 ++
fs/xfs/xfs_trans.h | 4 +-
6 files changed, 261 insertions(+), 2 deletions(-)
create mode 100644 fs/xfs/xfs_icreate_item.c
create mode 100644 fs/xfs/xfs_icreate_item.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 6313b69..4a45080 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -71,6 +71,7 @@ xfs-y += xfs_alloc.o \
xfs_dir2_sf.o \
xfs_ialloc.o \
xfs_ialloc_btree.o \
+ xfs_icreate_item.o \
xfs_inode.o \
xfs_log_recover.o \
xfs_mount.o \
diff --git a/fs/xfs/xfs_icreate_item.c b/fs/xfs/xfs_icreate_item.c
new file mode 100644
index 0000000..7716a4e
--- /dev/null
+++ b/fs/xfs/xfs_icreate_item.c
@@ -0,0 +1,195 @@
+/*
+ * Copyright (c) 2008-2010, 2013 Dave Chinner
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_inum.h"
+#include "xfs_trans.h"
+#include "xfs_buf_item.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_dir2.h"
+#include "xfs_mount.h"
+#include "xfs_trans_priv.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_attr_sf.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_inode_item.h"
+#include "xfs_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_error.h"
+#include "xfs_icreate_item.h"
+
+kmem_zone_t *xfs_icreate_zone; /* inode create item zone */
+
+static inline struct xfs_icreate_item *ICR_ITEM(struct xfs_log_item *lip)
+{
+ return container_of(lip, struct xfs_icreate_item, ic_item);
+}
+
+/*
+ * This returns the number of iovecs needed to log the given inode item.
+ *
+ * We only need one iovec for the icreate log structure.
+ */
+STATIC uint
+xfs_icreate_item_size(
+ struct xfs_log_item *lip)
+{
+ return 1;
+}
+
+/*
+ * This is called to fill in the vector of log iovecs for the
+ * given inode create log item.
+ */
+STATIC void
+xfs_icreate_item_format(
+ struct xfs_log_item *lip,
+ struct xfs_log_iovec *log_vector)
+{
+ struct xfs_icreate_item *icp = ICR_ITEM(lip);
+
+ log_vector->i_addr = (xfs_caddr_t)&icp->ic_format;
+ log_vector->i_len = sizeof(struct xfs_icreate_log);
+ log_vector->i_type = XLOG_REG_TYPE_ICREATE;
+}
+
+
+/* Pinning has no meaning for the create item, so just return. */
+STATIC void
+xfs_icreate_item_pin(
+ struct xfs_log_item *lip)
+{
+}
+
+
+/* pinning has no meaning for the create item, so just return. */
+STATIC void
+xfs_icreate_item_unpin(
+ struct xfs_log_item *lip,
+ int remove)
+{
+}
+
+STATIC void
+xfs_icreate_item_unlock(
+ struct xfs_log_item *lip)
+{
+ struct xfs_icreate_item *icp = ICR_ITEM(lip);
+
+ if (icp->ic_item.li_flags & XFS_LI_ABORTED)
+ kmem_zone_free(xfs_icreate_zone, icp);
+ return;
+}
+
+/*
+ * Because we have ordered buffers being tracked in the AIL for the inode
+ * creation, we don't need the create item after this. Hence we can free
+ * the log item and return -1 to tell the caller we're done with the item.
+ */
+STATIC xfs_lsn_t
+xfs_icreate_item_committed(
+ struct xfs_log_item *lip,
+ xfs_lsn_t lsn)
+{
+ struct xfs_icreate_item *icp = ICR_ITEM(lip);
+
+ kmem_zone_free(xfs_icreate_zone, icp);
+ return (xfs_lsn_t)-1;
+}
+
+/* item can never get into the AIL */
+STATIC uint
+xfs_icreate_item_push(
+ struct xfs_log_item *lip,
+ struct list_head *buffer_list)
+{
+ ASSERT(0);
+ return XFS_ITEM_SUCCESS;
+}
+
+/* Ordered buffers do the dependency tracking here, so this does nothing. */
+STATIC void
+xfs_icreate_item_committing(
+ struct xfs_log_item *lip,
+ xfs_lsn_t lsn)
+{
+}
+
+/*
+ * This is the ops vector shared by all buf log items.
+ */
+static struct xfs_item_ops xfs_icreate_item_ops = {
+ .iop_size = xfs_icreate_item_size,
+ .iop_format = xfs_icreate_item_format,
+ .iop_pin = xfs_icreate_item_pin,
+ .iop_unpin = xfs_icreate_item_unpin,
+ .iop_push = xfs_icreate_item_push,
+ .iop_unlock = xfs_icreate_item_unlock,
+ .iop_committed = xfs_icreate_item_committed,
+ .iop_committing = xfs_icreate_item_committing,
+};
+
+
+/*
+ * Initialize the inode log item for a newly allocated (in-core) inode.
+ *
+ * Inode extents can only reside within an AG. Hence specify the starting
+ * block for the inode chunk by offset within an AG as well as the
+ * length of the allocated extent.
+ *
+ * This joins the item to the transaction and marks it dirty so
+ * that we don't need a separate call to do this, nor does the
+ * caller need to know anything about the icreate item.
+ */
+void
+xfs_icreate_log(
+ struct xfs_trans *tp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t agbno,
+ unsigned int count,
+ unsigned int inode_size,
+ xfs_agblock_t length,
+ unsigned int generation)
+{
+ struct xfs_icreate_item *icp;
+
+ icp = kmem_zone_zalloc(xfs_icreate_zone, KM_SLEEP);
+
+ xfs_log_item_init(tp->t_mountp, &icp->ic_item, XFS_LI_ICREATE,
+ &xfs_icreate_item_ops);
+
+ icp->ic_format.icl_type = XFS_LI_ICREATE;
+ icp->ic_format.icl_size = 1; /* single vector */
+ icp->ic_format.icl_ag = cpu_to_be32(agno);
+ icp->ic_format.icl_agbno = cpu_to_be32(agbno);
+ icp->ic_format.icl_count = cpu_to_be32(count);
+ icp->ic_format.icl_isize = cpu_to_be32(inode_size);
+ icp->ic_format.icl_length = cpu_to_be32(length);
+ icp->ic_format.icl_gen = cpu_to_be32(generation);
+
+ xfs_trans_add_item(tp, &icp->ic_item);
+ tp->t_flags |= XFS_TRANS_DIRTY;
+ icp->ic_item.li_desc->lid_flags |= XFS_LID_DIRTY;
+}
diff --git a/fs/xfs/xfs_icreate_item.h b/fs/xfs/xfs_icreate_item.h
new file mode 100644
index 0000000..88ba8aa
--- /dev/null
+++ b/fs/xfs/xfs_icreate_item.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (c) 2008-2010, Dave Chinner
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef XFS_ICREATE_ITEM_H
+#define XFS_ICREATE_ITEM_H 1
+
+/*
+ * on disk log item structure
+ *
+ * Log recovery assumes the first two entries are the type and size and they fit
+ * in 32 bits. Also in host order (ugh) so they have to be 32 bit aligned so
+ * decoding can be done correctly.
+ */
+struct xfs_icreate_log {
+ __uint16_t icl_type; /* type of log format structure */
+ __uint16_t icl_size; /* size of log format structure */
+ __be32 icl_ag; /* ag being allocated in */
+ __be32 icl_agbno; /* start block of inode range */
+ __be32 icl_count; /* number of inodes to initialise */
+ __be32 icl_isize; /* size of inodes */
+ __be32 icl_length; /* length of extent to initialise */
+ __be32 icl_gen; /* inode generation number to use */
+};
+
+/* in memory log item structure */
+struct xfs_icreate_item {
+ struct xfs_log_item ic_item;
+ struct xfs_icreate_log ic_format;
+};
+
+extern kmem_zone_t *xfs_icreate_zone; /* inode create item zone */
+
+void xfs_icreate_log(struct xfs_trans *tp, xfs_agnumber_t agno,
+ xfs_agblock_t agbno, unsigned int count,
+ unsigned int inode_size, xfs_agblock_t length,
+ unsigned int generation);
+
+#endif /* XFS_ICREATE_ITEM_H */
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index b20918c..fb630e4 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -88,7 +88,8 @@ static inline xfs_lsn_t _lsn_cmp(xfs_lsn_t lsn1, xfs_lsn_t lsn2)
#define XLOG_REG_TYPE_UNMOUNT 17
#define XLOG_REG_TYPE_COMMIT 18
#define XLOG_REG_TYPE_TRANSHDR 19
-#define XLOG_REG_TYPE_MAX 19
+#define XLOG_REG_TYPE_ICREATE 20
+#define XLOG_REG_TYPE_MAX 20
typedef struct xfs_log_iovec {
void *i_addr; /* beginning address of region */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 3033ba5..f59e27f 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -51,6 +51,7 @@
#include "xfs_inode_item.h"
#include "xfs_icache.h"
#include "xfs_trace.h"
+#include "xfs_icreate_item.h"
#include <linux/namei.h>
#include <linux/init.h>
@@ -1655,9 +1656,15 @@ xfs_init_zones(void)
KM_ZONE_SPREAD, NULL);
if (!xfs_ili_zone)
goto out_destroy_inode_zone;
+ xfs_icreate_zone = kmem_zone_init(sizeof(struct xfs_icreate_item),
+ "xfs_icr");
+ if (!xfs_icreate_zone)
+ goto out_destroy_ili_zone;
return 0;
+ out_destroy_ili_zone:
+ kmem_zone_destroy(xfs_ili_zone);
out_destroy_inode_zone:
kmem_zone_destroy(xfs_inode_zone);
out_destroy_efi_zone:
@@ -1696,6 +1703,7 @@ xfs_destroy_zones(void)
* destroy caches.
*/
rcu_barrier();
+ kmem_zone_destroy(xfs_icreate_zone);
kmem_zone_destroy(xfs_ili_zone);
kmem_zone_destroy(xfs_inode_zone);
kmem_zone_destroy(xfs_efi_zone);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 850fe78..079d5a0 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -48,6 +48,7 @@ typedef struct xfs_trans_header {
#define XFS_LI_BUF 0x123c /* v2 bufs, variable sized inode bufs */
#define XFS_LI_DQUOT 0x123d
#define XFS_LI_QUOTAOFF 0x123e
+#define XFS_LI_ICREATE 0x123f
#define XFS_LI_TYPE_DESC \
{ XFS_LI_EFI, "XFS_LI_EFI" }, \
@@ -107,7 +108,8 @@ typedef struct xfs_trans_header {
#define XFS_TRANS_SWAPEXT 40
#define XFS_TRANS_SB_COUNT 41
#define XFS_TRANS_CHECKPOINT 42
-#define XFS_TRANS_TYPE_MAX 42
+#define XFS_TRANS_ICREATE 43
+#define XFS_TRANS_TYPE_MAX 43
/* new transaction types need to be reflected in xfs_logprint(8) */
#define XFS_TRANS_TYPES \
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 10/17] xfs: Inode create transaction reservations
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (8 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 09/17] xfs: Inode create log items Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 11/17] xfs: Inode create item recovery Dave Chinner
` (6 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
Define the log and space transaction sizes. Factor the current
create log reservation macro into the two logical halves and reuse
one half for the new icreate transactions. The icreate transaction
is transparent to all the high level create code - the
pre-calculated reservations will correctly set the reservations
dependent on whether the filesystem supports the icreate
transaction.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_trans.c | 118 ++++++++++++++++++++++++++++++++++------------------
1 file changed, 77 insertions(+), 41 deletions(-)
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 2fd7c1f..35a2299 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -234,71 +234,93 @@ xfs_calc_remove_reservation(
}
/*
- * For symlink we can modify:
+ * For create, break it in to the two cases that the transaction
+ * covers. We start with the modify case - allocation done by modification
+ * of the state of existing inodes - and the allocation case.
+ */
+
+/*
+ * For create we can modify:
* the parent directory inode: inode size
* the new inode: inode size
- * the inode btree entry: 1 block
+ * the inode btree entry: block size
+ * the superblock for the nlink flag: sector size
* the directory btree: (max depth + v2) * dir block size
* the directory inode's bmap btree: (max depth + v2) * block size
- * the blocks for the symlink: 1 kB
- * Or in the first xact we allocate some inodes giving:
+ */
+STATIC uint
+xfs_calc_create_resv_modify(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+ (uint)XFS_FSB_TO_B(mp, 1) +
+ xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
+}
+
+/*
+ * For create we can allocate some inodes giving:
* the agi and agf of the ag getting the new inodes: 2 * sectorsize
+ * the superblock for the nlink flag: sector size
* the inode blocks allocated: XFS_IALLOC_BLOCKS * blocksize
* the inode btree: max depth * blocksize
- * the allocation btrees: 2 trees * (2 * max depth - 1) * block size
+ * the allocation btrees: 2 trees * (max depth - 1) * block size
*/
STATIC uint
-xfs_calc_symlink_reservation(
+xfs_calc_create_resv_alloc(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+ mp->m_sb.sb_sectsize +
+ xfs_calc_buf_res(XFS_IALLOC_BLOCKS(mp), XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+STATIC uint
+__xfs_calc_create_reservation(
struct xfs_mount *mp)
{
return XFS_DQUOT_LOGRES(mp) +
- MAX((xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(1, 1024)),
- (xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_IALLOC_BLOCKS(mp),
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(mp->m_in_maxlevels,
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1))));
+ MAX(xfs_calc_create_resv_alloc(mp),
+ xfs_calc_create_resv_modify(mp));
}
/*
- * For create we can modify:
- * the parent directory inode: inode size
- * the new inode: inode size
- * the inode btree entry: block size
- * the superblock for the nlink flag: sector size
- * the directory btree: (max depth + v2) * dir block size
- * the directory inode's bmap btree: (max depth + v2) * block size
- * Or in the first xact we allocate some inodes giving:
+ * For icreate we can allocate some inodes giving:
* the agi and agf of the ag getting the new inodes: 2 * sectorsize
* the superblock for the nlink flag: sector size
- * the inode blocks allocated: XFS_IALLOC_BLOCKS * blocksize
* the inode btree: max depth * blocksize
* the allocation btrees: 2 trees * (max depth - 1) * block size
*/
STATIC uint
-xfs_calc_create_reservation(
+xfs_calc_icreate_resv_alloc(
struct xfs_mount *mp)
{
+ return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+ mp->m_sb.sb_sectsize +
+ xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+STATIC uint
+xfs_calc_icreate_reservation(xfs_mount_t *mp)
+{
return XFS_DQUOT_LOGRES(mp) +
- MAX((xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
- (uint)XFS_FSB_TO_B(mp, 1) +
- xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
- XFS_FSB_TO_B(mp, 1))),
- (xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
- mp->m_sb.sb_sectsize +
- xfs_calc_buf_res(XFS_IALLOC_BLOCKS(mp),
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(mp->m_in_maxlevels,
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1))));
+ MAX(xfs_calc_icreate_resv_alloc(mp),
+ xfs_calc_create_resv_modify(mp));
+}
+
+STATIC uint
+xfs_calc_create_reservation(
+ struct xfs_mount *mp)
+{
+ if (xfs_sb_version_hascrc(&mp->m_sb))
+ return xfs_calc_icreate_reservation(mp);
+ return __xfs_calc_create_reservation(mp);
+
}
/*
@@ -311,6 +333,20 @@ xfs_calc_mkdir_reservation(
return xfs_calc_create_reservation(mp);
}
+
+/*
+ * Making a new symplink is the same as creating a new file, but
+ * with the added blocks for remote symlink data which can be up to 1kB in
+ * length (MAXPATHLEN).
+ */
+STATIC uint
+xfs_calc_symlink_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_create_reservation(mp) +
+ xfs_calc_buf_res(1, MAXPATHLEN);
+}
+
/*
* In freeing an inode we can modify:
* the inode being freed: inode size
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 11/17] xfs: Inode create item recovery
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (9 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 10/17] xfs: Inode create transaction reservations Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 12/17] xfs: Use inode create transaction Dave Chinner
` (5 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
When we find a icreate transaction, we need to get and initialise
the buffers in the range that has been passed. Extract and verify
the information in the item record, then loop over the range
initialising and issuing the buffer writes delayed.
Support an arbitrary size range to initialise so that in
future when we allocate inodes in much larger chunks all kernels
that understand this transaction can still recover them.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_ialloc.c | 37 +++++++++++----
fs/xfs/xfs_ialloc.h | 8 ++++
fs/xfs/xfs_log_recover.c | 114 ++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 145 insertions(+), 14 deletions(-)
diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index c8f5ae1..d5ef81a 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -150,12 +150,16 @@ xfs_check_agi_freecount(
#endif
/*
- * Initialise a new set of inodes.
+ * Initialise a new set of inodes. When called without a transaction context
+ * (e.g. from recovery) we initiate a delayed write of the inode buffers rather
+ * than logging them (which in a transaction context puts them into the AIL
+ * for writeback rather than the xfsbufd queue).
*/
STATIC int
xfs_ialloc_inode_init(
struct xfs_mount *mp,
struct xfs_trans *tp,
+ struct list_head *buffer_list,
xfs_agnumber_t agno,
xfs_agblock_t agbno,
xfs_agblock_t length,
@@ -247,18 +251,33 @@ xfs_ialloc_inode_init(
ino++;
uuid_copy(&free->di_uuid, &mp->m_sb.sb_uuid);
xfs_dinode_calc_crc(mp, free);
- } else {
+ } else if (tp) {
/* just log the inode core */
xfs_trans_log_buf(tp, fbuf, ioffset,
ioffset + isize - 1);
}
}
- if (version == 3) {
- /* need to log the entire buffer */
- xfs_trans_log_buf(tp, fbuf, 0,
- BBTOB(fbuf->b_length) - 1);
+
+ if (tp) {
+ /*
+ * Mark the buffer as an inode allocation buffer so it
+ * sticks in AIL at the point of this allocation
+ * transaction. This ensures the they are on disk before
+ * the tail of the log can be moved past this
+ * transaction (i.e. by preventing relogging from moving
+ * it forward in the log).
+ */
+ xfs_trans_inode_alloc_buf(tp, fbuf);
+ if (version == 3) {
+ /* need to log the entire buffer */
+ xfs_trans_log_buf(tp, fbuf, 0,
+ BBTOB(fbuf->b_length) - 1);
+ }
+ } else {
+ fbuf->b_flags |= XBF_DONE;
+ xfs_buf_delwri_queue(fbuf, buffer_list);
+ xfs_buf_relse(fbuf);
}
- xfs_trans_inode_alloc_buf(tp, fbuf);
}
return 0;
}
@@ -303,7 +322,7 @@ xfs_ialloc_ag_alloc(
* First try to allocate inodes contiguous with the last-allocated
* chunk of inodes. If the filesystem is striped, this will fill
* an entire stripe unit with inodes.
- */
+ */
agi = XFS_BUF_TO_AGI(agbp);
newino = be32_to_cpu(agi->agi_newino);
agno = be32_to_cpu(agi->agi_seqno);
@@ -402,7 +421,7 @@ xfs_ialloc_ag_alloc(
* rather than a linear progression to prevent the next generation
* number from being easily guessable.
*/
- error = xfs_ialloc_inode_init(args.mp, tp, agno, args.agbno,
+ error = xfs_ialloc_inode_init(args.mp, tp, NULL, agno, args.agbno,
args.len, prandom_u32());
if (error)
diff --git a/fs/xfs/xfs_ialloc.h b/fs/xfs/xfs_ialloc.h
index c8da3df..68c0732 100644
--- a/fs/xfs/xfs_ialloc.h
+++ b/fs/xfs/xfs_ialloc.h
@@ -150,6 +150,14 @@ int xfs_inobt_lookup(struct xfs_btree_cur *cur, xfs_agino_t ino,
int xfs_inobt_get_rec(struct xfs_btree_cur *cur,
xfs_inobt_rec_incore_t *rec, int *stat);
+/*
+ * Inode chunk initialisation routine
+ */
+int xfs_ialloc_inode_init(struct xfs_mount *mp, struct xfs_trans *tp,
+ struct list_head *buffer_list,
+ xfs_agnumber_t agno, xfs_agblock_t agbno,
+ xfs_agblock_t length, unsigned int gen);
+
extern const struct xfs_buf_ops xfs_agi_buf_ops;
#endif /* __XFS_IALLOC_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 45a85ff..a8428b6 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -45,6 +45,7 @@
#include "xfs_cksum.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
+#include "xfs_icreate_item.h"
/* Need all the magic numbers and buffer ops structures from these headers */
#include "xfs_symlink.h"
@@ -1617,7 +1618,10 @@ xlog_recover_add_to_trans(
* form the cancelled buffer table. Hence they have tobe done last.
*
* 3. Inode allocation buffers must be replayed before inode items that
- * read the buffer and replay changes into it.
+ * read the buffer and replay changes into it. For filesystems using the
+ * ICREATE transactions, this means XFS_LI_ICREATE objects need to get
+ * treated the same as inode allocation buffers as they create and
+ * initialise the buffers directly.
*
* 4. Inode unlink buffers must be replayed after inode items are replayed.
* This ensures that inodes are completely flushed to the inode buffer
@@ -1632,10 +1636,17 @@ xlog_recover_add_to_trans(
* from all the other buffers and move them to last.
*
* Hence, 4 lists, in order from head to tail:
- * - buffer_list for all buffers except cancelled/inode unlink buffers
- * - item_list for all non-buffer items
- * - inode_buffer_list for inode unlink buffers
- * - cancel_list for the cancelled buffers
+ * - buffer_list for all buffers except cancelled/inode unlink buffers
+ * - item_list for all non-buffer items
+ * - inode_buffer_list for inode unlink buffers
+ * - cancel_list for the cancelled buffers
+ *
+ * Note that we add objects to the tail of the lists so that first-to-last
+ * ordering is preserved within the lists. Adding objects to the head of the
+ * list means when we traverse from the head we walk them in last-to-first
+ * order. For cancelled buffers and inode unlink buffers this doesn't matter,
+ * but for all other items there may be specific ordering that we need to
+ * preserve.
*/
STATIC int
xlog_recover_reorder_trans(
@@ -1655,6 +1666,9 @@ xlog_recover_reorder_trans(
xfs_buf_log_format_t *buf_f = item->ri_buf[0].i_addr;
switch (ITEM_TYPE(item)) {
+ case XFS_LI_ICREATE:
+ list_move_tail(&item->ri_list, &buffer_list);
+ break;
case XFS_LI_BUF:
if (buf_f->blf_flags & XFS_BLF_CANCEL) {
trace_xfs_log_recover_item_reorder_head(log,
@@ -2967,6 +2981,93 @@ xlog_recover_efd_pass2(
}
/*
+ * This routine is called when an inode create format structure is found in a
+ * committed transaction in the log. It's purpose is to initialise the inodes
+ * being allocated on disk. This requires us to get inode cluster buffers that
+ * match the range to be intialised, stamped with inode templates and written
+ * by delayed write so that subsequent modifications will hit the cached buffer
+ * and only need writing out at the end of recovery.
+ */
+STATIC int
+xlog_recover_do_icreate_pass2(
+ struct xlog *log,
+ struct list_head *buffer_list,
+ xlog_recover_item_t *item)
+{
+ struct xfs_mount *mp = log->l_mp;
+ struct xfs_icreate_log *icl;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ unsigned int count;
+ unsigned int isize;
+ xfs_agblock_t length;
+
+ icl = (struct xfs_icreate_log *)item->ri_buf[0].i_addr;
+ if (icl->icl_type != XFS_LI_ICREATE) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad type");
+ return EINVAL;
+ }
+
+ if (icl->icl_size != 1) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad icl size");
+ return EINVAL;
+ }
+
+ agno = be32_to_cpu(icl->icl_ag);
+ if (agno >= mp->m_sb.sb_agcount) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad agno");
+ return EINVAL;
+ }
+ agbno = be32_to_cpu(icl->icl_agbno);
+ if (!agbno || agbno == NULLAGBLOCK || agbno >= mp->m_sb.sb_agblocks) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad agbno");
+ return EINVAL;
+ }
+ isize = be32_to_cpu(icl->icl_isize);
+ if (isize != mp->m_sb.sb_inodesize) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad isize");
+ return EINVAL;
+ }
+ count = be32_to_cpu(icl->icl_count);
+ if (!count) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad count");
+ return EINVAL;
+ }
+ length = be32_to_cpu(icl->icl_length);
+ if (!length || length >= mp->m_sb.sb_agblocks) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad length");
+ return EINVAL;
+ }
+
+ /* existing allocation is fixed value */
+ ASSERT(count == XFS_IALLOC_INODES(mp));
+ ASSERT(length == XFS_IALLOC_BLOCKS(mp));
+ if (count != XFS_IALLOC_INODES(mp) ||
+ length != XFS_IALLOC_BLOCKS(mp)) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad count 2");
+ return EINVAL;
+ }
+
+ /*
+ * Inode buffers can be freed. Do not replay the inode initialisation as
+ * we could be overwriting something written after this inode buffer was
+ * cancelled.
+ *
+ * XXX: we need to iterate all buffers and only init those that are not
+ * cancelled. I think that a more fine grained factoring of
+ * xfs_ialloc_inode_init may be appropriate here to enable this to be
+ * done easily.
+ */
+ if (xlog_check_buffer_cancelled(log,
+ XFS_AGB_TO_DADDR(mp, agno, agbno), length, 0))
+ return 0;
+
+ xfs_ialloc_inode_init(mp, NULL, buffer_list, agno, agbno, length,
+ be32_to_cpu(icl->icl_gen));
+ return 0;
+}
+
+/*
* Free up any resources allocated by the transaction
*
* Remember that EFIs, EFDs, and IUNLINKs are handled later.
@@ -3008,6 +3109,7 @@ xlog_recover_commit_pass1(
case XFS_LI_EFI:
case XFS_LI_EFD:
case XFS_LI_DQUOT:
+ case XFS_LI_ICREATE:
/* nothing to do in pass 1 */
return 0;
default:
@@ -3038,6 +3140,8 @@ xlog_recover_commit_pass2(
return xlog_recover_efd_pass2(log, item);
case XFS_LI_DQUOT:
return xlog_recover_dquot_pass2(log, buffer_list, item);
+ case XFS_LI_ICREATE:
+ return xlog_recover_do_icreate_pass2(log, buffer_list, item);
case XFS_LI_QUOTAOFF:
/* nothing to do in pass2 */
return 0;
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 12/17] xfs: Use inode create transaction
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (10 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 11/17] xfs: Inode create item recovery Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 13/17] xfs: remove local fork format handling from xfs_bmapi_write() Dave Chinner
` (4 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
Replace the use of buffer based logging of inode initialisation,
uses the new logical form to describe the range to be initialised
in recovery. We continue to "log" the inode buffers to push them
into the AIL and ensure that the inode create transaction is not
removed from the log before the inode buffers are written to disk.
Update the transaction identifier and reservations to match the
changed implementation.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_buf_item.c | 12 ++++++++++--
fs/xfs/xfs_ialloc.c | 32 +++++++++++++++++++++++---------
2 files changed, 33 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 61f6876..bfc4e0c 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -310,13 +310,21 @@ xfs_buf_item_format(
/*
* If it is an inode buffer, transfer the in-memory state to the
- * format flags and clear the in-memory state. We do not transfer
+ * format flags and clear the in-memory state.
+ *
+ * For buffer based inode allocation, we do not transfer
* this state if the inode buffer allocation has not yet been committed
* to the log as setting the XFS_BLI_INODE_BUF flag will prevent
* correct replay of the inode allocation.
+ *
+ * For icreate item based inode allocation, the buffers aren't written
+ * to the journal during allocation, and hence we should always tag the
+ * buffer as an inode buffer so that the correct unlinked list replay
+ * occurs during recovery.
*/
if (bip->bli_flags & XFS_BLI_INODE_BUF) {
- if (!((bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF) &&
+ if (xfs_sb_version_hascrc(&lip->li_mountp->m_sb) ||
+ !((bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF) &&
xfs_log_item_in_current_chkpt(lip)))
bip->__bli_format.blf_flags |= XFS_BLF_INODE_BUF;
bip->bli_flags &= ~XFS_BLI_INODE_BUF;
diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index d5ef81a..319d9e4 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -38,6 +38,7 @@
#include "xfs_bmap.h"
#include "xfs_cksum.h"
#include "xfs_buf_item.h"
+#include "xfs_icreate_item.h"
/*
@@ -155,7 +156,7 @@ xfs_check_agi_freecount(
* than logging them (which in a transaction context puts them into the AIL
* for writeback rather than the xfsbufd queue).
*/
-STATIC int
+int
xfs_ialloc_inode_init(
struct xfs_mount *mp,
struct xfs_trans *tp,
@@ -212,6 +213,18 @@ xfs_ialloc_inode_init(
version = 3;
ino = XFS_AGINO_TO_INO(mp, agno,
XFS_OFFBNO_TO_AGINO(mp, agbno, 0));
+
+ /*
+ * log the initialisation that is about to take place as an
+ * logical operation. This means the transaction does not
+ * need to log the physical changes to the inode buffers as log
+ * recovery will know what initialisation is actually needed.
+ * Hence we only need to log the buffers as "ordered" buffers so
+ * they track in the AIL as if they were physically logged.
+ */
+ if (tp)
+ xfs_icreate_log(tp, agno, agbno, XFS_IALLOC_INODES(mp),
+ mp->m_sb.sb_inodesize, length, gen);
} else if (xfs_sb_version_hasnlink(&mp->m_sb))
version = 2;
else
@@ -227,13 +240,8 @@ xfs_ialloc_inode_init(
XBF_UNMAPPED);
if (!fbuf)
return ENOMEM;
- /*
- * Initialize all inodes in this buffer and then log them.
- *
- * XXX: It would be much better if we had just one transaction
- * to log a whole cluster of inodes instead of all the
- * individual transactions causing a lot of log traffic.
- */
+
+ /* Initialize the inode buffers and log them appropriately. */
fbuf->b_ops = &xfs_inode_buf_ops;
xfs_buf_zero(fbuf, 0, BBTOB(fbuf->b_length));
for (i = 0; i < ninodes; i++) {
@@ -269,7 +277,13 @@ xfs_ialloc_inode_init(
*/
xfs_trans_inode_alloc_buf(tp, fbuf);
if (version == 3) {
- /* need to log the entire buffer */
+ /*
+ * Mark the buffer as ordered so that they are
+ * not physically logged in the transaction but
+ * still tracked in the AIL as part of the
+ * transaction and pin the log appropriately.
+ */
+ xfs_trans_ordered_buf(tp, fbuf);
xfs_trans_log_buf(tp, fbuf, 0,
BBTOB(fbuf->b_length) - 1);
}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 13/17] xfs: remove local fork format handling from xfs_bmapi_write()
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (11 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 12/17] xfs: Use inode create transaction Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 14/17] xfs: move getdents code into it's own file Dave Chinner
` (3 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The conversion from local format to extent format requires
interpretation of the data in the fork being converted, so it cannot
be done in a generic way. It is up to the caller to convert the fork
format to extent format before calling into xfs_bmapi_write() so
format conversion can be done correctly.
The code in xfs_bmapi_write() to convert the format is used
implicitly by the attribute and directory code, but they
specifically zero the fork size so that the conversion does not do
any allocation or manipulation. Move this conversion into the
shortform to leaf functions for the dir/attr code so the conversions
are explicitly controlled by all callers.
Now we can remove the conversion code in xfs_bmapi_write.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_attr_leaf.c | 2 +
fs/xfs/xfs_bmap.c | 199 ++++++++++++++++++++---------------------------
fs/xfs/xfs_bmap.h | 1 +
fs/xfs/xfs_dir2_block.c | 20 +++--
4 files changed, 98 insertions(+), 124 deletions(-)
diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 31d3cd1..b800fbc 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -690,6 +690,8 @@ xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
sf = (xfs_attr_shortform_t *)tmpbuffer;
xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
+ xfs_bmap_local_to_extents_empty(dp, XFS_ATTR_FORK);
+
bp = NULL;
error = xfs_da_grow_inode(args, &blkno);
if (error) {
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 8904284..05c698c 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -1161,6 +1161,24 @@ xfs_bmap_extents_to_btree(
* since the file data needs to get logged so things will stay consistent.
* (The bmap-level manipulations are ok, though).
*/
+void
+xfs_bmap_local_to_extents_empty(
+ struct xfs_inode *ip,
+ int whichfork)
+{
+ struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
+
+ ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
+ ASSERT(ifp->if_bytes == 0);
+ ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) == 0);
+
+ xfs_bmap_forkoff_reset(ip->i_mount, ip, whichfork);
+ ifp->if_flags &= ~XFS_IFINLINE;
+ ifp->if_flags |= XFS_IFEXTENTS;
+ XFS_IFORK_FMT_SET(ip, whichfork, XFS_DINODE_FMT_EXTENTS);
+}
+
+
STATIC int /* error */
xfs_bmap_local_to_extents(
xfs_trans_t *tp, /* transaction pointer */
@@ -1174,9 +1192,12 @@ xfs_bmap_local_to_extents(
struct xfs_inode *ip,
struct xfs_ifork *ifp))
{
- int error; /* error return value */
+ int error = 0;
int flags; /* logging flags returned */
xfs_ifork_t *ifp; /* inode fork pointer */
+ xfs_alloc_arg_t args; /* allocation arguments */
+ xfs_buf_t *bp; /* buffer for extent block */
+ xfs_bmbt_rec_host_t *ep; /* extent record pointer */
/*
* We don't want to deal with the case of keeping inode data inline yet.
@@ -1185,68 +1206,65 @@ xfs_bmap_local_to_extents(
ASSERT(!(S_ISREG(ip->i_d.di_mode) && whichfork == XFS_DATA_FORK));
ifp = XFS_IFORK_PTR(ip, whichfork);
ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
+
+ if (!ifp->if_bytes) {
+ xfs_bmap_local_to_extents_empty(ip, whichfork);
+ flags = XFS_ILOG_CORE;
+ goto done;
+ }
+
flags = 0;
error = 0;
- if (ifp->if_bytes) {
- xfs_alloc_arg_t args; /* allocation arguments */
- xfs_buf_t *bp; /* buffer for extent block */
- xfs_bmbt_rec_host_t *ep;/* extent record pointer */
-
- ASSERT((ifp->if_flags &
- (XFS_IFINLINE|XFS_IFEXTENTS|XFS_IFEXTIREC)) == XFS_IFINLINE);
- memset(&args, 0, sizeof(args));
- args.tp = tp;
- args.mp = ip->i_mount;
- args.firstblock = *firstblock;
- /*
- * Allocate a block. We know we need only one, since the
- * file currently fits in an inode.
- */
- if (*firstblock == NULLFSBLOCK) {
- args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
- args.type = XFS_ALLOCTYPE_START_BNO;
- } else {
- args.fsbno = *firstblock;
- args.type = XFS_ALLOCTYPE_NEAR_BNO;
- }
- args.total = total;
- args.minlen = args.maxlen = args.prod = 1;
- error = xfs_alloc_vextent(&args);
- if (error)
- goto done;
-
- /* Can't fail, the space was reserved. */
- ASSERT(args.fsbno != NULLFSBLOCK);
- ASSERT(args.len == 1);
- *firstblock = args.fsbno;
- bp = xfs_btree_get_bufl(args.mp, tp, args.fsbno, 0);
-
- /* initialise the block and copy the data */
- init_fn(tp, bp, ip, ifp);
-
- /* account for the change in fork size and log everything */
- xfs_trans_log_buf(tp, bp, 0, ifp->if_bytes - 1);
- xfs_bmap_forkoff_reset(args.mp, ip, whichfork);
- xfs_idata_realloc(ip, -ifp->if_bytes, whichfork);
- xfs_iext_add(ifp, 0, 1);
- ep = xfs_iext_get_ext(ifp, 0);
- xfs_bmbt_set_allf(ep, 0, args.fsbno, 1, XFS_EXT_NORM);
- trace_xfs_bmap_post_update(ip, 0,
- whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0,
- _THIS_IP_);
- XFS_IFORK_NEXT_SET(ip, whichfork, 1);
- ip->i_d.di_nblocks = 1;
- xfs_trans_mod_dquot_byino(tp, ip,
- XFS_TRANS_DQ_BCOUNT, 1L);
- flags |= xfs_ilog_fext(whichfork);
+ ASSERT((ifp->if_flags & (XFS_IFINLINE|XFS_IFEXTENTS|XFS_IFEXTIREC)) ==
+ XFS_IFINLINE);
+ memset(&args, 0, sizeof(args));
+ args.tp = tp;
+ args.mp = ip->i_mount;
+ args.firstblock = *firstblock;
+ /*
+ * Allocate a block. We know we need only one, since the
+ * file currently fits in an inode.
+ */
+ if (*firstblock == NULLFSBLOCK) {
+ args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
+ args.type = XFS_ALLOCTYPE_START_BNO;
} else {
- ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) == 0);
- xfs_bmap_forkoff_reset(ip->i_mount, ip, whichfork);
+ args.fsbno = *firstblock;
+ args.type = XFS_ALLOCTYPE_NEAR_BNO;
}
- ifp->if_flags &= ~XFS_IFINLINE;
- ifp->if_flags |= XFS_IFEXTENTS;
- XFS_IFORK_FMT_SET(ip, whichfork, XFS_DINODE_FMT_EXTENTS);
+ args.total = total;
+ args.minlen = args.maxlen = args.prod = 1;
+ error = xfs_alloc_vextent(&args);
+ if (error)
+ goto done;
+
+ /* Can't fail, the space was reserved. */
+ ASSERT(args.fsbno != NULLFSBLOCK);
+ ASSERT(args.len == 1);
+ *firstblock = args.fsbno;
+ bp = xfs_btree_get_bufl(args.mp, tp, args.fsbno, 0);
+
+ /* initialise the block and copy the data */
+ init_fn(tp, bp, ip, ifp);
+
+ /* account for the change in fork size and log everything */
+ xfs_trans_log_buf(tp, bp, 0, ifp->if_bytes - 1);
+ xfs_idata_realloc(ip, -ifp->if_bytes, whichfork);
+ xfs_bmap_local_to_extents_empty(ip, whichfork);
flags |= XFS_ILOG_CORE;
+
+ xfs_iext_add(ifp, 0, 1);
+ ep = xfs_iext_get_ext(ifp, 0);
+ xfs_bmbt_set_allf(ep, 0, args.fsbno, 1, XFS_EXT_NORM);
+ trace_xfs_bmap_post_update(ip, 0,
+ whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0,
+ _THIS_IP_);
+ XFS_IFORK_NEXT_SET(ip, whichfork, 1);
+ ip->i_d.di_nblocks = 1;
+ xfs_trans_mod_dquot_byino(tp, ip,
+ XFS_TRANS_DQ_BCOUNT, 1L);
+ flags |= xfs_ilog_fext(whichfork);
+
done:
*logflagsp = flags;
return error;
@@ -1323,25 +1341,6 @@ xfs_bmap_add_attrfork_extents(
}
/*
- * Block initialisation function for local to extent format conversion.
- *
- * This shouldn't actually be called by anyone, so make sure debug kernels cause
- * a noticable failure.
- */
-STATIC void
-xfs_bmap_local_to_extents_init_fn(
- struct xfs_trans *tp,
- struct xfs_buf *bp,
- struct xfs_inode *ip,
- struct xfs_ifork *ifp)
-{
- ASSERT(0);
- bp->b_ops = &xfs_bmbt_buf_ops;
- memcpy(bp->b_addr, ifp->if_u1.if_data, ifp->if_bytes);
- xfs_trans_buf_set_type(tp, bp, XFS_BLFT_BTREE_BUF);
-}
-
-/*
* Called from xfs_bmap_add_attrfork to handle local format files. Each
* different data fork content type needs a different callout to do the
* conversion. Some are basic and only require special block initialisation
@@ -1381,9 +1380,9 @@ xfs_bmap_add_attrfork_local(
flags, XFS_DATA_FORK,
xfs_symlink_local_to_remote);
- return xfs_bmap_local_to_extents(tp, ip, firstblock, 1, flags,
- XFS_DATA_FORK,
- xfs_bmap_local_to_extents_init_fn);
+ /* should only be called for types that support local format data */
+ ASSERT(0);
+ return EFSCORRUPTED;
}
/*
@@ -4907,20 +4906,19 @@ xfs_bmapi_write(
orig_mval = mval;
orig_nmap = *nmap;
#endif
+ whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
+ XFS_ATTR_FORK : XFS_DATA_FORK;
ASSERT(*nmap >= 1);
ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
ASSERT(!(flags & XFS_BMAPI_IGSTATE));
ASSERT(tp != NULL);
ASSERT(len > 0);
-
- whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
- XFS_ATTR_FORK : XFS_DATA_FORK;
+ ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
if (unlikely(XFS_TEST_ERROR(
(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE &&
- XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL),
+ XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
mp, XFS_ERRTAG_BMAPIFORMAT, XFS_RANDOM_BMAPIFORMAT))) {
XFS_ERROR_REPORT("xfs_bmapi_write", XFS_ERRLEVEL_LOW, mp);
return XFS_ERROR(EFSCORRUPTED);
@@ -4933,37 +4931,6 @@ xfs_bmapi_write(
XFS_STATS_INC(xs_blk_mapw);
- if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
- /*
- * XXX (dgc): This assumes we are only called for inodes that
- * contain content neutral data in local format. Anything that
- * contains caller-specific data in local format that needs
- * transformation to move to a block format needs to do the
- * conversion to extent format itself.
- *
- * Directory data forks and attribute forks handle this
- * themselves, but with the addition of metadata verifiers every
- * data fork in local format now contains caller specific data
- * and as such conversion through this function is likely to be
- * broken.
- *
- * The only likely user of this branch is for remote symlinks,
- * but we cannot overwrite the data fork contents of the symlink
- * (EEXIST occurs higher up the stack) and so it will never go
- * from local format to extent format here. Hence I don't think
- * this branch is ever executed intentionally and we should
- * consider removing it and asserting that xfs_bmapi_write()
- * cannot be called directly on local format forks. i.e. callers
- * are completely responsible for local to extent format
- * conversion, not xfs_bmapi_write().
- */
- error = xfs_bmap_local_to_extents(tp, ip, firstblock, total,
- &bma.logflags, whichfork,
- xfs_bmap_local_to_extents_init_fn);
- if (error)
- goto error0;
- }
-
if (*firstblock == NULLFSBLOCK) {
if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE)
bma.minleft = be16_to_cpu(ifp->if_broot->bb_level) + 1;
diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
index 5f469c3..1cf1292 100644
--- a/fs/xfs/xfs_bmap.h
+++ b/fs/xfs/xfs_bmap.h
@@ -172,6 +172,7 @@ void xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
#endif
int xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
+void xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
void xfs_bmap_add_free(xfs_fsblock_t bno, xfs_filblks_t len,
struct xfs_bmap_free *flist, struct xfs_mount *mp);
void xfs_bmap_cancel(struct xfs_bmap_free *flist);
diff --git a/fs/xfs/xfs_dir2_block.c b/fs/xfs/xfs_dir2_block.c
index e59f5fc..53b9aa2 100644
--- a/fs/xfs/xfs_dir2_block.c
+++ b/fs/xfs/xfs_dir2_block.c
@@ -29,6 +29,7 @@
#include "xfs_dinode.h"
#include "xfs_inode.h"
#include "xfs_inode_item.h"
+#include "xfs_bmap.h"
#include "xfs_buf_item.h"
#include "xfs_dir2.h"
#include "xfs_dir2_format.h"
@@ -1167,13 +1168,15 @@ xfs_dir2_sf_to_block(
__be16 *tagp; /* end of data entry */
xfs_trans_t *tp; /* transaction pointer */
struct xfs_name name;
+ struct xfs_ifork *ifp;
trace_xfs_dir2_sf_to_block(args);
dp = args->dp;
tp = args->trans;
mp = dp->i_mount;
- ASSERT(dp->i_df.if_flags & XFS_IFINLINE);
+ ifp = XFS_IFORK_PTR(dp, XFS_DATA_FORK);
+ ASSERT(ifp->if_flags & XFS_IFINLINE);
/*
* Bomb out if the shortform directory is way too short.
*/
@@ -1182,22 +1185,23 @@ xfs_dir2_sf_to_block(
return XFS_ERROR(EIO);
}
- oldsfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+ oldsfp = (xfs_dir2_sf_hdr_t *)ifp->if_u1.if_data;
- ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
- ASSERT(dp->i_df.if_u1.if_data != NULL);
+ ASSERT(ifp->if_bytes == dp->i_d.di_size);
+ ASSERT(ifp->if_u1.if_data != NULL);
ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(oldsfp->i8count));
+ ASSERT(dp->i_d.di_nextents == 0);
/*
* Copy the directory into a temporary buffer.
* Then pitch the incore inode data so we can make extents.
*/
- sfp = kmem_alloc(dp->i_df.if_bytes, KM_SLEEP);
- memcpy(sfp, oldsfp, dp->i_df.if_bytes);
+ sfp = kmem_alloc(ifp->if_bytes, KM_SLEEP);
+ memcpy(sfp, oldsfp, ifp->if_bytes);
- xfs_idata_realloc(dp, -dp->i_df.if_bytes, XFS_DATA_FORK);
+ xfs_idata_realloc(dp, -ifp->if_bytes, XFS_DATA_FORK);
+ xfs_bmap_local_to_extents_empty(dp, XFS_DATA_FORK);
dp->i_d.di_size = 0;
- xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
/*
* Add block 0 to the inode.
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 14/17] xfs: move getdents code into it's own file
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (12 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 13/17] xfs: remove local fork format handling from xfs_bmapi_write() Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 15/17] xfs: reshuffle dir2 definitions around for userspace Dave Chinner
` (2 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The directory readdir code isno used by userspace, but it is
intermingled with files that are shared with userspace. This makes
it difficult to compare the differences between the userspac eand
kernel files are the userspace files don't have the getdents code in
them. Move all the kernel getdents code to a separate file to bring
the shared content between userspace and kernel files closer
together.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_dir2.c | 34 ---
fs/xfs/xfs_dir2_block.c | 100 +------
fs/xfs/xfs_dir2_leaf.c | 391 ---------------------------
fs/xfs/xfs_dir2_priv.h | 2 +
fs/xfs/xfs_dir2_readdir.c | 661 +++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_dir2_sf.c | 99 -------
7 files changed, 665 insertions(+), 623 deletions(-)
create mode 100644 fs/xfs/xfs_dir2_readdir.c
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 4a45080..f396fe9 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -30,6 +30,7 @@ xfs-y += xfs_aops.o \
xfs_bit.o \
xfs_buf.o \
xfs_dfrag.o \
+ xfs_dir2_readdir.o \
xfs_discard.o \
xfs_error.o \
xfs_export.o \
diff --git a/fs/xfs/xfs_dir2.c b/fs/xfs/xfs_dir2.c
index b26a50f..431be44 100644
--- a/fs/xfs/xfs_dir2.c
+++ b/fs/xfs/xfs_dir2.c
@@ -363,40 +363,6 @@ xfs_dir_removename(
}
/*
- * Read a directory.
- */
-int
-xfs_readdir(
- xfs_inode_t *dp,
- void *dirent,
- size_t bufsize,
- xfs_off_t *offset,
- filldir_t filldir)
-{
- int rval; /* return value */
- int v; /* type-checking value */
-
- trace_xfs_readdir(dp);
-
- if (XFS_FORCED_SHUTDOWN(dp->i_mount))
- return XFS_ERROR(EIO);
-
- ASSERT(S_ISDIR(dp->i_d.di_mode));
- XFS_STATS_INC(xs_dir_getdents);
-
- if (dp->i_d.di_format == XFS_DINODE_FMT_LOCAL)
- rval = xfs_dir2_sf_getdents(dp, dirent, offset, filldir);
- else if ((rval = xfs_dir2_isblock(NULL, dp, &v)))
- ;
- else if (v)
- rval = xfs_dir2_block_getdents(dp, dirent, offset, filldir);
- else
- rval = xfs_dir2_leaf_getdents(dp, dirent, bufsize, offset,
- filldir);
- return rval;
-}
-
-/*
* Replace the inode number of a directory entry.
*/
int
diff --git a/fs/xfs/xfs_dir2_block.c b/fs/xfs/xfs_dir2_block.c
index 53b9aa2..5e84000 100644
--- a/fs/xfs/xfs_dir2_block.c
+++ b/fs/xfs/xfs_dir2_block.c
@@ -126,7 +126,7 @@ const struct xfs_buf_ops xfs_dir3_block_buf_ops = {
.verify_write = xfs_dir3_block_write_verify,
};
-static int
+int
xfs_dir3_block_read(
struct xfs_trans *tp,
struct xfs_inode *dp,
@@ -565,104 +565,6 @@ xfs_dir2_block_addname(
}
/*
- * Readdir for block directories.
- */
-int /* error */
-xfs_dir2_block_getdents(
- xfs_inode_t *dp, /* incore inode */
- void *dirent,
- xfs_off_t *offset,
- filldir_t filldir)
-{
- xfs_dir2_data_hdr_t *hdr; /* block header */
- struct xfs_buf *bp; /* buffer for block */
- xfs_dir2_block_tail_t *btp; /* block tail */
- xfs_dir2_data_entry_t *dep; /* block data entry */
- xfs_dir2_data_unused_t *dup; /* block unused entry */
- char *endptr; /* end of the data entries */
- int error; /* error return value */
- xfs_mount_t *mp; /* filesystem mount point */
- char *ptr; /* current data entry */
- int wantoff; /* starting block offset */
- xfs_off_t cook;
-
- mp = dp->i_mount;
- /*
- * If the block number in the offset is out of range, we're done.
- */
- if (xfs_dir2_dataptr_to_db(mp, *offset) > mp->m_dirdatablk)
- return 0;
-
- error = xfs_dir3_block_read(NULL, dp, &bp);
- if (error)
- return error;
-
- /*
- * Extract the byte offset we start at from the seek pointer.
- * We'll skip entries before this.
- */
- wantoff = xfs_dir2_dataptr_to_off(mp, *offset);
- hdr = bp->b_addr;
- xfs_dir3_data_check(dp, bp);
- /*
- * Set up values for the loop.
- */
- btp = xfs_dir2_block_tail_p(mp, hdr);
- ptr = (char *)xfs_dir3_data_entry_p(hdr);
- endptr = (char *)xfs_dir2_block_leaf_p(btp);
-
- /*
- * Loop over the data portion of the block.
- * Each object is a real entry (dep) or an unused one (dup).
- */
- while (ptr < endptr) {
- dup = (xfs_dir2_data_unused_t *)ptr;
- /*
- * Unused, skip it.
- */
- if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
- ptr += be16_to_cpu(dup->length);
- continue;
- }
-
- dep = (xfs_dir2_data_entry_t *)ptr;
-
- /*
- * Bump pointer for the next iteration.
- */
- ptr += xfs_dir2_data_entsize(dep->namelen);
- /*
- * The entry is before the desired starting point, skip it.
- */
- if ((char *)dep - (char *)hdr < wantoff)
- continue;
-
- cook = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
- (char *)dep - (char *)hdr);
-
- /*
- * If it didn't fit, set the final offset to here & return.
- */
- if (filldir(dirent, (char *)dep->name, dep->namelen,
- cook & 0x7fffffff, be64_to_cpu(dep->inumber),
- DT_UNKNOWN)) {
- *offset = cook & 0x7fffffff;
- xfs_trans_brelse(NULL, bp);
- return 0;
- }
- }
-
- /*
- * Reached the end of the block.
- * Set the offset to a non-existent block 1 and return.
- */
- *offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk + 1, 0) &
- 0x7fffffff;
- xfs_trans_brelse(NULL, bp);
- return 0;
-}
-
-/*
* Log leaf entries from the block.
*/
static void
diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index bb22aac..e1aed21 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -1083,397 +1083,6 @@ xfs_dir3_leaf_compact_x1(
*highstalep = highstale;
}
-struct xfs_dir2_leaf_map_info {
- xfs_extlen_t map_blocks; /* number of fsbs in map */
- xfs_dablk_t map_off; /* last mapped file offset */
- int map_size; /* total entries in *map */
- int map_valid; /* valid entries in *map */
- int nmap; /* mappings to ask xfs_bmapi */
- xfs_dir2_db_t curdb; /* db for current block */
- int ra_current; /* number of read-ahead blks */
- int ra_index; /* *map index for read-ahead */
- int ra_offset; /* map entry offset for ra */
- int ra_want; /* readahead count wanted */
- struct xfs_bmbt_irec map[]; /* map vector for blocks */
-};
-
-STATIC int
-xfs_dir2_leaf_readbuf(
- struct xfs_inode *dp,
- size_t bufsize,
- struct xfs_dir2_leaf_map_info *mip,
- xfs_dir2_off_t *curoff,
- struct xfs_buf **bpp)
-{
- struct xfs_mount *mp = dp->i_mount;
- struct xfs_buf *bp = *bpp;
- struct xfs_bmbt_irec *map = mip->map;
- struct blk_plug plug;
- int error = 0;
- int length;
- int i;
- int j;
-
- /*
- * If we have a buffer, we need to release it and
- * take it out of the mapping.
- */
-
- if (bp) {
- xfs_trans_brelse(NULL, bp);
- bp = NULL;
- mip->map_blocks -= mp->m_dirblkfsbs;
- /*
- * Loop to get rid of the extents for the
- * directory block.
- */
- for (i = mp->m_dirblkfsbs; i > 0; ) {
- j = min_t(int, map->br_blockcount, i);
- map->br_blockcount -= j;
- map->br_startblock += j;
- map->br_startoff += j;
- /*
- * If mapping is done, pitch it from
- * the table.
- */
- if (!map->br_blockcount && --mip->map_valid)
- memmove(&map[0], &map[1],
- sizeof(map[0]) * mip->map_valid);
- i -= j;
- }
- }
-
- /*
- * Recalculate the readahead blocks wanted.
- */
- mip->ra_want = howmany(bufsize + mp->m_dirblksize,
- mp->m_sb.sb_blocksize) - 1;
- ASSERT(mip->ra_want >= 0);
-
- /*
- * If we don't have as many as we want, and we haven't
- * run out of data blocks, get some more mappings.
- */
- if (1 + mip->ra_want > mip->map_blocks &&
- mip->map_off < xfs_dir2_byte_to_da(mp, XFS_DIR2_LEAF_OFFSET)) {
- /*
- * Get more bmaps, fill in after the ones
- * we already have in the table.
- */
- mip->nmap = mip->map_size - mip->map_valid;
- error = xfs_bmapi_read(dp, mip->map_off,
- xfs_dir2_byte_to_da(mp, XFS_DIR2_LEAF_OFFSET) -
- mip->map_off,
- &map[mip->map_valid], &mip->nmap, 0);
-
- /*
- * Don't know if we should ignore this or try to return an
- * error. The trouble with returning errors is that readdir
- * will just stop without actually passing the error through.
- */
- if (error)
- goto out; /* XXX */
-
- /*
- * If we got all the mappings we asked for, set the final map
- * offset based on the last bmap value received. Otherwise,
- * we've reached the end.
- */
- if (mip->nmap == mip->map_size - mip->map_valid) {
- i = mip->map_valid + mip->nmap - 1;
- mip->map_off = map[i].br_startoff + map[i].br_blockcount;
- } else
- mip->map_off = xfs_dir2_byte_to_da(mp,
- XFS_DIR2_LEAF_OFFSET);
-
- /*
- * Look for holes in the mapping, and eliminate them. Count up
- * the valid blocks.
- */
- for (i = mip->map_valid; i < mip->map_valid + mip->nmap; ) {
- if (map[i].br_startblock == HOLESTARTBLOCK) {
- mip->nmap--;
- length = mip->map_valid + mip->nmap - i;
- if (length)
- memmove(&map[i], &map[i + 1],
- sizeof(map[i]) * length);
- } else {
- mip->map_blocks += map[i].br_blockcount;
- i++;
- }
- }
- mip->map_valid += mip->nmap;
- }
-
- /*
- * No valid mappings, so no more data blocks.
- */
- if (!mip->map_valid) {
- *curoff = xfs_dir2_da_to_byte(mp, mip->map_off);
- goto out;
- }
-
- /*
- * Read the directory block starting at the first mapping.
- */
- mip->curdb = xfs_dir2_da_to_db(mp, map->br_startoff);
- error = xfs_dir3_data_read(NULL, dp, map->br_startoff,
- map->br_blockcount >= mp->m_dirblkfsbs ?
- XFS_FSB_TO_DADDR(mp, map->br_startblock) : -1, &bp);
-
- /*
- * Should just skip over the data block instead of giving up.
- */
- if (error)
- goto out; /* XXX */
-
- /*
- * Adjust the current amount of read-ahead: we just read a block that
- * was previously ra.
- */
- if (mip->ra_current)
- mip->ra_current -= mp->m_dirblkfsbs;
-
- /*
- * Do we need more readahead?
- */
- blk_start_plug(&plug);
- for (mip->ra_index = mip->ra_offset = i = 0;
- mip->ra_want > mip->ra_current && i < mip->map_blocks;
- i += mp->m_dirblkfsbs) {
- ASSERT(mip->ra_index < mip->map_valid);
- /*
- * Read-ahead a contiguous directory block.
- */
- if (i > mip->ra_current &&
- map[mip->ra_index].br_blockcount >= mp->m_dirblkfsbs) {
- xfs_dir3_data_readahead(NULL, dp,
- map[mip->ra_index].br_startoff + mip->ra_offset,
- XFS_FSB_TO_DADDR(mp,
- map[mip->ra_index].br_startblock +
- mip->ra_offset));
- mip->ra_current = i;
- }
-
- /*
- * Read-ahead a non-contiguous directory block. This doesn't
- * use our mapping, but this is a very rare case.
- */
- else if (i > mip->ra_current) {
- xfs_dir3_data_readahead(NULL, dp,
- map[mip->ra_index].br_startoff +
- mip->ra_offset, -1);
- mip->ra_current = i;
- }
-
- /*
- * Advance offset through the mapping table.
- */
- for (j = 0; j < mp->m_dirblkfsbs; j++) {
- /*
- * The rest of this extent but not more than a dir
- * block.
- */
- length = min_t(int, mp->m_dirblkfsbs,
- map[mip->ra_index].br_blockcount -
- mip->ra_offset);
- j += length;
- mip->ra_offset += length;
-
- /*
- * Advance to the next mapping if this one is used up.
- */
- if (mip->ra_offset == map[mip->ra_index].br_blockcount) {
- mip->ra_offset = 0;
- mip->ra_index++;
- }
- }
- }
- blk_finish_plug(&plug);
-
-out:
- *bpp = bp;
- return error;
-}
-
-/*
- * Getdents (readdir) for leaf and node directories.
- * This reads the data blocks only, so is the same for both forms.
- */
-int /* error */
-xfs_dir2_leaf_getdents(
- xfs_inode_t *dp, /* incore directory inode */
- void *dirent,
- size_t bufsize,
- xfs_off_t *offset,
- filldir_t filldir)
-{
- struct xfs_buf *bp = NULL; /* data block buffer */
- xfs_dir2_data_hdr_t *hdr; /* data block header */
- xfs_dir2_data_entry_t *dep; /* data entry */
- xfs_dir2_data_unused_t *dup; /* unused entry */
- int error = 0; /* error return value */
- int length; /* temporary length value */
- xfs_mount_t *mp; /* filesystem mount point */
- int byteoff; /* offset in current block */
- xfs_dir2_off_t curoff; /* current overall offset */
- xfs_dir2_off_t newoff; /* new curoff after new blk */
- char *ptr = NULL; /* pointer to current data */
- struct xfs_dir2_leaf_map_info *map_info;
-
- /*
- * If the offset is at or past the largest allowed value,
- * give up right away.
- */
- if (*offset >= XFS_DIR2_MAX_DATAPTR)
- return 0;
-
- mp = dp->i_mount;
-
- /*
- * Set up to bmap a number of blocks based on the caller's
- * buffer size, the directory block size, and the filesystem
- * block size.
- */
- length = howmany(bufsize + mp->m_dirblksize,
- mp->m_sb.sb_blocksize);
- map_info = kmem_zalloc(offsetof(struct xfs_dir2_leaf_map_info, map) +
- (length * sizeof(struct xfs_bmbt_irec)),
- KM_SLEEP | KM_NOFS);
- map_info->map_size = length;
-
- /*
- * Inside the loop we keep the main offset value as a byte offset
- * in the directory file.
- */
- curoff = xfs_dir2_dataptr_to_byte(mp, *offset);
-
- /*
- * Force this conversion through db so we truncate the offset
- * down to get the start of the data block.
- */
- map_info->map_off = xfs_dir2_db_to_da(mp,
- xfs_dir2_byte_to_db(mp, curoff));
-
- /*
- * Loop over directory entries until we reach the end offset.
- * Get more blocks and readahead as necessary.
- */
- while (curoff < XFS_DIR2_LEAF_OFFSET) {
- /*
- * If we have no buffer, or we're off the end of the
- * current buffer, need to get another one.
- */
- if (!bp || ptr >= (char *)bp->b_addr + mp->m_dirblksize) {
-
- error = xfs_dir2_leaf_readbuf(dp, bufsize, map_info,
- &curoff, &bp);
- if (error || !map_info->map_valid)
- break;
-
- /*
- * Having done a read, we need to set a new offset.
- */
- newoff = xfs_dir2_db_off_to_byte(mp, map_info->curdb, 0);
- /*
- * Start of the current block.
- */
- if (curoff < newoff)
- curoff = newoff;
- /*
- * Make sure we're in the right block.
- */
- else if (curoff > newoff)
- ASSERT(xfs_dir2_byte_to_db(mp, curoff) ==
- map_info->curdb);
- hdr = bp->b_addr;
- xfs_dir3_data_check(dp, bp);
- /*
- * Find our position in the block.
- */
- ptr = (char *)xfs_dir3_data_entry_p(hdr);
- byteoff = xfs_dir2_byte_to_off(mp, curoff);
- /*
- * Skip past the header.
- */
- if (byteoff == 0)
- curoff += xfs_dir3_data_entry_offset(hdr);
- /*
- * Skip past entries until we reach our offset.
- */
- else {
- while ((char *)ptr - (char *)hdr < byteoff) {
- dup = (xfs_dir2_data_unused_t *)ptr;
-
- if (be16_to_cpu(dup->freetag)
- == XFS_DIR2_DATA_FREE_TAG) {
-
- length = be16_to_cpu(dup->length);
- ptr += length;
- continue;
- }
- dep = (xfs_dir2_data_entry_t *)ptr;
- length =
- xfs_dir2_data_entsize(dep->namelen);
- ptr += length;
- }
- /*
- * Now set our real offset.
- */
- curoff =
- xfs_dir2_db_off_to_byte(mp,
- xfs_dir2_byte_to_db(mp, curoff),
- (char *)ptr - (char *)hdr);
- if (ptr >= (char *)hdr + mp->m_dirblksize) {
- continue;
- }
- }
- }
- /*
- * We have a pointer to an entry.
- * Is it a live one?
- */
- dup = (xfs_dir2_data_unused_t *)ptr;
- /*
- * No, it's unused, skip over it.
- */
- if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
- length = be16_to_cpu(dup->length);
- ptr += length;
- curoff += length;
- continue;
- }
-
- dep = (xfs_dir2_data_entry_t *)ptr;
- length = xfs_dir2_data_entsize(dep->namelen);
-
- if (filldir(dirent, (char *)dep->name, dep->namelen,
- xfs_dir2_byte_to_dataptr(mp, curoff) & 0x7fffffff,
- be64_to_cpu(dep->inumber), DT_UNKNOWN))
- break;
-
- /*
- * Advance to next entry in the block.
- */
- ptr += length;
- curoff += length;
- /* bufsize may have just been a guess; don't go negative */
- bufsize = bufsize > length ? bufsize - length : 0;
- }
-
- /*
- * All done. Set output offset value to current offset.
- */
- if (curoff > xfs_dir2_dataptr_to_byte(mp, XFS_DIR2_MAX_DATAPTR))
- *offset = XFS_DIR2_MAX_DATAPTR & 0x7fffffff;
- else
- *offset = xfs_dir2_byte_to_dataptr(mp, curoff) & 0x7fffffff;
- kmem_free(map_info);
- if (bp)
- xfs_trans_brelse(NULL, bp);
- return error;
-}
-
/*
* Log the bests entries indicated from a leaf1 block.
diff --git a/fs/xfs/xfs_dir2_priv.h b/fs/xfs/xfs_dir2_priv.h
index 7cf573c..1f949df 100644
--- a/fs/xfs/xfs_dir2_priv.h
+++ b/fs/xfs/xfs_dir2_priv.h
@@ -32,6 +32,8 @@ extern int xfs_dir_cilookup_result(struct xfs_da_args *args,
/* xfs_dir2_block.c */
extern const struct xfs_buf_ops xfs_dir3_block_buf_ops;
+extern int xfs_dir3_block_read(struct xfs_trans *tp, struct xfs_inode *dp,
+ struct xfs_buf **bpp);
extern int xfs_dir2_block_addname(struct xfs_da_args *args);
extern int xfs_dir2_block_getdents(struct xfs_inode *dp, void *dirent,
xfs_off_t *offset, filldir_t filldir);
diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
new file mode 100644
index 0000000..dcf17cf
--- /dev/null
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -0,0 +1,661 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2013 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_bmap.h"
+
+STATIC int
+xfs_dir2_sf_getdents(
+ xfs_inode_t *dp, /* incore directory inode */
+ void *dirent,
+ xfs_off_t *offset,
+ filldir_t filldir)
+{
+ int i; /* shortform entry number */
+ xfs_mount_t *mp; /* filesystem mount point */
+ xfs_dir2_dataptr_t off; /* current entry's offset */
+ xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */
+ xfs_dir2_sf_hdr_t *sfp; /* shortform structure */
+ xfs_dir2_dataptr_t dot_offset;
+ xfs_dir2_dataptr_t dotdot_offset;
+ xfs_ino_t ino;
+
+ mp = dp->i_mount;
+
+ ASSERT(dp->i_df.if_flags & XFS_IFINLINE);
+ /*
+ * Give up if the directory is way too short.
+ */
+ if (dp->i_d.di_size < offsetof(xfs_dir2_sf_hdr_t, parent)) {
+ ASSERT(XFS_FORCED_SHUTDOWN(mp));
+ return XFS_ERROR(EIO);
+ }
+
+ ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
+ ASSERT(dp->i_df.if_u1.if_data != NULL);
+
+ sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+
+ ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
+
+ /*
+ * If the block number in the offset is out of range, we're done.
+ */
+ if (xfs_dir2_dataptr_to_db(mp, *offset) > mp->m_dirdatablk)
+ return 0;
+
+ /*
+ * Precalculate offsets for . and .. as we will always need them.
+ *
+ * XXX(hch): the second argument is sometimes 0 and sometimes
+ * mp->m_dirdatablk.
+ */
+ dot_offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
+ XFS_DIR3_DATA_DOT_OFFSET(mp));
+ dotdot_offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
+ XFS_DIR3_DATA_DOTDOT_OFFSET(mp));
+
+ /*
+ * Put . entry unless we're starting past it.
+ */
+ if (*offset <= dot_offset) {
+ if (filldir(dirent, ".", 1, dot_offset & 0x7fffffff, dp->i_ino, DT_DIR)) {
+ *offset = dot_offset & 0x7fffffff;
+ return 0;
+ }
+ }
+
+ /*
+ * Put .. entry unless we're starting past it.
+ */
+ if (*offset <= dotdot_offset) {
+ ino = xfs_dir2_sf_get_parent_ino(sfp);
+ if (filldir(dirent, "..", 2, dotdot_offset & 0x7fffffff, ino, DT_DIR)) {
+ *offset = dotdot_offset & 0x7fffffff;
+ return 0;
+ }
+ }
+
+ /*
+ * Loop while there are more entries and put'ing works.
+ */
+ sfep = xfs_dir2_sf_firstentry(sfp);
+ for (i = 0; i < sfp->count; i++) {
+ off = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
+ xfs_dir2_sf_get_offset(sfep));
+
+ if (*offset > off) {
+ sfep = xfs_dir2_sf_nextentry(sfp, sfep);
+ continue;
+ }
+
+ ino = xfs_dir2_sfe_get_ino(sfp, sfep);
+ if (filldir(dirent, (char *)sfep->name, sfep->namelen,
+ off & 0x7fffffff, ino, DT_UNKNOWN)) {
+ *offset = off & 0x7fffffff;
+ return 0;
+ }
+ sfep = xfs_dir2_sf_nextentry(sfp, sfep);
+ }
+
+ *offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk + 1, 0) &
+ 0x7fffffff;
+ return 0;
+}
+
+
+/*
+ * Readdir for block directories.
+ */
+STATIC int
+xfs_dir2_block_getdents(
+ xfs_inode_t *dp, /* incore inode */
+ void *dirent,
+ xfs_off_t *offset,
+ filldir_t filldir)
+{
+ xfs_dir2_data_hdr_t *hdr; /* block header */
+ struct xfs_buf *bp; /* buffer for block */
+ xfs_dir2_block_tail_t *btp; /* block tail */
+ xfs_dir2_data_entry_t *dep; /* block data entry */
+ xfs_dir2_data_unused_t *dup; /* block unused entry */
+ char *endptr; /* end of the data entries */
+ int error; /* error return value */
+ xfs_mount_t *mp; /* filesystem mount point */
+ char *ptr; /* current data entry */
+ int wantoff; /* starting block offset */
+ xfs_off_t cook;
+
+ mp = dp->i_mount;
+ /*
+ * If the block number in the offset is out of range, we're done.
+ */
+ if (xfs_dir2_dataptr_to_db(mp, *offset) > mp->m_dirdatablk)
+ return 0;
+
+ error = xfs_dir3_block_read(NULL, dp, &bp);
+ if (error)
+ return error;
+
+ /*
+ * Extract the byte offset we start at from the seek pointer.
+ * We'll skip entries before this.
+ */
+ wantoff = xfs_dir2_dataptr_to_off(mp, *offset);
+ hdr = bp->b_addr;
+ xfs_dir3_data_check(dp, bp);
+ /*
+ * Set up values for the loop.
+ */
+ btp = xfs_dir2_block_tail_p(mp, hdr);
+ ptr = (char *)xfs_dir3_data_entry_p(hdr);
+ endptr = (char *)xfs_dir2_block_leaf_p(btp);
+
+ /*
+ * Loop over the data portion of the block.
+ * Each object is a real entry (dep) or an unused one (dup).
+ */
+ while (ptr < endptr) {
+ dup = (xfs_dir2_data_unused_t *)ptr;
+ /*
+ * Unused, skip it.
+ */
+ if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
+ ptr += be16_to_cpu(dup->length);
+ continue;
+ }
+
+ dep = (xfs_dir2_data_entry_t *)ptr;
+
+ /*
+ * Bump pointer for the next iteration.
+ */
+ ptr += xfs_dir2_data_entsize(dep->namelen);
+ /*
+ * The entry is before the desired starting point, skip it.
+ */
+ if ((char *)dep - (char *)hdr < wantoff)
+ continue;
+
+ cook = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
+ (char *)dep - (char *)hdr);
+
+ /*
+ * If it didn't fit, set the final offset to here & return.
+ */
+ if (filldir(dirent, (char *)dep->name, dep->namelen,
+ cook & 0x7fffffff, be64_to_cpu(dep->inumber),
+ DT_UNKNOWN)) {
+ *offset = cook & 0x7fffffff;
+ xfs_trans_brelse(NULL, bp);
+ return 0;
+ }
+ }
+
+ /*
+ * Reached the end of the block.
+ * Set the offset to a non-existent block 1 and return.
+ */
+ *offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk + 1, 0) &
+ 0x7fffffff;
+ xfs_trans_brelse(NULL, bp);
+ return 0;
+}
+
+struct xfs_dir2_leaf_map_info {
+ xfs_extlen_t map_blocks; /* number of fsbs in map */
+ xfs_dablk_t map_off; /* last mapped file offset */
+ int map_size; /* total entries in *map */
+ int map_valid; /* valid entries in *map */
+ int nmap; /* mappings to ask xfs_bmapi */
+ xfs_dir2_db_t curdb; /* db for current block */
+ int ra_current; /* number of read-ahead blks */
+ int ra_index; /* *map index for read-ahead */
+ int ra_offset; /* map entry offset for ra */
+ int ra_want; /* readahead count wanted */
+ struct xfs_bmbt_irec map[]; /* map vector for blocks */
+};
+
+STATIC int
+xfs_dir2_leaf_readbuf(
+ struct xfs_inode *dp,
+ size_t bufsize,
+ struct xfs_dir2_leaf_map_info *mip,
+ xfs_dir2_off_t *curoff,
+ struct xfs_buf **bpp)
+{
+ struct xfs_mount *mp = dp->i_mount;
+ struct xfs_buf *bp = *bpp;
+ struct xfs_bmbt_irec *map = mip->map;
+ struct blk_plug plug;
+ int error = 0;
+ int length;
+ int i;
+ int j;
+
+ /*
+ * If we have a buffer, we need to release it and
+ * take it out of the mapping.
+ */
+
+ if (bp) {
+ xfs_trans_brelse(NULL, bp);
+ bp = NULL;
+ mip->map_blocks -= mp->m_dirblkfsbs;
+ /*
+ * Loop to get rid of the extents for the
+ * directory block.
+ */
+ for (i = mp->m_dirblkfsbs; i > 0; ) {
+ j = min_t(int, map->br_blockcount, i);
+ map->br_blockcount -= j;
+ map->br_startblock += j;
+ map->br_startoff += j;
+ /*
+ * If mapping is done, pitch it from
+ * the table.
+ */
+ if (!map->br_blockcount && --mip->map_valid)
+ memmove(&map[0], &map[1],
+ sizeof(map[0]) * mip->map_valid);
+ i -= j;
+ }
+ }
+
+ /*
+ * Recalculate the readahead blocks wanted.
+ */
+ mip->ra_want = howmany(bufsize + mp->m_dirblksize,
+ mp->m_sb.sb_blocksize) - 1;
+ ASSERT(mip->ra_want >= 0);
+
+ /*
+ * If we don't have as many as we want, and we haven't
+ * run out of data blocks, get some more mappings.
+ */
+ if (1 + mip->ra_want > mip->map_blocks &&
+ mip->map_off < xfs_dir2_byte_to_da(mp, XFS_DIR2_LEAF_OFFSET)) {
+ /*
+ * Get more bmaps, fill in after the ones
+ * we already have in the table.
+ */
+ mip->nmap = mip->map_size - mip->map_valid;
+ error = xfs_bmapi_read(dp, mip->map_off,
+ xfs_dir2_byte_to_da(mp, XFS_DIR2_LEAF_OFFSET) -
+ mip->map_off,
+ &map[mip->map_valid], &mip->nmap, 0);
+
+ /*
+ * Don't know if we should ignore this or try to return an
+ * error. The trouble with returning errors is that readdir
+ * will just stop without actually passing the error through.
+ */
+ if (error)
+ goto out; /* XXX */
+
+ /*
+ * If we got all the mappings we asked for, set the final map
+ * offset based on the last bmap value received. Otherwise,
+ * we've reached the end.
+ */
+ if (mip->nmap == mip->map_size - mip->map_valid) {
+ i = mip->map_valid + mip->nmap - 1;
+ mip->map_off = map[i].br_startoff + map[i].br_blockcount;
+ } else
+ mip->map_off = xfs_dir2_byte_to_da(mp,
+ XFS_DIR2_LEAF_OFFSET);
+
+ /*
+ * Look for holes in the mapping, and eliminate them. Count up
+ * the valid blocks.
+ */
+ for (i = mip->map_valid; i < mip->map_valid + mip->nmap; ) {
+ if (map[i].br_startblock == HOLESTARTBLOCK) {
+ mip->nmap--;
+ length = mip->map_valid + mip->nmap - i;
+ if (length)
+ memmove(&map[i], &map[i + 1],
+ sizeof(map[i]) * length);
+ } else {
+ mip->map_blocks += map[i].br_blockcount;
+ i++;
+ }
+ }
+ mip->map_valid += mip->nmap;
+ }
+
+ /*
+ * No valid mappings, so no more data blocks.
+ */
+ if (!mip->map_valid) {
+ *curoff = xfs_dir2_da_to_byte(mp, mip->map_off);
+ goto out;
+ }
+
+ /*
+ * Read the directory block starting at the first mapping.
+ */
+ mip->curdb = xfs_dir2_da_to_db(mp, map->br_startoff);
+ error = xfs_dir3_data_read(NULL, dp, map->br_startoff,
+ map->br_blockcount >= mp->m_dirblkfsbs ?
+ XFS_FSB_TO_DADDR(mp, map->br_startblock) : -1, &bp);
+
+ /*
+ * Should just skip over the data block instead of giving up.
+ */
+ if (error)
+ goto out; /* XXX */
+
+ /*
+ * Adjust the current amount of read-ahead: we just read a block that
+ * was previously ra.
+ */
+ if (mip->ra_current)
+ mip->ra_current -= mp->m_dirblkfsbs;
+
+ /*
+ * Do we need more readahead?
+ */
+ blk_start_plug(&plug);
+ for (mip->ra_index = mip->ra_offset = i = 0;
+ mip->ra_want > mip->ra_current && i < mip->map_blocks;
+ i += mp->m_dirblkfsbs) {
+ ASSERT(mip->ra_index < mip->map_valid);
+ /*
+ * Read-ahead a contiguous directory block.
+ */
+ if (i > mip->ra_current &&
+ map[mip->ra_index].br_blockcount >= mp->m_dirblkfsbs) {
+ xfs_dir3_data_readahead(NULL, dp,
+ map[mip->ra_index].br_startoff + mip->ra_offset,
+ XFS_FSB_TO_DADDR(mp,
+ map[mip->ra_index].br_startblock +
+ mip->ra_offset));
+ mip->ra_current = i;
+ }
+
+ /*
+ * Read-ahead a non-contiguous directory block. This doesn't
+ * use our mapping, but this is a very rare case.
+ */
+ else if (i > mip->ra_current) {
+ xfs_dir3_data_readahead(NULL, dp,
+ map[mip->ra_index].br_startoff +
+ mip->ra_offset, -1);
+ mip->ra_current = i;
+ }
+
+ /*
+ * Advance offset through the mapping table.
+ */
+ for (j = 0; j < mp->m_dirblkfsbs; j++) {
+ /*
+ * The rest of this extent but not more than a dir
+ * block.
+ */
+ length = min_t(int, mp->m_dirblkfsbs,
+ map[mip->ra_index].br_blockcount -
+ mip->ra_offset);
+ j += length;
+ mip->ra_offset += length;
+
+ /*
+ * Advance to the next mapping if this one is used up.
+ */
+ if (mip->ra_offset == map[mip->ra_index].br_blockcount) {
+ mip->ra_offset = 0;
+ mip->ra_index++;
+ }
+ }
+ }
+ blk_finish_plug(&plug);
+
+out:
+ *bpp = bp;
+ return error;
+}
+
+/*
+ * Getdents (readdir) for leaf and node directories.
+ * This reads the data blocks only, so is the same for both forms.
+ */
+STATIC int
+xfs_dir2_leaf_getdents(
+ xfs_inode_t *dp, /* incore directory inode */
+ void *dirent,
+ size_t bufsize,
+ xfs_off_t *offset,
+ filldir_t filldir)
+{
+ struct xfs_buf *bp = NULL; /* data block buffer */
+ xfs_dir2_data_hdr_t *hdr; /* data block header */
+ xfs_dir2_data_entry_t *dep; /* data entry */
+ xfs_dir2_data_unused_t *dup; /* unused entry */
+ int error = 0; /* error return value */
+ int length; /* temporary length value */
+ xfs_mount_t *mp; /* filesystem mount point */
+ int byteoff; /* offset in current block */
+ xfs_dir2_off_t curoff; /* current overall offset */
+ xfs_dir2_off_t newoff; /* new curoff after new blk */
+ char *ptr = NULL; /* pointer to current data */
+ struct xfs_dir2_leaf_map_info *map_info;
+
+ /*
+ * If the offset is at or past the largest allowed value,
+ * give up right away.
+ */
+ if (*offset >= XFS_DIR2_MAX_DATAPTR)
+ return 0;
+
+ mp = dp->i_mount;
+
+ /*
+ * Set up to bmap a number of blocks based on the caller's
+ * buffer size, the directory block size, and the filesystem
+ * block size.
+ */
+ length = howmany(bufsize + mp->m_dirblksize,
+ mp->m_sb.sb_blocksize);
+ map_info = kmem_zalloc(offsetof(struct xfs_dir2_leaf_map_info, map) +
+ (length * sizeof(struct xfs_bmbt_irec)),
+ KM_SLEEP | KM_NOFS);
+ map_info->map_size = length;
+
+ /*
+ * Inside the loop we keep the main offset value as a byte offset
+ * in the directory file.
+ */
+ curoff = xfs_dir2_dataptr_to_byte(mp, *offset);
+
+ /*
+ * Force this conversion through db so we truncate the offset
+ * down to get the start of the data block.
+ */
+ map_info->map_off = xfs_dir2_db_to_da(mp,
+ xfs_dir2_byte_to_db(mp, curoff));
+
+ /*
+ * Loop over directory entries until we reach the end offset.
+ * Get more blocks and readahead as necessary.
+ */
+ while (curoff < XFS_DIR2_LEAF_OFFSET) {
+ /*
+ * If we have no buffer, or we're off the end of the
+ * current buffer, need to get another one.
+ */
+ if (!bp || ptr >= (char *)bp->b_addr + mp->m_dirblksize) {
+
+ error = xfs_dir2_leaf_readbuf(dp, bufsize, map_info,
+ &curoff, &bp);
+ if (error || !map_info->map_valid)
+ break;
+
+ /*
+ * Having done a read, we need to set a new offset.
+ */
+ newoff = xfs_dir2_db_off_to_byte(mp, map_info->curdb, 0);
+ /*
+ * Start of the current block.
+ */
+ if (curoff < newoff)
+ curoff = newoff;
+ /*
+ * Make sure we're in the right block.
+ */
+ else if (curoff > newoff)
+ ASSERT(xfs_dir2_byte_to_db(mp, curoff) ==
+ map_info->curdb);
+ hdr = bp->b_addr;
+ xfs_dir3_data_check(dp, bp);
+ /*
+ * Find our position in the block.
+ */
+ ptr = (char *)xfs_dir3_data_entry_p(hdr);
+ byteoff = xfs_dir2_byte_to_off(mp, curoff);
+ /*
+ * Skip past the header.
+ */
+ if (byteoff == 0)
+ curoff += xfs_dir3_data_entry_offset(hdr);
+ /*
+ * Skip past entries until we reach our offset.
+ */
+ else {
+ while ((char *)ptr - (char *)hdr < byteoff) {
+ dup = (xfs_dir2_data_unused_t *)ptr;
+
+ if (be16_to_cpu(dup->freetag)
+ == XFS_DIR2_DATA_FREE_TAG) {
+
+ length = be16_to_cpu(dup->length);
+ ptr += length;
+ continue;
+ }
+ dep = (xfs_dir2_data_entry_t *)ptr;
+ length =
+ xfs_dir2_data_entsize(dep->namelen);
+ ptr += length;
+ }
+ /*
+ * Now set our real offset.
+ */
+ curoff =
+ xfs_dir2_db_off_to_byte(mp,
+ xfs_dir2_byte_to_db(mp, curoff),
+ (char *)ptr - (char *)hdr);
+ if (ptr >= (char *)hdr + mp->m_dirblksize) {
+ continue;
+ }
+ }
+ }
+ /*
+ * We have a pointer to an entry.
+ * Is it a live one?
+ */
+ dup = (xfs_dir2_data_unused_t *)ptr;
+ /*
+ * No, it's unused, skip over it.
+ */
+ if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
+ length = be16_to_cpu(dup->length);
+ ptr += length;
+ curoff += length;
+ continue;
+ }
+
+ dep = (xfs_dir2_data_entry_t *)ptr;
+ length = xfs_dir2_data_entsize(dep->namelen);
+
+ if (filldir(dirent, (char *)dep->name, dep->namelen,
+ xfs_dir2_byte_to_dataptr(mp, curoff) & 0x7fffffff,
+ be64_to_cpu(dep->inumber), DT_UNKNOWN))
+ break;
+
+ /*
+ * Advance to next entry in the block.
+ */
+ ptr += length;
+ curoff += length;
+ /* bufsize may have just been a guess; don't go negative */
+ bufsize = bufsize > length ? bufsize - length : 0;
+ }
+
+ /*
+ * All done. Set output offset value to current offset.
+ */
+ if (curoff > xfs_dir2_dataptr_to_byte(mp, XFS_DIR2_MAX_DATAPTR))
+ *offset = XFS_DIR2_MAX_DATAPTR & 0x7fffffff;
+ else
+ *offset = xfs_dir2_byte_to_dataptr(mp, curoff) & 0x7fffffff;
+ kmem_free(map_info);
+ if (bp)
+ xfs_trans_brelse(NULL, bp);
+ return error;
+}
+
+/*
+ * Read a directory.
+ */
+int
+xfs_readdir(
+ xfs_inode_t *dp,
+ void *dirent,
+ size_t bufsize,
+ xfs_off_t *offset,
+ filldir_t filldir)
+{
+ int rval; /* return value */
+ int v; /* type-checking value */
+
+ trace_xfs_readdir(dp);
+
+ if (XFS_FORCED_SHUTDOWN(dp->i_mount))
+ return XFS_ERROR(EIO);
+
+ ASSERT(S_ISDIR(dp->i_d.di_mode));
+ XFS_STATS_INC(xs_dir_getdents);
+
+ if (dp->i_d.di_format == XFS_DINODE_FMT_LOCAL)
+ rval = xfs_dir2_sf_getdents(dp, dirent, offset, filldir);
+ else if ((rval = xfs_dir2_isblock(NULL, dp, &v)))
+ ;
+ else if (v)
+ rval = xfs_dir2_block_getdents(dp, dirent, offset, filldir);
+ else
+ rval = xfs_dir2_leaf_getdents(dp, dirent, bufsize, offset,
+ filldir);
+ return rval;
+}
+
diff --git a/fs/xfs/xfs_dir2_sf.c b/fs/xfs/xfs_dir2_sf.c
index 6157424..f24ce90 100644
--- a/fs/xfs/xfs_dir2_sf.c
+++ b/fs/xfs/xfs_dir2_sf.c
@@ -765,105 +765,6 @@ xfs_dir2_sf_create(
return 0;
}
-int /* error */
-xfs_dir2_sf_getdents(
- xfs_inode_t *dp, /* incore directory inode */
- void *dirent,
- xfs_off_t *offset,
- filldir_t filldir)
-{
- int i; /* shortform entry number */
- xfs_mount_t *mp; /* filesystem mount point */
- xfs_dir2_dataptr_t off; /* current entry's offset */
- xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */
- xfs_dir2_sf_hdr_t *sfp; /* shortform structure */
- xfs_dir2_dataptr_t dot_offset;
- xfs_dir2_dataptr_t dotdot_offset;
- xfs_ino_t ino;
-
- mp = dp->i_mount;
-
- ASSERT(dp->i_df.if_flags & XFS_IFINLINE);
- /*
- * Give up if the directory is way too short.
- */
- if (dp->i_d.di_size < offsetof(xfs_dir2_sf_hdr_t, parent)) {
- ASSERT(XFS_FORCED_SHUTDOWN(mp));
- return XFS_ERROR(EIO);
- }
-
- ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
- ASSERT(dp->i_df.if_u1.if_data != NULL);
-
- sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
-
- ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
-
- /*
- * If the block number in the offset is out of range, we're done.
- */
- if (xfs_dir2_dataptr_to_db(mp, *offset) > mp->m_dirdatablk)
- return 0;
-
- /*
- * Precalculate offsets for . and .. as we will always need them.
- *
- * XXX(hch): the second argument is sometimes 0 and sometimes
- * mp->m_dirdatablk.
- */
- dot_offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
- XFS_DIR3_DATA_DOT_OFFSET(mp));
- dotdot_offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
- XFS_DIR3_DATA_DOTDOT_OFFSET(mp));
-
- /*
- * Put . entry unless we're starting past it.
- */
- if (*offset <= dot_offset) {
- if (filldir(dirent, ".", 1, dot_offset & 0x7fffffff, dp->i_ino, DT_DIR)) {
- *offset = dot_offset & 0x7fffffff;
- return 0;
- }
- }
-
- /*
- * Put .. entry unless we're starting past it.
- */
- if (*offset <= dotdot_offset) {
- ino = xfs_dir2_sf_get_parent_ino(sfp);
- if (filldir(dirent, "..", 2, dotdot_offset & 0x7fffffff, ino, DT_DIR)) {
- *offset = dotdot_offset & 0x7fffffff;
- return 0;
- }
- }
-
- /*
- * Loop while there are more entries and put'ing works.
- */
- sfep = xfs_dir2_sf_firstentry(sfp);
- for (i = 0; i < sfp->count; i++) {
- off = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
- xfs_dir2_sf_get_offset(sfep));
-
- if (*offset > off) {
- sfep = xfs_dir2_sf_nextentry(sfp, sfep);
- continue;
- }
-
- ino = xfs_dir2_sfe_get_ino(sfp, sfep);
- if (filldir(dirent, (char *)sfep->name, sfep->namelen,
- off & 0x7fffffff, ino, DT_UNKNOWN)) {
- *offset = off & 0x7fffffff;
- return 0;
- }
- sfep = xfs_dir2_sf_nextentry(sfp, sfep);
- }
-
- *offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk + 1, 0) &
- 0x7fffffff;
- return 0;
-}
-
/*
* Lookup an entry in a shortform directory.
* Returns EEXIST if found, ENOENT if not found.
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 15/17] xfs: reshuffle dir2 definitions around for userspace
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (13 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 14/17] xfs: move getdents code into it's own file Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 16/17] xfs: split out attribute listing code into separate file Dave Chinner
2013-06-07 5:45 ` [PATCH 17/17] xfs: split out attribute fork truncation " Dave Chinner
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Many of the definitions within xfs_dir2_priv.h are needed in
userspace outside libxfs. Definitions within xfs_dir2_priv.h are
wholly contained within libxfs, so we need to shuffle some of the
definitions around to keep consistency across files shared between
user and kernel space.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_dir2.h | 71 ++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_dir2_data.c | 1 +
fs/xfs/xfs_dir2_format.h | 24 ----------------
fs/xfs/xfs_dir2_leaf.c | 1 +
fs/xfs/xfs_dir2_node.c | 1 +
fs/xfs/xfs_dir2_priv.h | 31 --------------------
fs/xfs/xfs_dir2_sf.c | 4 +--
fs/xfs/xfs_file.c | 3 +-
fs/xfs/xfs_log_recover.c | 2 +-
9 files changed, 78 insertions(+), 60 deletions(-)
diff --git a/fs/xfs/xfs_dir2.h b/fs/xfs/xfs_dir2.h
index e937d99..df3e20b 100644
--- a/fs/xfs/xfs_dir2.h
+++ b/fs/xfs/xfs_dir2.h
@@ -23,10 +23,35 @@ struct xfs_da_args;
struct xfs_inode;
struct xfs_mount;
struct xfs_trans;
+struct xfs_dir2_sf_hdr;
+struct xfs_dir2_sf_entry;
+struct xfs_dir2_data_hdr;
+struct xfs_dir2_data_entry;
+struct xfs_dir2_data_unused;
extern struct xfs_name xfs_name_dotdot;
/*
+ * Directory type definitions
+ */
+
+/* Byte offset in data block and shortform entry. */
+typedef __uint16_t xfs_dir2_data_off_t;
+#define NULLDATAOFF 0xffffU
+typedef uint xfs_dir2_data_aoff_t; /* argument form */
+
+/* Offset in data space of a data entry. */
+typedef __uint32_t xfs_dir2_dataptr_t;
+#define XFS_DIR2_MAX_DATAPTR ((xfs_dir2_dataptr_t)0xffffffff)
+#define XFS_DIR2_NULL_DATAPTR ((xfs_dir2_dataptr_t)0)
+
+/* Byte offset in a directory. */
+typedef xfs_off_t xfs_dir2_off_t;
+
+/* Directory block number (logical dirblk in file) */
+typedef __uint32_t xfs_dir2_db_t;
+
+/*
* Generic directory interface routines
*/
extern void xfs_dir_startup(void);
@@ -57,4 +82,50 @@ extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
*/
extern int xfs_dir2_sf_to_block(struct xfs_da_args *args);
+/*
+ * Direct call on directory open, before entering the readdir code.
+ */
+extern int xfs_dir3_data_readahead(struct xfs_trans *tp, struct xfs_inode *dp,
+ xfs_dablk_t bno, xfs_daddr_t mapped_bno);
+
+/*
+ * Interface routines used by userspace utilities
+ */
+extern xfs_ino_t xfs_dir2_sf_get_parent_ino(struct xfs_dir2_sf_hdr *sfp);
+extern void xfs_dir2_sf_put_parent_ino(struct xfs_dir2_sf_hdr *sfp,
+ xfs_ino_t ino);
+extern xfs_ino_t xfs_dir2_sfe_get_ino(struct xfs_dir2_sf_hdr *sfp,
+ struct xfs_dir2_sf_entry *sfep);
+extern void xfs_dir2_sfe_put_ino( struct xfs_dir2_sf_hdr *,
+ struct xfs_dir2_sf_entry *sfep, xfs_ino_t ino);
+
+extern int xfs_dir2_isblock(struct xfs_trans *tp, struct xfs_inode *dp, int *r);
+extern int xfs_dir2_isleaf(struct xfs_trans *tp, struct xfs_inode *dp, int *r);
+extern int xfs_dir2_shrink_inode(struct xfs_da_args *args, xfs_dir2_db_t db,
+ struct xfs_buf *bp);
+
+extern void xfs_dir2_data_freescan(struct xfs_mount *mp,
+ struct xfs_dir2_data_hdr *hdr, int *loghead);
+extern void xfs_dir2_data_log_entry(struct xfs_trans *tp, struct xfs_buf *bp,
+ struct xfs_dir2_data_entry *dep);
+extern void xfs_dir2_data_log_header(struct xfs_trans *tp,
+ struct xfs_buf *bp);
+extern void xfs_dir2_data_log_unused(struct xfs_trans *tp, struct xfs_buf *bp,
+ struct xfs_dir2_data_unused *dup);
+extern void xfs_dir2_data_make_free(struct xfs_trans *tp, struct xfs_buf *bp,
+ xfs_dir2_data_aoff_t offset, xfs_dir2_data_aoff_t len,
+ int *needlogp, int *needscanp);
+extern void xfs_dir2_data_use_free(struct xfs_trans *tp, struct xfs_buf *bp,
+ struct xfs_dir2_data_unused *dup, xfs_dir2_data_aoff_t offset,
+ xfs_dir2_data_aoff_t len, int *needlogp, int *needscanp);
+
+extern struct xfs_dir2_data_free *xfs_dir2_data_freefind(
+ struct xfs_dir2_data_hdr *hdr, struct xfs_dir2_data_unused *dup);
+
+extern const struct xfs_buf_ops xfs_dir3_block_buf_ops;
+extern const struct xfs_buf_ops xfs_dir3_leafn_buf_ops;
+extern const struct xfs_buf_ops xfs_dir3_leaf1_buf_ops;
+extern const struct xfs_buf_ops xfs_dir3_free_buf_ops;
+extern const struct xfs_buf_ops xfs_dir3_data_buf_ops;
+
#endif /* __XFS_DIR2_H__ */
diff --git a/fs/xfs/xfs_dir2_data.c b/fs/xfs/xfs_dir2_data.c
index c293023..465027b 100644
--- a/fs/xfs/xfs_dir2_data.c
+++ b/fs/xfs/xfs_dir2_data.c
@@ -28,6 +28,7 @@
#include "xfs_bmap_btree.h"
#include "xfs_dinode.h"
#include "xfs_inode.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_format.h"
#include "xfs_dir2_priv.h"
#include "xfs_error.h"
diff --git a/fs/xfs/xfs_dir2_format.h b/fs/xfs/xfs_dir2_format.h
index 995f1f5..18ca87d 100644
--- a/fs/xfs/xfs_dir2_format.h
+++ b/fs/xfs/xfs_dir2_format.h
@@ -69,36 +69,12 @@
#define XFS_DIR3_FREE_MAGIC 0x58444633 /* XDF3: free index blocks */
/*
- * Byte offset in data block and shortform entry.
- */
-typedef __uint16_t xfs_dir2_data_off_t;
-#define NULLDATAOFF 0xffffU
-typedef uint xfs_dir2_data_aoff_t; /* argument form */
-
-/*
* Normalized offset (in a data block) of the entry, really xfs_dir2_data_off_t.
* Only need 16 bits, this is the byte offset into the single block form.
*/
typedef struct { __uint8_t i[2]; } __arch_pack xfs_dir2_sf_off_t;
/*
- * Offset in data space of a data entry.
- */
-typedef __uint32_t xfs_dir2_dataptr_t;
-#define XFS_DIR2_MAX_DATAPTR ((xfs_dir2_dataptr_t)0xffffffff)
-#define XFS_DIR2_NULL_DATAPTR ((xfs_dir2_dataptr_t)0)
-
-/*
- * Byte offset in a directory.
- */
-typedef xfs_off_t xfs_dir2_off_t;
-
-/*
- * Directory block number (logical dirblk in file)
- */
-typedef __uint32_t xfs_dir2_db_t;
-
-/*
* Inode number stored as 8 8-bit values.
*/
typedef struct { __uint8_t i[8]; } xfs_dir2_ino8_t;
diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index e1aed21..c60e3a7 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -30,6 +30,7 @@
#include "xfs_dinode.h"
#include "xfs_inode.h"
#include "xfs_bmap.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_format.h"
#include "xfs_dir2_priv.h"
#include "xfs_error.h"
diff --git a/fs/xfs/xfs_dir2_node.c b/fs/xfs/xfs_dir2_node.c
index 2226a00..7b448f5 100644
--- a/fs/xfs/xfs_dir2_node.c
+++ b/fs/xfs/xfs_dir2_node.c
@@ -29,6 +29,7 @@
#include "xfs_dinode.h"
#include "xfs_inode.h"
#include "xfs_bmap.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_format.h"
#include "xfs_dir2_priv.h"
#include "xfs_error.h"
diff --git a/fs/xfs/xfs_dir2_priv.h b/fs/xfs/xfs_dir2_priv.h
index 1f949df..134e19e 100644
--- a/fs/xfs/xfs_dir2_priv.h
+++ b/fs/xfs/xfs_dir2_priv.h
@@ -20,18 +20,12 @@
/* xfs_dir2.c */
extern int xfs_dir_ino_validate(struct xfs_mount *mp, xfs_ino_t ino);
-extern int xfs_dir2_isblock(struct xfs_trans *tp, struct xfs_inode *dp, int *r);
-extern int xfs_dir2_isleaf(struct xfs_trans *tp, struct xfs_inode *dp, int *r);
extern int xfs_dir2_grow_inode(struct xfs_da_args *args, int space,
xfs_dir2_db_t *dbp);
-extern int xfs_dir2_shrink_inode(struct xfs_da_args *args, xfs_dir2_db_t db,
- struct xfs_buf *bp);
extern int xfs_dir_cilookup_result(struct xfs_da_args *args,
const unsigned char *name, int len);
/* xfs_dir2_block.c */
-extern const struct xfs_buf_ops xfs_dir3_block_buf_ops;
-
extern int xfs_dir3_block_read(struct xfs_trans *tp, struct xfs_inode *dp,
struct xfs_buf **bpp);
extern int xfs_dir2_block_addname(struct xfs_da_args *args);
@@ -50,39 +44,17 @@ extern int xfs_dir2_leaf_to_block(struct xfs_da_args *args,
#define xfs_dir3_data_check(dp,bp)
#endif
-extern const struct xfs_buf_ops xfs_dir3_data_buf_ops;
-extern const struct xfs_buf_ops xfs_dir3_free_buf_ops;
-
extern int __xfs_dir3_data_check(struct xfs_inode *dp, struct xfs_buf *bp);
extern int xfs_dir3_data_read(struct xfs_trans *tp, struct xfs_inode *dp,
xfs_dablk_t bno, xfs_daddr_t mapped_bno, struct xfs_buf **bpp);
-extern int xfs_dir3_data_readahead(struct xfs_trans *tp, struct xfs_inode *dp,
- xfs_dablk_t bno, xfs_daddr_t mapped_bno);
extern struct xfs_dir2_data_free *
xfs_dir2_data_freeinsert(struct xfs_dir2_data_hdr *hdr,
struct xfs_dir2_data_unused *dup, int *loghead);
-extern void xfs_dir2_data_freescan(struct xfs_mount *mp,
- struct xfs_dir2_data_hdr *hdr, int *loghead);
extern int xfs_dir3_data_init(struct xfs_da_args *args, xfs_dir2_db_t blkno,
struct xfs_buf **bpp);
-extern void xfs_dir2_data_log_entry(struct xfs_trans *tp, struct xfs_buf *bp,
- struct xfs_dir2_data_entry *dep);
-extern void xfs_dir2_data_log_header(struct xfs_trans *tp,
- struct xfs_buf *bp);
-extern void xfs_dir2_data_log_unused(struct xfs_trans *tp, struct xfs_buf *bp,
- struct xfs_dir2_data_unused *dup);
-extern void xfs_dir2_data_make_free(struct xfs_trans *tp, struct xfs_buf *bp,
- xfs_dir2_data_aoff_t offset, xfs_dir2_data_aoff_t len,
- int *needlogp, int *needscanp);
-extern void xfs_dir2_data_use_free(struct xfs_trans *tp, struct xfs_buf *bp,
- struct xfs_dir2_data_unused *dup, xfs_dir2_data_aoff_t offset,
- xfs_dir2_data_aoff_t len, int *needlogp, int *needscanp);
/* xfs_dir2_leaf.c */
-extern const struct xfs_buf_ops xfs_dir3_leaf1_buf_ops;
-extern const struct xfs_buf_ops xfs_dir3_leafn_buf_ops;
-
extern int xfs_dir3_leafn_read(struct xfs_trans *tp, struct xfs_inode *dp,
xfs_dablk_t fbno, xfs_daddr_t mappedbno, struct xfs_buf **bpp);
extern int xfs_dir2_block_to_leaf(struct xfs_da_args *args,
@@ -146,9 +118,6 @@ extern int xfs_dir2_free_read(struct xfs_trans *tp, struct xfs_inode *dp,
xfs_dablk_t fbno, struct xfs_buf **bpp);
/* xfs_dir2_sf.c */
-extern xfs_ino_t xfs_dir2_sf_get_parent_ino(struct xfs_dir2_sf_hdr *sfp);
-extern xfs_ino_t xfs_dir2_sfe_get_ino(struct xfs_dir2_sf_hdr *sfp,
- struct xfs_dir2_sf_entry *sfep);
extern int xfs_dir2_block_sfsize(struct xfs_inode *dp,
struct xfs_dir2_data_hdr *block, struct xfs_dir2_sf_hdr *sfhp);
extern int xfs_dir2_block_to_sf(struct xfs_da_args *args, struct xfs_buf *bp,
diff --git a/fs/xfs/xfs_dir2_sf.c b/fs/xfs/xfs_dir2_sf.c
index f24ce90..40bf17e 100644
--- a/fs/xfs/xfs_dir2_sf.c
+++ b/fs/xfs/xfs_dir2_sf.c
@@ -95,7 +95,7 @@ xfs_dir2_sf_get_parent_ino(
return xfs_dir2_sf_get_ino(hdr, &hdr->parent);
}
-static void
+void
xfs_dir2_sf_put_parent_ino(
struct xfs_dir2_sf_hdr *hdr,
xfs_ino_t ino)
@@ -123,7 +123,7 @@ xfs_dir2_sfe_get_ino(
return xfs_dir2_sf_get_ino(hdr, xfs_dir2_sfe_inop(sfep));
}
-static void
+void
xfs_dir2_sfe_put_ino(
struct xfs_dir2_sf_hdr *hdr,
struct xfs_dir2_sf_entry *sfep,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a5f2042..5190689 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -31,8 +31,7 @@
#include "xfs_error.h"
#include "xfs_vnodeops.h"
#include "xfs_da_btree.h"
-#include "xfs_dir2_format.h"
-#include "xfs_dir2_priv.h"
+#include "xfs_dir2.h"
#include "xfs_ioctl.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index a8428b6..8c01546 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -50,8 +50,8 @@
/* Need all the magic numbers and buffer ops structures from these headers */
#include "xfs_symlink.h"
#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_format.h"
-#include "xfs_dir2_priv.h"
#include "xfs_attr_leaf.h"
#include "xfs_attr_remote.h"
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 16/17] xfs: split out attribute listing code into separate file
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (14 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 15/17] xfs: reshuffle dir2 definitions around for userspace Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
2013-06-07 5:45 ` [PATCH 17/17] xfs: split out attribute fork truncation " Dave Chinner
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The attribute listing code is not used by userspace, so like the
directory readdir code, split it out into a separate file to
minimise the differences between the filesystem shared with libxfs
in userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_attr.c | 317 +----------------------
fs/xfs/xfs_attr.h | 1 +
fs/xfs/xfs_attr_leaf.c | 300 ----------------------
fs/xfs/xfs_attr_list.c | 655 ++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 658 insertions(+), 616 deletions(-)
create mode 100644 fs/xfs/xfs_attr_list.c
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index f396fe9..4670207 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -27,6 +27,7 @@ xfs-y += xfs_trace.o
# highlevel code
xfs-y += xfs_aops.o \
+ xfs_attr_list.o \
xfs_bit.o \
xfs_buf.o \
xfs_dfrag.o \
diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index 20fe3fe..75a168c 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -62,7 +62,6 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
-STATIC int xfs_attr_leaf_list(xfs_attr_list_context_t *context);
/*
* Internal routines when attribute list is more than one block.
@@ -70,7 +69,6 @@ STATIC int xfs_attr_leaf_list(xfs_attr_list_context_t *context);
STATIC int xfs_attr_node_get(xfs_da_args_t *args);
STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
-STATIC int xfs_attr_node_list(xfs_attr_list_context_t *context);
STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
@@ -90,7 +88,7 @@ xfs_attr_name_to_xname(
return 0;
}
-STATIC int
+int
xfs_inode_hasattr(
struct xfs_inode *ip)
{
@@ -611,157 +609,6 @@ xfs_attr_remove(
return xfs_attr_remove_int(dp, &xname, flags);
}
-int
-xfs_attr_list_int(xfs_attr_list_context_t *context)
-{
- int error;
- xfs_inode_t *dp = context->dp;
-
- XFS_STATS_INC(xs_attr_list);
-
- if (XFS_FORCED_SHUTDOWN(dp->i_mount))
- return EIO;
-
- xfs_ilock(dp, XFS_ILOCK_SHARED);
-
- /*
- * Decide on what work routines to call based on the inode size.
- */
- if (!xfs_inode_hasattr(dp)) {
- error = 0;
- } else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
- error = xfs_attr_shortform_list(context);
- } else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
- error = xfs_attr_leaf_list(context);
- } else {
- error = xfs_attr_node_list(context);
- }
-
- xfs_iunlock(dp, XFS_ILOCK_SHARED);
-
- return error;
-}
-
-#define ATTR_ENTBASESIZE /* minimum bytes used by an attr */ \
- (((struct attrlist_ent *) 0)->a_name - (char *) 0)
-#define ATTR_ENTSIZE(namelen) /* actual bytes used by an attr */ \
- ((ATTR_ENTBASESIZE + (namelen) + 1 + sizeof(u_int32_t)-1) \
- & ~(sizeof(u_int32_t)-1))
-
-/*
- * Format an attribute and copy it out to the user's buffer.
- * Take care to check values and protect against them changing later,
- * we may be reading them directly out of a user buffer.
- */
-/*ARGSUSED*/
-STATIC int
-xfs_attr_put_listent(
- xfs_attr_list_context_t *context,
- int flags,
- unsigned char *name,
- int namelen,
- int valuelen,
- unsigned char *value)
-{
- struct attrlist *alist = (struct attrlist *)context->alist;
- attrlist_ent_t *aep;
- int arraytop;
-
- ASSERT(!(context->flags & ATTR_KERNOVAL));
- ASSERT(context->count >= 0);
- ASSERT(context->count < (ATTR_MAX_VALUELEN/8));
- ASSERT(context->firstu >= sizeof(*alist));
- ASSERT(context->firstu <= context->bufsize);
-
- /*
- * Only list entries in the right namespace.
- */
- if (((context->flags & ATTR_SECURE) == 0) !=
- ((flags & XFS_ATTR_SECURE) == 0))
- return 0;
- if (((context->flags & ATTR_ROOT) == 0) !=
- ((flags & XFS_ATTR_ROOT) == 0))
- return 0;
-
- arraytop = sizeof(*alist) +
- context->count * sizeof(alist->al_offset[0]);
- context->firstu -= ATTR_ENTSIZE(namelen);
- if (context->firstu < arraytop) {
- trace_xfs_attr_list_full(context);
- alist->al_more = 1;
- context->seen_enough = 1;
- return 1;
- }
-
- aep = (attrlist_ent_t *)&context->alist[context->firstu];
- aep->a_valuelen = valuelen;
- memcpy(aep->a_name, name, namelen);
- aep->a_name[namelen] = 0;
- alist->al_offset[context->count++] = context->firstu;
- alist->al_count = context->count;
- trace_xfs_attr_list_add(context);
- return 0;
-}
-
-/*
- * Generate a list of extended attribute names and optionally
- * also value lengths. Positive return value follows the XFS
- * convention of being an error, zero or negative return code
- * is the length of the buffer returned (negated), indicating
- * success.
- */
-int
-xfs_attr_list(
- xfs_inode_t *dp,
- char *buffer,
- int bufsize,
- int flags,
- attrlist_cursor_kern_t *cursor)
-{
- xfs_attr_list_context_t context;
- struct attrlist *alist;
- int error;
-
- /*
- * Validate the cursor.
- */
- if (cursor->pad1 || cursor->pad2)
- return(XFS_ERROR(EINVAL));
- if ((cursor->initted == 0) &&
- (cursor->hashval || cursor->blkno || cursor->offset))
- return XFS_ERROR(EINVAL);
-
- /*
- * Check for a properly aligned buffer.
- */
- if (((long)buffer) & (sizeof(int)-1))
- return XFS_ERROR(EFAULT);
- if (flags & ATTR_KERNOVAL)
- bufsize = 0;
-
- /*
- * Initialize the output buffer.
- */
- memset(&context, 0, sizeof(context));
- context.dp = dp;
- context.cursor = cursor;
- context.resynch = 1;
- context.flags = flags;
- context.alist = buffer;
- context.bufsize = (bufsize & ~(sizeof(int)-1)); /* align */
- context.firstu = context.bufsize;
- context.put_listent = xfs_attr_put_listent;
-
- alist = (struct attrlist *)context.alist;
- alist->al_count = 0;
- alist->al_more = 0;
- alist->al_offset[0] = context.bufsize;
-
- error = xfs_attr_list_int(&context);
- ASSERT(error >= 0);
- return error;
-}
-
int /* error */
xfs_attr_inactive(xfs_inode_t *dp)
{
@@ -1166,28 +1013,6 @@ xfs_attr_leaf_get(xfs_da_args_t *args)
return error;
}
-/*
- * Copy out attribute entries for attr_list(), for leaf attribute lists.
- */
-STATIC int
-xfs_attr_leaf_list(xfs_attr_list_context_t *context)
-{
- int error;
- struct xfs_buf *bp;
-
- trace_xfs_attr_leaf_list(context);
-
- context->cursor->blkno = 0;
- error = xfs_attr3_leaf_read(NULL, context->dp, 0, -1, &bp);
- if (error)
- return XFS_ERROR(error);
-
- error = xfs_attr3_leaf_list_int(bp, context);
- xfs_trans_brelse(NULL, bp);
- return XFS_ERROR(error);
-}
-
-
/*========================================================================
* External routines when attribute list size > XFS_LBSIZE(mp).
*========================================================================*/
@@ -1780,143 +1605,3 @@ xfs_attr_node_get(xfs_da_args_t *args)
xfs_da_state_free(state);
return(retval);
}
-
-STATIC int /* error */
-xfs_attr_node_list(xfs_attr_list_context_t *context)
-{
- attrlist_cursor_kern_t *cursor;
- xfs_attr_leafblock_t *leaf;
- xfs_da_intnode_t *node;
- struct xfs_attr3_icleaf_hdr leafhdr;
- struct xfs_da3_icnode_hdr nodehdr;
- struct xfs_da_node_entry *btree;
- int error, i;
- struct xfs_buf *bp;
-
- trace_xfs_attr_node_list(context);
-
- cursor = context->cursor;
- cursor->initted = 1;
-
- /*
- * Do all sorts of validation on the passed-in cursor structure.
- * If anything is amiss, ignore the cursor and look up the hashval
- * starting from the btree root.
- */
- bp = NULL;
- if (cursor->blkno > 0) {
- error = xfs_da3_node_read(NULL, context->dp, cursor->blkno, -1,
- &bp, XFS_ATTR_FORK);
- if ((error != 0) && (error != EFSCORRUPTED))
- return(error);
- if (bp) {
- struct xfs_attr_leaf_entry *entries;
-
- node = bp->b_addr;
- switch (be16_to_cpu(node->hdr.info.magic)) {
- case XFS_DA_NODE_MAGIC:
- case XFS_DA3_NODE_MAGIC:
- trace_xfs_attr_list_wrong_blk(context);
- xfs_trans_brelse(NULL, bp);
- bp = NULL;
- break;
- case XFS_ATTR_LEAF_MAGIC:
- case XFS_ATTR3_LEAF_MAGIC:
- leaf = bp->b_addr;
- xfs_attr3_leaf_hdr_from_disk(&leafhdr, leaf);
- entries = xfs_attr3_leaf_entryp(leaf);
- if (cursor->hashval > be32_to_cpu(
- entries[leafhdr.count - 1].hashval)) {
- trace_xfs_attr_list_wrong_blk(context);
- xfs_trans_brelse(NULL, bp);
- bp = NULL;
- } else if (cursor->hashval <= be32_to_cpu(
- entries[0].hashval)) {
- trace_xfs_attr_list_wrong_blk(context);
- xfs_trans_brelse(NULL, bp);
- bp = NULL;
- }
- break;
- default:
- trace_xfs_attr_list_wrong_blk(context);
- xfs_trans_brelse(NULL, bp);
- bp = NULL;
- }
- }
- }
-
- /*
- * We did not find what we expected given the cursor's contents,
- * so we start from the top and work down based on the hash value.
- * Note that start of node block is same as start of leaf block.
- */
- if (bp == NULL) {
- cursor->blkno = 0;
- for (;;) {
- __uint16_t magic;
-
- error = xfs_da3_node_read(NULL, context->dp,
- cursor->blkno, -1, &bp,
- XFS_ATTR_FORK);
- if (error)
- return(error);
- node = bp->b_addr;
- magic = be16_to_cpu(node->hdr.info.magic);
- if (magic == XFS_ATTR_LEAF_MAGIC ||
- magic == XFS_ATTR3_LEAF_MAGIC)
- break;
- if (magic != XFS_DA_NODE_MAGIC &&
- magic != XFS_DA3_NODE_MAGIC) {
- XFS_CORRUPTION_ERROR("xfs_attr_node_list(3)",
- XFS_ERRLEVEL_LOW,
- context->dp->i_mount,
- node);
- xfs_trans_brelse(NULL, bp);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- xfs_da3_node_hdr_from_disk(&nodehdr, node);
- btree = xfs_da3_node_tree_p(node);
- for (i = 0; i < nodehdr.count; btree++, i++) {
- if (cursor->hashval
- <= be32_to_cpu(btree->hashval)) {
- cursor->blkno = be32_to_cpu(btree->before);
- trace_xfs_attr_list_node_descend(context,
- btree);
- break;
- }
- }
- if (i == nodehdr.count) {
- xfs_trans_brelse(NULL, bp);
- return 0;
- }
- xfs_trans_brelse(NULL, bp);
- }
- }
- ASSERT(bp != NULL);
-
- /*
- * Roll upward through the blocks, processing each leaf block in
- * order. As long as there is space in the result buffer, keep
- * adding the information.
- */
- for (;;) {
- leaf = bp->b_addr;
- error = xfs_attr3_leaf_list_int(bp, context);
- if (error) {
- xfs_trans_brelse(NULL, bp);
- return error;
- }
- xfs_attr3_leaf_hdr_from_disk(&leafhdr, leaf);
- if (context->seen_enough || leafhdr.forw == 0)
- break;
- cursor->blkno = leafhdr.forw;
- xfs_trans_brelse(NULL, bp);
- error = xfs_attr3_leaf_read(NULL, context->dp, cursor->blkno, -1,
- &bp);
- if (error)
- return error;
- }
- xfs_trans_brelse(NULL, bp);
- return 0;
-}
diff --git a/fs/xfs/xfs_attr.h b/fs/xfs/xfs_attr.h
index de8dd58..cb604b5 100644
--- a/fs/xfs/xfs_attr.h
+++ b/fs/xfs/xfs_attr.h
@@ -141,5 +141,6 @@ typedef struct xfs_attr_list_context {
*/
int xfs_attr_inactive(struct xfs_inode *dp);
int xfs_attr_list_int(struct xfs_attr_list_context *);
+int xfs_inode_hasattr(struct xfs_inode *ip);
#endif /* __XFS_ATTR_H__ */
diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index b800fbc..ba13857 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -751,182 +751,6 @@ out:
return(error);
}
-STATIC int
-xfs_attr_shortform_compare(const void *a, const void *b)
-{
- xfs_attr_sf_sort_t *sa, *sb;
-
- sa = (xfs_attr_sf_sort_t *)a;
- sb = (xfs_attr_sf_sort_t *)b;
- if (sa->hash < sb->hash) {
- return(-1);
- } else if (sa->hash > sb->hash) {
- return(1);
- } else {
- return(sa->entno - sb->entno);
- }
-}
-
-
-#define XFS_ISRESET_CURSOR(cursor) \
- (!((cursor)->initted) && !((cursor)->hashval) && \
- !((cursor)->blkno) && !((cursor)->offset))
-/*
- * Copy out entries of shortform attribute lists for attr_list().
- * Shortform attribute lists are not stored in hashval sorted order.
- * If the output buffer is not large enough to hold them all, then we
- * we have to calculate each entries' hashvalue and sort them before
- * we can begin returning them to the user.
- */
-/*ARGSUSED*/
-int
-xfs_attr_shortform_list(xfs_attr_list_context_t *context)
-{
- attrlist_cursor_kern_t *cursor;
- xfs_attr_sf_sort_t *sbuf, *sbp;
- xfs_attr_shortform_t *sf;
- xfs_attr_sf_entry_t *sfe;
- xfs_inode_t *dp;
- int sbsize, nsbuf, count, i;
- int error;
-
- ASSERT(context != NULL);
- dp = context->dp;
- ASSERT(dp != NULL);
- ASSERT(dp->i_afp != NULL);
- sf = (xfs_attr_shortform_t *)dp->i_afp->if_u1.if_data;
- ASSERT(sf != NULL);
- if (!sf->hdr.count)
- return(0);
- cursor = context->cursor;
- ASSERT(cursor != NULL);
-
- trace_xfs_attr_list_sf(context);
-
- /*
- * If the buffer is large enough and the cursor is at the start,
- * do not bother with sorting since we will return everything in
- * one buffer and another call using the cursor won't need to be
- * made.
- * Note the generous fudge factor of 16 overhead bytes per entry.
- * If bufsize is zero then put_listent must be a search function
- * and can just scan through what we have.
- */
- if (context->bufsize == 0 ||
- (XFS_ISRESET_CURSOR(cursor) &&
- (dp->i_afp->if_bytes + sf->hdr.count * 16) < context->bufsize)) {
- for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
- error = context->put_listent(context,
- sfe->flags,
- sfe->nameval,
- (int)sfe->namelen,
- (int)sfe->valuelen,
- &sfe->nameval[sfe->namelen]);
-
- /*
- * Either search callback finished early or
- * didn't fit it all in the buffer after all.
- */
- if (context->seen_enough)
- break;
-
- if (error)
- return error;
- sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
- }
- trace_xfs_attr_list_sf_all(context);
- return(0);
- }
-
- /* do no more for a search callback */
- if (context->bufsize == 0)
- return 0;
-
- /*
- * It didn't all fit, so we have to sort everything on hashval.
- */
- sbsize = sf->hdr.count * sizeof(*sbuf);
- sbp = sbuf = kmem_alloc(sbsize, KM_SLEEP | KM_NOFS);
-
- /*
- * Scan the attribute list for the rest of the entries, storing
- * the relevant info from only those that match into a buffer.
- */
- nsbuf = 0;
- for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
- if (unlikely(
- ((char *)sfe < (char *)sf) ||
- ((char *)sfe >= ((char *)sf + dp->i_afp->if_bytes)))) {
- XFS_CORRUPTION_ERROR("xfs_attr_shortform_list",
- XFS_ERRLEVEL_LOW,
- context->dp->i_mount, sfe);
- kmem_free(sbuf);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- sbp->entno = i;
- sbp->hash = xfs_da_hashname(sfe->nameval, sfe->namelen);
- sbp->name = sfe->nameval;
- sbp->namelen = sfe->namelen;
- /* These are bytes, and both on-disk, don't endian-flip */
- sbp->valuelen = sfe->valuelen;
- sbp->flags = sfe->flags;
- sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
- sbp++;
- nsbuf++;
- }
-
- /*
- * Sort the entries on hash then entno.
- */
- xfs_sort(sbuf, nsbuf, sizeof(*sbuf), xfs_attr_shortform_compare);
-
- /*
- * Re-find our place IN THE SORTED LIST.
- */
- count = 0;
- cursor->initted = 1;
- cursor->blkno = 0;
- for (sbp = sbuf, i = 0; i < nsbuf; i++, sbp++) {
- if (sbp->hash == cursor->hashval) {
- if (cursor->offset == count) {
- break;
- }
- count++;
- } else if (sbp->hash > cursor->hashval) {
- break;
- }
- }
- if (i == nsbuf) {
- kmem_free(sbuf);
- return(0);
- }
-
- /*
- * Loop putting entries into the user buffer.
- */
- for ( ; i < nsbuf; i++, sbp++) {
- if (cursor->hashval != sbp->hash) {
- cursor->hashval = sbp->hash;
- cursor->offset = 0;
- }
- error = context->put_listent(context,
- sbp->flags,
- sbp->name,
- sbp->namelen,
- sbp->valuelen,
- &sbp->name[sbp->namelen]);
- if (error)
- return error;
- if (context->seen_enough)
- break;
- cursor->offset++;
- }
-
- kmem_free(sbuf);
- return(0);
-}
-
/*
* Check a leaf attribute block to see if all the entries would fit into
* a shortform attribute list.
@@ -2643,130 +2467,6 @@ xfs_attr_leaf_newentsize(int namelen, int valuelen, int blocksize, int *local)
return size;
}
-/*
- * Copy out attribute list entries for attr_list(), for leaf attribute lists.
- */
-int
-xfs_attr3_leaf_list_int(
- struct xfs_buf *bp,
- struct xfs_attr_list_context *context)
-{
- struct attrlist_cursor_kern *cursor;
- struct xfs_attr_leafblock *leaf;
- struct xfs_attr3_icleaf_hdr ichdr;
- struct xfs_attr_leaf_entry *entries;
- struct xfs_attr_leaf_entry *entry;
- int retval;
- int i;
-
- trace_xfs_attr_list_leaf(context);
-
- leaf = bp->b_addr;
- xfs_attr3_leaf_hdr_from_disk(&ichdr, leaf);
- entries = xfs_attr3_leaf_entryp(leaf);
-
- cursor = context->cursor;
- cursor->initted = 1;
-
- /*
- * Re-find our place in the leaf block if this is a new syscall.
- */
- if (context->resynch) {
- entry = &entries[0];
- for (i = 0; i < ichdr.count; entry++, i++) {
- if (be32_to_cpu(entry->hashval) == cursor->hashval) {
- if (cursor->offset == context->dupcnt) {
- context->dupcnt = 0;
- break;
- }
- context->dupcnt++;
- } else if (be32_to_cpu(entry->hashval) >
- cursor->hashval) {
- context->dupcnt = 0;
- break;
- }
- }
- if (i == ichdr.count) {
- trace_xfs_attr_list_notfound(context);
- return 0;
- }
- } else {
- entry = &entries[0];
- i = 0;
- }
- context->resynch = 0;
-
- /*
- * We have found our place, start copying out the new attributes.
- */
- retval = 0;
- for (; i < ichdr.count; entry++, i++) {
- if (be32_to_cpu(entry->hashval) != cursor->hashval) {
- cursor->hashval = be32_to_cpu(entry->hashval);
- cursor->offset = 0;
- }
-
- if (entry->flags & XFS_ATTR_INCOMPLETE)
- continue; /* skip incomplete entries */
-
- if (entry->flags & XFS_ATTR_LOCAL) {
- xfs_attr_leaf_name_local_t *name_loc =
- xfs_attr3_leaf_name_local(leaf, i);
-
- retval = context->put_listent(context,
- entry->flags,
- name_loc->nameval,
- (int)name_loc->namelen,
- be16_to_cpu(name_loc->valuelen),
- &name_loc->nameval[name_loc->namelen]);
- if (retval)
- return retval;
- } else {
- xfs_attr_leaf_name_remote_t *name_rmt =
- xfs_attr3_leaf_name_remote(leaf, i);
-
- int valuelen = be32_to_cpu(name_rmt->valuelen);
-
- if (context->put_value) {
- xfs_da_args_t args;
-
- memset((char *)&args, 0, sizeof(args));
- args.dp = context->dp;
- args.whichfork = XFS_ATTR_FORK;
- args.valuelen = valuelen;
- args.value = kmem_alloc(valuelen, KM_SLEEP | KM_NOFS);
- args.rmtblkno = be32_to_cpu(name_rmt->valueblk);
- args.rmtblkcnt = xfs_attr3_rmt_blocks(
- args.dp->i_mount, valuelen);
- retval = xfs_attr_rmtval_get(&args);
- if (retval)
- return retval;
- retval = context->put_listent(context,
- entry->flags,
- name_rmt->name,
- (int)name_rmt->namelen,
- valuelen,
- args.value);
- kmem_free(args.value);
- } else {
- retval = context->put_listent(context,
- entry->flags,
- name_rmt->name,
- (int)name_rmt->namelen,
- valuelen,
- NULL);
- }
- if (retval)
- return retval;
- }
- if (context->seen_enough)
- break;
- cursor->offset++;
- }
- trace_xfs_attr_list_leaf_end(context);
- return retval;
-}
-
/*========================================================================
* Manage the INCOMPLETE flag in a leaf entry
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
new file mode 100644
index 0000000..cbc80d4
--- /dev/null
+++ b/fs/xfs/xfs_attr_list.c
@@ -0,0 +1,655 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2013 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_btree.h"
+#include "xfs_attr_sf.h"
+#include "xfs_attr_remote.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_inode_item.h"
+#include "xfs_bmap.h"
+#include "xfs_attr.h"
+#include "xfs_attr_leaf.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_buf_item.h"
+#include "xfs_cksum.h"
+
+STATIC int
+xfs_attr_shortform_compare(const void *a, const void *b)
+{
+ xfs_attr_sf_sort_t *sa, *sb;
+
+ sa = (xfs_attr_sf_sort_t *)a;
+ sb = (xfs_attr_sf_sort_t *)b;
+ if (sa->hash < sb->hash) {
+ return(-1);
+ } else if (sa->hash > sb->hash) {
+ return(1);
+ } else {
+ return(sa->entno - sb->entno);
+ }
+}
+
+#define XFS_ISRESET_CURSOR(cursor) \
+ (!((cursor)->initted) && !((cursor)->hashval) && \
+ !((cursor)->blkno) && !((cursor)->offset))
+/*
+ * Copy out entries of shortform attribute lists for attr_list().
+ * Shortform attribute lists are not stored in hashval sorted order.
+ * If the output buffer is not large enough to hold them all, then we
+ * we have to calculate each entries' hashvalue and sort them before
+ * we can begin returning them to the user.
+ */
+int
+xfs_attr_shortform_list(xfs_attr_list_context_t *context)
+{
+ attrlist_cursor_kern_t *cursor;
+ xfs_attr_sf_sort_t *sbuf, *sbp;
+ xfs_attr_shortform_t *sf;
+ xfs_attr_sf_entry_t *sfe;
+ xfs_inode_t *dp;
+ int sbsize, nsbuf, count, i;
+ int error;
+
+ ASSERT(context != NULL);
+ dp = context->dp;
+ ASSERT(dp != NULL);
+ ASSERT(dp->i_afp != NULL);
+ sf = (xfs_attr_shortform_t *)dp->i_afp->if_u1.if_data;
+ ASSERT(sf != NULL);
+ if (!sf->hdr.count)
+ return(0);
+ cursor = context->cursor;
+ ASSERT(cursor != NULL);
+
+ trace_xfs_attr_list_sf(context);
+
+ /*
+ * If the buffer is large enough and the cursor is at the start,
+ * do not bother with sorting since we will return everything in
+ * one buffer and another call using the cursor won't need to be
+ * made.
+ * Note the generous fudge factor of 16 overhead bytes per entry.
+ * If bufsize is zero then put_listent must be a search function
+ * and can just scan through what we have.
+ */
+ if (context->bufsize == 0 ||
+ (XFS_ISRESET_CURSOR(cursor) &&
+ (dp->i_afp->if_bytes + sf->hdr.count * 16) < context->bufsize)) {
+ for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
+ error = context->put_listent(context,
+ sfe->flags,
+ sfe->nameval,
+ (int)sfe->namelen,
+ (int)sfe->valuelen,
+ &sfe->nameval[sfe->namelen]);
+
+ /*
+ * Either search callback finished early or
+ * didn't fit it all in the buffer after all.
+ */
+ if (context->seen_enough)
+ break;
+
+ if (error)
+ return error;
+ sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
+ }
+ trace_xfs_attr_list_sf_all(context);
+ return(0);
+ }
+
+ /* do no more for a search callback */
+ if (context->bufsize == 0)
+ return 0;
+
+ /*
+ * It didn't all fit, so we have to sort everything on hashval.
+ */
+ sbsize = sf->hdr.count * sizeof(*sbuf);
+ sbp = sbuf = kmem_alloc(sbsize, KM_SLEEP | KM_NOFS);
+
+ /*
+ * Scan the attribute list for the rest of the entries, storing
+ * the relevant info from only those that match into a buffer.
+ */
+ nsbuf = 0;
+ for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
+ if (unlikely(
+ ((char *)sfe < (char *)sf) ||
+ ((char *)sfe >= ((char *)sf + dp->i_afp->if_bytes)))) {
+ XFS_CORRUPTION_ERROR("xfs_attr_shortform_list",
+ XFS_ERRLEVEL_LOW,
+ context->dp->i_mount, sfe);
+ kmem_free(sbuf);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ sbp->entno = i;
+ sbp->hash = xfs_da_hashname(sfe->nameval, sfe->namelen);
+ sbp->name = sfe->nameval;
+ sbp->namelen = sfe->namelen;
+ /* These are bytes, and both on-disk, don't endian-flip */
+ sbp->valuelen = sfe->valuelen;
+ sbp->flags = sfe->flags;
+ sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
+ sbp++;
+ nsbuf++;
+ }
+
+ /*
+ * Sort the entries on hash then entno.
+ */
+ xfs_sort(sbuf, nsbuf, sizeof(*sbuf), xfs_attr_shortform_compare);
+
+ /*
+ * Re-find our place IN THE SORTED LIST.
+ */
+ count = 0;
+ cursor->initted = 1;
+ cursor->blkno = 0;
+ for (sbp = sbuf, i = 0; i < nsbuf; i++, sbp++) {
+ if (sbp->hash == cursor->hashval) {
+ if (cursor->offset == count) {
+ break;
+ }
+ count++;
+ } else if (sbp->hash > cursor->hashval) {
+ break;
+ }
+ }
+ if (i == nsbuf) {
+ kmem_free(sbuf);
+ return(0);
+ }
+
+ /*
+ * Loop putting entries into the user buffer.
+ */
+ for ( ; i < nsbuf; i++, sbp++) {
+ if (cursor->hashval != sbp->hash) {
+ cursor->hashval = sbp->hash;
+ cursor->offset = 0;
+ }
+ error = context->put_listent(context,
+ sbp->flags,
+ sbp->name,
+ sbp->namelen,
+ sbp->valuelen,
+ &sbp->name[sbp->namelen]);
+ if (error)
+ return error;
+ if (context->seen_enough)
+ break;
+ cursor->offset++;
+ }
+
+ kmem_free(sbuf);
+ return(0);
+}
+
+STATIC int
+xfs_attr_node_list(xfs_attr_list_context_t *context)
+{
+ attrlist_cursor_kern_t *cursor;
+ xfs_attr_leafblock_t *leaf;
+ xfs_da_intnode_t *node;
+ struct xfs_attr3_icleaf_hdr leafhdr;
+ struct xfs_da3_icnode_hdr nodehdr;
+ struct xfs_da_node_entry *btree;
+ int error, i;
+ struct xfs_buf *bp;
+
+ trace_xfs_attr_node_list(context);
+
+ cursor = context->cursor;
+ cursor->initted = 1;
+
+ /*
+ * Do all sorts of validation on the passed-in cursor structure.
+ * If anything is amiss, ignore the cursor and look up the hashval
+ * starting from the btree root.
+ */
+ bp = NULL;
+ if (cursor->blkno > 0) {
+ error = xfs_da3_node_read(NULL, context->dp, cursor->blkno, -1,
+ &bp, XFS_ATTR_FORK);
+ if ((error != 0) && (error != EFSCORRUPTED))
+ return(error);
+ if (bp) {
+ struct xfs_attr_leaf_entry *entries;
+
+ node = bp->b_addr;
+ switch (be16_to_cpu(node->hdr.info.magic)) {
+ case XFS_DA_NODE_MAGIC:
+ case XFS_DA3_NODE_MAGIC:
+ trace_xfs_attr_list_wrong_blk(context);
+ xfs_trans_brelse(NULL, bp);
+ bp = NULL;
+ break;
+ case XFS_ATTR_LEAF_MAGIC:
+ case XFS_ATTR3_LEAF_MAGIC:
+ leaf = bp->b_addr;
+ xfs_attr3_leaf_hdr_from_disk(&leafhdr, leaf);
+ entries = xfs_attr3_leaf_entryp(leaf);
+ if (cursor->hashval > be32_to_cpu(
+ entries[leafhdr.count - 1].hashval)) {
+ trace_xfs_attr_list_wrong_blk(context);
+ xfs_trans_brelse(NULL, bp);
+ bp = NULL;
+ } else if (cursor->hashval <= be32_to_cpu(
+ entries[0].hashval)) {
+ trace_xfs_attr_list_wrong_blk(context);
+ xfs_trans_brelse(NULL, bp);
+ bp = NULL;
+ }
+ break;
+ default:
+ trace_xfs_attr_list_wrong_blk(context);
+ xfs_trans_brelse(NULL, bp);
+ bp = NULL;
+ }
+ }
+ }
+
+ /*
+ * We did not find what we expected given the cursor's contents,
+ * so we start from the top and work down based on the hash value.
+ * Note that start of node block is same as start of leaf block.
+ */
+ if (bp == NULL) {
+ cursor->blkno = 0;
+ for (;;) {
+ __uint16_t magic;
+
+ error = xfs_da3_node_read(NULL, context->dp,
+ cursor->blkno, -1, &bp,
+ XFS_ATTR_FORK);
+ if (error)
+ return(error);
+ node = bp->b_addr;
+ magic = be16_to_cpu(node->hdr.info.magic);
+ if (magic == XFS_ATTR_LEAF_MAGIC ||
+ magic == XFS_ATTR3_LEAF_MAGIC)
+ break;
+ if (magic != XFS_DA_NODE_MAGIC &&
+ magic != XFS_DA3_NODE_MAGIC) {
+ XFS_CORRUPTION_ERROR("xfs_attr_node_list(3)",
+ XFS_ERRLEVEL_LOW,
+ context->dp->i_mount,
+ node);
+ xfs_trans_brelse(NULL, bp);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ xfs_da3_node_hdr_from_disk(&nodehdr, node);
+ btree = xfs_da3_node_tree_p(node);
+ for (i = 0; i < nodehdr.count; btree++, i++) {
+ if (cursor->hashval
+ <= be32_to_cpu(btree->hashval)) {
+ cursor->blkno = be32_to_cpu(btree->before);
+ trace_xfs_attr_list_node_descend(context,
+ btree);
+ break;
+ }
+ }
+ if (i == nodehdr.count) {
+ xfs_trans_brelse(NULL, bp);
+ return 0;
+ }
+ xfs_trans_brelse(NULL, bp);
+ }
+ }
+ ASSERT(bp != NULL);
+
+ /*
+ * Roll upward through the blocks, processing each leaf block in
+ * order. As long as there is space in the result buffer, keep
+ * adding the information.
+ */
+ for (;;) {
+ leaf = bp->b_addr;
+ error = xfs_attr3_leaf_list_int(bp, context);
+ if (error) {
+ xfs_trans_brelse(NULL, bp);
+ return error;
+ }
+ xfs_attr3_leaf_hdr_from_disk(&leafhdr, leaf);
+ if (context->seen_enough || leafhdr.forw == 0)
+ break;
+ cursor->blkno = leafhdr.forw;
+ xfs_trans_brelse(NULL, bp);
+ error = xfs_attr3_leaf_read(NULL, context->dp, cursor->blkno, -1,
+ &bp);
+ if (error)
+ return error;
+ }
+ xfs_trans_brelse(NULL, bp);
+ return 0;
+}
+
+/*
+ * Copy out attribute list entries for attr_list(), for leaf attribute lists.
+ */
+int
+xfs_attr3_leaf_list_int(
+ struct xfs_buf *bp,
+ struct xfs_attr_list_context *context)
+{
+ struct attrlist_cursor_kern *cursor;
+ struct xfs_attr_leafblock *leaf;
+ struct xfs_attr3_icleaf_hdr ichdr;
+ struct xfs_attr_leaf_entry *entries;
+ struct xfs_attr_leaf_entry *entry;
+ int retval;
+ int i;
+
+ trace_xfs_attr_list_leaf(context);
+
+ leaf = bp->b_addr;
+ xfs_attr3_leaf_hdr_from_disk(&ichdr, leaf);
+ entries = xfs_attr3_leaf_entryp(leaf);
+
+ cursor = context->cursor;
+ cursor->initted = 1;
+
+ /*
+ * Re-find our place in the leaf block if this is a new syscall.
+ */
+ if (context->resynch) {
+ entry = &entries[0];
+ for (i = 0; i < ichdr.count; entry++, i++) {
+ if (be32_to_cpu(entry->hashval) == cursor->hashval) {
+ if (cursor->offset == context->dupcnt) {
+ context->dupcnt = 0;
+ break;
+ }
+ context->dupcnt++;
+ } else if (be32_to_cpu(entry->hashval) >
+ cursor->hashval) {
+ context->dupcnt = 0;
+ break;
+ }
+ }
+ if (i == ichdr.count) {
+ trace_xfs_attr_list_notfound(context);
+ return 0;
+ }
+ } else {
+ entry = &entries[0];
+ i = 0;
+ }
+ context->resynch = 0;
+
+ /*
+ * We have found our place, start copying out the new attributes.
+ */
+ retval = 0;
+ for (; i < ichdr.count; entry++, i++) {
+ if (be32_to_cpu(entry->hashval) != cursor->hashval) {
+ cursor->hashval = be32_to_cpu(entry->hashval);
+ cursor->offset = 0;
+ }
+
+ if (entry->flags & XFS_ATTR_INCOMPLETE)
+ continue; /* skip incomplete entries */
+
+ if (entry->flags & XFS_ATTR_LOCAL) {
+ xfs_attr_leaf_name_local_t *name_loc =
+ xfs_attr3_leaf_name_local(leaf, i);
+
+ retval = context->put_listent(context,
+ entry->flags,
+ name_loc->nameval,
+ (int)name_loc->namelen,
+ be16_to_cpu(name_loc->valuelen),
+ &name_loc->nameval[name_loc->namelen]);
+ if (retval)
+ return retval;
+ } else {
+ xfs_attr_leaf_name_remote_t *name_rmt =
+ xfs_attr3_leaf_name_remote(leaf, i);
+
+ int valuelen = be32_to_cpu(name_rmt->valuelen);
+
+ if (context->put_value) {
+ xfs_da_args_t args;
+
+ memset((char *)&args, 0, sizeof(args));
+ args.dp = context->dp;
+ args.whichfork = XFS_ATTR_FORK;
+ args.valuelen = valuelen;
+ args.value = kmem_alloc(valuelen, KM_SLEEP | KM_NOFS);
+ args.rmtblkno = be32_to_cpu(name_rmt->valueblk);
+ args.rmtblkcnt = xfs_attr3_rmt_blocks(
+ args.dp->i_mount, valuelen);
+ retval = xfs_attr_rmtval_get(&args);
+ if (retval)
+ return retval;
+ retval = context->put_listent(context,
+ entry->flags,
+ name_rmt->name,
+ (int)name_rmt->namelen,
+ valuelen,
+ args.value);
+ kmem_free(args.value);
+ } else {
+ retval = context->put_listent(context,
+ entry->flags,
+ name_rmt->name,
+ (int)name_rmt->namelen,
+ valuelen,
+ NULL);
+ }
+ if (retval)
+ return retval;
+ }
+ if (context->seen_enough)
+ break;
+ cursor->offset++;
+ }
+ trace_xfs_attr_list_leaf_end(context);
+ return retval;
+}
+
+/*
+ * Copy out attribute entries for attr_list(), for leaf attribute lists.
+ */
+STATIC int
+xfs_attr_leaf_list(xfs_attr_list_context_t *context)
+{
+ int error;
+ struct xfs_buf *bp;
+
+ trace_xfs_attr_leaf_list(context);
+
+ context->cursor->blkno = 0;
+ error = xfs_attr3_leaf_read(NULL, context->dp, 0, -1, &bp);
+ if (error)
+ return XFS_ERROR(error);
+
+ error = xfs_attr3_leaf_list_int(bp, context);
+ xfs_trans_brelse(NULL, bp);
+ return XFS_ERROR(error);
+}
+
+int
+xfs_attr_list_int(
+ xfs_attr_list_context_t *context)
+{
+ int error;
+ xfs_inode_t *dp = context->dp;
+
+ XFS_STATS_INC(xs_attr_list);
+
+ if (XFS_FORCED_SHUTDOWN(dp->i_mount))
+ return EIO;
+
+ xfs_ilock(dp, XFS_ILOCK_SHARED);
+
+ /*
+ * Decide on what work routines to call based on the inode size.
+ */
+ if (!xfs_inode_hasattr(dp)) {
+ error = 0;
+ } else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
+ error = xfs_attr_shortform_list(context);
+ } else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
+ error = xfs_attr_leaf_list(context);
+ } else {
+ error = xfs_attr_node_list(context);
+ }
+
+ xfs_iunlock(dp, XFS_ILOCK_SHARED);
+
+ return error;
+}
+
+#define ATTR_ENTBASESIZE /* minimum bytes used by an attr */ \
+ (((struct attrlist_ent *) 0)->a_name - (char *) 0)
+#define ATTR_ENTSIZE(namelen) /* actual bytes used by an attr */ \
+ ((ATTR_ENTBASESIZE + (namelen) + 1 + sizeof(u_int32_t)-1) \
+ & ~(sizeof(u_int32_t)-1))
+
+/*
+ * Format an attribute and copy it out to the user's buffer.
+ * Take care to check values and protect against them changing later,
+ * we may be reading them directly out of a user buffer.
+ */
+STATIC int
+xfs_attr_put_listent(
+ xfs_attr_list_context_t *context,
+ int flags,
+ unsigned char *name,
+ int namelen,
+ int valuelen,
+ unsigned char *value)
+{
+ struct attrlist *alist = (struct attrlist *)context->alist;
+ attrlist_ent_t *aep;
+ int arraytop;
+
+ ASSERT(!(context->flags & ATTR_KERNOVAL));
+ ASSERT(context->count >= 0);
+ ASSERT(context->count < (ATTR_MAX_VALUELEN/8));
+ ASSERT(context->firstu >= sizeof(*alist));
+ ASSERT(context->firstu <= context->bufsize);
+
+ /*
+ * Only list entries in the right namespace.
+ */
+ if (((context->flags & ATTR_SECURE) == 0) !=
+ ((flags & XFS_ATTR_SECURE) == 0))
+ return 0;
+ if (((context->flags & ATTR_ROOT) == 0) !=
+ ((flags & XFS_ATTR_ROOT) == 0))
+ return 0;
+
+ arraytop = sizeof(*alist) +
+ context->count * sizeof(alist->al_offset[0]);
+ context->firstu -= ATTR_ENTSIZE(namelen);
+ if (context->firstu < arraytop) {
+ trace_xfs_attr_list_full(context);
+ alist->al_more = 1;
+ context->seen_enough = 1;
+ return 1;
+ }
+
+ aep = (attrlist_ent_t *)&context->alist[context->firstu];
+ aep->a_valuelen = valuelen;
+ memcpy(aep->a_name, name, namelen);
+ aep->a_name[namelen] = 0;
+ alist->al_offset[context->count++] = context->firstu;
+ alist->al_count = context->count;
+ trace_xfs_attr_list_add(context);
+ return 0;
+}
+
+/*
+ * Generate a list of extended attribute names and optionally
+ * also value lengths. Positive return value follows the XFS
+ * convention of being an error, zero or negative return code
+ * is the length of the buffer returned (negated), indicating
+ * success.
+ */
+int
+xfs_attr_list(
+ xfs_inode_t *dp,
+ char *buffer,
+ int bufsize,
+ int flags,
+ attrlist_cursor_kern_t *cursor)
+{
+ xfs_attr_list_context_t context;
+ struct attrlist *alist;
+ int error;
+
+ /*
+ * Validate the cursor.
+ */
+ if (cursor->pad1 || cursor->pad2)
+ return(XFS_ERROR(EINVAL));
+ if ((cursor->initted == 0) &&
+ (cursor->hashval || cursor->blkno || cursor->offset))
+ return XFS_ERROR(EINVAL);
+
+ /*
+ * Check for a properly aligned buffer.
+ */
+ if (((long)buffer) & (sizeof(int)-1))
+ return XFS_ERROR(EFAULT);
+ if (flags & ATTR_KERNOVAL)
+ bufsize = 0;
+
+ /*
+ * Initialize the output buffer.
+ */
+ memset(&context, 0, sizeof(context));
+ context.dp = dp;
+ context.cursor = cursor;
+ context.resynch = 1;
+ context.flags = flags;
+ context.alist = buffer;
+ context.bufsize = (bufsize & ~(sizeof(int)-1)); /* align */
+ context.firstu = context.bufsize;
+ context.put_listent = xfs_attr_put_listent;
+
+ alist = (struct attrlist *)context.alist;
+ alist->al_count = 0;
+ alist->al_more = 0;
+ alist->al_offset[0] = context.bufsize;
+
+ error = xfs_attr_list_int(&context);
+ ASSERT(error >= 0);
+ return error;
+}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 17/17] xfs: split out attribute fork truncation code into separate file
2013-06-07 5:45 [PATCH 00/17] xfs: current patch queue for 3.11 Dave Chinner
` (15 preceding siblings ...)
2013-06-07 5:45 ` [PATCH 16/17] xfs: split out attribute listing code into separate file Dave Chinner
@ 2013-06-07 5:45 ` Dave Chinner
16 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2013-06-07 5:45 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The attribute inactivation code is not used by userspace, so like
the attribute listing, split it out into a separate file to minimise
the differences between the filesystem shared with libxfs in
userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_attr.c | 71 -------
fs/xfs/xfs_attr_inactive.c | 453 ++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_attr_leaf.c | 341 ---------------------------------
4 files changed, 454 insertions(+), 412 deletions(-)
create mode 100644 fs/xfs/xfs_attr_inactive.c
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 4670207..6f33414 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -27,6 +27,7 @@ xfs-y += xfs_trace.o
# highlevel code
xfs-y += xfs_aops.o \
+ xfs_attr_inactive.o \
xfs_attr_list.o \
xfs_bit.o \
xfs_buf.o \
diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index 75a168c..10c18ff 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -609,77 +609,6 @@ xfs_attr_remove(
return xfs_attr_remove_int(dp, &xname, flags);
}
-int /* error */
-xfs_attr_inactive(xfs_inode_t *dp)
-{
- xfs_trans_t *trans;
- xfs_mount_t *mp;
- int error;
-
- mp = dp->i_mount;
- ASSERT(! XFS_NOT_DQATTACHED(mp, dp));
-
- xfs_ilock(dp, XFS_ILOCK_SHARED);
- if (!xfs_inode_hasattr(dp) ||
- dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
- xfs_iunlock(dp, XFS_ILOCK_SHARED);
- return 0;
- }
- xfs_iunlock(dp, XFS_ILOCK_SHARED);
-
- /*
- * Start our first transaction of the day.
- *
- * All future transactions during this code must be "chained" off
- * this one via the trans_dup() call. All transactions will contain
- * the inode, and the inode will always be marked with trans_ihold().
- * Since the inode will be locked in all transactions, we must log
- * the inode in every transaction to let it float upward through
- * the log.
- */
- trans = xfs_trans_alloc(mp, XFS_TRANS_ATTRINVAL);
- if ((error = xfs_trans_reserve(trans, 0, XFS_ATTRINVAL_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_ATTRINVAL_LOG_COUNT))) {
- xfs_trans_cancel(trans, 0);
- return(error);
- }
- xfs_ilock(dp, XFS_ILOCK_EXCL);
-
- /*
- * No need to make quota reservations here. We expect to release some
- * blocks, not allocate, in the common case.
- */
- xfs_trans_ijoin(trans, dp, 0);
-
- /*
- * Decide on what work routines to call based on the inode size.
- */
- if (!xfs_inode_hasattr(dp) ||
- dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
- error = 0;
- goto out;
- }
- error = xfs_attr3_root_inactive(&trans, dp);
- if (error)
- goto out;
-
- error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK, 0);
- if (error)
- goto out;
-
- error = xfs_trans_commit(trans, XFS_TRANS_RELEASE_LOG_RES);
- xfs_iunlock(dp, XFS_ILOCK_EXCL);
-
- return(error);
-
-out:
- xfs_trans_cancel(trans, XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_ABORT);
- xfs_iunlock(dp, XFS_ILOCK_EXCL);
- return(error);
-}
-
-
/*========================================================================
* External routines when attribute list is inside the inode
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
new file mode 100644
index 0000000..0d344ed
--- /dev/null
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -0,0 +1,453 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2013 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_btree.h"
+#include "xfs_attr_remote.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_inode_item.h"
+#include "xfs_bmap.h"
+#include "xfs_attr.h"
+#include "xfs_attr_leaf.h"
+#include "xfs_error.h"
+#include "xfs_quota.h"
+#include "xfs_trace.h"
+
+/*
+ * Look at all the extents for this logical region,
+ * invalidate any buffers that are incore/in transactions.
+ */
+STATIC int
+xfs_attr3_leaf_freextent(
+ struct xfs_trans **trans,
+ struct xfs_inode *dp,
+ xfs_dablk_t blkno,
+ int blkcnt)
+{
+ struct xfs_bmbt_irec map;
+ struct xfs_buf *bp;
+ xfs_dablk_t tblkno;
+ xfs_daddr_t dblkno;
+ int tblkcnt;
+ int dblkcnt;
+ int nmap;
+ int error;
+
+ /*
+ * Roll through the "value", invalidating the attribute value's
+ * blocks.
+ */
+ tblkno = blkno;
+ tblkcnt = blkcnt;
+ while (tblkcnt > 0) {
+ /*
+ * Try to remember where we decided to put the value.
+ */
+ nmap = 1;
+ error = xfs_bmapi_read(dp, (xfs_fileoff_t)tblkno, tblkcnt,
+ &map, &nmap, XFS_BMAPI_ATTRFORK);
+ if (error) {
+ return(error);
+ }
+ ASSERT(nmap == 1);
+ ASSERT(map.br_startblock != DELAYSTARTBLOCK);
+
+ /*
+ * If it's a hole, these are already unmapped
+ * so there's nothing to invalidate.
+ */
+ if (map.br_startblock != HOLESTARTBLOCK) {
+
+ dblkno = XFS_FSB_TO_DADDR(dp->i_mount,
+ map.br_startblock);
+ dblkcnt = XFS_FSB_TO_BB(dp->i_mount,
+ map.br_blockcount);
+ bp = xfs_trans_get_buf(*trans,
+ dp->i_mount->m_ddev_targp,
+ dblkno, dblkcnt, 0);
+ if (!bp)
+ return ENOMEM;
+ xfs_trans_binval(*trans, bp);
+ /*
+ * Roll to next transaction.
+ */
+ error = xfs_trans_roll(trans, dp);
+ if (error)
+ return (error);
+ }
+
+ tblkno += map.br_blockcount;
+ tblkcnt -= map.br_blockcount;
+ }
+
+ return(0);
+}
+
+/*
+ * Invalidate all of the "remote" value regions pointed to by a particular
+ * leaf block.
+ * Note that we must release the lock on the buffer so that we are not
+ * caught holding something that the logging code wants to flush to disk.
+ */
+STATIC int
+xfs_attr3_leaf_inactive(
+ struct xfs_trans **trans,
+ struct xfs_inode *dp,
+ struct xfs_buf *bp)
+{
+ struct xfs_attr_leafblock *leaf;
+ struct xfs_attr3_icleaf_hdr ichdr;
+ struct xfs_attr_leaf_entry *entry;
+ struct xfs_attr_leaf_name_remote *name_rmt;
+ struct xfs_attr_inactive_list *list;
+ struct xfs_attr_inactive_list *lp;
+ int error;
+ int count;
+ int size;
+ int tmp;
+ int i;
+
+ leaf = bp->b_addr;
+ xfs_attr3_leaf_hdr_from_disk(&ichdr, leaf);
+
+ /*
+ * Count the number of "remote" value extents.
+ */
+ count = 0;
+ entry = xfs_attr3_leaf_entryp(leaf);
+ for (i = 0; i < ichdr.count; entry++, i++) {
+ if (be16_to_cpu(entry->nameidx) &&
+ ((entry->flags & XFS_ATTR_LOCAL) == 0)) {
+ name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
+ if (name_rmt->valueblk)
+ count++;
+ }
+ }
+
+ /*
+ * If there are no "remote" values, we're done.
+ */
+ if (count == 0) {
+ xfs_trans_brelse(*trans, bp);
+ return 0;
+ }
+
+ /*
+ * Allocate storage for a list of all the "remote" value extents.
+ */
+ size = count * sizeof(xfs_attr_inactive_list_t);
+ list = kmem_alloc(size, KM_SLEEP);
+
+ /*
+ * Identify each of the "remote" value extents.
+ */
+ lp = list;
+ entry = xfs_attr3_leaf_entryp(leaf);
+ for (i = 0; i < ichdr.count; entry++, i++) {
+ if (be16_to_cpu(entry->nameidx) &&
+ ((entry->flags & XFS_ATTR_LOCAL) == 0)) {
+ name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
+ if (name_rmt->valueblk) {
+ lp->valueblk = be32_to_cpu(name_rmt->valueblk);
+ lp->valuelen = xfs_attr3_rmt_blocks(dp->i_mount,
+ be32_to_cpu(name_rmt->valuelen));
+ lp++;
+ }
+ }
+ }
+ xfs_trans_brelse(*trans, bp); /* unlock for trans. in freextent() */
+
+ /*
+ * Invalidate each of the "remote" value extents.
+ */
+ error = 0;
+ for (lp = list, i = 0; i < count; i++, lp++) {
+ tmp = xfs_attr3_leaf_freextent(trans, dp,
+ lp->valueblk, lp->valuelen);
+
+ if (error == 0)
+ error = tmp; /* save only the 1st errno */
+ }
+
+ kmem_free(list);
+ return error;
+}
+
+/*
+ * Recurse (gasp!) through the attribute nodes until we find leaves.
+ * We're doing a depth-first traversal in order to invalidate everything.
+ */
+STATIC int
+xfs_attr3_node_inactive(
+ struct xfs_trans **trans,
+ struct xfs_inode *dp,
+ struct xfs_buf *bp,
+ int level)
+{
+ xfs_da_blkinfo_t *info;
+ xfs_da_intnode_t *node;
+ xfs_dablk_t child_fsb;
+ xfs_daddr_t parent_blkno, child_blkno;
+ int error, i;
+ struct xfs_buf *child_bp;
+ struct xfs_da_node_entry *btree;
+ struct xfs_da3_icnode_hdr ichdr;
+
+ /*
+ * Since this code is recursive (gasp!) we must protect ourselves.
+ */
+ if (level > XFS_DA_NODE_MAXDEPTH) {
+ xfs_trans_brelse(*trans, bp); /* no locks for later trans */
+ return XFS_ERROR(EIO);
+ }
+
+ node = bp->b_addr;
+ xfs_da3_node_hdr_from_disk(&ichdr, node);
+ parent_blkno = bp->b_bn;
+ if (!ichdr.count) {
+ xfs_trans_brelse(*trans, bp);
+ return 0;
+ }
+ btree = xfs_da3_node_tree_p(node);
+ child_fsb = be32_to_cpu(btree[0].before);
+ xfs_trans_brelse(*trans, bp); /* no locks for later trans */
+
+ /*
+ * If this is the node level just above the leaves, simply loop
+ * over the leaves removing all of them. If this is higher up
+ * in the tree, recurse downward.
+ */
+ for (i = 0; i < ichdr.count; i++) {
+ /*
+ * Read the subsidiary block to see what we have to work with.
+ * Don't do this in a transaction. This is a depth-first
+ * traversal of the tree so we may deal with many blocks
+ * before we come back to this one.
+ */
+ error = xfs_da3_node_read(*trans, dp, child_fsb, -2, &child_bp,
+ XFS_ATTR_FORK);
+ if (error)
+ return(error);
+ if (child_bp) {
+ /* save for re-read later */
+ child_blkno = XFS_BUF_ADDR(child_bp);
+
+ /*
+ * Invalidate the subtree, however we have to.
+ */
+ info = child_bp->b_addr;
+ switch (info->magic) {
+ case cpu_to_be16(XFS_DA_NODE_MAGIC):
+ case cpu_to_be16(XFS_DA3_NODE_MAGIC):
+ error = xfs_attr3_node_inactive(trans, dp,
+ child_bp, level + 1);
+ break;
+ case cpu_to_be16(XFS_ATTR_LEAF_MAGIC):
+ case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
+ error = xfs_attr3_leaf_inactive(trans, dp,
+ child_bp);
+ break;
+ default:
+ error = XFS_ERROR(EIO);
+ xfs_trans_brelse(*trans, child_bp);
+ break;
+ }
+ if (error)
+ return error;
+
+ /*
+ * Remove the subsidiary block from the cache
+ * and from the log.
+ */
+ error = xfs_da_get_buf(*trans, dp, 0, child_blkno,
+ &child_bp, XFS_ATTR_FORK);
+ if (error)
+ return error;
+ xfs_trans_binval(*trans, child_bp);
+ }
+
+ /*
+ * If we're not done, re-read the parent to get the next
+ * child block number.
+ */
+ if (i + 1 < ichdr.count) {
+ error = xfs_da3_node_read(*trans, dp, 0, parent_blkno,
+ &bp, XFS_ATTR_FORK);
+ if (error)
+ return error;
+ child_fsb = be32_to_cpu(btree[i + 1].before);
+ xfs_trans_brelse(*trans, bp);
+ }
+ /*
+ * Atomically commit the whole invalidate stuff.
+ */
+ error = xfs_trans_roll(trans, dp);
+ if (error)
+ return error;
+ }
+
+ return 0;
+}
+
+/*
+ * Indiscriminately delete the entire attribute fork
+ *
+ * Recurse (gasp!) through the attribute nodes until we find leaves.
+ * We're doing a depth-first traversal in order to invalidate everything.
+ */
+int
+xfs_attr3_root_inactive(
+ struct xfs_trans **trans,
+ struct xfs_inode *dp)
+{
+ struct xfs_da_blkinfo *info;
+ struct xfs_buf *bp;
+ xfs_daddr_t blkno;
+ int error;
+
+ /*
+ * Read block 0 to see what we have to work with.
+ * We only get here if we have extents, since we remove
+ * the extents in reverse order the extent containing
+ * block 0 must still be there.
+ */
+ error = xfs_da3_node_read(*trans, dp, 0, -1, &bp, XFS_ATTR_FORK);
+ if (error)
+ return error;
+ blkno = bp->b_bn;
+
+ /*
+ * Invalidate the tree, even if the "tree" is only a single leaf block.
+ * This is a depth-first traversal!
+ */
+ info = bp->b_addr;
+ switch (info->magic) {
+ case cpu_to_be16(XFS_DA_NODE_MAGIC):
+ case cpu_to_be16(XFS_DA3_NODE_MAGIC):
+ error = xfs_attr3_node_inactive(trans, dp, bp, 1);
+ break;
+ case cpu_to_be16(XFS_ATTR_LEAF_MAGIC):
+ case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
+ error = xfs_attr3_leaf_inactive(trans, dp, bp);
+ break;
+ default:
+ error = XFS_ERROR(EIO);
+ xfs_trans_brelse(*trans, bp);
+ break;
+ }
+ if (error)
+ return error;
+
+ /*
+ * Invalidate the incore copy of the root block.
+ */
+ error = xfs_da_get_buf(*trans, dp, 0, blkno, &bp, XFS_ATTR_FORK);
+ if (error)
+ return error;
+ xfs_trans_binval(*trans, bp); /* remove from cache */
+ /*
+ * Commit the invalidate and start the next transaction.
+ */
+ error = xfs_trans_roll(trans, dp);
+
+ return error;
+}
+
+int
+xfs_attr_inactive(xfs_inode_t *dp)
+{
+ xfs_trans_t *trans;
+ xfs_mount_t *mp;
+ int error;
+
+ mp = dp->i_mount;
+ ASSERT(! XFS_NOT_DQATTACHED(mp, dp));
+
+ xfs_ilock(dp, XFS_ILOCK_SHARED);
+ if (!xfs_inode_hasattr(dp) ||
+ dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
+ xfs_iunlock(dp, XFS_ILOCK_SHARED);
+ return 0;
+ }
+ xfs_iunlock(dp, XFS_ILOCK_SHARED);
+
+ /*
+ * Start our first transaction of the day.
+ *
+ * All future transactions during this code must be "chained" off
+ * this one via the trans_dup() call. All transactions will contain
+ * the inode, and the inode will always be marked with trans_ihold().
+ * Since the inode will be locked in all transactions, we must log
+ * the inode in every transaction to let it float upward through
+ * the log.
+ */
+ trans = xfs_trans_alloc(mp, XFS_TRANS_ATTRINVAL);
+ if ((error = xfs_trans_reserve(trans, 0, XFS_ATTRINVAL_LOG_RES(mp), 0,
+ XFS_TRANS_PERM_LOG_RES,
+ XFS_ATTRINVAL_LOG_COUNT))) {
+ xfs_trans_cancel(trans, 0);
+ return(error);
+ }
+ xfs_ilock(dp, XFS_ILOCK_EXCL);
+
+ /*
+ * No need to make quota reservations here. We expect to release some
+ * blocks, not allocate, in the common case.
+ */
+ xfs_trans_ijoin(trans, dp, 0);
+
+ /*
+ * Decide on what work routines to call based on the inode size.
+ */
+ if (!xfs_inode_hasattr(dp) ||
+ dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
+ error = 0;
+ goto out;
+ }
+ error = xfs_attr3_root_inactive(&trans, dp);
+ if (error)
+ goto out;
+
+ error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK, 0);
+ if (error)
+ goto out;
+
+ error = xfs_trans_commit(trans, XFS_TRANS_RELEASE_LOG_RES);
+ xfs_iunlock(dp, XFS_ILOCK_EXCL);
+
+ return(error);
+
+out:
+ xfs_trans_cancel(trans, XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_ABORT);
+ xfs_iunlock(dp, XFS_ILOCK_EXCL);
+ return(error);
+}
diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index ba13857..791ade2 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -2712,344 +2712,3 @@ xfs_attr3_leaf_flipflags(
return error;
}
-/*========================================================================
- * Indiscriminately delete the entire attribute fork
- *========================================================================*/
-
-/*
- * Recurse (gasp!) through the attribute nodes until we find leaves.
- * We're doing a depth-first traversal in order to invalidate everything.
- */
-int
-xfs_attr3_root_inactive(
- struct xfs_trans **trans,
- struct xfs_inode *dp)
-{
- struct xfs_da_blkinfo *info;
- struct xfs_buf *bp;
- xfs_daddr_t blkno;
- int error;
-
- /*
- * Read block 0 to see what we have to work with.
- * We only get here if we have extents, since we remove
- * the extents in reverse order the extent containing
- * block 0 must still be there.
- */
- error = xfs_da3_node_read(*trans, dp, 0, -1, &bp, XFS_ATTR_FORK);
- if (error)
- return error;
- blkno = bp->b_bn;
-
- /*
- * Invalidate the tree, even if the "tree" is only a single leaf block.
- * This is a depth-first traversal!
- */
- info = bp->b_addr;
- switch (info->magic) {
- case cpu_to_be16(XFS_DA_NODE_MAGIC):
- case cpu_to_be16(XFS_DA3_NODE_MAGIC):
- error = xfs_attr3_node_inactive(trans, dp, bp, 1);
- break;
- case cpu_to_be16(XFS_ATTR_LEAF_MAGIC):
- case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
- error = xfs_attr3_leaf_inactive(trans, dp, bp);
- break;
- default:
- error = XFS_ERROR(EIO);
- xfs_trans_brelse(*trans, bp);
- break;
- }
- if (error)
- return error;
-
- /*
- * Invalidate the incore copy of the root block.
- */
- error = xfs_da_get_buf(*trans, dp, 0, blkno, &bp, XFS_ATTR_FORK);
- if (error)
- return error;
- xfs_trans_binval(*trans, bp); /* remove from cache */
- /*
- * Commit the invalidate and start the next transaction.
- */
- error = xfs_trans_roll(trans, dp);
-
- return error;
-}
-
-/*
- * Recurse (gasp!) through the attribute nodes until we find leaves.
- * We're doing a depth-first traversal in order to invalidate everything.
- */
-STATIC int
-xfs_attr3_node_inactive(
- struct xfs_trans **trans,
- struct xfs_inode *dp,
- struct xfs_buf *bp,
- int level)
-{
- xfs_da_blkinfo_t *info;
- xfs_da_intnode_t *node;
- xfs_dablk_t child_fsb;
- xfs_daddr_t parent_blkno, child_blkno;
- int error, i;
- struct xfs_buf *child_bp;
- struct xfs_da_node_entry *btree;
- struct xfs_da3_icnode_hdr ichdr;
-
- /*
- * Since this code is recursive (gasp!) we must protect ourselves.
- */
- if (level > XFS_DA_NODE_MAXDEPTH) {
- xfs_trans_brelse(*trans, bp); /* no locks for later trans */
- return XFS_ERROR(EIO);
- }
-
- node = bp->b_addr;
- xfs_da3_node_hdr_from_disk(&ichdr, node);
- parent_blkno = bp->b_bn;
- if (!ichdr.count) {
- xfs_trans_brelse(*trans, bp);
- return 0;
- }
- btree = xfs_da3_node_tree_p(node);
- child_fsb = be32_to_cpu(btree[0].before);
- xfs_trans_brelse(*trans, bp); /* no locks for later trans */
-
- /*
- * If this is the node level just above the leaves, simply loop
- * over the leaves removing all of them. If this is higher up
- * in the tree, recurse downward.
- */
- for (i = 0; i < ichdr.count; i++) {
- /*
- * Read the subsidiary block to see what we have to work with.
- * Don't do this in a transaction. This is a depth-first
- * traversal of the tree so we may deal with many blocks
- * before we come back to this one.
- */
- error = xfs_da3_node_read(*trans, dp, child_fsb, -2, &child_bp,
- XFS_ATTR_FORK);
- if (error)
- return(error);
- if (child_bp) {
- /* save for re-read later */
- child_blkno = XFS_BUF_ADDR(child_bp);
-
- /*
- * Invalidate the subtree, however we have to.
- */
- info = child_bp->b_addr;
- switch (info->magic) {
- case cpu_to_be16(XFS_DA_NODE_MAGIC):
- case cpu_to_be16(XFS_DA3_NODE_MAGIC):
- error = xfs_attr3_node_inactive(trans, dp,
- child_bp, level + 1);
- break;
- case cpu_to_be16(XFS_ATTR_LEAF_MAGIC):
- case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
- error = xfs_attr3_leaf_inactive(trans, dp,
- child_bp);
- break;
- default:
- error = XFS_ERROR(EIO);
- xfs_trans_brelse(*trans, child_bp);
- break;
- }
- if (error)
- return error;
-
- /*
- * Remove the subsidiary block from the cache
- * and from the log.
- */
- error = xfs_da_get_buf(*trans, dp, 0, child_blkno,
- &child_bp, XFS_ATTR_FORK);
- if (error)
- return error;
- xfs_trans_binval(*trans, child_bp);
- }
-
- /*
- * If we're not done, re-read the parent to get the next
- * child block number.
- */
- if (i + 1 < ichdr.count) {
- error = xfs_da3_node_read(*trans, dp, 0, parent_blkno,
- &bp, XFS_ATTR_FORK);
- if (error)
- return error;
- child_fsb = be32_to_cpu(btree[i + 1].before);
- xfs_trans_brelse(*trans, bp);
- }
- /*
- * Atomically commit the whole invalidate stuff.
- */
- error = xfs_trans_roll(trans, dp);
- if (error)
- return error;
- }
-
- return 0;
-}
-
-/*
- * Invalidate all of the "remote" value regions pointed to by a particular
- * leaf block.
- * Note that we must release the lock on the buffer so that we are not
- * caught holding something that the logging code wants to flush to disk.
- */
-STATIC int
-xfs_attr3_leaf_inactive(
- struct xfs_trans **trans,
- struct xfs_inode *dp,
- struct xfs_buf *bp)
-{
- struct xfs_attr_leafblock *leaf;
- struct xfs_attr3_icleaf_hdr ichdr;
- struct xfs_attr_leaf_entry *entry;
- struct xfs_attr_leaf_name_remote *name_rmt;
- struct xfs_attr_inactive_list *list;
- struct xfs_attr_inactive_list *lp;
- int error;
- int count;
- int size;
- int tmp;
- int i;
-
- leaf = bp->b_addr;
- xfs_attr3_leaf_hdr_from_disk(&ichdr, leaf);
-
- /*
- * Count the number of "remote" value extents.
- */
- count = 0;
- entry = xfs_attr3_leaf_entryp(leaf);
- for (i = 0; i < ichdr.count; entry++, i++) {
- if (be16_to_cpu(entry->nameidx) &&
- ((entry->flags & XFS_ATTR_LOCAL) == 0)) {
- name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
- if (name_rmt->valueblk)
- count++;
- }
- }
-
- /*
- * If there are no "remote" values, we're done.
- */
- if (count == 0) {
- xfs_trans_brelse(*trans, bp);
- return 0;
- }
-
- /*
- * Allocate storage for a list of all the "remote" value extents.
- */
- size = count * sizeof(xfs_attr_inactive_list_t);
- list = kmem_alloc(size, KM_SLEEP);
-
- /*
- * Identify each of the "remote" value extents.
- */
- lp = list;
- entry = xfs_attr3_leaf_entryp(leaf);
- for (i = 0; i < ichdr.count; entry++, i++) {
- if (be16_to_cpu(entry->nameidx) &&
- ((entry->flags & XFS_ATTR_LOCAL) == 0)) {
- name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
- if (name_rmt->valueblk) {
- lp->valueblk = be32_to_cpu(name_rmt->valueblk);
- lp->valuelen = xfs_attr3_rmt_blocks(dp->i_mount,
- be32_to_cpu(name_rmt->valuelen));
- lp++;
- }
- }
- }
- xfs_trans_brelse(*trans, bp); /* unlock for trans. in freextent() */
-
- /*
- * Invalidate each of the "remote" value extents.
- */
- error = 0;
- for (lp = list, i = 0; i < count; i++, lp++) {
- tmp = xfs_attr3_leaf_freextent(trans, dp,
- lp->valueblk, lp->valuelen);
-
- if (error == 0)
- error = tmp; /* save only the 1st errno */
- }
-
- kmem_free(list);
- return error;
-}
-
-/*
- * Look at all the extents for this logical region,
- * invalidate any buffers that are incore/in transactions.
- */
-STATIC int
-xfs_attr3_leaf_freextent(
- struct xfs_trans **trans,
- struct xfs_inode *dp,
- xfs_dablk_t blkno,
- int blkcnt)
-{
- struct xfs_bmbt_irec map;
- struct xfs_buf *bp;
- xfs_dablk_t tblkno;
- xfs_daddr_t dblkno;
- int tblkcnt;
- int dblkcnt;
- int nmap;
- int error;
-
- /*
- * Roll through the "value", invalidating the attribute value's
- * blocks.
- */
- tblkno = blkno;
- tblkcnt = blkcnt;
- while (tblkcnt > 0) {
- /*
- * Try to remember where we decided to put the value.
- */
- nmap = 1;
- error = xfs_bmapi_read(dp, (xfs_fileoff_t)tblkno, tblkcnt,
- &map, &nmap, XFS_BMAPI_ATTRFORK);
- if (error) {
- return(error);
- }
- ASSERT(nmap == 1);
- ASSERT(map.br_startblock != DELAYSTARTBLOCK);
-
- /*
- * If it's a hole, these are already unmapped
- * so there's nothing to invalidate.
- */
- if (map.br_startblock != HOLESTARTBLOCK) {
-
- dblkno = XFS_FSB_TO_DADDR(dp->i_mount,
- map.br_startblock);
- dblkcnt = XFS_FSB_TO_BB(dp->i_mount,
- map.br_blockcount);
- bp = xfs_trans_get_buf(*trans,
- dp->i_mount->m_ddev_targp,
- dblkno, dblkcnt, 0);
- if (!bp)
- return ENOMEM;
- xfs_trans_binval(*trans, bp);
- /*
- * Roll to next transaction.
- */
- error = xfs_trans_roll(trans, dp);
- if (error)
- return (error);
- }
-
- tblkno += map.br_blockcount;
- tblkcnt -= map.br_blockcount;
- }
-
- return(0);
-}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 18+ messages in thread