* [PATCH 01/60] xfs: update mount options documentation
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-20 15:35 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 02/60] xfs: add pluging for bulkstat readahead Dave Chinner
` (60 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Because it's horribly out of date.
And mark various deprecated options as deprecated and give them a
removal date.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
Documentation/filesystems/xfs.txt | 317 ++++++++++++++++++++++++-------------
1 file changed, 209 insertions(+), 108 deletions(-)
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 83577f0..12525b1 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -18,6 +18,8 @@ Mount Options
=============
When mounting an XFS filesystem, the following options are accepted.
+For boolean mount options, the names with the (*) suffix is the
+default behaviour.
allocsize=size
Sets the buffered I/O end-of-file preallocation size when
@@ -25,97 +27,128 @@ When mounting an XFS filesystem, the following options are accepted.
Valid values for this option are page size (typically 4KiB)
through to 1GiB, inclusive, in power-of-2 increments.
- attr2/noattr2
- The options enable/disable (default is disabled for backward
- compatibility on-disk) an "opportunistic" improvement to be
- made in the way inline extended attributes are stored on-disk.
- When the new form is used for the first time (by setting or
- removing extended attributes) the on-disk superblock feature
- bit field will be updated to reflect this format being in use.
+ The default behaviour is for dynamic end-of-file
+ preallocation size, which uses a set of heuristics to
+ optimise the preallocation size based on the current
+ allocation patterns within the file and the access patterns
+ to the file. Specifying a fixed allocsize value turns off
+ the dynamic behaviour.
+
+ attr2
+ noattr2
+ The options enable/disable an "opportunistic" improvement to
+ be made in the way inline extended attributes are stored
+ on-disk. When the new form is used for the first time when
+ attr2 is selected (either when setting or removing extended
+ attributes) the on-disk superblock feature bit field will be
+ updated to reflect this format being in use.
+
+ The default behaviour is determined by the on-disk feature
+ bit indicating that attr2 behaviour is active. If either
+ mount option it set, then that becomes the new default used
+ by the filesystem.
CRC enabled filesystems always use the attr2 format, and so
will reject the noattr2 mount option if it is set.
- barrier
- Enables the use of block layer write barriers for writes into
- the journal and unwritten extent conversion. This allows for
- drive level write caching to be enabled, for devices that
- support write barriers.
+ barrier (*)
+ nobarrier
+ Enables/disables the use of block layer write barriers for
+ writes into the journal and for data integrity operations.
+ This allows for drive level write caching to be enabled, for
+ devices that support write barriers.
discard
- Issue command to let the block device reclaim space freed by the
- filesystem. This is useful for SSD devices, thinly provisioned
- LUNs and virtual machine images, but may have a performance
- impact.
-
- dmapi
- Enable the DMAPI (Data Management API) event callouts.
- Use with the "mtpt" option.
-
- grpid/bsdgroups and nogrpid/sysvgroups
- These options define what group ID a newly created file gets.
- When grpid is set, it takes the group ID of the directory in
- which it is created; otherwise (the default) it takes the fsgid
- of the current process, unless the directory has the setgid bit
- set, in which case it takes the gid from the parent directory,
- and also gets the setgid bit set if it is a directory itself.
-
- ihashsize=value
- In memory inode hashes have been removed, so this option has
- no function as of August 2007. Option is deprecated.
-
- ikeep/noikeep
- When ikeep is specified, XFS does not delete empty inode clusters
- and keeps them around on disk. ikeep is the traditional XFS
- behaviour. When noikeep is specified, empty inode clusters
- are returned to the free space pool. The default is noikeep for
- non-DMAPI mounts, while ikeep is the default when DMAPI is in use.
-
- inode64
- Indicates that XFS is allowed to create inodes at any location
- in the filesystem, including those which will result in inode
- numbers occupying more than 32 bits of significance. This is
- the default allocation option. Applications which do not handle
- inode numbers bigger than 32 bits, should use inode32 option.
+ nodiscard (*)
+ Enable/disable the issuing of commands to let the block
+ device reclaim space freed by the filesystem. This is
+ useful for SSD devices, thinly provisioned LUNs and virtual
+ machine images, but may have a performance impact.
+
+ Note: It is currently recommended that you use the fstrim
+ application to discard unused blocks rather than the discard
+ mount option because the performance impact of this option
+ is quite severe.
+
+ grpid/bsdgroups
+ nogrpid/sysvgroups (*)
+ These options define what group ID a newly created file
+ gets. When grpid is set, it takes the group ID of the
+ directory in which it is created; otherwise it takes the
+ fsgid of the current process, unless the directory has the
+ setgid bit set, in which case it takes the gid from the
+ parent directory, and also gets the setgid bit set if it is
+ a directory itself.
+
+ filestreams
+ Make the data allocator use the filestreams allocation mode
+ across the entire filesystem rather than just on directories
+ configured to use it.
+
+ ikeep
+ noikeep (*)
+ When ikeep is specified, XFS does not delete empty inode
+ clusters and keeps them around on disk. When noikeep is
+ specified, empty inode clusters are returned to the free
+ space pool.
inode32
- Indicates that XFS is limited to create inodes at locations which
- will not result in inode numbers with more than 32 bits of
- significance. This is provided for backwards compatibility, since
- 64 bits inode numbers might cause problems for some applications
- that cannot handle large inode numbers.
-
- largeio/nolargeio
+ inode64 (*)
+ When inode32 is specified, it indicates that XFS limits
+ inode creation to locations which will not result in inode
+ numbers with more than 32 bits of significance.
+
+ When inode64 is specified, it indicates that XFS is allowed
+ to create inodes at any location in the filesystem,
+ including those which will result in inode numbers occupying
+ more than 32 bits of significance.
+
+ inode32 is provided for backwards compatibility with older
+ systems and applications, since 64 bits inode numbers might
+ cause problems for some applications that cannot handle
+ large inode numbers. If applications are in use which do
+ not handle inode numbers bigger than 32 bits, the inode32
+ option should be specified.
+
+
+ largeio
+ nolargeio (*)
If "nolargeio" is specified, the optimal I/O reported in
- st_blksize by stat(2) will be as small as possible to allow user
- applications to avoid inefficient read/modify/write I/O.
- If "largeio" specified, a filesystem that has a "swidth" specified
- will return the "swidth" value (in bytes) in st_blksize. If the
- filesystem does not have a "swidth" specified but does specify
- an "allocsize" then "allocsize" (in bytes) will be returned
- instead.
- If neither of these two options are specified, then filesystem
- will behave as if "nolargeio" was specified.
+ st_blksize by stat(2) will be as small as possible to allow
+ user applications to avoid inefficient read/modify/write
+ I/O. This is typically the page size of the machine, as
+ this is the granularity of the page cache.
+
+ If "largeio" specified, a filesystem that was created with a
+ "swidth" specified will return the "swidth" value (in bytes)
+ in st_blksize. If the filesystem does not have a "swidth"
+ specified but does specify an "allocsize" then "allocsize"
+ (in bytes) will be returned instead. Otherwise the behaviour
+ is the same as if "nolargeio" was specified.
logbufs=value
- Set the number of in-memory log buffers. Valid numbers range
- from 2-8 inclusive.
- The default value is 8 buffers for filesystems with a
- blocksize of 64KiB, 4 buffers for filesystems with a blocksize
- of 32KiB, 3 buffers for filesystems with a blocksize of 16KiB
- and 2 buffers for all other configurations. Increasing the
- number of buffers may increase performance on some workloads
- at the cost of the memory used for the additional log buffers
- and their associated control structures.
+ Set the number of in-memory log buffers. Valid numbers
+ range from 2-8 inclusive.
+
+ The default value is 8 buffers.
+
+ If the memory cost of 8 log buffers is too high on small
+ systems, then it may be reduced at some cost to performance
+ on metadata intensive workloads. The logbsize option below
+ controls the size of each buffer and so is also relevent to
+ this case.
logbsize=value
- Set the size of each in-memory log buffer.
- Size may be specified in bytes, or in kilobytes with a "k" suffix.
- Valid sizes for version 1 and version 2 logs are 16384 (16k) and
- 32768 (32k). Valid sizes for version 2 logs also include
- 65536 (64k), 131072 (128k) and 262144 (256k).
- The default value for machines with more than 32MiB of memory
- is 32768, machines with less memory use 16384 by default.
+ Set the size of each in-memory log buffer. The size may be
+ specified in bytes, or in kilobytes with a "k" suffix.
+ Valid sizes for version 1 and version 2 logs are 16384 (16k)
+ and 32768 (32k). Valid sizes for version 2 logs also
+ include 65536 (64k), 131072 (128k) and 262144 (256k). The
+ logbsize must be an integer multiple of the log
+ stripe unit configured at mkfs time.
+
+ The default value for for version 1 logs is 32768, while the
+ default value for version 2 logs is MAX(32768, log_sunit).
logdev=device and rtdev=device
Use an external log (metadata journal) and/or real-time device.
@@ -124,16 +157,11 @@ When mounting an XFS filesystem, the following options are accepted.
optional, and the log section can be separate from the data
section or contained within it.
- mtpt=mountpoint
- Use with the "dmapi" option. The value specified here will be
- included in the DMAPI mount event, and should be the path of
- the actual mountpoint that is used.
-
noalign
- Data allocations will not be aligned at stripe unit boundaries.
-
- noatime
- Access timestamps are not updated when a file is read.
+ Data allocations will not be aligned at stripe unit
+ boundaries. This is only relevant to filesystems created
+ with non-zero data alignment parameters (sunit, swidth) by
+ mkfs.
norecovery
The filesystem will be mounted without running log recovery.
@@ -144,8 +172,14 @@ When mounting an XFS filesystem, the following options are accepted.
the mount will fail.
nouuid
- Don't check for double mounted file systems using the file system uuid.
- This is useful to mount LVM snapshot volumes.
+ Don't check for double mounted file systems using the file
+ system uuid. This is useful to mount LVM snapshot volumes,
+ and often used in combination with "norecovery" for mounting
+ read-only snapshots.
+
+ noquota
+ Forcibly turns off all quota accounting and enforcement
+ within the filesystem.
uquota/usrquota/uqnoenforce/quota
User disk quota accounting enabled, and limits (optionally)
@@ -160,24 +194,64 @@ When mounting an XFS filesystem, the following options are accepted.
enforced. Refer to xfs_quota(8) for further details.
sunit=value and swidth=value
- Used to specify the stripe unit and width for a RAID device or
- a stripe volume. "value" must be specified in 512-byte block
- units.
- If this option is not specified and the filesystem was made on
- a stripe volume or the stripe width or unit were specified for
- the RAID device at mkfs time, then the mount system call will
- restore the value from the superblock. For filesystems that
- are made directly on RAID devices, these options can be used
- to override the information in the superblock if the underlying
- disk layout changes after the filesystem has been created.
- The "swidth" option is required if the "sunit" option has been
- specified, and must be a multiple of the "sunit" value.
+ Used to specify the stripe unit and width for a RAID device
+ or a stripe volume. "value" must be specified in 512-byte
+ block units. These options are only relevant to filesystems
+ that were created with non-zero data alignment parameters.
+
+ The sunit and swidth parameters specified must be compatible
+ with the existing filesystem alignment characteristics. In
+ general, that means the only valid changes to sunit are
+ increasing it by a power-of-2 multiple. Valid swidth values
+ are any integer multiple of a valid sunit value.
+
+ Typically the only time these mount options are necessary if
+ after an underlying RAID device has had it's geometry
+ modified, such as adding a new disk to a RAID5 lun and
+ reshaping it.
swalloc
Data allocations will be rounded up to stripe width boundaries
when the current end of file is being extended and the file
size is larger than the stripe width size.
+ wsync
+ When specified, all filesystem namespace operations are
+ executed synchronously. This ensures that when the namespace
+ operation (create, unlink, etc) completes, the change to the
+ namespace is on stable storage. This is useful in HA setups
+ where failover must not result in clients seeing
+ inconsistent namespace presentation during or after a
+ failover event.
+
+
+Deprecated Mount Options
+========================
+
+ delaylog/nodelaylog
+ Delayed logging is the only logging method that XFS supports
+ now, so these mount options are now ignored.
+
+ Due for removal in 3.12.
+
+ ihashsize=value
+ In memory inode hashes have been removed, so this option has
+ no function as of August 2007. Option is deprecated.
+
+ Due for removal in 3.12.
+
+ irixsgid
+ This behaviour is now controlled by a sysctl, so the mount
+ option is ignored.
+
+ Due for removal in 3.12.
+
+ osyncisdsync
+ osyncisosync
+ O_SYNC and O_DSYNC are fully supported, so there is no need
+ for these options any more.
+
+ Due for removal in 3.12.
sysctls
=======
@@ -189,15 +263,20 @@ The following sysctls are available for the XFS filesystem:
in /proc/fs/xfs/stat. It then immediately resets to "0".
fs.xfs.xfssyncd_centisecs (Min: 100 Default: 3000 Max: 720000)
- The interval at which the xfssyncd thread flushes metadata
- out to disk. This thread will flush log activity out, and
- do some processing on unlinked inodes.
+ The interval at which the filesystem flushes metadata
+ out to disk and runs internal cache cleanup routines.
- fs.xfs.xfsbufd_centisecs (Min: 50 Default: 100 Max: 3000)
- The interval at which xfsbufd scans the dirty metadata buffers list.
+ fs.xfs.filestream_centisecs (Min: 1 Default: 3000 Max: 360000)
+ The interval at which the filesystem ages filestreams cache
+ references and returns timed-out AGs back to the free stream
+ pool.
- fs.xfs.age_buffer_centisecs (Min: 100 Default: 1500 Max: 720000)
- The age at which xfsbufd flushes dirty metadata buffers to disk.
+ fs.xfs.speculative_prealloc_lifetime
+ (Units: seconds Min: 1 Default: 300 Max: 86400)
+ The interval at which the background scanning for inodes
+ with unused speculative preallocation runs. The scan
+ removes unused preallocation from clean inodes and releases
+ the unused space back to the free pool.
fs.xfs.error_level (Min: 0 Default: 3 Max: 11)
A volume knob for error reporting when internal errors occur.
@@ -254,9 +333,31 @@ The following sysctls are available for the XFS filesystem:
by the xfs_io(8) chattr command on a directory to be
inherited by files in that directory.
+ fs.xfs.inherit_nodefrag (Min: 0 Default: 1 Max: 1)
+ Setting this to "1" will cause the "nodefrag" flag set
+ by the xfs_io(8) chattr command on a directory to be
+ inherited by files in that directory.
+
fs.xfs.rotorstep (Min: 1 Default: 1 Max: 256)
In "inode32" allocation mode, this option determines how many
files the allocator attempts to allocate in the same allocation
group before moving to the next allocation group. The intent
is to control the rate at which the allocator moves between
allocation groups when allocating extents for new files.
+
+Deprecated Sysctls
+==================
+
+ fs.xfs.xfsbufd_centisecs (Min: 50 Default: 100 Max: 3000)
+ Dirty metadata is now tracked by the log subsystem and
+ flushing is driven by log space and idling demands. The
+ xfsbufd no longer exists, so this syctl does nothing.
+
+ Due for removal in 3.14.
+
+ fs.xfs.age_buffer_centisecs (Min: 100 Default: 1500 Max: 720000)
+ Dirty metadata is now tracked by the log subsystem and
+ flushing is driven by log space and idling demands. The
+ xfsbufd no longer exists, so this syctl does nothing.
+
+ Due for removal in 3.14.
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 01/60] xfs: update mount options documentation
2013-06-19 4:50 ` [PATCH 01/60] xfs: update mount options documentation Dave Chinner
@ 2013-06-20 15:35 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-20 15:35 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> Because it's horribly out of date.
>
> And mark various deprecated options as deprecated and give them a
> removal date.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---
> Documentation/filesystems/xfs.txt | 317 ++++++++++++++++++++++++-------------
> 1 file changed, 209 insertions(+), 108 deletions(-)
>
> diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
> index 83577f0..12525b1 100644
> --- a/Documentation/filesystems/xfs.txt
> +++ b/Documentation/filesystems/xfs.txt
> @@ -18,6 +18,8 @@ Mount Options
> =============
>
> When mounting an XFS filesystem, the following options are accepted.
> +For boolean mount options, the names with the (*) suffix is the
> +default behaviour.
>
> allocsize=size
> Sets the buffered I/O end-of-file preallocation size when
> @@ -25,97 +27,128 @@ When mounting an XFS filesystem, the following options are accepted.
> Valid values for this option are page size (typically 4KiB)
> through to 1GiB, inclusive, in power-of-2 increments.
>
> - attr2/noattr2
> - The options enable/disable (default is disabled for backward
> - compatibility on-disk) an "opportunistic" improvement to be
> - made in the way inline extended attributes are stored on-disk.
> - When the new form is used for the first time (by setting or
> - removing extended attributes) the on-disk superblock feature
> - bit field will be updated to reflect this format being in use.
> + The default behaviour is for dynamic end-of-file
> + preallocation size, which uses a set of heuristics to
> + optimise the preallocation size based on the current
> + allocation patterns within the file and the access patterns
> + to the file. Specifying a fixed allocsize value turns off
> + the dynamic behaviour.
> +
> + attr2
> + noattr2
> + The options enable/disable an "opportunistic" improvement to
> + be made in the way inline extended attributes are stored
> + on-disk. When the new form is used for the first time when
> + attr2 is selected (either when setting or removing extended
> + attributes) the on-disk superblock feature bit field will be
> + updated to reflect this format being in use.
> +
> + The default behaviour is determined by the on-disk feature
> + bit indicating that attr2 behaviour is active. If either
> + mount option it set, then that becomes the new default used
> + by the filesystem.
>
> CRC enabled filesystems always use the attr2 format, and so
> will reject the noattr2 mount option if it is set.
>
Thanks for doing this.
allocsize and attr/noattr seem a bit too run-on for me, *but* we all
have our writing style; mine is terrible so you can ignore the below if
you want. The "(?)" are added if the noattr2 command is also added.
Rest looks good to me and consider it:
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
allocsize=size
Sets the buffered I/O end-of-file preallocation size when
doing delayed allocation writeout. The default size is 64KiB is
and valid values for this option are page size (typically 4KiB)
through to 1GiB, inclusive, in power-of-2 increments.
The default behaviour is to use the dynamic end-of-file
preallocation size, which uses a set of heuristics to
optimise the preallocation size based on the current
allocation patterns within the file and the access patterns
to the file. Specifying a fixed allocsize value turns off
the dynamic behaviour.
attr2
noattr2
The options enable/disable the "opportunistic" improvement
in the way inline extended attributes are stored on-disk.
When the new attribute storage form is used for the first time
(that usuage may be either when setting or removing extended
attributes) after selecting (?one of ?) the attr2 (? or noattr2?)
option(?s), the on-disk superblock feature bit field will be
updated
to reflect this is the format to be used.
The default behaviour is determined by the on-disk feature
bit indicating that attr2 behaviour is active. If either
mount option it set, then that becomes the new default used
by the filesystem.
CRC enabled filesystems always use the attr2 format, and so
will reject the noattr2 mount option if it is set.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 02/60] xfs: add pluging for bulkstat readahead
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
2013-06-19 4:50 ` [PATCH 01/60] xfs: update mount options documentation Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-20 16:59 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 03/60] xfs: plug directory buffer readahead Dave Chinner
` (59 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
I was running some tests on bulkstat on CRC enabled filesystems when
I noticed that all the IO being issued was 8k in size, regardless of
the fact taht we are issuing sequential 8k buffers for inodes
clusters. The IO size shoul dbe 16k for 256 byte inodes, and 32k for
512 byte inodes, but this wasn't happening.
blktrace showed that there was an explict plug and unplug happening
around each readahead IO from _xfs_buf_ioapply, and the unplug was
causing the IO to be issued immediately. Hence no opportunity was
being given to the elevator to merge adjacent readahead requests and
dispatch them as a single IO.
Add plugging around the inode chunk readahead dispatch loop tin
bulkstat to ensure that we don't unplug the queue between adjacent
inode buffer readahead IOs and so we get fewer, larger IO requests
hitting the storage subsystemi for bulkstat.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_itable.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 2ea7d40..06d004d 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -383,11 +383,13 @@ xfs_bulkstat(
* Also start read-ahead now for this chunk.
*/
if (r.ir_freecount < XFS_INODES_PER_CHUNK) {
+ struct blk_plug plug;
/*
* Loop over all clusters in the next chunk.
* Do a readahead if there are any allocated
* inodes in that cluster.
*/
+ blk_start_plug(&plug);
agbno = XFS_AGINO_TO_AGBNO(mp, r.ir_startino);
for (chunkidx = 0;
chunkidx < XFS_INODES_PER_CHUNK;
@@ -399,6 +401,7 @@ xfs_bulkstat(
agbno, nbcluster,
&xfs_inode_buf_ops);
}
+ blk_finish_plug(&plug);
irbp->ir_startino = r.ir_startino;
irbp->ir_freecount = r.ir_freecount;
irbp->ir_free = r.ir_free;
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 02/60] xfs: add pluging for bulkstat readahead
2013-06-19 4:50 ` [PATCH 02/60] xfs: add pluging for bulkstat readahead Dave Chinner
@ 2013-06-20 16:59 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-20 16:59 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> I was running some tests on bulkstat on CRC enabled filesystems when
> I noticed that all the IO being issued was 8k in size, regardless of
> the fact taht we are issuing sequential 8k buffers for inodes
> clusters. The IO size shoul dbe 16k for 256 byte inodes, and 32k for
> 512 byte inodes, but this wasn't happening.
>
> blktrace showed that there was an explict plug and unplug happening
> around each readahead IO from _xfs_buf_ioapply, and the unplug was
> causing the IO to be issued immediately. Hence no opportunity was
> being given to the elevator to merge adjacent readahead requests and
> dispatch them as a single IO.
>
> Add plugging around the inode chunk readahead dispatch loop tin
> bulkstat to ensure that we don't unplug the queue between adjacent
> inode buffer readahead IOs and so we get fewer, larger IO requests
> hitting the storage subsystemi for bulkstat.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---
Looks good.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 03/60] xfs: plug directory buffer readahead
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
2013-06-19 4:50 ` [PATCH 01/60] xfs: update mount options documentation Dave Chinner
2013-06-19 4:50 ` [PATCH 02/60] xfs: add pluging for bulkstat readahead Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-20 18:45 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 04/60] xfs: don't use speculative prealloc for small files Dave Chinner
` (58 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Similar to bulkstat inode chunk readahead, we need to plug directory
data buffer readahead during getdents to ensure that we can merge
adjacent readahead requests and sort out of order requests optimally
before they are dispatched. This improves the readahead efficiency
and reduces the IO load it generates as the IO patterns are
significantly better for both contiguous and fragmented directories.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_dir2_leaf.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index da71a18..bb22aac 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -1108,6 +1108,7 @@ xfs_dir2_leaf_readbuf(
struct xfs_mount *mp = dp->i_mount;
struct xfs_buf *bp = *bpp;
struct xfs_bmbt_irec *map = mip->map;
+ struct blk_plug plug;
int error = 0;
int length;
int i;
@@ -1236,6 +1237,7 @@ xfs_dir2_leaf_readbuf(
/*
* Do we need more readahead?
*/
+ blk_start_plug(&plug);
for (mip->ra_index = mip->ra_offset = i = 0;
mip->ra_want > mip->ra_current && i < mip->map_blocks;
i += mp->m_dirblkfsbs) {
@@ -1287,6 +1289,7 @@ xfs_dir2_leaf_readbuf(
}
}
}
+ blk_finish_plug(&plug);
out:
*bpp = bp;
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 03/60] xfs: plug directory buffer readahead
2013-06-19 4:50 ` [PATCH 03/60] xfs: plug directory buffer readahead Dave Chinner
@ 2013-06-20 18:45 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-20 18:45 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> Similar to bulkstat inode chunk readahead, we need to plug directory
> data buffer readahead during getdents to ensure that we can merge
> adjacent readahead requests and sort out of order requests optimally
> before they are dispatched. This improves the readahead efficiency
> and reduces the IO load it generates as the IO patterns are
> significantly better for both contiguous and fragmented directories.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---
Looks good.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 04/60] xfs: don't use speculative prealloc for small files
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (2 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 03/60] xfs: plug directory buffer readahead Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 12:59 ` Brian Foster
2013-06-20 19:31 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 05/60] xfs: don't do IO when creating an new inode Dave Chinner
` (57 subsequent siblings)
61 siblings, 2 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Dedicated small file workloads have been seeing significant free
space fragmentation causing premature inode allocation failure
when large inode sizes are in use. A particular test case showed
that a workload that runs to a real ENOSPC on 256 byte inodes would
fail inode allocation with ENOSPC about about 80% full with 512 byte
inodes, and at about 50% full with 1024 byte inodes.
The same workload, when run with -o allocsize=4096 on 1024 byte
inodes would run to being 100% full before giving ENOSPC. That is,
no freespace fragmentation at all.
The issue was caused by the specific IO pattern the application had
- the framework it was using did not support direct IO, and so it
was emulating it by using fadvise(DONT_NEED). The result was that
the data was getting written back before the speculative prealloc
had been trimmed from memory by the close(), and so small single
block files were being allocated with 2 blocks, and then having one
truncated away. The result was lots of small 4k free space extents,
and hence each new 8k allocation would take another 8k from
contiguous free space and turn it into 4k of allocated space and 4k
of free space.
Hence inode allocation, which requires contiguous, aligned
allocation of 16k (256 byte inodes), 32k (512 byte inodes) or 64k
(1024 byte inodes) can fail to find sufficiently large freespace and
hence fail while there is still lots of free space available.
There's a simple fix for this, and one that has precendence in the
allocator code already - don't do speculative allocation unless the
size of the file is larger than a certain size. In this case, that
size is the minimum default preallocation size:
mp->m_writeio_blocks. And to keep with the concept of being nice to
people when the files are still relatively small, cap the prealloc
to mp->m_writeio_blocks until the file goes over a stripe unit is
size, at which point we'll fall back to the current behaviour based
on the last extent size.
This will effectively turn off speculative prealloc for very small
files, keep preallocation low for small files, and behave as it
currently does for any file larger than a stripe unit. This
completely avoids the freespace fragmentation problem this
particular IO pattern was causing.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_iomap.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 8f8aaee..6a70964 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -284,6 +284,15 @@ xfs_iomap_eof_want_preallocate(
return 0;
/*
+ * If the file is smaller than the minimum prealloc and we are using
+ * dynamic preallocation, don't do any preallocation at all as it is
+ * likely this is the only write to the file that is going to be done.
+ */
+ if (!(mp->m_flags & XFS_MOUNT_DFLT_IOSIZE) &&
+ XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_writeio_blocks))
+ return 0;
+
+ /*
* If there are any real blocks past eof, then don't
* do any speculative allocation.
*/
@@ -345,6 +354,10 @@ xfs_iomap_eof_prealloc_initial_size(
if (mp->m_flags & XFS_MOUNT_DFLT_IOSIZE)
return 0;
+ /* If the file is small, then use the minimum prealloc */
+ if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign))
+ return 0;
+
/*
* As we write multiple pages, the offset will always align to the
* start of a page and hence point to a hole at EOF. i.e. if the size is
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 04/60] xfs: don't use speculative prealloc for small files
2013-06-19 4:50 ` [PATCH 04/60] xfs: don't use speculative prealloc for small files Dave Chinner
@ 2013-06-19 12:59 ` Brian Foster
2013-06-20 19:31 ` Mark Tinguely
1 sibling, 0 replies; 95+ messages in thread
From: Brian Foster @ 2013-06-19 12:59 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/19/2013 12:50 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Dedicated small file workloads have been seeing significant free
> space fragmentation causing premature inode allocation failure
> when large inode sizes are in use. A particular test case showed
> that a workload that runs to a real ENOSPC on 256 byte inodes would
> fail inode allocation with ENOSPC about about 80% full with 512 byte
> inodes, and at about 50% full with 1024 byte inodes.
>
> The same workload, when run with -o allocsize=4096 on 1024 byte
> inodes would run to being 100% full before giving ENOSPC. That is,
> no freespace fragmentation at all.
>
> The issue was caused by the specific IO pattern the application had
> - the framework it was using did not support direct IO, and so it
> was emulating it by using fadvise(DONT_NEED). The result was that
> the data was getting written back before the speculative prealloc
> had been trimmed from memory by the close(), and so small single
> block files were being allocated with 2 blocks, and then having one
> truncated away. The result was lots of small 4k free space extents,
> and hence each new 8k allocation would take another 8k from
> contiguous free space and turn it into 4k of allocated space and 4k
> of free space.
>
> Hence inode allocation, which requires contiguous, aligned
> allocation of 16k (256 byte inodes), 32k (512 byte inodes) or 64k
> (1024 byte inodes) can fail to find sufficiently large freespace and
> hence fail while there is still lots of free space available.
>
> There's a simple fix for this, and one that has precendence in the
> allocator code already - don't do speculative allocation unless the
> size of the file is larger than a certain size. In this case, that
> size is the minimum default preallocation size:
> mp->m_writeio_blocks. And to keep with the concept of being nice to
> people when the files are still relatively small, cap the prealloc
> to mp->m_writeio_blocks until the file goes over a stripe unit is
> size, at which point we'll fall back to the current behaviour based
> on the last extent size.
>
> This will effectively turn off speculative prealloc for very small
> files, keep preallocation low for small files, and behave as it
> currently does for any file larger than a stripe unit. This
> completely avoids the freespace fragmentation problem this
> particular IO pattern was causing.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
Looks good.
Reviewed-by: Brian Foster <bfoster@redhat.com>
> fs/xfs/xfs_iomap.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 8f8aaee..6a70964 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -284,6 +284,15 @@ xfs_iomap_eof_want_preallocate(
> return 0;
>
> /*
> + * If the file is smaller than the minimum prealloc and we are using
> + * dynamic preallocation, don't do any preallocation at all as it is
> + * likely this is the only write to the file that is going to be done.
> + */
> + if (!(mp->m_flags & XFS_MOUNT_DFLT_IOSIZE) &&
> + XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_writeio_blocks))
> + return 0;
> +
> + /*
> * If there are any real blocks past eof, then don't
> * do any speculative allocation.
> */
> @@ -345,6 +354,10 @@ xfs_iomap_eof_prealloc_initial_size(
> if (mp->m_flags & XFS_MOUNT_DFLT_IOSIZE)
> return 0;
>
> + /* If the file is small, then use the minimum prealloc */
> + if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign))
> + return 0;
> +
> /*
> * As we write multiple pages, the offset will always align to the
> * start of a page and hence point to a hole at EOF. i.e. if the size is
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 04/60] xfs: don't use speculative prealloc for small files
2013-06-19 4:50 ` [PATCH 04/60] xfs: don't use speculative prealloc for small files Dave Chinner
2013-06-19 12:59 ` Brian Foster
@ 2013-06-20 19:31 ` Mark Tinguely
1 sibling, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-20 19:31 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> Dedicated small file workloads have been seeing significant free
> space fragmentation causing premature inode allocation failure
> when large inode sizes are in use. A particular test case showed
> that a workload that runs to a real ENOSPC on 256 byte inodes would
> fail inode allocation with ENOSPC about about 80% full with 512 byte
> inodes, and at about 50% full with 1024 byte inodes.
>
> The same workload, when run with -o allocsize=4096 on 1024 byte
> inodes would run to being 100% full before giving ENOSPC. That is,
> no freespace fragmentation at all.
>
> The issue was caused by the specific IO pattern the application had
> - the framework it was using did not support direct IO, and so it
> was emulating it by using fadvise(DONT_NEED). The result was that
> the data was getting written back before the speculative prealloc
> had been trimmed from memory by the close(), and so small single
> block files were being allocated with 2 blocks, and then having one
> truncated away. The result was lots of small 4k free space extents,
> and hence each new 8k allocation would take another 8k from
> contiguous free space and turn it into 4k of allocated space and 4k
> of free space.
>
> Hence inode allocation, which requires contiguous, aligned
> allocation of 16k (256 byte inodes), 32k (512 byte inodes) or 64k
> (1024 byte inodes) can fail to find sufficiently large freespace and
> hence fail while there is still lots of free space available.
>
> There's a simple fix for this, and one that has precendence in the
> allocator code already - don't do speculative allocation unless the
> size of the file is larger than a certain size. In this case, that
> size is the minimum default preallocation size:
> mp->m_writeio_blocks. And to keep with the concept of being nice to
> people when the files are still relatively small, cap the prealloc
> to mp->m_writeio_blocks until the file goes over a stripe unit is
> size, at which point we'll fall back to the current behaviour based
> on the last extent size.
>
> This will effectively turn off speculative prealloc for very small
> files, keep preallocation low for small files, and behave as it
> currently does for any file larger than a stripe unit. This
> completely avoids the freespace fragmentation problem this
> particular IO pattern was causing.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---
I agree with Brian, it looks good.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 05/60] xfs: don't do IO when creating an new inode
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (3 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 04/60] xfs: don't use speculative prealloc for small files Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-21 13:57 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 06/60] xfs: xfs_ifree doesn't need to modify the inode buffer Dave Chinner
` (56 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
When we are allocating a new inode, we read the inode cluster off
disk to increment the generation number. We are already using a
random generation number for newly allocated inodes, so if we are not
using the ikeep mode, we can just generate a new generation number
when we initialise the newly allocated inode.
This avoids the need for reading the inode buffer during inode
creation. This will speed up allocation of inodes in cold, partially
allocated clusters as they will no longer need to be read from disk
during allocation. It will also reduce the CPU overhead of inode
allocation by not having the process the buffer read, even on cache
hits.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 36 ++++++++++++++++++++++++++++--------
1 file changed, 28 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 7f7be5f..d1f76da 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1028,6 +1028,11 @@ xfs_dinode_calc_crc(
/*
* Read the disk inode attributes into the in-core inode structure.
+ *
+ * If we are initialising a new inode and we are not utilising the
+ * XFS_MOUNT_IKEEP inode cluster mode, we can simple build the new inode core
+ * with a random generation number. If we are keeping inodes around, we need to
+ * read the inode cluster to get the existing generation number off disk.
*/
int
xfs_iread(
@@ -1047,6 +1052,22 @@ xfs_iread(
if (error)
return error;
+ /* shortcut IO on inode allocation if possible */
+ if ((iget_flags & XFS_IGET_CREATE) &&
+ !(mp->m_flags & XFS_MOUNT_IKEEP)) {
+ /* initialise the on-disk inode core */
+ memset(&ip->i_d, 0, sizeof(ip->i_d));
+ ip->i_d.di_magic = XFS_DINODE_MAGIC;
+ ip->i_d.di_gen = prandom_u32();
+ if (xfs_sb_version_hascrc(&mp->m_sb)) {
+ ip->i_d.di_version = 3;
+ ip->i_d.di_ino = ip->i_ino;
+ uuid_copy(&ip->i_d.di_uuid, &mp->m_sb.sb_uuid);
+ } else
+ ip->i_d.di_version = 2;
+ return 0;
+ }
+
/*
* Get pointers to the on-disk inode and the buffer containing it.
*/
@@ -1133,17 +1154,16 @@ xfs_iread(
xfs_buf_set_ref(bp, XFS_INO_REF);
/*
- * Use xfs_trans_brelse() to release the buffer containing the
- * on-disk inode, because it was acquired with xfs_trans_read_buf()
- * in xfs_imap_to_bp() above. If tp is NULL, this is just a normal
+ * Use xfs_trans_brelse() to release the buffer containing the on-disk
+ * inode, because it was acquired with xfs_trans_read_buf() in
+ * xfs_imap_to_bp() above. If tp is NULL, this is just a normal
* brelse(). If we're within a transaction, then xfs_trans_brelse()
* will only release the buffer if it is not dirty within the
* transaction. It will be OK to release the buffer in this case,
- * because inodes on disk are never destroyed and we will be
- * locking the new in-core inode before putting it in the hash
- * table where other processes can find it. Thus we don't have
- * to worry about the inode being changed just because we released
- * the buffer.
+ * because inodes on disk are never destroyed and we will be locking the
+ * new in-core inode before putting it in the cache where other
+ * processes can find it. Thus we don't have to worry about the inode
+ * being changed just because we released the buffer.
*/
out_brelse:
xfs_trans_brelse(tp, bp);
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 05/60] xfs: don't do IO when creating an new inode
2013-06-19 4:50 ` [PATCH 05/60] xfs: don't do IO when creating an new inode Dave Chinner
@ 2013-06-21 13:57 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-21 13:57 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> When we are allocating a new inode, we read the inode cluster off
> disk to increment the generation number. We are already using a
> random generation number for newly allocated inodes, so if we are not
> using the ikeep mode, we can just generate a new generation number
> when we initialise the newly allocated inode.
>
> This avoids the need for reading the inode buffer during inode
> creation. This will speed up allocation of inodes in cold, partially
> allocated clusters as they will no longer need to be read from disk
> during allocation. It will also reduce the CPU overhead of inode
> allocation by not having the process the buffer read, even on cache
> hits.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---
Looks good.
As a side note, the comment in xfs_inode.h for the i_d:
xfs_icdinode_t i_d; /* most of ondisk inode */
In Linux 3.10, it is no longer just most.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 06/60] xfs: xfs_ifree doesn't need to modify the inode buffer
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (4 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 05/60] xfs: don't do IO when creating an new inode Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-21 21:24 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 07/60] xfs: Introduce ordered log vector support Dave Chinner
` (55 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Long ago, bulkstat used to read inodes directly fromteh backing
buffer for speed. This had the unfortunate problem of being cache
incoherent with unlinks, and so xfs_ifree() had to mark the inode
as free directly in the backing buffer. bulkstat was changed some
time ago to use inode cache coherent lookups, and so will never see
unlinked inodes in it's lookups. Hence xfs_ifree() does not need to
touch the inode backing buffer anymore.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 32 ++++----------------------------
1 file changed, 4 insertions(+), 28 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index d1f76da..9ecfe1e 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2048,8 +2048,6 @@ xfs_ifree(
int error;
int delete;
xfs_ino_t first_ino;
- xfs_dinode_t *dip;
- xfs_buf_t *ibp;
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
ASSERT(ip->i_d.di_nlink == 0);
@@ -2062,14 +2060,13 @@ xfs_ifree(
* Pull the on-disk inode from the AGI unlinked list.
*/
error = xfs_iunlink_remove(tp, ip);
- if (error != 0) {
+ if (error)
return error;
- }
error = xfs_difree(tp, ip->i_ino, flist, &delete, &first_ino);
- if (error != 0) {
+ if (error)
return error;
- }
+
ip->i_d.di_mode = 0; /* mark incore inode as free */
ip->i_d.di_flags = 0;
ip->i_d.di_dmevmask = 0;
@@ -2081,31 +2078,10 @@ xfs_ifree(
* by reincarnations of this inode.
*/
ip->i_d.di_gen++;
-
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
- error = xfs_imap_to_bp(ip->i_mount, tp, &ip->i_imap, &dip, &ibp,
- 0, 0);
- if (error)
- return error;
-
- /*
- * Clear the on-disk di_mode. This is to prevent xfs_bulkstat
- * from picking up this inode when it is reclaimed (its incore state
- * initialzed but not flushed to disk yet). The in-core di_mode is
- * already cleared and a corresponding transaction logged.
- * The hack here just synchronizes the in-core to on-disk
- * di_mode value in advance before the actual inode sync to disk.
- * This is OK because the inode is already unlinked and would never
- * change its di_mode again for this inode generation.
- * This is a temporary hack that would require a proper fix
- * in the future.
- */
- dip->di_mode = 0;
-
- if (delete) {
+ if (delete)
error = xfs_ifree_cluster(ip, tp, first_ino);
- }
return error;
}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 06/60] xfs: xfs_ifree doesn't need to modify the inode buffer
2013-06-19 4:50 ` [PATCH 06/60] xfs: xfs_ifree doesn't need to modify the inode buffer Dave Chinner
@ 2013-06-21 21:24 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-21 21:24 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> Long ago, bulkstat used to read inodes directly fromteh backing
> buffer for speed. This had the unfortunate problem of being cache
> incoherent with unlinks, and so xfs_ifree() had to mark the inode
> as free directly in the backing buffer. bulkstat was changed some
> time ago to use inode cache coherent lookups, and so will never see
> unlinked inodes in it's lookups. Hence xfs_ifree() does not need to
> touch the inode backing buffer anymore.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---
> fs/xfs/xfs_inode.c | 32 ++++----------------------------
looks good.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 07/60] xfs: Introduce ordered log vector support
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (5 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 06/60] xfs: xfs_ifree doesn't need to modify the inode buffer Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-22 17:26 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 08/60] xfs: Introduce an ordered buffer item Dave Chinner
` (54 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
And "ordered log vector" is a log vector that is used for
tracking a log item through the CIL and into the AIL as part of the
log checkpointing. These ordered log vectors are special in that
they are not written to to journal in any way, and are not accounted
to the checkpoint being written.
The reason for this behaviour is to allow operations to attach items
to transactions and have them follow the normal transactional
lifecycle without actually having to write them to the journal. This
allows logging of items that track high level logical changes and
writing them to the log, while the physical items being modified
pass through into the AIL and pin the tail of the log (and therefore
the logical item in the log) until all the modified items are
physically written to disk.
IOWs, it allows us to write metadata without physically logging
every individual change but still maintain the full transactional
integrity guarantees we currently have w.r.t. crash recovery.
This change modifies some of the CIL item insertion loops, as
ordered log vectors introduce some new constraints as they don't
track any data. One advantage of this change is that it combines
two log vector chain walks into a single pass, so there is less
overhead in the transaction commit pass as well. It also kills some
unused code in the log vector walk loop when committing the CIL.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 21 +++++++++++---
fs/xfs/xfs_log.h | 2 ++
fs/xfs/xfs_log_cil.c | 75 ++++++++++++++++++++++++++++++++++----------------
3 files changed, 70 insertions(+), 28 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index b345a7c..db08d34 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1963,6 +1963,10 @@ xlog_write_calc_vec_length(
headers++;
for (lv = log_vector; lv; lv = lv->lv_next) {
+ /* we don't write ordered log vectors */
+ if (lv->lv_buf_len == XFS_LOG_VEC_ORDERED)
+ continue;
+
headers += lv->lv_niovecs;
for (i = 0; i < lv->lv_niovecs; i++) {
@@ -2216,7 +2220,7 @@ xlog_write(
index = 0;
lv = log_vector;
vecp = lv->lv_iovecp;
- while (lv && index < lv->lv_niovecs) {
+ while (lv && (!lv->lv_niovecs || index < lv->lv_niovecs)) {
void *ptr;
int log_offset;
@@ -2236,13 +2240,21 @@ xlog_write(
* This loop writes out as many regions as can fit in the amount
* of space which was allocated by xlog_state_get_iclog_space().
*/
- while (lv && index < lv->lv_niovecs) {
- struct xfs_log_iovec *reg = &vecp[index];
+ while (lv && (!lv->lv_niovecs || index < lv->lv_niovecs)) {
+ struct xfs_log_iovec *reg;
struct xlog_op_header *ophdr;
int start_rec_copy;
int copy_len;
int copy_off;
+ bool ordered = false;
+
+ /* ordered log vectors have no regions to write */
+ if (lv->lv_buf_len == XFS_LOG_VEC_ORDERED) {
+ ordered = true;
+ goto next_lv;
+ }
+ reg = &vecp[index];
ASSERT(reg->i_len % sizeof(__int32_t) == 0);
ASSERT((unsigned long)ptr % sizeof(__int32_t) == 0);
@@ -2302,12 +2314,13 @@ xlog_write(
break;
if (++index == lv->lv_niovecs) {
+next_lv:
lv = lv->lv_next;
index = 0;
if (lv)
vecp = lv->lv_iovecp;
}
- if (record_cnt == 0) {
+ if (record_cnt == 0 && ordered == false) {
if (!lv)
return 0;
break;
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 5caee96..b20918c 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -105,6 +105,8 @@ struct xfs_log_vec {
int lv_buf_len; /* size of formatted buffer */
};
+#define XFS_LOG_VEC_ORDERED (-1)
+
/*
* Structure used to pass callback function and the function's argument
* to the log manager.
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index d0833b5..02b9cf3 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -127,6 +127,7 @@ xlog_cil_prepare_log_vecs(
int index;
int len = 0;
uint niovecs;
+ bool ordered = false;
/* Skip items which aren't dirty in this transaction. */
if (!(lidp->lid_flags & XFS_LID_DIRTY))
@@ -137,14 +138,30 @@ xlog_cil_prepare_log_vecs(
if (!niovecs)
continue;
+ /*
+ * Ordered items need to be tracked but we do not wish to write
+ * them. We need a logvec to track the object, but we do not
+ * need an iovec or buffer to be allocated for copying data.
+ */
+ if (niovecs == XFS_LOG_VEC_ORDERED) {
+ ordered = true;
+ niovecs = 0;
+ }
+
new_lv = kmem_zalloc(sizeof(*new_lv) +
niovecs * sizeof(struct xfs_log_iovec),
KM_SLEEP|KM_NOFS);
+ new_lv->lv_item = lidp->lid_item;
+ new_lv->lv_niovecs = niovecs;
+ if (ordered) {
+ /* track as an ordered logvec */
+ new_lv->lv_buf_len = XFS_LOG_VEC_ORDERED;
+ goto next;
+ }
+
/* The allocated iovec region lies beyond the log vector. */
new_lv->lv_iovecp = (struct xfs_log_iovec *)&new_lv[1];
- new_lv->lv_niovecs = niovecs;
- new_lv->lv_item = lidp->lid_item;
/* build the vector array and calculate it's length */
IOP_FORMAT(new_lv->lv_item, new_lv->lv_iovecp);
@@ -165,6 +182,7 @@ xlog_cil_prepare_log_vecs(
}
ASSERT(ptr == new_lv->lv_buf + new_lv->lv_buf_len);
+next:
if (!ret_lv)
ret_lv = new_lv;
else
@@ -191,8 +209,18 @@ xfs_cil_prepare_item(
if (old) {
/* existing lv on log item, space used is a delta */
- ASSERT(!list_empty(&lv->lv_item->li_cil));
- ASSERT(old->lv_buf && old->lv_buf_len && old->lv_niovecs);
+ ASSERT((old->lv_buf && old->lv_buf_len && old->lv_niovecs) ||
+ old->lv_buf_len == XFS_LOG_VEC_ORDERED);
+
+ /*
+ * If the new item is ordered, keep the old one that is already
+ * tracking dirty or ordered regions
+ */
+ if (lv->lv_buf_len == XFS_LOG_VEC_ORDERED) {
+ ASSERT(!lv->lv_buf);
+ kmem_free(lv);
+ return;
+ }
*len += lv->lv_buf_len - old->lv_buf_len;
*diff_iovecs += lv->lv_niovecs - old->lv_niovecs;
@@ -201,10 +229,11 @@ xfs_cil_prepare_item(
} else {
/* new lv, must pin the log item */
ASSERT(!lv->lv_item->li_lv);
- ASSERT(list_empty(&lv->lv_item->li_cil));
- *len += lv->lv_buf_len;
- *diff_iovecs += lv->lv_niovecs;
+ if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED) {
+ *len += lv->lv_buf_len;
+ *diff_iovecs += lv->lv_niovecs;
+ }
IOP_PIN(lv->lv_item);
}
@@ -259,18 +288,24 @@ xlog_cil_insert_items(
* We can do this safely because the context can't checkpoint until we
* are done so it doesn't matter exactly how we update the CIL.
*/
- for (lv = log_vector; lv; lv = lv->lv_next)
- xfs_cil_prepare_item(log, lv, &len, &diff_iovecs);
-
- /* account for space used by new iovec headers */
- len += diff_iovecs * sizeof(xlog_op_header_t);
-
spin_lock(&cil->xc_cil_lock);
+ for (lv = log_vector; lv; ) {
+ struct xfs_log_vec *next = lv->lv_next;
- /* move the items to the tail of the CIL */
- for (lv = log_vector; lv; lv = lv->lv_next)
+ ASSERT(lv->lv_item->li_lv || list_empty(&lv->lv_item->li_cil));
+ lv->lv_next = NULL;
+
+ /*
+ * xfs_cil_prepare_item() may free the lv, so move the item on
+ * the CIL first.
+ */
list_move_tail(&lv->lv_item->li_cil, &cil->xc_cil);
+ xfs_cil_prepare_item(log, lv, &len, &diff_iovecs);
+ lv = next;
+ }
+ /* account for space used by new iovec headers */
+ len += diff_iovecs * sizeof(xlog_op_header_t);
ctx->nvecs += diff_iovecs;
/*
@@ -381,9 +416,7 @@ xlog_cil_push(
struct xfs_cil_ctx *new_ctx;
struct xlog_in_core *commit_iclog;
struct xlog_ticket *tic;
- int num_lv;
int num_iovecs;
- int len;
int error = 0;
struct xfs_trans_header thdr;
struct xfs_log_iovec lhdr;
@@ -428,12 +461,9 @@ xlog_cil_push(
* side which is currently locked out by the flush lock.
*/
lv = NULL;
- num_lv = 0;
num_iovecs = 0;
- len = 0;
while (!list_empty(&cil->xc_cil)) {
struct xfs_log_item *item;
- int i;
item = list_first_entry(&cil->xc_cil,
struct xfs_log_item, li_cil);
@@ -444,11 +474,7 @@ xlog_cil_push(
lv->lv_next = item->li_lv;
lv = item->li_lv;
item->li_lv = NULL;
-
- num_lv++;
num_iovecs += lv->lv_niovecs;
- for (i = 0; i < lv->lv_niovecs; i++)
- len += lv->lv_iovecp[i].i_len;
}
/*
@@ -701,6 +727,7 @@ xfs_log_commit_cil(
if (commit_lsn)
*commit_lsn = log->l_cilp->xc_ctx->sequence;
+ /* xlog_cil_insert_items() destroys log_vector list */
xlog_cil_insert_items(log, log_vector, tp->t_ticket);
/* check we didn't blow the reservation */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 07/60] xfs: Introduce ordered log vector support
2013-06-19 4:50 ` [PATCH 07/60] xfs: Introduce ordered log vector support Dave Chinner
@ 2013-06-22 17:26 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-22 17:26 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> And "ordered log vector" is a log vector that is used for
> tracking a log item through the CIL and into the AIL as part of the
> log checkpointing. These ordered log vectors are special in that
> they are not written to to journal in any way, and are not accounted
> to the checkpoint being written.
>
> The reason for this behaviour is to allow operations to attach items
> to transactions and have them follow the normal transactional
> lifecycle without actually having to write them to the journal. This
> allows logging of items that track high level logical changes and
> writing them to the log, while the physical items being modified
> pass through into the AIL and pin the tail of the log (and therefore
> the logical item in the log) until all the modified items are
> physically written to disk.
>
> IOWs, it allows us to write metadata without physically logging
> every individual change but still maintain the full transactional
> integrity guarantees we currently have w.r.t. crash recovery.
>
> This change modifies some of the CIL item insertion loops, as
> ordered log vectors introduce some new constraints as they don't
> track any data. One advantage of this change is that it combines
> two log vector chain walks into a single pass, so there is less
> overhead in the transaction commit pass as well. It also kills some
> unused code in the log vector walk loop when committing the CIL.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---
> fs/xfs/xfs_log.c | 21 +++++++++++---
> fs/xfs/xfs_log.h | 2 ++
> fs/xfs/xfs_log_cil.c | 75 ++++++++++++++++++++++++++++++++++----------------
> 3 files changed, 70 insertions(+), 28 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
...
> @@ -2236,13 +2240,21 @@ xlog_write(
> * This loop writes out as many regions as can fit in the amount
> * of space which was allocated by xlog_state_get_iclog_space().
> */
> - while (lv&& index< lv->lv_niovecs) {
> - struct xfs_log_iovec *reg =&vecp[index];
> + while (lv&& (!lv->lv_niovecs || index< lv->lv_niovecs)) {
Be paranoid and make sure the log vector with no vectors owner is really
a "ordered" entry?
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 08/60] xfs: Introduce an ordered buffer item
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (6 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 07/60] xfs: Introduce ordered log vector support Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-23 17:27 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 09/60] xfs: Inode create log items Dave Chinner
` (53 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
If we have a buffer that we have modified but we do not wish to
physically log in a transaction (e.g. we've logged a logical
change), we still need to ensure that transactional integrity is
maintained. Hence we must not move the tail of the log past the
transaction that the buffer is associated with before the buffer is
written to disk.
This means these special buffers still need to be included in the
transaction and added to the AIL just like a normal buffer, but we
do not want the modifications to the buffer written into the
transaction. IOWs, what we want is an "ordered buffer" that
maintains the same transactional life cycle as a physically logged
buffer, just without the transcribing of the modifications to the
log.
Hence we need to flag the buffer as an "ordered buffer" to avoid
including it in vector size calculations or formatting during the
transaction. Once the transaction is committed, the buffer appears
for all intents to be the same as a physically logged buffer as it
transitions through the log and AIL.
Relogging will also work just fine for such an ordered buffer - the
logical transaction will be replayed before the subsequent
modifications that relog the buffer, so everything will be
reconstructed correctly by recovery.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_buf_item.c | 75 +++++++++++++++++++++++++++++++-----------------
fs/xfs/xfs_buf_item.h | 4 ++-
fs/xfs/xfs_trace.h | 4 +++
fs/xfs/xfs_trans.h | 1 +
fs/xfs/xfs_trans_buf.c | 34 ++++++++++++++++++++--
5 files changed, 87 insertions(+), 31 deletions(-)
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 4ec4317..61f6876 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -140,6 +140,16 @@ xfs_buf_item_size(
ASSERT(bip->bli_flags & XFS_BLI_LOGGED);
+ if (bip->bli_flags & XFS_BLI_ORDERED) {
+ /*
+ * The buffer has been logged just to order it.
+ * It is not being included in the transaction
+ * commit, so no vectors are used at all.
+ */
+ trace_xfs_buf_item_size_ordered(bip);
+ return XFS_LOG_VEC_ORDERED;
+ }
+
/*
* the vector count is based on the number of buffer vectors we have
* dirty bits in. This will only be greater than one when we have a
@@ -212,6 +222,7 @@ xfs_buf_item_format_segment(
goto out;
}
+
/*
* Fill in an iovec for each set of contiguous chunks.
*/
@@ -311,6 +322,16 @@ xfs_buf_item_format(
bip->bli_flags &= ~XFS_BLI_INODE_BUF;
}
+ if ((bip->bli_flags & (XFS_BLI_ORDERED|XFS_BLI_STALE)) ==
+ XFS_BLI_ORDERED) {
+ /*
+ * The buffer has been logged just to order it. It is not being
+ * included in the transaction commit, so don't format it.
+ */
+ trace_xfs_buf_item_format_ordered(bip);
+ return;
+ }
+
for (i = 0; i < bip->bli_format_count; i++) {
vecp = xfs_buf_item_format_segment(bip, vecp, offset,
&bip->bli_formats[i]);
@@ -340,6 +361,7 @@ xfs_buf_item_pin(
ASSERT(atomic_read(&bip->bli_refcount) > 0);
ASSERT((bip->bli_flags & XFS_BLI_LOGGED) ||
+ (bip->bli_flags & XFS_BLI_ORDERED) ||
(bip->bli_flags & XFS_BLI_STALE));
trace_xfs_buf_item_pin(bip);
@@ -512,8 +534,9 @@ xfs_buf_item_unlock(
{
struct xfs_buf_log_item *bip = BUF_ITEM(lip);
struct xfs_buf *bp = bip->bli_buf;
- int aborted, clean, i;
- uint hold;
+ bool clean;
+ bool aborted;
+ int flags;
/* Clear the buffer's association with this transaction. */
bp->b_transp = NULL;
@@ -524,23 +547,21 @@ xfs_buf_item_unlock(
* (cancelled) buffers at unpin time, but we'll never go through the
* pin/unpin cycle if we abort inside commit.
*/
- aborted = (lip->li_flags & XFS_LI_ABORTED) != 0;
-
+ aborted = (lip->li_flags & XFS_LI_ABORTED) ? true : false;
/*
- * Before possibly freeing the buf item, determine if we should
- * release the buffer at the end of this routine.
+ * Before possibly freeing the buf item, copy the per-transaction state
+ * so we can reference it safely later after clearing it from the
+ * buffer log item.
*/
- hold = bip->bli_flags & XFS_BLI_HOLD;
-
- /* Clear the per transaction state. */
- bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD);
+ flags = bip->bli_flags;
+ bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
/*
* If the buf item is marked stale, then don't do anything. We'll
* unlock the buffer and free the buf item when the buffer is unpinned
* for the last time.
*/
- if (bip->bli_flags & XFS_BLI_STALE) {
+ if (flags & XFS_BLI_STALE) {
trace_xfs_buf_item_unlock_stale(bip);
ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL);
if (!aborted) {
@@ -557,13 +578,19 @@ xfs_buf_item_unlock(
* be the only reference to the buf item, so we free it anyway
* regardless of whether it is dirty or not. A dirty abort implies a
* shutdown, anyway.
+ *
+ * Ordered buffers are dirty but may have no recorded changes, so ensure
+ * we only release clean items here.
*/
- clean = 1;
- for (i = 0; i < bip->bli_format_count; i++) {
- if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
- bip->bli_formats[i].blf_map_size)) {
- clean = 0;
- break;
+ clean = (flags & XFS_BLI_DIRTY) ? false : true;
+ if (clean) {
+ int i;
+ for (i = 0; i < bip->bli_format_count; i++) {
+ if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
+ bip->bli_formats[i].blf_map_size)) {
+ clean = false;
+ break;
+ }
}
}
if (clean)
@@ -576,7 +603,7 @@ xfs_buf_item_unlock(
} else
atomic_dec(&bip->bli_refcount);
- if (!hold)
+ if (!(flags & XFS_BLI_HOLD))
xfs_buf_relse(bp);
}
@@ -842,12 +869,6 @@ xfs_buf_item_log(
struct xfs_buf *bp = bip->bli_buf;
/*
- * Mark the item as having some dirty data for
- * quick reference in xfs_buf_item_dirty.
- */
- bip->bli_flags |= XFS_BLI_DIRTY;
-
- /*
* walk each buffer segment and mark them dirty appropriately.
*/
start = 0;
@@ -873,7 +894,7 @@ xfs_buf_item_log(
/*
- * Return 1 if the buffer has some data that has been logged (at any
+ * Return 1 if the buffer has been logged or ordered in a transaction (at any
* point, not just the current transaction) and 0 if not.
*/
uint
@@ -907,11 +928,11 @@ void
xfs_buf_item_relse(
xfs_buf_t *bp)
{
- xfs_buf_log_item_t *bip;
+ xfs_buf_log_item_t *bip = bp->b_fspriv;
trace_xfs_buf_item_relse(bp, _RET_IP_);
+ ASSERT(!(bip->bli_item.li_flags & XFS_LI_IN_AIL));
- bip = bp->b_fspriv;
bp->b_fspriv = bip->bli_item.li_bio_list;
if (bp->b_fspriv == NULL)
bp->b_iodone = NULL;
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 2573d2a..0f1c247 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -120,6 +120,7 @@ xfs_blft_from_flags(struct xfs_buf_log_format *blf)
#define XFS_BLI_INODE_ALLOC_BUF 0x10
#define XFS_BLI_STALE_INODE 0x20
#define XFS_BLI_INODE_BUF 0x40
+#define XFS_BLI_ORDERED 0x80
#define XFS_BLI_FLAGS \
{ XFS_BLI_HOLD, "HOLD" }, \
@@ -128,7 +129,8 @@ xfs_blft_from_flags(struct xfs_buf_log_format *blf)
{ XFS_BLI_LOGGED, "LOGGED" }, \
{ XFS_BLI_INODE_ALLOC_BUF, "INODE_ALLOC" }, \
{ XFS_BLI_STALE_INODE, "STALE_INODE" }, \
- { XFS_BLI_INODE_BUF, "INODE_BUF" }
+ { XFS_BLI_INODE_BUF, "INODE_BUF" }, \
+ { XFS_BLI_ORDERED, "ORDERED" }
#ifdef __KERNEL__
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index aa4db33..34422b6 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -486,9 +486,12 @@ DEFINE_EVENT(xfs_buf_item_class, name, \
TP_PROTO(struct xfs_buf_log_item *bip), \
TP_ARGS(bip))
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size);
+DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_ordered);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_stale);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format);
+DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_ordered);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_stale);
+DEFINE_BUF_ITEM_EVENT(xfs_buf_item_ordered);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_pin);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_unpin);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_unpin_stale);
@@ -508,6 +511,7 @@ DEFINE_BUF_ITEM_EVENT(xfs_trans_bjoin);
DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold);
DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold_release);
DEFINE_BUF_ITEM_EVENT(xfs_trans_binval);
+DEFINE_BUF_ITEM_EVENT(xfs_trans_buf_ordered);
DECLARE_EVENT_CLASS(xfs_lock_class,
TP_PROTO(struct xfs_inode *ip, unsigned lock_flags,
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index a44dba5..850fe78 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -503,6 +503,7 @@ void xfs_trans_bhold_release(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_binval(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_inode_buf(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *);
+void xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
void xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 73a5fa4..aa5a04b 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -397,7 +397,6 @@ shutdown_abort:
return XFS_ERROR(EIO);
}
-
/*
* Release the buffer bp which was previously acquired with one of the
* xfs_trans_... buffer allocation routines if the buffer has not
@@ -603,8 +602,14 @@ xfs_trans_log_buf(xfs_trans_t *tp,
tp->t_flags |= XFS_TRANS_DIRTY;
bip->bli_item.li_desc->lid_flags |= XFS_LID_DIRTY;
- bip->bli_flags |= XFS_BLI_LOGGED;
- xfs_buf_item_log(bip, first, last);
+
+ /*
+ * If we have an ordered buffer we are not logging any dirty range but
+ * it still needs to be marked dirty and that it has been logged.
+ */
+ bip->bli_flags |= XFS_BLI_DIRTY | XFS_BLI_LOGGED;
+ if (!(bip->bli_flags & XFS_BLI_ORDERED))
+ xfs_buf_item_log(bip, first, last);
}
@@ -757,6 +762,29 @@ xfs_trans_inode_alloc_buf(
}
/*
+ * Mark the buffer as ordered for this transaction. This means
+ * that the contents of the buffer are not recorded in the transaction
+ * but it is tracked in the AIL as though it was. This allows us
+ * to record logical changes in transactions rather than the physical
+ * changes we make to the buffer without changing writeback ordering
+ * constraints of metadata buffers.
+ */
+void
+xfs_trans_ordered_buf(
+ struct xfs_trans *tp,
+ struct xfs_buf *bp)
+{
+ struct xfs_buf_log_item *bip = bp->b_fspriv;
+
+ ASSERT(bp->b_transp == tp);
+ ASSERT(bip != NULL);
+ ASSERT(atomic_read(&bip->bli_refcount) > 0);
+
+ bip->bli_flags |= XFS_BLI_ORDERED;
+ trace_xfs_buf_item_ordered(bip);
+}
+
+/*
* Set the type of the buffer for log recovery so that it can correctly identify
* and hence attach the correct buffer ops to the buffer after replay.
*/
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 08/60] xfs: Introduce an ordered buffer item
2013-06-19 4:50 ` [PATCH 08/60] xfs: Introduce an ordered buffer item Dave Chinner
@ 2013-06-23 17:27 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-23 17:27 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> If we have a buffer that we have modified but we do not wish to
> physically log in a transaction (e.g. we've logged a logical
> change), we still need to ensure that transactional integrity is
> maintained. Hence we must not move the tail of the log past the
> transaction that the buffer is associated with before the buffer is
> written to disk.
>
> This means these special buffers still need to be included in the
> transaction and added to the AIL just like a normal buffer, but we
> do not want the modifications to the buffer written into the
> transaction. IOWs, what we want is an "ordered buffer" that
> maintains the same transactional life cycle as a physically logged
> buffer, just without the transcribing of the modifications to the
> log.
>
> Hence we need to flag the buffer as an "ordered buffer" to avoid
> including it in vector size calculations or formatting during the
> transaction. Once the transaction is committed, the buffer appears
> for all intents to be the same as a physically logged buffer as it
> transitions through the log and AIL.
>
> Relogging will also work just fine for such an ordered buffer - the
> logical transaction will be replayed before the subsequent
> modifications that relog the buffer, so everything will be
> reconstructed correctly by recovery.
>
> Signed-off-by: Dave Chinner<david@fromorbit.com>
> ---
Looks good.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 09/60] xfs: Inode create log items
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (7 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 08/60] xfs: Introduce an ordered buffer item Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-22 15:49 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 10/60] xfs: Inode create transaction reservations Dave Chinner
` (52 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
Introduce the inode create log item type for logical inode create logging.
Instead of logging the changes in buffers, pass the range to be
initialised through the log by a new transaction type. This reduces
the amount of log space required to record initialisation during
allocation from about 128 bytes per inode to a small fixed amount
per inode extent to be initialised.
This requires a new log item type to track it through the log
and the AIL. This is a relatively simple item - most callbacks are
noops as this item has the same life cycle as the transaction.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_icreate_item.c | 195 +++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_icreate_item.h | 52 ++++++++++++
fs/xfs/xfs_log.h | 3 +-
fs/xfs/xfs_super.c | 8 ++
fs/xfs/xfs_trans.h | 4 +-
6 files changed, 261 insertions(+), 2 deletions(-)
create mode 100644 fs/xfs/xfs_icreate_item.c
create mode 100644 fs/xfs/xfs_icreate_item.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 6313b69..4a45080 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -71,6 +71,7 @@ xfs-y += xfs_alloc.o \
xfs_dir2_sf.o \
xfs_ialloc.o \
xfs_ialloc_btree.o \
+ xfs_icreate_item.o \
xfs_inode.o \
xfs_log_recover.o \
xfs_mount.o \
diff --git a/fs/xfs/xfs_icreate_item.c b/fs/xfs/xfs_icreate_item.c
new file mode 100644
index 0000000..7716a4e
--- /dev/null
+++ b/fs/xfs/xfs_icreate_item.c
@@ -0,0 +1,195 @@
+/*
+ * Copyright (c) 2008-2010, 2013 Dave Chinner
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_inum.h"
+#include "xfs_trans.h"
+#include "xfs_buf_item.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_dir2.h"
+#include "xfs_mount.h"
+#include "xfs_trans_priv.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_attr_sf.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_inode_item.h"
+#include "xfs_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_error.h"
+#include "xfs_icreate_item.h"
+
+kmem_zone_t *xfs_icreate_zone; /* inode create item zone */
+
+static inline struct xfs_icreate_item *ICR_ITEM(struct xfs_log_item *lip)
+{
+ return container_of(lip, struct xfs_icreate_item, ic_item);
+}
+
+/*
+ * This returns the number of iovecs needed to log the given inode item.
+ *
+ * We only need one iovec for the icreate log structure.
+ */
+STATIC uint
+xfs_icreate_item_size(
+ struct xfs_log_item *lip)
+{
+ return 1;
+}
+
+/*
+ * This is called to fill in the vector of log iovecs for the
+ * given inode create log item.
+ */
+STATIC void
+xfs_icreate_item_format(
+ struct xfs_log_item *lip,
+ struct xfs_log_iovec *log_vector)
+{
+ struct xfs_icreate_item *icp = ICR_ITEM(lip);
+
+ log_vector->i_addr = (xfs_caddr_t)&icp->ic_format;
+ log_vector->i_len = sizeof(struct xfs_icreate_log);
+ log_vector->i_type = XLOG_REG_TYPE_ICREATE;
+}
+
+
+/* Pinning has no meaning for the create item, so just return. */
+STATIC void
+xfs_icreate_item_pin(
+ struct xfs_log_item *lip)
+{
+}
+
+
+/* pinning has no meaning for the create item, so just return. */
+STATIC void
+xfs_icreate_item_unpin(
+ struct xfs_log_item *lip,
+ int remove)
+{
+}
+
+STATIC void
+xfs_icreate_item_unlock(
+ struct xfs_log_item *lip)
+{
+ struct xfs_icreate_item *icp = ICR_ITEM(lip);
+
+ if (icp->ic_item.li_flags & XFS_LI_ABORTED)
+ kmem_zone_free(xfs_icreate_zone, icp);
+ return;
+}
+
+/*
+ * Because we have ordered buffers being tracked in the AIL for the inode
+ * creation, we don't need the create item after this. Hence we can free
+ * the log item and return -1 to tell the caller we're done with the item.
+ */
+STATIC xfs_lsn_t
+xfs_icreate_item_committed(
+ struct xfs_log_item *lip,
+ xfs_lsn_t lsn)
+{
+ struct xfs_icreate_item *icp = ICR_ITEM(lip);
+
+ kmem_zone_free(xfs_icreate_zone, icp);
+ return (xfs_lsn_t)-1;
+}
+
+/* item can never get into the AIL */
+STATIC uint
+xfs_icreate_item_push(
+ struct xfs_log_item *lip,
+ struct list_head *buffer_list)
+{
+ ASSERT(0);
+ return XFS_ITEM_SUCCESS;
+}
+
+/* Ordered buffers do the dependency tracking here, so this does nothing. */
+STATIC void
+xfs_icreate_item_committing(
+ struct xfs_log_item *lip,
+ xfs_lsn_t lsn)
+{
+}
+
+/*
+ * This is the ops vector shared by all buf log items.
+ */
+static struct xfs_item_ops xfs_icreate_item_ops = {
+ .iop_size = xfs_icreate_item_size,
+ .iop_format = xfs_icreate_item_format,
+ .iop_pin = xfs_icreate_item_pin,
+ .iop_unpin = xfs_icreate_item_unpin,
+ .iop_push = xfs_icreate_item_push,
+ .iop_unlock = xfs_icreate_item_unlock,
+ .iop_committed = xfs_icreate_item_committed,
+ .iop_committing = xfs_icreate_item_committing,
+};
+
+
+/*
+ * Initialize the inode log item for a newly allocated (in-core) inode.
+ *
+ * Inode extents can only reside within an AG. Hence specify the starting
+ * block for the inode chunk by offset within an AG as well as the
+ * length of the allocated extent.
+ *
+ * This joins the item to the transaction and marks it dirty so
+ * that we don't need a separate call to do this, nor does the
+ * caller need to know anything about the icreate item.
+ */
+void
+xfs_icreate_log(
+ struct xfs_trans *tp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t agbno,
+ unsigned int count,
+ unsigned int inode_size,
+ xfs_agblock_t length,
+ unsigned int generation)
+{
+ struct xfs_icreate_item *icp;
+
+ icp = kmem_zone_zalloc(xfs_icreate_zone, KM_SLEEP);
+
+ xfs_log_item_init(tp->t_mountp, &icp->ic_item, XFS_LI_ICREATE,
+ &xfs_icreate_item_ops);
+
+ icp->ic_format.icl_type = XFS_LI_ICREATE;
+ icp->ic_format.icl_size = 1; /* single vector */
+ icp->ic_format.icl_ag = cpu_to_be32(agno);
+ icp->ic_format.icl_agbno = cpu_to_be32(agbno);
+ icp->ic_format.icl_count = cpu_to_be32(count);
+ icp->ic_format.icl_isize = cpu_to_be32(inode_size);
+ icp->ic_format.icl_length = cpu_to_be32(length);
+ icp->ic_format.icl_gen = cpu_to_be32(generation);
+
+ xfs_trans_add_item(tp, &icp->ic_item);
+ tp->t_flags |= XFS_TRANS_DIRTY;
+ icp->ic_item.li_desc->lid_flags |= XFS_LID_DIRTY;
+}
diff --git a/fs/xfs/xfs_icreate_item.h b/fs/xfs/xfs_icreate_item.h
new file mode 100644
index 0000000..88ba8aa
--- /dev/null
+++ b/fs/xfs/xfs_icreate_item.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (c) 2008-2010, Dave Chinner
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef XFS_ICREATE_ITEM_H
+#define XFS_ICREATE_ITEM_H 1
+
+/*
+ * on disk log item structure
+ *
+ * Log recovery assumes the first two entries are the type and size and they fit
+ * in 32 bits. Also in host order (ugh) so they have to be 32 bit aligned so
+ * decoding can be done correctly.
+ */
+struct xfs_icreate_log {
+ __uint16_t icl_type; /* type of log format structure */
+ __uint16_t icl_size; /* size of log format structure */
+ __be32 icl_ag; /* ag being allocated in */
+ __be32 icl_agbno; /* start block of inode range */
+ __be32 icl_count; /* number of inodes to initialise */
+ __be32 icl_isize; /* size of inodes */
+ __be32 icl_length; /* length of extent to initialise */
+ __be32 icl_gen; /* inode generation number to use */
+};
+
+/* in memory log item structure */
+struct xfs_icreate_item {
+ struct xfs_log_item ic_item;
+ struct xfs_icreate_log ic_format;
+};
+
+extern kmem_zone_t *xfs_icreate_zone; /* inode create item zone */
+
+void xfs_icreate_log(struct xfs_trans *tp, xfs_agnumber_t agno,
+ xfs_agblock_t agbno, unsigned int count,
+ unsigned int inode_size, xfs_agblock_t length,
+ unsigned int generation);
+
+#endif /* XFS_ICREATE_ITEM_H */
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index b20918c..fb630e4 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -88,7 +88,8 @@ static inline xfs_lsn_t _lsn_cmp(xfs_lsn_t lsn1, xfs_lsn_t lsn2)
#define XLOG_REG_TYPE_UNMOUNT 17
#define XLOG_REG_TYPE_COMMIT 18
#define XLOG_REG_TYPE_TRANSHDR 19
-#define XLOG_REG_TYPE_MAX 19
+#define XLOG_REG_TYPE_ICREATE 20
+#define XLOG_REG_TYPE_MAX 20
typedef struct xfs_log_iovec {
void *i_addr; /* beginning address of region */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 3033ba5..f59e27f 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -51,6 +51,7 @@
#include "xfs_inode_item.h"
#include "xfs_icache.h"
#include "xfs_trace.h"
+#include "xfs_icreate_item.h"
#include <linux/namei.h>
#include <linux/init.h>
@@ -1655,9 +1656,15 @@ xfs_init_zones(void)
KM_ZONE_SPREAD, NULL);
if (!xfs_ili_zone)
goto out_destroy_inode_zone;
+ xfs_icreate_zone = kmem_zone_init(sizeof(struct xfs_icreate_item),
+ "xfs_icr");
+ if (!xfs_icreate_zone)
+ goto out_destroy_ili_zone;
return 0;
+ out_destroy_ili_zone:
+ kmem_zone_destroy(xfs_ili_zone);
out_destroy_inode_zone:
kmem_zone_destroy(xfs_inode_zone);
out_destroy_efi_zone:
@@ -1696,6 +1703,7 @@ xfs_destroy_zones(void)
* destroy caches.
*/
rcu_barrier();
+ kmem_zone_destroy(xfs_icreate_zone);
kmem_zone_destroy(xfs_ili_zone);
kmem_zone_destroy(xfs_inode_zone);
kmem_zone_destroy(xfs_efi_zone);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 850fe78..079d5a0 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -48,6 +48,7 @@ typedef struct xfs_trans_header {
#define XFS_LI_BUF 0x123c /* v2 bufs, variable sized inode bufs */
#define XFS_LI_DQUOT 0x123d
#define XFS_LI_QUOTAOFF 0x123e
+#define XFS_LI_ICREATE 0x123f
#define XFS_LI_TYPE_DESC \
{ XFS_LI_EFI, "XFS_LI_EFI" }, \
@@ -107,7 +108,8 @@ typedef struct xfs_trans_header {
#define XFS_TRANS_SWAPEXT 40
#define XFS_TRANS_SB_COUNT 41
#define XFS_TRANS_CHECKPOINT 42
-#define XFS_TRANS_TYPE_MAX 42
+#define XFS_TRANS_ICREATE 43
+#define XFS_TRANS_TYPE_MAX 43
/* new transaction types need to be reflected in xfs_logprint(8) */
#define XFS_TRANS_TYPES \
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 09/60] xfs: Inode create log items
2013-06-19 4:50 ` [PATCH 09/60] xfs: Inode create log items Dave Chinner
@ 2013-06-22 15:49 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-22 15:49 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> +/*
> + * on disk log item structure
> + *
> + * Log recovery assumes the first two entries are the type and size and they fit
> + * in 32 bits. Also in host order (ugh) so they have to be 32 bit aligned so
> + * decoding can be done correctly.
> + */
> +struct xfs_icreate_log {
> + __uint16_t icl_type; /* type of log format structure */
> + __uint16_t icl_size; /* size of log format structure */
> + __be32 icl_ag; /* ag being allocated in */
> + __be32 icl_agbno; /* start block of inode range */
> + __be32 icl_count; /* number of inodes to initialise */
> + __be32 icl_isize; /* size of inodes */
> + __be32 icl_length; /* length of extent to initialise */
> + __be32 icl_gen; /* inode generation number to use */
> +};
> +
> +/* in memory log item structure */
> +struct xfs_icreate_item {
> + struct xfs_log_item ic_item;
> + struct xfs_icreate_log ic_format;
> +};
> +
Just a nit: I assume the name was getting a bit long, but having _format
in the name tells the structure purpose. I know, I know, the
xfs_icreate_item uses format in the "ic_format" variable name.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 10/60] xfs: Inode create transaction reservations
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (8 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 09/60] xfs: Inode create log items Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-23 17:29 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 11/60] xfs: Inode create item recovery Dave Chinner
` (51 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
Define the log and space transaction sizes. Factor the current
create log reservation macro into the two logical halves and reuse
one half for the new icreate transactions. The icreate transaction
is transparent to all the high level create code - the
pre-calculated reservations will correctly set the reservations
dependent on whether the filesystem supports the icreate
transaction.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_trans.c | 118 ++++++++++++++++++++++++++++++++++------------------
1 file changed, 77 insertions(+), 41 deletions(-)
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 2fd7c1f..35a2299 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -234,71 +234,93 @@ xfs_calc_remove_reservation(
}
/*
- * For symlink we can modify:
+ * For create, break it in to the two cases that the transaction
+ * covers. We start with the modify case - allocation done by modification
+ * of the state of existing inodes - and the allocation case.
+ */
+
+/*
+ * For create we can modify:
* the parent directory inode: inode size
* the new inode: inode size
- * the inode btree entry: 1 block
+ * the inode btree entry: block size
+ * the superblock for the nlink flag: sector size
* the directory btree: (max depth + v2) * dir block size
* the directory inode's bmap btree: (max depth + v2) * block size
- * the blocks for the symlink: 1 kB
- * Or in the first xact we allocate some inodes giving:
+ */
+STATIC uint
+xfs_calc_create_resv_modify(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+ (uint)XFS_FSB_TO_B(mp, 1) +
+ xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
+}
+
+/*
+ * For create we can allocate some inodes giving:
* the agi and agf of the ag getting the new inodes: 2 * sectorsize
+ * the superblock for the nlink flag: sector size
* the inode blocks allocated: XFS_IALLOC_BLOCKS * blocksize
* the inode btree: max depth * blocksize
- * the allocation btrees: 2 trees * (2 * max depth - 1) * block size
+ * the allocation btrees: 2 trees * (max depth - 1) * block size
*/
STATIC uint
-xfs_calc_symlink_reservation(
+xfs_calc_create_resv_alloc(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+ mp->m_sb.sb_sectsize +
+ xfs_calc_buf_res(XFS_IALLOC_BLOCKS(mp), XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+STATIC uint
+__xfs_calc_create_reservation(
struct xfs_mount *mp)
{
return XFS_DQUOT_LOGRES(mp) +
- MAX((xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(1, 1024)),
- (xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_IALLOC_BLOCKS(mp),
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(mp->m_in_maxlevels,
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1))));
+ MAX(xfs_calc_create_resv_alloc(mp),
+ xfs_calc_create_resv_modify(mp));
}
/*
- * For create we can modify:
- * the parent directory inode: inode size
- * the new inode: inode size
- * the inode btree entry: block size
- * the superblock for the nlink flag: sector size
- * the directory btree: (max depth + v2) * dir block size
- * the directory inode's bmap btree: (max depth + v2) * block size
- * Or in the first xact we allocate some inodes giving:
+ * For icreate we can allocate some inodes giving:
* the agi and agf of the ag getting the new inodes: 2 * sectorsize
* the superblock for the nlink flag: sector size
- * the inode blocks allocated: XFS_IALLOC_BLOCKS * blocksize
* the inode btree: max depth * blocksize
* the allocation btrees: 2 trees * (max depth - 1) * block size
*/
STATIC uint
-xfs_calc_create_reservation(
+xfs_calc_icreate_resv_alloc(
struct xfs_mount *mp)
{
+ return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+ mp->m_sb.sb_sectsize +
+ xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+STATIC uint
+xfs_calc_icreate_reservation(xfs_mount_t *mp)
+{
return XFS_DQUOT_LOGRES(mp) +
- MAX((xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
- (uint)XFS_FSB_TO_B(mp, 1) +
- xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
- XFS_FSB_TO_B(mp, 1))),
- (xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
- mp->m_sb.sb_sectsize +
- xfs_calc_buf_res(XFS_IALLOC_BLOCKS(mp),
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(mp->m_in_maxlevels,
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1))));
+ MAX(xfs_calc_icreate_resv_alloc(mp),
+ xfs_calc_create_resv_modify(mp));
+}
+
+STATIC uint
+xfs_calc_create_reservation(
+ struct xfs_mount *mp)
+{
+ if (xfs_sb_version_hascrc(&mp->m_sb))
+ return xfs_calc_icreate_reservation(mp);
+ return __xfs_calc_create_reservation(mp);
+
}
/*
@@ -311,6 +333,20 @@ xfs_calc_mkdir_reservation(
return xfs_calc_create_reservation(mp);
}
+
+/*
+ * Making a new symplink is the same as creating a new file, but
+ * with the added blocks for remote symlink data which can be up to 1kB in
+ * length (MAXPATHLEN).
+ */
+STATIC uint
+xfs_calc_symlink_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_create_reservation(mp) +
+ xfs_calc_buf_res(1, MAXPATHLEN);
+}
+
/*
* In freeing an inode we can modify:
* the inode being freed: inode size
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 10/60] xfs: Inode create transaction reservations
2013-06-19 4:50 ` [PATCH 10/60] xfs: Inode create transaction reservations Dave Chinner
@ 2013-06-23 17:29 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-23 17:29 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> Define the log and space transaction sizes. Factor the current
> create log reservation macro into the two logical halves and reuse
> one half for the new icreate transactions. The icreate transaction
> is transparent to all the high level create code - the
> pre-calculated reservations will correctly set the reservations
> dependent on whether the filesystem supports the icreate
> transaction.
>
> Signed-off-by: Dave Chinner<david@fromorbit.com>
> ---
Looks good.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 11/60] xfs: Inode create item recovery
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (9 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 10/60] xfs: Inode create transaction reservations Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-24 14:37 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 12/60] xfs: Use inode create transaction Dave Chinner
` (50 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
When we find a icreate transaction, we need to get and initialise
the buffers in the range that has been passed. Extract and verify
the information in the item record, then loop over the range
initialising and issuing the buffer writes delayed.
Support an arbitrary size range to initialise so that in
future when we allocate inodes in much larger chunks all kernels
that understand this transaction can still recover them.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_ialloc.c | 37 +++++++++++----
fs/xfs/xfs_ialloc.h | 8 ++++
fs/xfs/xfs_log_recover.c | 114 ++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 145 insertions(+), 14 deletions(-)
diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index c8f5ae1..d5ef81a 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -150,12 +150,16 @@ xfs_check_agi_freecount(
#endif
/*
- * Initialise a new set of inodes.
+ * Initialise a new set of inodes. When called without a transaction context
+ * (e.g. from recovery) we initiate a delayed write of the inode buffers rather
+ * than logging them (which in a transaction context puts them into the AIL
+ * for writeback rather than the xfsbufd queue).
*/
STATIC int
xfs_ialloc_inode_init(
struct xfs_mount *mp,
struct xfs_trans *tp,
+ struct list_head *buffer_list,
xfs_agnumber_t agno,
xfs_agblock_t agbno,
xfs_agblock_t length,
@@ -247,18 +251,33 @@ xfs_ialloc_inode_init(
ino++;
uuid_copy(&free->di_uuid, &mp->m_sb.sb_uuid);
xfs_dinode_calc_crc(mp, free);
- } else {
+ } else if (tp) {
/* just log the inode core */
xfs_trans_log_buf(tp, fbuf, ioffset,
ioffset + isize - 1);
}
}
- if (version == 3) {
- /* need to log the entire buffer */
- xfs_trans_log_buf(tp, fbuf, 0,
- BBTOB(fbuf->b_length) - 1);
+
+ if (tp) {
+ /*
+ * Mark the buffer as an inode allocation buffer so it
+ * sticks in AIL at the point of this allocation
+ * transaction. This ensures the they are on disk before
+ * the tail of the log can be moved past this
+ * transaction (i.e. by preventing relogging from moving
+ * it forward in the log).
+ */
+ xfs_trans_inode_alloc_buf(tp, fbuf);
+ if (version == 3) {
+ /* need to log the entire buffer */
+ xfs_trans_log_buf(tp, fbuf, 0,
+ BBTOB(fbuf->b_length) - 1);
+ }
+ } else {
+ fbuf->b_flags |= XBF_DONE;
+ xfs_buf_delwri_queue(fbuf, buffer_list);
+ xfs_buf_relse(fbuf);
}
- xfs_trans_inode_alloc_buf(tp, fbuf);
}
return 0;
}
@@ -303,7 +322,7 @@ xfs_ialloc_ag_alloc(
* First try to allocate inodes contiguous with the last-allocated
* chunk of inodes. If the filesystem is striped, this will fill
* an entire stripe unit with inodes.
- */
+ */
agi = XFS_BUF_TO_AGI(agbp);
newino = be32_to_cpu(agi->agi_newino);
agno = be32_to_cpu(agi->agi_seqno);
@@ -402,7 +421,7 @@ xfs_ialloc_ag_alloc(
* rather than a linear progression to prevent the next generation
* number from being easily guessable.
*/
- error = xfs_ialloc_inode_init(args.mp, tp, agno, args.agbno,
+ error = xfs_ialloc_inode_init(args.mp, tp, NULL, agno, args.agbno,
args.len, prandom_u32());
if (error)
diff --git a/fs/xfs/xfs_ialloc.h b/fs/xfs/xfs_ialloc.h
index c8da3df..68c0732 100644
--- a/fs/xfs/xfs_ialloc.h
+++ b/fs/xfs/xfs_ialloc.h
@@ -150,6 +150,14 @@ int xfs_inobt_lookup(struct xfs_btree_cur *cur, xfs_agino_t ino,
int xfs_inobt_get_rec(struct xfs_btree_cur *cur,
xfs_inobt_rec_incore_t *rec, int *stat);
+/*
+ * Inode chunk initialisation routine
+ */
+int xfs_ialloc_inode_init(struct xfs_mount *mp, struct xfs_trans *tp,
+ struct list_head *buffer_list,
+ xfs_agnumber_t agno, xfs_agblock_t agbno,
+ xfs_agblock_t length, unsigned int gen);
+
extern const struct xfs_buf_ops xfs_agi_buf_ops;
#endif /* __XFS_IALLOC_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 7cf5e4e..6fcc910a 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -45,6 +45,7 @@
#include "xfs_cksum.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
+#include "xfs_icreate_item.h"
/* Need all the magic numbers and buffer ops structures from these headers */
#include "xfs_symlink.h"
@@ -1617,7 +1618,10 @@ xlog_recover_add_to_trans(
* form the cancelled buffer table. Hence they have tobe done last.
*
* 3. Inode allocation buffers must be replayed before inode items that
- * read the buffer and replay changes into it.
+ * read the buffer and replay changes into it. For filesystems using the
+ * ICREATE transactions, this means XFS_LI_ICREATE objects need to get
+ * treated the same as inode allocation buffers as they create and
+ * initialise the buffers directly.
*
* 4. Inode unlink buffers must be replayed after inode items are replayed.
* This ensures that inodes are completely flushed to the inode buffer
@@ -1632,10 +1636,17 @@ xlog_recover_add_to_trans(
* from all the other buffers and move them to last.
*
* Hence, 4 lists, in order from head to tail:
- * - buffer_list for all buffers except cancelled/inode unlink buffers
- * - item_list for all non-buffer items
- * - inode_buffer_list for inode unlink buffers
- * - cancel_list for the cancelled buffers
+ * - buffer_list for all buffers except cancelled/inode unlink buffers
+ * - item_list for all non-buffer items
+ * - inode_buffer_list for inode unlink buffers
+ * - cancel_list for the cancelled buffers
+ *
+ * Note that we add objects to the tail of the lists so that first-to-last
+ * ordering is preserved within the lists. Adding objects to the head of the
+ * list means when we traverse from the head we walk them in last-to-first
+ * order. For cancelled buffers and inode unlink buffers this doesn't matter,
+ * but for all other items there may be specific ordering that we need to
+ * preserve.
*/
STATIC int
xlog_recover_reorder_trans(
@@ -1655,6 +1666,9 @@ xlog_recover_reorder_trans(
xfs_buf_log_format_t *buf_f = item->ri_buf[0].i_addr;
switch (ITEM_TYPE(item)) {
+ case XFS_LI_ICREATE:
+ list_move_tail(&item->ri_list, &buffer_list);
+ break;
case XFS_LI_BUF:
if (buf_f->blf_flags & XFS_BLF_CANCEL) {
trace_xfs_log_recover_item_reorder_head(log,
@@ -2982,6 +2996,93 @@ xlog_recover_efd_pass2(
}
/*
+ * This routine is called when an inode create format structure is found in a
+ * committed transaction in the log. It's purpose is to initialise the inodes
+ * being allocated on disk. This requires us to get inode cluster buffers that
+ * match the range to be intialised, stamped with inode templates and written
+ * by delayed write so that subsequent modifications will hit the cached buffer
+ * and only need writing out at the end of recovery.
+ */
+STATIC int
+xlog_recover_do_icreate_pass2(
+ struct xlog *log,
+ struct list_head *buffer_list,
+ xlog_recover_item_t *item)
+{
+ struct xfs_mount *mp = log->l_mp;
+ struct xfs_icreate_log *icl;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ unsigned int count;
+ unsigned int isize;
+ xfs_agblock_t length;
+
+ icl = (struct xfs_icreate_log *)item->ri_buf[0].i_addr;
+ if (icl->icl_type != XFS_LI_ICREATE) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad type");
+ return EINVAL;
+ }
+
+ if (icl->icl_size != 1) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad icl size");
+ return EINVAL;
+ }
+
+ agno = be32_to_cpu(icl->icl_ag);
+ if (agno >= mp->m_sb.sb_agcount) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad agno");
+ return EINVAL;
+ }
+ agbno = be32_to_cpu(icl->icl_agbno);
+ if (!agbno || agbno == NULLAGBLOCK || agbno >= mp->m_sb.sb_agblocks) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad agbno");
+ return EINVAL;
+ }
+ isize = be32_to_cpu(icl->icl_isize);
+ if (isize != mp->m_sb.sb_inodesize) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad isize");
+ return EINVAL;
+ }
+ count = be32_to_cpu(icl->icl_count);
+ if (!count) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad count");
+ return EINVAL;
+ }
+ length = be32_to_cpu(icl->icl_length);
+ if (!length || length >= mp->m_sb.sb_agblocks) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad length");
+ return EINVAL;
+ }
+
+ /* existing allocation is fixed value */
+ ASSERT(count == XFS_IALLOC_INODES(mp));
+ ASSERT(length == XFS_IALLOC_BLOCKS(mp));
+ if (count != XFS_IALLOC_INODES(mp) ||
+ length != XFS_IALLOC_BLOCKS(mp)) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad count 2");
+ return EINVAL;
+ }
+
+ /*
+ * Inode buffers can be freed. Do not replay the inode initialisation as
+ * we could be overwriting something written after this inode buffer was
+ * cancelled.
+ *
+ * XXX: we need to iterate all buffers and only init those that are not
+ * cancelled. I think that a more fine grained factoring of
+ * xfs_ialloc_inode_init may be appropriate here to enable this to be
+ * done easily.
+ */
+ if (xlog_check_buffer_cancelled(log,
+ XFS_AGB_TO_DADDR(mp, agno, agbno), length, 0))
+ return 0;
+
+ xfs_ialloc_inode_init(mp, NULL, buffer_list, agno, agbno, length,
+ be32_to_cpu(icl->icl_gen));
+ return 0;
+}
+
+/*
* Free up any resources allocated by the transaction
*
* Remember that EFIs, EFDs, and IUNLINKs are handled later.
@@ -3023,6 +3124,7 @@ xlog_recover_commit_pass1(
case XFS_LI_EFI:
case XFS_LI_EFD:
case XFS_LI_DQUOT:
+ case XFS_LI_ICREATE:
/* nothing to do in pass 1 */
return 0;
default:
@@ -3053,6 +3155,8 @@ xlog_recover_commit_pass2(
return xlog_recover_efd_pass2(log, item);
case XFS_LI_DQUOT:
return xlog_recover_dquot_pass2(log, buffer_list, item);
+ case XFS_LI_ICREATE:
+ return xlog_recover_do_icreate_pass2(log, buffer_list, item);
case XFS_LI_QUOTAOFF:
/* nothing to do in pass2 */
return 0;
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 11/60] xfs: Inode create item recovery
2013-06-19 4:50 ` [PATCH 11/60] xfs: Inode create item recovery Dave Chinner
@ 2013-06-24 14:37 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-24 14:37 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> When we find a icreate transaction, we need to get and initialise
> the buffers in the range that has been passed. Extract and verify
> the information in the item record, then loop over the range
> initialising and issuing the buffer writes delayed.
>
> Support an arbitrary size range to initialise so that in
> future when we allocate inodes in much larger chunks all kernels
> that understand this transaction can still recover them.
>
> Signed-off-by: Dave Chinner<david@fromorbit.com>
> ---
Looks good.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 12/60] xfs: Use inode create transaction
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (10 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 11/60] xfs: Inode create item recovery Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-24 18:55 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 13/60] xfs: remove local fork format handling from xfs_bmapi_write() Dave Chinner
` (49 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
Replace the use of buffer based logging of inode initialisation,
uses the new logical form to describe the range to be initialised
in recovery. We continue to "log" the inode buffers to push them
into the AIL and ensure that the inode create transaction is not
removed from the log before the inode buffers are written to disk.
Update the transaction identifier and reservations to match the
changed implementation.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_buf_item.c | 12 ++++++++++--
fs/xfs/xfs_ialloc.c | 32 +++++++++++++++++++++++---------
2 files changed, 33 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 61f6876..bfc4e0c 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -310,13 +310,21 @@ xfs_buf_item_format(
/*
* If it is an inode buffer, transfer the in-memory state to the
- * format flags and clear the in-memory state. We do not transfer
+ * format flags and clear the in-memory state.
+ *
+ * For buffer based inode allocation, we do not transfer
* this state if the inode buffer allocation has not yet been committed
* to the log as setting the XFS_BLI_INODE_BUF flag will prevent
* correct replay of the inode allocation.
+ *
+ * For icreate item based inode allocation, the buffers aren't written
+ * to the journal during allocation, and hence we should always tag the
+ * buffer as an inode buffer so that the correct unlinked list replay
+ * occurs during recovery.
*/
if (bip->bli_flags & XFS_BLI_INODE_BUF) {
- if (!((bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF) &&
+ if (xfs_sb_version_hascrc(&lip->li_mountp->m_sb) ||
+ !((bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF) &&
xfs_log_item_in_current_chkpt(lip)))
bip->__bli_format.blf_flags |= XFS_BLF_INODE_BUF;
bip->bli_flags &= ~XFS_BLI_INODE_BUF;
diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index d5ef81a..319d9e4 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -38,6 +38,7 @@
#include "xfs_bmap.h"
#include "xfs_cksum.h"
#include "xfs_buf_item.h"
+#include "xfs_icreate_item.h"
/*
@@ -155,7 +156,7 @@ xfs_check_agi_freecount(
* than logging them (which in a transaction context puts them into the AIL
* for writeback rather than the xfsbufd queue).
*/
-STATIC int
+int
xfs_ialloc_inode_init(
struct xfs_mount *mp,
struct xfs_trans *tp,
@@ -212,6 +213,18 @@ xfs_ialloc_inode_init(
version = 3;
ino = XFS_AGINO_TO_INO(mp, agno,
XFS_OFFBNO_TO_AGINO(mp, agbno, 0));
+
+ /*
+ * log the initialisation that is about to take place as an
+ * logical operation. This means the transaction does not
+ * need to log the physical changes to the inode buffers as log
+ * recovery will know what initialisation is actually needed.
+ * Hence we only need to log the buffers as "ordered" buffers so
+ * they track in the AIL as if they were physically logged.
+ */
+ if (tp)
+ xfs_icreate_log(tp, agno, agbno, XFS_IALLOC_INODES(mp),
+ mp->m_sb.sb_inodesize, length, gen);
} else if (xfs_sb_version_hasnlink(&mp->m_sb))
version = 2;
else
@@ -227,13 +240,8 @@ xfs_ialloc_inode_init(
XBF_UNMAPPED);
if (!fbuf)
return ENOMEM;
- /*
- * Initialize all inodes in this buffer and then log them.
- *
- * XXX: It would be much better if we had just one transaction
- * to log a whole cluster of inodes instead of all the
- * individual transactions causing a lot of log traffic.
- */
+
+ /* Initialize the inode buffers and log them appropriately. */
fbuf->b_ops = &xfs_inode_buf_ops;
xfs_buf_zero(fbuf, 0, BBTOB(fbuf->b_length));
for (i = 0; i < ninodes; i++) {
@@ -269,7 +277,13 @@ xfs_ialloc_inode_init(
*/
xfs_trans_inode_alloc_buf(tp, fbuf);
if (version == 3) {
- /* need to log the entire buffer */
+ /*
+ * Mark the buffer as ordered so that they are
+ * not physically logged in the transaction but
+ * still tracked in the AIL as part of the
+ * transaction and pin the log appropriately.
+ */
+ xfs_trans_ordered_buf(tp, fbuf);
xfs_trans_log_buf(tp, fbuf, 0,
BBTOB(fbuf->b_length) - 1);
}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 12/60] xfs: Use inode create transaction
2013-06-19 4:50 ` [PATCH 12/60] xfs: Use inode create transaction Dave Chinner
@ 2013-06-24 18:55 ` Mark Tinguely
0 siblings, 0 replies; 95+ messages in thread
From: Mark Tinguely @ 2013-06-24 18:55 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:50, Dave Chinner wrote:
> Replace the use of buffer based logging of inode initialisation,
> uses the new logical form to describe the range to be initialised
> in recovery. We continue to "log" the inode buffers to push them
> into the AIL and ensure that the inode create transaction is not
> removed from the log before the inode buffers are written to disk.
>
> Update the transaction identifier and reservations to match the
> changed implementation.
>
> Signed-off-by: Dave Chinner<david@fromorbit.com>
> ---
Looks good. Nice feature.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 13/60] xfs: remove local fork format handling from xfs_bmapi_write()
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (11 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 12/60] xfs: Use inode create transaction Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 14/60] xfs: move getdents code into it's own file Dave Chinner
` (48 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The conversion from local format to extent format requires
interpretation of the data in the fork being converted, so it cannot
be done in a generic way. It is up to the caller to convert the fork
format to extent format before calling into xfs_bmapi_write() so
format conversion can be done correctly.
The code in xfs_bmapi_write() to convert the format is used
implicitly by the attribute and directory code, but they
specifically zero the fork size so that the conversion does not do
any allocation or manipulation. Move this conversion into the
shortform to leaf functions for the dir/attr code so the conversions
are explicitly controlled by all callers.
Now we can remove the conversion code in xfs_bmapi_write.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_attr_leaf.c | 2 +
fs/xfs/xfs_bmap.c | 199 ++++++++++++++++++++---------------------------
fs/xfs/xfs_bmap.h | 1 +
fs/xfs/xfs_dir2_block.c | 20 +++--
4 files changed, 98 insertions(+), 124 deletions(-)
diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 31d3cd1..b800fbc 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -690,6 +690,8 @@ xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
sf = (xfs_attr_shortform_t *)tmpbuffer;
xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
+ xfs_bmap_local_to_extents_empty(dp, XFS_ATTR_FORK);
+
bp = NULL;
error = xfs_da_grow_inode(args, &blkno);
if (error) {
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 8904284..05c698c 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -1161,6 +1161,24 @@ xfs_bmap_extents_to_btree(
* since the file data needs to get logged so things will stay consistent.
* (The bmap-level manipulations are ok, though).
*/
+void
+xfs_bmap_local_to_extents_empty(
+ struct xfs_inode *ip,
+ int whichfork)
+{
+ struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
+
+ ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
+ ASSERT(ifp->if_bytes == 0);
+ ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) == 0);
+
+ xfs_bmap_forkoff_reset(ip->i_mount, ip, whichfork);
+ ifp->if_flags &= ~XFS_IFINLINE;
+ ifp->if_flags |= XFS_IFEXTENTS;
+ XFS_IFORK_FMT_SET(ip, whichfork, XFS_DINODE_FMT_EXTENTS);
+}
+
+
STATIC int /* error */
xfs_bmap_local_to_extents(
xfs_trans_t *tp, /* transaction pointer */
@@ -1174,9 +1192,12 @@ xfs_bmap_local_to_extents(
struct xfs_inode *ip,
struct xfs_ifork *ifp))
{
- int error; /* error return value */
+ int error = 0;
int flags; /* logging flags returned */
xfs_ifork_t *ifp; /* inode fork pointer */
+ xfs_alloc_arg_t args; /* allocation arguments */
+ xfs_buf_t *bp; /* buffer for extent block */
+ xfs_bmbt_rec_host_t *ep; /* extent record pointer */
/*
* We don't want to deal with the case of keeping inode data inline yet.
@@ -1185,68 +1206,65 @@ xfs_bmap_local_to_extents(
ASSERT(!(S_ISREG(ip->i_d.di_mode) && whichfork == XFS_DATA_FORK));
ifp = XFS_IFORK_PTR(ip, whichfork);
ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
+
+ if (!ifp->if_bytes) {
+ xfs_bmap_local_to_extents_empty(ip, whichfork);
+ flags = XFS_ILOG_CORE;
+ goto done;
+ }
+
flags = 0;
error = 0;
- if (ifp->if_bytes) {
- xfs_alloc_arg_t args; /* allocation arguments */
- xfs_buf_t *bp; /* buffer for extent block */
- xfs_bmbt_rec_host_t *ep;/* extent record pointer */
-
- ASSERT((ifp->if_flags &
- (XFS_IFINLINE|XFS_IFEXTENTS|XFS_IFEXTIREC)) == XFS_IFINLINE);
- memset(&args, 0, sizeof(args));
- args.tp = tp;
- args.mp = ip->i_mount;
- args.firstblock = *firstblock;
- /*
- * Allocate a block. We know we need only one, since the
- * file currently fits in an inode.
- */
- if (*firstblock == NULLFSBLOCK) {
- args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
- args.type = XFS_ALLOCTYPE_START_BNO;
- } else {
- args.fsbno = *firstblock;
- args.type = XFS_ALLOCTYPE_NEAR_BNO;
- }
- args.total = total;
- args.minlen = args.maxlen = args.prod = 1;
- error = xfs_alloc_vextent(&args);
- if (error)
- goto done;
-
- /* Can't fail, the space was reserved. */
- ASSERT(args.fsbno != NULLFSBLOCK);
- ASSERT(args.len == 1);
- *firstblock = args.fsbno;
- bp = xfs_btree_get_bufl(args.mp, tp, args.fsbno, 0);
-
- /* initialise the block and copy the data */
- init_fn(tp, bp, ip, ifp);
-
- /* account for the change in fork size and log everything */
- xfs_trans_log_buf(tp, bp, 0, ifp->if_bytes - 1);
- xfs_bmap_forkoff_reset(args.mp, ip, whichfork);
- xfs_idata_realloc(ip, -ifp->if_bytes, whichfork);
- xfs_iext_add(ifp, 0, 1);
- ep = xfs_iext_get_ext(ifp, 0);
- xfs_bmbt_set_allf(ep, 0, args.fsbno, 1, XFS_EXT_NORM);
- trace_xfs_bmap_post_update(ip, 0,
- whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0,
- _THIS_IP_);
- XFS_IFORK_NEXT_SET(ip, whichfork, 1);
- ip->i_d.di_nblocks = 1;
- xfs_trans_mod_dquot_byino(tp, ip,
- XFS_TRANS_DQ_BCOUNT, 1L);
- flags |= xfs_ilog_fext(whichfork);
+ ASSERT((ifp->if_flags & (XFS_IFINLINE|XFS_IFEXTENTS|XFS_IFEXTIREC)) ==
+ XFS_IFINLINE);
+ memset(&args, 0, sizeof(args));
+ args.tp = tp;
+ args.mp = ip->i_mount;
+ args.firstblock = *firstblock;
+ /*
+ * Allocate a block. We know we need only one, since the
+ * file currently fits in an inode.
+ */
+ if (*firstblock == NULLFSBLOCK) {
+ args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
+ args.type = XFS_ALLOCTYPE_START_BNO;
} else {
- ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) == 0);
- xfs_bmap_forkoff_reset(ip->i_mount, ip, whichfork);
+ args.fsbno = *firstblock;
+ args.type = XFS_ALLOCTYPE_NEAR_BNO;
}
- ifp->if_flags &= ~XFS_IFINLINE;
- ifp->if_flags |= XFS_IFEXTENTS;
- XFS_IFORK_FMT_SET(ip, whichfork, XFS_DINODE_FMT_EXTENTS);
+ args.total = total;
+ args.minlen = args.maxlen = args.prod = 1;
+ error = xfs_alloc_vextent(&args);
+ if (error)
+ goto done;
+
+ /* Can't fail, the space was reserved. */
+ ASSERT(args.fsbno != NULLFSBLOCK);
+ ASSERT(args.len == 1);
+ *firstblock = args.fsbno;
+ bp = xfs_btree_get_bufl(args.mp, tp, args.fsbno, 0);
+
+ /* initialise the block and copy the data */
+ init_fn(tp, bp, ip, ifp);
+
+ /* account for the change in fork size and log everything */
+ xfs_trans_log_buf(tp, bp, 0, ifp->if_bytes - 1);
+ xfs_idata_realloc(ip, -ifp->if_bytes, whichfork);
+ xfs_bmap_local_to_extents_empty(ip, whichfork);
flags |= XFS_ILOG_CORE;
+
+ xfs_iext_add(ifp, 0, 1);
+ ep = xfs_iext_get_ext(ifp, 0);
+ xfs_bmbt_set_allf(ep, 0, args.fsbno, 1, XFS_EXT_NORM);
+ trace_xfs_bmap_post_update(ip, 0,
+ whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0,
+ _THIS_IP_);
+ XFS_IFORK_NEXT_SET(ip, whichfork, 1);
+ ip->i_d.di_nblocks = 1;
+ xfs_trans_mod_dquot_byino(tp, ip,
+ XFS_TRANS_DQ_BCOUNT, 1L);
+ flags |= xfs_ilog_fext(whichfork);
+
done:
*logflagsp = flags;
return error;
@@ -1323,25 +1341,6 @@ xfs_bmap_add_attrfork_extents(
}
/*
- * Block initialisation function for local to extent format conversion.
- *
- * This shouldn't actually be called by anyone, so make sure debug kernels cause
- * a noticable failure.
- */
-STATIC void
-xfs_bmap_local_to_extents_init_fn(
- struct xfs_trans *tp,
- struct xfs_buf *bp,
- struct xfs_inode *ip,
- struct xfs_ifork *ifp)
-{
- ASSERT(0);
- bp->b_ops = &xfs_bmbt_buf_ops;
- memcpy(bp->b_addr, ifp->if_u1.if_data, ifp->if_bytes);
- xfs_trans_buf_set_type(tp, bp, XFS_BLFT_BTREE_BUF);
-}
-
-/*
* Called from xfs_bmap_add_attrfork to handle local format files. Each
* different data fork content type needs a different callout to do the
* conversion. Some are basic and only require special block initialisation
@@ -1381,9 +1380,9 @@ xfs_bmap_add_attrfork_local(
flags, XFS_DATA_FORK,
xfs_symlink_local_to_remote);
- return xfs_bmap_local_to_extents(tp, ip, firstblock, 1, flags,
- XFS_DATA_FORK,
- xfs_bmap_local_to_extents_init_fn);
+ /* should only be called for types that support local format data */
+ ASSERT(0);
+ return EFSCORRUPTED;
}
/*
@@ -4907,20 +4906,19 @@ xfs_bmapi_write(
orig_mval = mval;
orig_nmap = *nmap;
#endif
+ whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
+ XFS_ATTR_FORK : XFS_DATA_FORK;
ASSERT(*nmap >= 1);
ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
ASSERT(!(flags & XFS_BMAPI_IGSTATE));
ASSERT(tp != NULL);
ASSERT(len > 0);
-
- whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
- XFS_ATTR_FORK : XFS_DATA_FORK;
+ ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
if (unlikely(XFS_TEST_ERROR(
(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE &&
- XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL),
+ XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
mp, XFS_ERRTAG_BMAPIFORMAT, XFS_RANDOM_BMAPIFORMAT))) {
XFS_ERROR_REPORT("xfs_bmapi_write", XFS_ERRLEVEL_LOW, mp);
return XFS_ERROR(EFSCORRUPTED);
@@ -4933,37 +4931,6 @@ xfs_bmapi_write(
XFS_STATS_INC(xs_blk_mapw);
- if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
- /*
- * XXX (dgc): This assumes we are only called for inodes that
- * contain content neutral data in local format. Anything that
- * contains caller-specific data in local format that needs
- * transformation to move to a block format needs to do the
- * conversion to extent format itself.
- *
- * Directory data forks and attribute forks handle this
- * themselves, but with the addition of metadata verifiers every
- * data fork in local format now contains caller specific data
- * and as such conversion through this function is likely to be
- * broken.
- *
- * The only likely user of this branch is for remote symlinks,
- * but we cannot overwrite the data fork contents of the symlink
- * (EEXIST occurs higher up the stack) and so it will never go
- * from local format to extent format here. Hence I don't think
- * this branch is ever executed intentionally and we should
- * consider removing it and asserting that xfs_bmapi_write()
- * cannot be called directly on local format forks. i.e. callers
- * are completely responsible for local to extent format
- * conversion, not xfs_bmapi_write().
- */
- error = xfs_bmap_local_to_extents(tp, ip, firstblock, total,
- &bma.logflags, whichfork,
- xfs_bmap_local_to_extents_init_fn);
- if (error)
- goto error0;
- }
-
if (*firstblock == NULLFSBLOCK) {
if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE)
bma.minleft = be16_to_cpu(ifp->if_broot->bb_level) + 1;
diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
index 5f469c3..1cf1292 100644
--- a/fs/xfs/xfs_bmap.h
+++ b/fs/xfs/xfs_bmap.h
@@ -172,6 +172,7 @@ void xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
#endif
int xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
+void xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
void xfs_bmap_add_free(xfs_fsblock_t bno, xfs_filblks_t len,
struct xfs_bmap_free *flist, struct xfs_mount *mp);
void xfs_bmap_cancel(struct xfs_bmap_free *flist);
diff --git a/fs/xfs/xfs_dir2_block.c b/fs/xfs/xfs_dir2_block.c
index e59f5fc..53b9aa2 100644
--- a/fs/xfs/xfs_dir2_block.c
+++ b/fs/xfs/xfs_dir2_block.c
@@ -29,6 +29,7 @@
#include "xfs_dinode.h"
#include "xfs_inode.h"
#include "xfs_inode_item.h"
+#include "xfs_bmap.h"
#include "xfs_buf_item.h"
#include "xfs_dir2.h"
#include "xfs_dir2_format.h"
@@ -1167,13 +1168,15 @@ xfs_dir2_sf_to_block(
__be16 *tagp; /* end of data entry */
xfs_trans_t *tp; /* transaction pointer */
struct xfs_name name;
+ struct xfs_ifork *ifp;
trace_xfs_dir2_sf_to_block(args);
dp = args->dp;
tp = args->trans;
mp = dp->i_mount;
- ASSERT(dp->i_df.if_flags & XFS_IFINLINE);
+ ifp = XFS_IFORK_PTR(dp, XFS_DATA_FORK);
+ ASSERT(ifp->if_flags & XFS_IFINLINE);
/*
* Bomb out if the shortform directory is way too short.
*/
@@ -1182,22 +1185,23 @@ xfs_dir2_sf_to_block(
return XFS_ERROR(EIO);
}
- oldsfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+ oldsfp = (xfs_dir2_sf_hdr_t *)ifp->if_u1.if_data;
- ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
- ASSERT(dp->i_df.if_u1.if_data != NULL);
+ ASSERT(ifp->if_bytes == dp->i_d.di_size);
+ ASSERT(ifp->if_u1.if_data != NULL);
ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(oldsfp->i8count));
+ ASSERT(dp->i_d.di_nextents == 0);
/*
* Copy the directory into a temporary buffer.
* Then pitch the incore inode data so we can make extents.
*/
- sfp = kmem_alloc(dp->i_df.if_bytes, KM_SLEEP);
- memcpy(sfp, oldsfp, dp->i_df.if_bytes);
+ sfp = kmem_alloc(ifp->if_bytes, KM_SLEEP);
+ memcpy(sfp, oldsfp, ifp->if_bytes);
- xfs_idata_realloc(dp, -dp->i_df.if_bytes, XFS_DATA_FORK);
+ xfs_idata_realloc(dp, -ifp->if_bytes, XFS_DATA_FORK);
+ xfs_bmap_local_to_extents_empty(dp, XFS_DATA_FORK);
dp->i_d.di_size = 0;
- xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
/*
* Add block 0 to the inode.
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 14/60] xfs: move getdents code into it's own file
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (12 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 13/60] xfs: remove local fork format handling from xfs_bmapi_write() Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 15/60] xfs: reshuffle dir2 definitions around for userspace Dave Chinner
` (47 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The directory readdir code isno used by userspace, but it is
intermingled with files that are shared with userspace. This makes
it difficult to compare the differences between the userspac eand
kernel files are the userspace files don't have the getdents code in
them. Move all the kernel getdents code to a separate file to bring
the shared content between userspace and kernel files closer
together.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_dir2.c | 34 ---
fs/xfs/xfs_dir2_block.c | 100 +------
fs/xfs/xfs_dir2_leaf.c | 391 ---------------------------
fs/xfs/xfs_dir2_priv.h | 8 +-
fs/xfs/xfs_dir2_readdir.c | 660 +++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_dir2_sf.c | 99 -------
7 files changed, 664 insertions(+), 629 deletions(-)
create mode 100644 fs/xfs/xfs_dir2_readdir.c
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 4a45080..f396fe9 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -30,6 +30,7 @@ xfs-y += xfs_aops.o \
xfs_bit.o \
xfs_buf.o \
xfs_dfrag.o \
+ xfs_dir2_readdir.o \
xfs_discard.o \
xfs_error.o \
xfs_export.o \
diff --git a/fs/xfs/xfs_dir2.c b/fs/xfs/xfs_dir2.c
index b26a50f..431be44 100644
--- a/fs/xfs/xfs_dir2.c
+++ b/fs/xfs/xfs_dir2.c
@@ -363,40 +363,6 @@ xfs_dir_removename(
}
/*
- * Read a directory.
- */
-int
-xfs_readdir(
- xfs_inode_t *dp,
- void *dirent,
- size_t bufsize,
- xfs_off_t *offset,
- filldir_t filldir)
-{
- int rval; /* return value */
- int v; /* type-checking value */
-
- trace_xfs_readdir(dp);
-
- if (XFS_FORCED_SHUTDOWN(dp->i_mount))
- return XFS_ERROR(EIO);
-
- ASSERT(S_ISDIR(dp->i_d.di_mode));
- XFS_STATS_INC(xs_dir_getdents);
-
- if (dp->i_d.di_format == XFS_DINODE_FMT_LOCAL)
- rval = xfs_dir2_sf_getdents(dp, dirent, offset, filldir);
- else if ((rval = xfs_dir2_isblock(NULL, dp, &v)))
- ;
- else if (v)
- rval = xfs_dir2_block_getdents(dp, dirent, offset, filldir);
- else
- rval = xfs_dir2_leaf_getdents(dp, dirent, bufsize, offset,
- filldir);
- return rval;
-}
-
-/*
* Replace the inode number of a directory entry.
*/
int
diff --git a/fs/xfs/xfs_dir2_block.c b/fs/xfs/xfs_dir2_block.c
index 53b9aa2..5e84000 100644
--- a/fs/xfs/xfs_dir2_block.c
+++ b/fs/xfs/xfs_dir2_block.c
@@ -126,7 +126,7 @@ const struct xfs_buf_ops xfs_dir3_block_buf_ops = {
.verify_write = xfs_dir3_block_write_verify,
};
-static int
+int
xfs_dir3_block_read(
struct xfs_trans *tp,
struct xfs_inode *dp,
@@ -565,104 +565,6 @@ xfs_dir2_block_addname(
}
/*
- * Readdir for block directories.
- */
-int /* error */
-xfs_dir2_block_getdents(
- xfs_inode_t *dp, /* incore inode */
- void *dirent,
- xfs_off_t *offset,
- filldir_t filldir)
-{
- xfs_dir2_data_hdr_t *hdr; /* block header */
- struct xfs_buf *bp; /* buffer for block */
- xfs_dir2_block_tail_t *btp; /* block tail */
- xfs_dir2_data_entry_t *dep; /* block data entry */
- xfs_dir2_data_unused_t *dup; /* block unused entry */
- char *endptr; /* end of the data entries */
- int error; /* error return value */
- xfs_mount_t *mp; /* filesystem mount point */
- char *ptr; /* current data entry */
- int wantoff; /* starting block offset */
- xfs_off_t cook;
-
- mp = dp->i_mount;
- /*
- * If the block number in the offset is out of range, we're done.
- */
- if (xfs_dir2_dataptr_to_db(mp, *offset) > mp->m_dirdatablk)
- return 0;
-
- error = xfs_dir3_block_read(NULL, dp, &bp);
- if (error)
- return error;
-
- /*
- * Extract the byte offset we start at from the seek pointer.
- * We'll skip entries before this.
- */
- wantoff = xfs_dir2_dataptr_to_off(mp, *offset);
- hdr = bp->b_addr;
- xfs_dir3_data_check(dp, bp);
- /*
- * Set up values for the loop.
- */
- btp = xfs_dir2_block_tail_p(mp, hdr);
- ptr = (char *)xfs_dir3_data_entry_p(hdr);
- endptr = (char *)xfs_dir2_block_leaf_p(btp);
-
- /*
- * Loop over the data portion of the block.
- * Each object is a real entry (dep) or an unused one (dup).
- */
- while (ptr < endptr) {
- dup = (xfs_dir2_data_unused_t *)ptr;
- /*
- * Unused, skip it.
- */
- if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
- ptr += be16_to_cpu(dup->length);
- continue;
- }
-
- dep = (xfs_dir2_data_entry_t *)ptr;
-
- /*
- * Bump pointer for the next iteration.
- */
- ptr += xfs_dir2_data_entsize(dep->namelen);
- /*
- * The entry is before the desired starting point, skip it.
- */
- if ((char *)dep - (char *)hdr < wantoff)
- continue;
-
- cook = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
- (char *)dep - (char *)hdr);
-
- /*
- * If it didn't fit, set the final offset to here & return.
- */
- if (filldir(dirent, (char *)dep->name, dep->namelen,
- cook & 0x7fffffff, be64_to_cpu(dep->inumber),
- DT_UNKNOWN)) {
- *offset = cook & 0x7fffffff;
- xfs_trans_brelse(NULL, bp);
- return 0;
- }
- }
-
- /*
- * Reached the end of the block.
- * Set the offset to a non-existent block 1 and return.
- */
- *offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk + 1, 0) &
- 0x7fffffff;
- xfs_trans_brelse(NULL, bp);
- return 0;
-}
-
-/*
* Log leaf entries from the block.
*/
static void
diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index bb22aac..e1aed21 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -1083,397 +1083,6 @@ xfs_dir3_leaf_compact_x1(
*highstalep = highstale;
}
-struct xfs_dir2_leaf_map_info {
- xfs_extlen_t map_blocks; /* number of fsbs in map */
- xfs_dablk_t map_off; /* last mapped file offset */
- int map_size; /* total entries in *map */
- int map_valid; /* valid entries in *map */
- int nmap; /* mappings to ask xfs_bmapi */
- xfs_dir2_db_t curdb; /* db for current block */
- int ra_current; /* number of read-ahead blks */
- int ra_index; /* *map index for read-ahead */
- int ra_offset; /* map entry offset for ra */
- int ra_want; /* readahead count wanted */
- struct xfs_bmbt_irec map[]; /* map vector for blocks */
-};
-
-STATIC int
-xfs_dir2_leaf_readbuf(
- struct xfs_inode *dp,
- size_t bufsize,
- struct xfs_dir2_leaf_map_info *mip,
- xfs_dir2_off_t *curoff,
- struct xfs_buf **bpp)
-{
- struct xfs_mount *mp = dp->i_mount;
- struct xfs_buf *bp = *bpp;
- struct xfs_bmbt_irec *map = mip->map;
- struct blk_plug plug;
- int error = 0;
- int length;
- int i;
- int j;
-
- /*
- * If we have a buffer, we need to release it and
- * take it out of the mapping.
- */
-
- if (bp) {
- xfs_trans_brelse(NULL, bp);
- bp = NULL;
- mip->map_blocks -= mp->m_dirblkfsbs;
- /*
- * Loop to get rid of the extents for the
- * directory block.
- */
- for (i = mp->m_dirblkfsbs; i > 0; ) {
- j = min_t(int, map->br_blockcount, i);
- map->br_blockcount -= j;
- map->br_startblock += j;
- map->br_startoff += j;
- /*
- * If mapping is done, pitch it from
- * the table.
- */
- if (!map->br_blockcount && --mip->map_valid)
- memmove(&map[0], &map[1],
- sizeof(map[0]) * mip->map_valid);
- i -= j;
- }
- }
-
- /*
- * Recalculate the readahead blocks wanted.
- */
- mip->ra_want = howmany(bufsize + mp->m_dirblksize,
- mp->m_sb.sb_blocksize) - 1;
- ASSERT(mip->ra_want >= 0);
-
- /*
- * If we don't have as many as we want, and we haven't
- * run out of data blocks, get some more mappings.
- */
- if (1 + mip->ra_want > mip->map_blocks &&
- mip->map_off < xfs_dir2_byte_to_da(mp, XFS_DIR2_LEAF_OFFSET)) {
- /*
- * Get more bmaps, fill in after the ones
- * we already have in the table.
- */
- mip->nmap = mip->map_size - mip->map_valid;
- error = xfs_bmapi_read(dp, mip->map_off,
- xfs_dir2_byte_to_da(mp, XFS_DIR2_LEAF_OFFSET) -
- mip->map_off,
- &map[mip->map_valid], &mip->nmap, 0);
-
- /*
- * Don't know if we should ignore this or try to return an
- * error. The trouble with returning errors is that readdir
- * will just stop without actually passing the error through.
- */
- if (error)
- goto out; /* XXX */
-
- /*
- * If we got all the mappings we asked for, set the final map
- * offset based on the last bmap value received. Otherwise,
- * we've reached the end.
- */
- if (mip->nmap == mip->map_size - mip->map_valid) {
- i = mip->map_valid + mip->nmap - 1;
- mip->map_off = map[i].br_startoff + map[i].br_blockcount;
- } else
- mip->map_off = xfs_dir2_byte_to_da(mp,
- XFS_DIR2_LEAF_OFFSET);
-
- /*
- * Look for holes in the mapping, and eliminate them. Count up
- * the valid blocks.
- */
- for (i = mip->map_valid; i < mip->map_valid + mip->nmap; ) {
- if (map[i].br_startblock == HOLESTARTBLOCK) {
- mip->nmap--;
- length = mip->map_valid + mip->nmap - i;
- if (length)
- memmove(&map[i], &map[i + 1],
- sizeof(map[i]) * length);
- } else {
- mip->map_blocks += map[i].br_blockcount;
- i++;
- }
- }
- mip->map_valid += mip->nmap;
- }
-
- /*
- * No valid mappings, so no more data blocks.
- */
- if (!mip->map_valid) {
- *curoff = xfs_dir2_da_to_byte(mp, mip->map_off);
- goto out;
- }
-
- /*
- * Read the directory block starting at the first mapping.
- */
- mip->curdb = xfs_dir2_da_to_db(mp, map->br_startoff);
- error = xfs_dir3_data_read(NULL, dp, map->br_startoff,
- map->br_blockcount >= mp->m_dirblkfsbs ?
- XFS_FSB_TO_DADDR(mp, map->br_startblock) : -1, &bp);
-
- /*
- * Should just skip over the data block instead of giving up.
- */
- if (error)
- goto out; /* XXX */
-
- /*
- * Adjust the current amount of read-ahead: we just read a block that
- * was previously ra.
- */
- if (mip->ra_current)
- mip->ra_current -= mp->m_dirblkfsbs;
-
- /*
- * Do we need more readahead?
- */
- blk_start_plug(&plug);
- for (mip->ra_index = mip->ra_offset = i = 0;
- mip->ra_want > mip->ra_current && i < mip->map_blocks;
- i += mp->m_dirblkfsbs) {
- ASSERT(mip->ra_index < mip->map_valid);
- /*
- * Read-ahead a contiguous directory block.
- */
- if (i > mip->ra_current &&
- map[mip->ra_index].br_blockcount >= mp->m_dirblkfsbs) {
- xfs_dir3_data_readahead(NULL, dp,
- map[mip->ra_index].br_startoff + mip->ra_offset,
- XFS_FSB_TO_DADDR(mp,
- map[mip->ra_index].br_startblock +
- mip->ra_offset));
- mip->ra_current = i;
- }
-
- /*
- * Read-ahead a non-contiguous directory block. This doesn't
- * use our mapping, but this is a very rare case.
- */
- else if (i > mip->ra_current) {
- xfs_dir3_data_readahead(NULL, dp,
- map[mip->ra_index].br_startoff +
- mip->ra_offset, -1);
- mip->ra_current = i;
- }
-
- /*
- * Advance offset through the mapping table.
- */
- for (j = 0; j < mp->m_dirblkfsbs; j++) {
- /*
- * The rest of this extent but not more than a dir
- * block.
- */
- length = min_t(int, mp->m_dirblkfsbs,
- map[mip->ra_index].br_blockcount -
- mip->ra_offset);
- j += length;
- mip->ra_offset += length;
-
- /*
- * Advance to the next mapping if this one is used up.
- */
- if (mip->ra_offset == map[mip->ra_index].br_blockcount) {
- mip->ra_offset = 0;
- mip->ra_index++;
- }
- }
- }
- blk_finish_plug(&plug);
-
-out:
- *bpp = bp;
- return error;
-}
-
-/*
- * Getdents (readdir) for leaf and node directories.
- * This reads the data blocks only, so is the same for both forms.
- */
-int /* error */
-xfs_dir2_leaf_getdents(
- xfs_inode_t *dp, /* incore directory inode */
- void *dirent,
- size_t bufsize,
- xfs_off_t *offset,
- filldir_t filldir)
-{
- struct xfs_buf *bp = NULL; /* data block buffer */
- xfs_dir2_data_hdr_t *hdr; /* data block header */
- xfs_dir2_data_entry_t *dep; /* data entry */
- xfs_dir2_data_unused_t *dup; /* unused entry */
- int error = 0; /* error return value */
- int length; /* temporary length value */
- xfs_mount_t *mp; /* filesystem mount point */
- int byteoff; /* offset in current block */
- xfs_dir2_off_t curoff; /* current overall offset */
- xfs_dir2_off_t newoff; /* new curoff after new blk */
- char *ptr = NULL; /* pointer to current data */
- struct xfs_dir2_leaf_map_info *map_info;
-
- /*
- * If the offset is at or past the largest allowed value,
- * give up right away.
- */
- if (*offset >= XFS_DIR2_MAX_DATAPTR)
- return 0;
-
- mp = dp->i_mount;
-
- /*
- * Set up to bmap a number of blocks based on the caller's
- * buffer size, the directory block size, and the filesystem
- * block size.
- */
- length = howmany(bufsize + mp->m_dirblksize,
- mp->m_sb.sb_blocksize);
- map_info = kmem_zalloc(offsetof(struct xfs_dir2_leaf_map_info, map) +
- (length * sizeof(struct xfs_bmbt_irec)),
- KM_SLEEP | KM_NOFS);
- map_info->map_size = length;
-
- /*
- * Inside the loop we keep the main offset value as a byte offset
- * in the directory file.
- */
- curoff = xfs_dir2_dataptr_to_byte(mp, *offset);
-
- /*
- * Force this conversion through db so we truncate the offset
- * down to get the start of the data block.
- */
- map_info->map_off = xfs_dir2_db_to_da(mp,
- xfs_dir2_byte_to_db(mp, curoff));
-
- /*
- * Loop over directory entries until we reach the end offset.
- * Get more blocks and readahead as necessary.
- */
- while (curoff < XFS_DIR2_LEAF_OFFSET) {
- /*
- * If we have no buffer, or we're off the end of the
- * current buffer, need to get another one.
- */
- if (!bp || ptr >= (char *)bp->b_addr + mp->m_dirblksize) {
-
- error = xfs_dir2_leaf_readbuf(dp, bufsize, map_info,
- &curoff, &bp);
- if (error || !map_info->map_valid)
- break;
-
- /*
- * Having done a read, we need to set a new offset.
- */
- newoff = xfs_dir2_db_off_to_byte(mp, map_info->curdb, 0);
- /*
- * Start of the current block.
- */
- if (curoff < newoff)
- curoff = newoff;
- /*
- * Make sure we're in the right block.
- */
- else if (curoff > newoff)
- ASSERT(xfs_dir2_byte_to_db(mp, curoff) ==
- map_info->curdb);
- hdr = bp->b_addr;
- xfs_dir3_data_check(dp, bp);
- /*
- * Find our position in the block.
- */
- ptr = (char *)xfs_dir3_data_entry_p(hdr);
- byteoff = xfs_dir2_byte_to_off(mp, curoff);
- /*
- * Skip past the header.
- */
- if (byteoff == 0)
- curoff += xfs_dir3_data_entry_offset(hdr);
- /*
- * Skip past entries until we reach our offset.
- */
- else {
- while ((char *)ptr - (char *)hdr < byteoff) {
- dup = (xfs_dir2_data_unused_t *)ptr;
-
- if (be16_to_cpu(dup->freetag)
- == XFS_DIR2_DATA_FREE_TAG) {
-
- length = be16_to_cpu(dup->length);
- ptr += length;
- continue;
- }
- dep = (xfs_dir2_data_entry_t *)ptr;
- length =
- xfs_dir2_data_entsize(dep->namelen);
- ptr += length;
- }
- /*
- * Now set our real offset.
- */
- curoff =
- xfs_dir2_db_off_to_byte(mp,
- xfs_dir2_byte_to_db(mp, curoff),
- (char *)ptr - (char *)hdr);
- if (ptr >= (char *)hdr + mp->m_dirblksize) {
- continue;
- }
- }
- }
- /*
- * We have a pointer to an entry.
- * Is it a live one?
- */
- dup = (xfs_dir2_data_unused_t *)ptr;
- /*
- * No, it's unused, skip over it.
- */
- if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
- length = be16_to_cpu(dup->length);
- ptr += length;
- curoff += length;
- continue;
- }
-
- dep = (xfs_dir2_data_entry_t *)ptr;
- length = xfs_dir2_data_entsize(dep->namelen);
-
- if (filldir(dirent, (char *)dep->name, dep->namelen,
- xfs_dir2_byte_to_dataptr(mp, curoff) & 0x7fffffff,
- be64_to_cpu(dep->inumber), DT_UNKNOWN))
- break;
-
- /*
- * Advance to next entry in the block.
- */
- ptr += length;
- curoff += length;
- /* bufsize may have just been a guess; don't go negative */
- bufsize = bufsize > length ? bufsize - length : 0;
- }
-
- /*
- * All done. Set output offset value to current offset.
- */
- if (curoff > xfs_dir2_dataptr_to_byte(mp, XFS_DIR2_MAX_DATAPTR))
- *offset = XFS_DIR2_MAX_DATAPTR & 0x7fffffff;
- else
- *offset = xfs_dir2_byte_to_dataptr(mp, curoff) & 0x7fffffff;
- kmem_free(map_info);
- if (bp)
- xfs_trans_brelse(NULL, bp);
- return error;
-}
-
/*
* Log the bests entries indicated from a leaf1 block.
diff --git a/fs/xfs/xfs_dir2_priv.h b/fs/xfs/xfs_dir2_priv.h
index 7cf573c..72ff8d7 100644
--- a/fs/xfs/xfs_dir2_priv.h
+++ b/fs/xfs/xfs_dir2_priv.h
@@ -32,9 +32,9 @@ extern int xfs_dir_cilookup_result(struct xfs_da_args *args,
/* xfs_dir2_block.c */
extern const struct xfs_buf_ops xfs_dir3_block_buf_ops;
+extern int xfs_dir3_block_read(struct xfs_trans *tp, struct xfs_inode *dp,
+ struct xfs_buf **bpp);
extern int xfs_dir2_block_addname(struct xfs_da_args *args);
-extern int xfs_dir2_block_getdents(struct xfs_inode *dp, void *dirent,
- xfs_off_t *offset, filldir_t filldir);
extern int xfs_dir2_block_lookup(struct xfs_da_args *args);
extern int xfs_dir2_block_removename(struct xfs_da_args *args);
extern int xfs_dir2_block_replace(struct xfs_da_args *args);
@@ -91,8 +91,6 @@ extern void xfs_dir3_leaf_compact(struct xfs_da_args *args,
extern void xfs_dir3_leaf_compact_x1(struct xfs_dir3_icleaf_hdr *leafhdr,
struct xfs_dir2_leaf_entry *ents, int *indexp,
int *lowstalep, int *highstalep, int *lowlogp, int *highlogp);
-extern int xfs_dir2_leaf_getdents(struct xfs_inode *dp, void *dirent,
- size_t bufsize, xfs_off_t *offset, filldir_t filldir);
extern int xfs_dir3_leaf_get_buf(struct xfs_da_args *args, xfs_dir2_db_t bno,
struct xfs_buf **bpp, __uint16_t magic);
extern void xfs_dir3_leaf_log_ents(struct xfs_trans *tp, struct xfs_buf *bp,
@@ -153,8 +151,6 @@ extern int xfs_dir2_block_to_sf(struct xfs_da_args *args, struct xfs_buf *bp,
int size, xfs_dir2_sf_hdr_t *sfhp);
extern int xfs_dir2_sf_addname(struct xfs_da_args *args);
extern int xfs_dir2_sf_create(struct xfs_da_args *args, xfs_ino_t pino);
-extern int xfs_dir2_sf_getdents(struct xfs_inode *dp, void *dirent,
- xfs_off_t *offset, filldir_t filldir);
extern int xfs_dir2_sf_lookup(struct xfs_da_args *args);
extern int xfs_dir2_sf_removename(struct xfs_da_args *args);
extern int xfs_dir2_sf_replace(struct xfs_da_args *args);
diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
new file mode 100644
index 0000000..af17c00
--- /dev/null
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -0,0 +1,660 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2013 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_bmap.h"
+
+STATIC int
+xfs_dir2_sf_getdents(
+ xfs_inode_t *dp, /* incore directory inode */
+ void *dirent,
+ xfs_off_t *offset,
+ filldir_t filldir)
+{
+ int i; /* shortform entry number */
+ xfs_mount_t *mp; /* filesystem mount point */
+ xfs_dir2_dataptr_t off; /* current entry's offset */
+ xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */
+ xfs_dir2_sf_hdr_t *sfp; /* shortform structure */
+ xfs_dir2_dataptr_t dot_offset;
+ xfs_dir2_dataptr_t dotdot_offset;
+ xfs_ino_t ino;
+
+ mp = dp->i_mount;
+
+ ASSERT(dp->i_df.if_flags & XFS_IFINLINE);
+ /*
+ * Give up if the directory is way too short.
+ */
+ if (dp->i_d.di_size < offsetof(xfs_dir2_sf_hdr_t, parent)) {
+ ASSERT(XFS_FORCED_SHUTDOWN(mp));
+ return XFS_ERROR(EIO);
+ }
+
+ ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
+ ASSERT(dp->i_df.if_u1.if_data != NULL);
+
+ sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+
+ ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
+
+ /*
+ * If the block number in the offset is out of range, we're done.
+ */
+ if (xfs_dir2_dataptr_to_db(mp, *offset) > mp->m_dirdatablk)
+ return 0;
+
+ /*
+ * Precalculate offsets for . and .. as we will always need them.
+ *
+ * XXX(hch): the second argument is sometimes 0 and sometimes
+ * mp->m_dirdatablk.
+ */
+ dot_offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
+ XFS_DIR3_DATA_DOT_OFFSET(mp));
+ dotdot_offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
+ XFS_DIR3_DATA_DOTDOT_OFFSET(mp));
+
+ /*
+ * Put . entry unless we're starting past it.
+ */
+ if (*offset <= dot_offset) {
+ if (filldir(dirent, ".", 1, dot_offset & 0x7fffffff, dp->i_ino, DT_DIR)) {
+ *offset = dot_offset & 0x7fffffff;
+ return 0;
+ }
+ }
+
+ /*
+ * Put .. entry unless we're starting past it.
+ */
+ if (*offset <= dotdot_offset) {
+ ino = xfs_dir2_sf_get_parent_ino(sfp);
+ if (filldir(dirent, "..", 2, dotdot_offset & 0x7fffffff, ino, DT_DIR)) {
+ *offset = dotdot_offset & 0x7fffffff;
+ return 0;
+ }
+ }
+
+ /*
+ * Loop while there are more entries and put'ing works.
+ */
+ sfep = xfs_dir2_sf_firstentry(sfp);
+ for (i = 0; i < sfp->count; i++) {
+ off = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
+ xfs_dir2_sf_get_offset(sfep));
+
+ if (*offset > off) {
+ sfep = xfs_dir2_sf_nextentry(sfp, sfep);
+ continue;
+ }
+
+ ino = xfs_dir2_sfe_get_ino(sfp, sfep);
+ if (filldir(dirent, (char *)sfep->name, sfep->namelen,
+ off & 0x7fffffff, ino, DT_UNKNOWN)) {
+ *offset = off & 0x7fffffff;
+ return 0;
+ }
+ sfep = xfs_dir2_sf_nextentry(sfp, sfep);
+ }
+
+ *offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk + 1, 0) &
+ 0x7fffffff;
+ return 0;
+}
+
+
+/*
+ * Readdir for block directories.
+ */
+STATIC int
+xfs_dir2_block_getdents(
+ xfs_inode_t *dp, /* incore inode */
+ void *dirent,
+ xfs_off_t *offset,
+ filldir_t filldir)
+{
+ xfs_dir2_data_hdr_t *hdr; /* block header */
+ struct xfs_buf *bp; /* buffer for block */
+ xfs_dir2_block_tail_t *btp; /* block tail */
+ xfs_dir2_data_entry_t *dep; /* block data entry */
+ xfs_dir2_data_unused_t *dup; /* block unused entry */
+ char *endptr; /* end of the data entries */
+ int error; /* error return value */
+ xfs_mount_t *mp; /* filesystem mount point */
+ char *ptr; /* current data entry */
+ int wantoff; /* starting block offset */
+ xfs_off_t cook;
+
+ mp = dp->i_mount;
+ /*
+ * If the block number in the offset is out of range, we're done.
+ */
+ if (xfs_dir2_dataptr_to_db(mp, *offset) > mp->m_dirdatablk)
+ return 0;
+
+ error = xfs_dir3_block_read(NULL, dp, &bp);
+ if (error)
+ return error;
+
+ /*
+ * Extract the byte offset we start at from the seek pointer.
+ * We'll skip entries before this.
+ */
+ wantoff = xfs_dir2_dataptr_to_off(mp, *offset);
+ hdr = bp->b_addr;
+ xfs_dir3_data_check(dp, bp);
+ /*
+ * Set up values for the loop.
+ */
+ btp = xfs_dir2_block_tail_p(mp, hdr);
+ ptr = (char *)xfs_dir3_data_entry_p(hdr);
+ endptr = (char *)xfs_dir2_block_leaf_p(btp);
+
+ /*
+ * Loop over the data portion of the block.
+ * Each object is a real entry (dep) or an unused one (dup).
+ */
+ while (ptr < endptr) {
+ dup = (xfs_dir2_data_unused_t *)ptr;
+ /*
+ * Unused, skip it.
+ */
+ if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
+ ptr += be16_to_cpu(dup->length);
+ continue;
+ }
+
+ dep = (xfs_dir2_data_entry_t *)ptr;
+
+ /*
+ * Bump pointer for the next iteration.
+ */
+ ptr += xfs_dir2_data_entsize(dep->namelen);
+ /*
+ * The entry is before the desired starting point, skip it.
+ */
+ if ((char *)dep - (char *)hdr < wantoff)
+ continue;
+
+ cook = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
+ (char *)dep - (char *)hdr);
+
+ /*
+ * If it didn't fit, set the final offset to here & return.
+ */
+ if (filldir(dirent, (char *)dep->name, dep->namelen,
+ cook & 0x7fffffff, be64_to_cpu(dep->inumber),
+ DT_UNKNOWN)) {
+ *offset = cook & 0x7fffffff;
+ xfs_trans_brelse(NULL, bp);
+ return 0;
+ }
+ }
+
+ /*
+ * Reached the end of the block.
+ * Set the offset to a non-existent block 1 and return.
+ */
+ *offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk + 1, 0) &
+ 0x7fffffff;
+ xfs_trans_brelse(NULL, bp);
+ return 0;
+}
+
+struct xfs_dir2_leaf_map_info {
+ xfs_extlen_t map_blocks; /* number of fsbs in map */
+ xfs_dablk_t map_off; /* last mapped file offset */
+ int map_size; /* total entries in *map */
+ int map_valid; /* valid entries in *map */
+ int nmap; /* mappings to ask xfs_bmapi */
+ xfs_dir2_db_t curdb; /* db for current block */
+ int ra_current; /* number of read-ahead blks */
+ int ra_index; /* *map index for read-ahead */
+ int ra_offset; /* map entry offset for ra */
+ int ra_want; /* readahead count wanted */
+ struct xfs_bmbt_irec map[]; /* map vector for blocks */
+};
+
+STATIC int
+xfs_dir2_leaf_readbuf(
+ struct xfs_inode *dp,
+ size_t bufsize,
+ struct xfs_dir2_leaf_map_info *mip,
+ xfs_dir2_off_t *curoff,
+ struct xfs_buf **bpp)
+{
+ struct xfs_mount *mp = dp->i_mount;
+ struct xfs_buf *bp = *bpp;
+ struct xfs_bmbt_irec *map = mip->map;
+ struct blk_plug plug;
+ int error = 0;
+ int length;
+ int i;
+ int j;
+
+ /*
+ * If we have a buffer, we need to release it and
+ * take it out of the mapping.
+ */
+
+ if (bp) {
+ xfs_trans_brelse(NULL, bp);
+ bp = NULL;
+ mip->map_blocks -= mp->m_dirblkfsbs;
+ /*
+ * Loop to get rid of the extents for the
+ * directory block.
+ */
+ for (i = mp->m_dirblkfsbs; i > 0; ) {
+ j = min_t(int, map->br_blockcount, i);
+ map->br_blockcount -= j;
+ map->br_startblock += j;
+ map->br_startoff += j;
+ /*
+ * If mapping is done, pitch it from
+ * the table.
+ */
+ if (!map->br_blockcount && --mip->map_valid)
+ memmove(&map[0], &map[1],
+ sizeof(map[0]) * mip->map_valid);
+ i -= j;
+ }
+ }
+
+ /*
+ * Recalculate the readahead blocks wanted.
+ */
+ mip->ra_want = howmany(bufsize + mp->m_dirblksize,
+ mp->m_sb.sb_blocksize) - 1;
+ ASSERT(mip->ra_want >= 0);
+
+ /*
+ * If we don't have as many as we want, and we haven't
+ * run out of data blocks, get some more mappings.
+ */
+ if (1 + mip->ra_want > mip->map_blocks &&
+ mip->map_off < xfs_dir2_byte_to_da(mp, XFS_DIR2_LEAF_OFFSET)) {
+ /*
+ * Get more bmaps, fill in after the ones
+ * we already have in the table.
+ */
+ mip->nmap = mip->map_size - mip->map_valid;
+ error = xfs_bmapi_read(dp, mip->map_off,
+ xfs_dir2_byte_to_da(mp, XFS_DIR2_LEAF_OFFSET) -
+ mip->map_off,
+ &map[mip->map_valid], &mip->nmap, 0);
+
+ /*
+ * Don't know if we should ignore this or try to return an
+ * error. The trouble with returning errors is that readdir
+ * will just stop without actually passing the error through.
+ */
+ if (error)
+ goto out; /* XXX */
+
+ /*
+ * If we got all the mappings we asked for, set the final map
+ * offset based on the last bmap value received. Otherwise,
+ * we've reached the end.
+ */
+ if (mip->nmap == mip->map_size - mip->map_valid) {
+ i = mip->map_valid + mip->nmap - 1;
+ mip->map_off = map[i].br_startoff + map[i].br_blockcount;
+ } else
+ mip->map_off = xfs_dir2_byte_to_da(mp,
+ XFS_DIR2_LEAF_OFFSET);
+
+ /*
+ * Look for holes in the mapping, and eliminate them. Count up
+ * the valid blocks.
+ */
+ for (i = mip->map_valid; i < mip->map_valid + mip->nmap; ) {
+ if (map[i].br_startblock == HOLESTARTBLOCK) {
+ mip->nmap--;
+ length = mip->map_valid + mip->nmap - i;
+ if (length)
+ memmove(&map[i], &map[i + 1],
+ sizeof(map[i]) * length);
+ } else {
+ mip->map_blocks += map[i].br_blockcount;
+ i++;
+ }
+ }
+ mip->map_valid += mip->nmap;
+ }
+
+ /*
+ * No valid mappings, so no more data blocks.
+ */
+ if (!mip->map_valid) {
+ *curoff = xfs_dir2_da_to_byte(mp, mip->map_off);
+ goto out;
+ }
+
+ /*
+ * Read the directory block starting at the first mapping.
+ */
+ mip->curdb = xfs_dir2_da_to_db(mp, map->br_startoff);
+ error = xfs_dir3_data_read(NULL, dp, map->br_startoff,
+ map->br_blockcount >= mp->m_dirblkfsbs ?
+ XFS_FSB_TO_DADDR(mp, map->br_startblock) : -1, &bp);
+
+ /*
+ * Should just skip over the data block instead of giving up.
+ */
+ if (error)
+ goto out; /* XXX */
+
+ /*
+ * Adjust the current amount of read-ahead: we just read a block that
+ * was previously ra.
+ */
+ if (mip->ra_current)
+ mip->ra_current -= mp->m_dirblkfsbs;
+
+ /*
+ * Do we need more readahead?
+ */
+ blk_start_plug(&plug);
+ for (mip->ra_index = mip->ra_offset = i = 0;
+ mip->ra_want > mip->ra_current && i < mip->map_blocks;
+ i += mp->m_dirblkfsbs) {
+ ASSERT(mip->ra_index < mip->map_valid);
+ /*
+ * Read-ahead a contiguous directory block.
+ */
+ if (i > mip->ra_current &&
+ map[mip->ra_index].br_blockcount >= mp->m_dirblkfsbs) {
+ xfs_dir3_data_readahead(NULL, dp,
+ map[mip->ra_index].br_startoff + mip->ra_offset,
+ XFS_FSB_TO_DADDR(mp,
+ map[mip->ra_index].br_startblock +
+ mip->ra_offset));
+ mip->ra_current = i;
+ }
+
+ /*
+ * Read-ahead a non-contiguous directory block. This doesn't
+ * use our mapping, but this is a very rare case.
+ */
+ else if (i > mip->ra_current) {
+ xfs_dir3_data_readahead(NULL, dp,
+ map[mip->ra_index].br_startoff +
+ mip->ra_offset, -1);
+ mip->ra_current = i;
+ }
+
+ /*
+ * Advance offset through the mapping table.
+ */
+ for (j = 0; j < mp->m_dirblkfsbs; j++) {
+ /*
+ * The rest of this extent but not more than a dir
+ * block.
+ */
+ length = min_t(int, mp->m_dirblkfsbs,
+ map[mip->ra_index].br_blockcount -
+ mip->ra_offset);
+ j += length;
+ mip->ra_offset += length;
+
+ /*
+ * Advance to the next mapping if this one is used up.
+ */
+ if (mip->ra_offset == map[mip->ra_index].br_blockcount) {
+ mip->ra_offset = 0;
+ mip->ra_index++;
+ }
+ }
+ }
+ blk_finish_plug(&plug);
+
+out:
+ *bpp = bp;
+ return error;
+}
+
+/*
+ * Getdents (readdir) for leaf and node directories.
+ * This reads the data blocks only, so is the same for both forms.
+ */
+STATIC int
+xfs_dir2_leaf_getdents(
+ xfs_inode_t *dp, /* incore directory inode */
+ void *dirent,
+ size_t bufsize,
+ xfs_off_t *offset,
+ filldir_t filldir)
+{
+ struct xfs_buf *bp = NULL; /* data block buffer */
+ xfs_dir2_data_hdr_t *hdr; /* data block header */
+ xfs_dir2_data_entry_t *dep; /* data entry */
+ xfs_dir2_data_unused_t *dup; /* unused entry */
+ int error = 0; /* error return value */
+ int length; /* temporary length value */
+ xfs_mount_t *mp; /* filesystem mount point */
+ int byteoff; /* offset in current block */
+ xfs_dir2_off_t curoff; /* current overall offset */
+ xfs_dir2_off_t newoff; /* new curoff after new blk */
+ char *ptr = NULL; /* pointer to current data */
+ struct xfs_dir2_leaf_map_info *map_info;
+
+ /*
+ * If the offset is at or past the largest allowed value,
+ * give up right away.
+ */
+ if (*offset >= XFS_DIR2_MAX_DATAPTR)
+ return 0;
+
+ mp = dp->i_mount;
+
+ /*
+ * Set up to bmap a number of blocks based on the caller's
+ * buffer size, the directory block size, and the filesystem
+ * block size.
+ */
+ length = howmany(bufsize + mp->m_dirblksize,
+ mp->m_sb.sb_blocksize);
+ map_info = kmem_zalloc(offsetof(struct xfs_dir2_leaf_map_info, map) +
+ (length * sizeof(struct xfs_bmbt_irec)),
+ KM_SLEEP | KM_NOFS);
+ map_info->map_size = length;
+
+ /*
+ * Inside the loop we keep the main offset value as a byte offset
+ * in the directory file.
+ */
+ curoff = xfs_dir2_dataptr_to_byte(mp, *offset);
+
+ /*
+ * Force this conversion through db so we truncate the offset
+ * down to get the start of the data block.
+ */
+ map_info->map_off = xfs_dir2_db_to_da(mp,
+ xfs_dir2_byte_to_db(mp, curoff));
+
+ /*
+ * Loop over directory entries until we reach the end offset.
+ * Get more blocks and readahead as necessary.
+ */
+ while (curoff < XFS_DIR2_LEAF_OFFSET) {
+ /*
+ * If we have no buffer, or we're off the end of the
+ * current buffer, need to get another one.
+ */
+ if (!bp || ptr >= (char *)bp->b_addr + mp->m_dirblksize) {
+
+ error = xfs_dir2_leaf_readbuf(dp, bufsize, map_info,
+ &curoff, &bp);
+ if (error || !map_info->map_valid)
+ break;
+
+ /*
+ * Having done a read, we need to set a new offset.
+ */
+ newoff = xfs_dir2_db_off_to_byte(mp, map_info->curdb, 0);
+ /*
+ * Start of the current block.
+ */
+ if (curoff < newoff)
+ curoff = newoff;
+ /*
+ * Make sure we're in the right block.
+ */
+ else if (curoff > newoff)
+ ASSERT(xfs_dir2_byte_to_db(mp, curoff) ==
+ map_info->curdb);
+ hdr = bp->b_addr;
+ xfs_dir3_data_check(dp, bp);
+ /*
+ * Find our position in the block.
+ */
+ ptr = (char *)xfs_dir3_data_entry_p(hdr);
+ byteoff = xfs_dir2_byte_to_off(mp, curoff);
+ /*
+ * Skip past the header.
+ */
+ if (byteoff == 0)
+ curoff += xfs_dir3_data_entry_offset(hdr);
+ /*
+ * Skip past entries until we reach our offset.
+ */
+ else {
+ while ((char *)ptr - (char *)hdr < byteoff) {
+ dup = (xfs_dir2_data_unused_t *)ptr;
+
+ if (be16_to_cpu(dup->freetag)
+ == XFS_DIR2_DATA_FREE_TAG) {
+
+ length = be16_to_cpu(dup->length);
+ ptr += length;
+ continue;
+ }
+ dep = (xfs_dir2_data_entry_t *)ptr;
+ length =
+ xfs_dir2_data_entsize(dep->namelen);
+ ptr += length;
+ }
+ /*
+ * Now set our real offset.
+ */
+ curoff =
+ xfs_dir2_db_off_to_byte(mp,
+ xfs_dir2_byte_to_db(mp, curoff),
+ (char *)ptr - (char *)hdr);
+ if (ptr >= (char *)hdr + mp->m_dirblksize) {
+ continue;
+ }
+ }
+ }
+ /*
+ * We have a pointer to an entry.
+ * Is it a live one?
+ */
+ dup = (xfs_dir2_data_unused_t *)ptr;
+ /*
+ * No, it's unused, skip over it.
+ */
+ if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
+ length = be16_to_cpu(dup->length);
+ ptr += length;
+ curoff += length;
+ continue;
+ }
+
+ dep = (xfs_dir2_data_entry_t *)ptr;
+ length = xfs_dir2_data_entsize(dep->namelen);
+
+ if (filldir(dirent, (char *)dep->name, dep->namelen,
+ xfs_dir2_byte_to_dataptr(mp, curoff) & 0x7fffffff,
+ be64_to_cpu(dep->inumber), DT_UNKNOWN))
+ break;
+
+ /*
+ * Advance to next entry in the block.
+ */
+ ptr += length;
+ curoff += length;
+ /* bufsize may have just been a guess; don't go negative */
+ bufsize = bufsize > length ? bufsize - length : 0;
+ }
+
+ /*
+ * All done. Set output offset value to current offset.
+ */
+ if (curoff > xfs_dir2_dataptr_to_byte(mp, XFS_DIR2_MAX_DATAPTR))
+ *offset = XFS_DIR2_MAX_DATAPTR & 0x7fffffff;
+ else
+ *offset = xfs_dir2_byte_to_dataptr(mp, curoff) & 0x7fffffff;
+ kmem_free(map_info);
+ if (bp)
+ xfs_trans_brelse(NULL, bp);
+ return error;
+}
+
+/*
+ * Read a directory.
+ */
+int
+xfs_readdir(
+ xfs_inode_t *dp,
+ void *dirent,
+ size_t bufsize,
+ xfs_off_t *offset,
+ filldir_t filldir)
+{
+ int rval; /* return value */
+ int v; /* type-checking value */
+
+ trace_xfs_readdir(dp);
+
+ if (XFS_FORCED_SHUTDOWN(dp->i_mount))
+ return XFS_ERROR(EIO);
+
+ ASSERT(S_ISDIR(dp->i_d.di_mode));
+ XFS_STATS_INC(xs_dir_getdents);
+
+ if (dp->i_d.di_format == XFS_DINODE_FMT_LOCAL)
+ rval = xfs_dir2_sf_getdents(dp, dirent, offset, filldir);
+ else if ((rval = xfs_dir2_isblock(NULL, dp, &v)))
+ ;
+ else if (v)
+ rval = xfs_dir2_block_getdents(dp, dirent, offset, filldir);
+ else
+ rval = xfs_dir2_leaf_getdents(dp, dirent, bufsize, offset,
+ filldir);
+ return rval;
+}
+
diff --git a/fs/xfs/xfs_dir2_sf.c b/fs/xfs/xfs_dir2_sf.c
index 6157424..f24ce90 100644
--- a/fs/xfs/xfs_dir2_sf.c
+++ b/fs/xfs/xfs_dir2_sf.c
@@ -765,105 +765,6 @@ xfs_dir2_sf_create(
return 0;
}
-int /* error */
-xfs_dir2_sf_getdents(
- xfs_inode_t *dp, /* incore directory inode */
- void *dirent,
- xfs_off_t *offset,
- filldir_t filldir)
-{
- int i; /* shortform entry number */
- xfs_mount_t *mp; /* filesystem mount point */
- xfs_dir2_dataptr_t off; /* current entry's offset */
- xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */
- xfs_dir2_sf_hdr_t *sfp; /* shortform structure */
- xfs_dir2_dataptr_t dot_offset;
- xfs_dir2_dataptr_t dotdot_offset;
- xfs_ino_t ino;
-
- mp = dp->i_mount;
-
- ASSERT(dp->i_df.if_flags & XFS_IFINLINE);
- /*
- * Give up if the directory is way too short.
- */
- if (dp->i_d.di_size < offsetof(xfs_dir2_sf_hdr_t, parent)) {
- ASSERT(XFS_FORCED_SHUTDOWN(mp));
- return XFS_ERROR(EIO);
- }
-
- ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
- ASSERT(dp->i_df.if_u1.if_data != NULL);
-
- sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
-
- ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
-
- /*
- * If the block number in the offset is out of range, we're done.
- */
- if (xfs_dir2_dataptr_to_db(mp, *offset) > mp->m_dirdatablk)
- return 0;
-
- /*
- * Precalculate offsets for . and .. as we will always need them.
- *
- * XXX(hch): the second argument is sometimes 0 and sometimes
- * mp->m_dirdatablk.
- */
- dot_offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
- XFS_DIR3_DATA_DOT_OFFSET(mp));
- dotdot_offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
- XFS_DIR3_DATA_DOTDOT_OFFSET(mp));
-
- /*
- * Put . entry unless we're starting past it.
- */
- if (*offset <= dot_offset) {
- if (filldir(dirent, ".", 1, dot_offset & 0x7fffffff, dp->i_ino, DT_DIR)) {
- *offset = dot_offset & 0x7fffffff;
- return 0;
- }
- }
-
- /*
- * Put .. entry unless we're starting past it.
- */
- if (*offset <= dotdot_offset) {
- ino = xfs_dir2_sf_get_parent_ino(sfp);
- if (filldir(dirent, "..", 2, dotdot_offset & 0x7fffffff, ino, DT_DIR)) {
- *offset = dotdot_offset & 0x7fffffff;
- return 0;
- }
- }
-
- /*
- * Loop while there are more entries and put'ing works.
- */
- sfep = xfs_dir2_sf_firstentry(sfp);
- for (i = 0; i < sfp->count; i++) {
- off = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
- xfs_dir2_sf_get_offset(sfep));
-
- if (*offset > off) {
- sfep = xfs_dir2_sf_nextentry(sfp, sfep);
- continue;
- }
-
- ino = xfs_dir2_sfe_get_ino(sfp, sfep);
- if (filldir(dirent, (char *)sfep->name, sfep->namelen,
- off & 0x7fffffff, ino, DT_UNKNOWN)) {
- *offset = off & 0x7fffffff;
- return 0;
- }
- sfep = xfs_dir2_sf_nextentry(sfp, sfep);
- }
-
- *offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk + 1, 0) &
- 0x7fffffff;
- return 0;
-}
-
/*
* Lookup an entry in a shortform directory.
* Returns EEXIST if found, ENOENT if not found.
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 15/60] xfs: reshuffle dir2 definitions around for userspace
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (13 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 14/60] xfs: move getdents code into it's own file Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 16/60] xfs: split out attribute listing code into separate file Dave Chinner
` (46 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Many of the definitions within xfs_dir2_priv.h are needed in
userspace outside libxfs. Definitions within xfs_dir2_priv.h are
wholly contained within libxfs, so we need to shuffle some of the
definitions around to keep consistency across files shared between
user and kernel space.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_bmap.c | 3 ++-
fs/xfs/xfs_da_btree.c | 2 +-
fs/xfs/xfs_dir2.c | 2 +-
fs/xfs/xfs_dir2.h | 51 +++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_dir2_block.c | 2 +-
fs/xfs/xfs_dir2_data.c | 6 ++----
fs/xfs/xfs_dir2_leaf.c | 1 +
fs/xfs/xfs_dir2_node.c | 1 +
fs/xfs/xfs_dir2_priv.h | 31 ---------------------------
fs/xfs/xfs_dir2_readdir.c | 1 +
fs/xfs/xfs_dir2_sf.c | 6 +++---
fs/xfs/xfs_export.c | 4 +++-
fs/xfs/xfs_file.c | 2 +-
fs/xfs/xfs_icreate_item.c | 12 -----------
fs/xfs/xfs_log_recover.c | 2 +-
fs/xfs/xfs_mount.c | 4 +++-
fs/xfs/xfs_rename.c | 3 ++-
fs/xfs/xfs_rtalloc.c | 1 -
fs/xfs/xfs_super.c | 3 ++-
fs/xfs/xfs_symlink.c | 3 ++-
fs/xfs/xfs_utils.c | 4 +++-
fs/xfs/xfs_vnodeops.c | 3 ++-
22 files changed, 83 insertions(+), 64 deletions(-)
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 05c698c..2ddeb43 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -24,9 +24,10 @@
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
-#include "xfs_dir2.h"
#include "xfs_mount.h"
#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_bmap_btree.h"
#include "xfs_alloc_btree.h"
#include "xfs_ialloc_btree.h"
diff --git a/fs/xfs/xfs_da_btree.c b/fs/xfs/xfs_da_btree.c
index 0b8b2a1..8bbd7ef 100644
--- a/fs/xfs/xfs_da_btree.c
+++ b/fs/xfs/xfs_da_btree.c
@@ -27,8 +27,8 @@
#include "xfs_mount.h"
#include "xfs_da_btree.h"
#include "xfs_bmap_btree.h"
-#include "xfs_dir2.h"
#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_priv.h"
#include "xfs_dinode.h"
#include "xfs_inode.h"
diff --git a/fs/xfs/xfs_dir2.c b/fs/xfs/xfs_dir2.c
index 431be44..c3263a5 100644
--- a/fs/xfs/xfs_dir2.c
+++ b/fs/xfs/xfs_dir2.c
@@ -31,8 +31,8 @@
#include "xfs_inode.h"
#include "xfs_inode_item.h"
#include "xfs_bmap.h"
-#include "xfs_dir2.h"
#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_priv.h"
#include "xfs_error.h"
#include "xfs_vnodeops.h"
diff --git a/fs/xfs/xfs_dir2.h b/fs/xfs/xfs_dir2.h
index e937d99..7ef6b0f 100644
--- a/fs/xfs/xfs_dir2.h
+++ b/fs/xfs/xfs_dir2.h
@@ -23,6 +23,11 @@ struct xfs_da_args;
struct xfs_inode;
struct xfs_mount;
struct xfs_trans;
+struct xfs_dir2_sf_hdr;
+struct xfs_dir2_sf_entry;
+struct xfs_dir2_data_hdr;
+struct xfs_dir2_data_entry;
+struct xfs_dir2_data_unused;
extern struct xfs_name xfs_name_dotdot;
@@ -57,4 +62,50 @@ extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
*/
extern int xfs_dir2_sf_to_block(struct xfs_da_args *args);
+/*
+ * Direct call on directory open, before entering the readdir code.
+ */
+extern int xfs_dir3_data_readahead(struct xfs_trans *tp, struct xfs_inode *dp,
+ xfs_dablk_t bno, xfs_daddr_t mapped_bno);
+
+/*
+ * Interface routines used by userspace utilities
+ */
+extern xfs_ino_t xfs_dir2_sf_get_parent_ino(struct xfs_dir2_sf_hdr *sfp);
+extern void xfs_dir2_sf_put_parent_ino(struct xfs_dir2_sf_hdr *sfp,
+ xfs_ino_t ino);
+extern xfs_ino_t xfs_dir2_sfe_get_ino(struct xfs_dir2_sf_hdr *sfp,
+ struct xfs_dir2_sf_entry *sfep);
+extern void xfs_dir2_sfe_put_ino( struct xfs_dir2_sf_hdr *,
+ struct xfs_dir2_sf_entry *sfep, xfs_ino_t ino);
+
+extern int xfs_dir2_isblock(struct xfs_trans *tp, struct xfs_inode *dp, int *r);
+extern int xfs_dir2_isleaf(struct xfs_trans *tp, struct xfs_inode *dp, int *r);
+extern int xfs_dir2_shrink_inode(struct xfs_da_args *args, xfs_dir2_db_t db,
+ struct xfs_buf *bp);
+
+extern void xfs_dir2_data_freescan(struct xfs_mount *mp,
+ struct xfs_dir2_data_hdr *hdr, int *loghead);
+extern void xfs_dir2_data_log_entry(struct xfs_trans *tp, struct xfs_buf *bp,
+ struct xfs_dir2_data_entry *dep);
+extern void xfs_dir2_data_log_header(struct xfs_trans *tp,
+ struct xfs_buf *bp);
+extern void xfs_dir2_data_log_unused(struct xfs_trans *tp, struct xfs_buf *bp,
+ struct xfs_dir2_data_unused *dup);
+extern void xfs_dir2_data_make_free(struct xfs_trans *tp, struct xfs_buf *bp,
+ xfs_dir2_data_aoff_t offset, xfs_dir2_data_aoff_t len,
+ int *needlogp, int *needscanp);
+extern void xfs_dir2_data_use_free(struct xfs_trans *tp, struct xfs_buf *bp,
+ struct xfs_dir2_data_unused *dup, xfs_dir2_data_aoff_t offset,
+ xfs_dir2_data_aoff_t len, int *needlogp, int *needscanp);
+
+extern struct xfs_dir2_data_free *xfs_dir2_data_freefind(
+ struct xfs_dir2_data_hdr *hdr, struct xfs_dir2_data_unused *dup);
+
+extern const struct xfs_buf_ops xfs_dir3_block_buf_ops;
+extern const struct xfs_buf_ops xfs_dir3_leafn_buf_ops;
+extern const struct xfs_buf_ops xfs_dir3_leaf1_buf_ops;
+extern const struct xfs_buf_ops xfs_dir3_free_buf_ops;
+extern const struct xfs_buf_ops xfs_dir3_data_buf_ops;
+
#endif /* __XFS_DIR2_H__ */
diff --git a/fs/xfs/xfs_dir2_block.c b/fs/xfs/xfs_dir2_block.c
index 5e84000..becd69f 100644
--- a/fs/xfs/xfs_dir2_block.c
+++ b/fs/xfs/xfs_dir2_block.c
@@ -31,8 +31,8 @@
#include "xfs_inode_item.h"
#include "xfs_bmap.h"
#include "xfs_buf_item.h"
-#include "xfs_dir2.h"
#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_priv.h"
#include "xfs_error.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_dir2_data.c b/fs/xfs/xfs_dir2_data.c
index c293023..4e1917d 100644
--- a/fs/xfs/xfs_dir2_data.c
+++ b/fs/xfs/xfs_dir2_data.c
@@ -29,14 +29,12 @@
#include "xfs_dinode.h"
#include "xfs_inode.h"
#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_priv.h"
#include "xfs_error.h"
#include "xfs_buf_item.h"
#include "xfs_cksum.h"
-STATIC xfs_dir2_data_free_t *
-xfs_dir2_data_freefind(xfs_dir2_data_hdr_t *hdr, xfs_dir2_data_unused_t *dup);
-
/*
* Check the consistency of the data block.
* The input can also be a block-format directory.
@@ -325,7 +323,7 @@ xfs_dir3_data_readahead(
* Given a data block and an unused entry from that block,
* return the bestfree entry if any that corresponds to it.
*/
-STATIC xfs_dir2_data_free_t *
+xfs_dir2_data_free_t *
xfs_dir2_data_freefind(
xfs_dir2_data_hdr_t *hdr, /* data block */
xfs_dir2_data_unused_t *dup) /* data unused entry */
diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index e1aed21..005a6b3 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -31,6 +31,7 @@
#include "xfs_inode.h"
#include "xfs_bmap.h"
#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_priv.h"
#include "xfs_error.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_dir2_node.c b/fs/xfs/xfs_dir2_node.c
index 2226a00..b4bd9b6 100644
--- a/fs/xfs/xfs_dir2_node.c
+++ b/fs/xfs/xfs_dir2_node.c
@@ -30,6 +30,7 @@
#include "xfs_inode.h"
#include "xfs_bmap.h"
#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_priv.h"
#include "xfs_error.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_dir2_priv.h b/fs/xfs/xfs_dir2_priv.h
index 72ff8d7..807eb65 100644
--- a/fs/xfs/xfs_dir2_priv.h
+++ b/fs/xfs/xfs_dir2_priv.h
@@ -20,18 +20,12 @@
/* xfs_dir2.c */
extern int xfs_dir_ino_validate(struct xfs_mount *mp, xfs_ino_t ino);
-extern int xfs_dir2_isblock(struct xfs_trans *tp, struct xfs_inode *dp, int *r);
-extern int xfs_dir2_isleaf(struct xfs_trans *tp, struct xfs_inode *dp, int *r);
extern int xfs_dir2_grow_inode(struct xfs_da_args *args, int space,
xfs_dir2_db_t *dbp);
-extern int xfs_dir2_shrink_inode(struct xfs_da_args *args, xfs_dir2_db_t db,
- struct xfs_buf *bp);
extern int xfs_dir_cilookup_result(struct xfs_da_args *args,
const unsigned char *name, int len);
/* xfs_dir2_block.c */
-extern const struct xfs_buf_ops xfs_dir3_block_buf_ops;
-
extern int xfs_dir3_block_read(struct xfs_trans *tp, struct xfs_inode *dp,
struct xfs_buf **bpp);
extern int xfs_dir2_block_addname(struct xfs_da_args *args);
@@ -48,39 +42,17 @@ extern int xfs_dir2_leaf_to_block(struct xfs_da_args *args,
#define xfs_dir3_data_check(dp,bp)
#endif
-extern const struct xfs_buf_ops xfs_dir3_data_buf_ops;
-extern const struct xfs_buf_ops xfs_dir3_free_buf_ops;
-
extern int __xfs_dir3_data_check(struct xfs_inode *dp, struct xfs_buf *bp);
extern int xfs_dir3_data_read(struct xfs_trans *tp, struct xfs_inode *dp,
xfs_dablk_t bno, xfs_daddr_t mapped_bno, struct xfs_buf **bpp);
-extern int xfs_dir3_data_readahead(struct xfs_trans *tp, struct xfs_inode *dp,
- xfs_dablk_t bno, xfs_daddr_t mapped_bno);
extern struct xfs_dir2_data_free *
xfs_dir2_data_freeinsert(struct xfs_dir2_data_hdr *hdr,
struct xfs_dir2_data_unused *dup, int *loghead);
-extern void xfs_dir2_data_freescan(struct xfs_mount *mp,
- struct xfs_dir2_data_hdr *hdr, int *loghead);
extern int xfs_dir3_data_init(struct xfs_da_args *args, xfs_dir2_db_t blkno,
struct xfs_buf **bpp);
-extern void xfs_dir2_data_log_entry(struct xfs_trans *tp, struct xfs_buf *bp,
- struct xfs_dir2_data_entry *dep);
-extern void xfs_dir2_data_log_header(struct xfs_trans *tp,
- struct xfs_buf *bp);
-extern void xfs_dir2_data_log_unused(struct xfs_trans *tp, struct xfs_buf *bp,
- struct xfs_dir2_data_unused *dup);
-extern void xfs_dir2_data_make_free(struct xfs_trans *tp, struct xfs_buf *bp,
- xfs_dir2_data_aoff_t offset, xfs_dir2_data_aoff_t len,
- int *needlogp, int *needscanp);
-extern void xfs_dir2_data_use_free(struct xfs_trans *tp, struct xfs_buf *bp,
- struct xfs_dir2_data_unused *dup, xfs_dir2_data_aoff_t offset,
- xfs_dir2_data_aoff_t len, int *needlogp, int *needscanp);
/* xfs_dir2_leaf.c */
-extern const struct xfs_buf_ops xfs_dir3_leaf1_buf_ops;
-extern const struct xfs_buf_ops xfs_dir3_leafn_buf_ops;
-
extern int xfs_dir3_leafn_read(struct xfs_trans *tp, struct xfs_inode *dp,
xfs_dablk_t fbno, xfs_daddr_t mappedbno, struct xfs_buf **bpp);
extern int xfs_dir2_block_to_leaf(struct xfs_da_args *args,
@@ -142,9 +114,6 @@ extern int xfs_dir2_free_read(struct xfs_trans *tp, struct xfs_inode *dp,
xfs_dablk_t fbno, struct xfs_buf **bpp);
/* xfs_dir2_sf.c */
-extern xfs_ino_t xfs_dir2_sf_get_parent_ino(struct xfs_dir2_sf_hdr *sfp);
-extern xfs_ino_t xfs_dir2_sfe_get_ino(struct xfs_dir2_sf_hdr *sfp,
- struct xfs_dir2_sf_entry *sfep);
extern int xfs_dir2_block_sfsize(struct xfs_inode *dp,
struct xfs_dir2_data_hdr *block, struct xfs_dir2_sf_hdr *sfhp);
extern int xfs_dir2_block_to_sf(struct xfs_da_args *args, struct xfs_buf *bp,
diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index af17c00..88da6b8 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -30,6 +30,7 @@
#include "xfs_dinode.h"
#include "xfs_inode.h"
#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_priv.h"
#include "xfs_error.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_dir2_sf.c b/fs/xfs/xfs_dir2_sf.c
index f24ce90..65b65c5 100644
--- a/fs/xfs/xfs_dir2_sf.c
+++ b/fs/xfs/xfs_dir2_sf.c
@@ -29,8 +29,8 @@
#include "xfs_inode.h"
#include "xfs_inode_item.h"
#include "xfs_error.h"
-#include "xfs_dir2.h"
#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_dir2_priv.h"
#include "xfs_trace.h"
@@ -95,7 +95,7 @@ xfs_dir2_sf_get_parent_ino(
return xfs_dir2_sf_get_ino(hdr, &hdr->parent);
}
-static void
+void
xfs_dir2_sf_put_parent_ino(
struct xfs_dir2_sf_hdr *hdr,
xfs_ino_t ino)
@@ -123,7 +123,7 @@ xfs_dir2_sfe_get_ino(
return xfs_dir2_sf_get_ino(hdr, xfs_dir2_sfe_inop(sfep));
}
-static void
+void
xfs_dir2_sfe_put_ino(
struct xfs_dir2_sf_hdr *hdr,
struct xfs_dir2_sf_entry *sfep,
diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
index c585bc6..29c880f 100644
--- a/fs/xfs/xfs_export.c
+++ b/fs/xfs/xfs_export.c
@@ -21,8 +21,10 @@
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
-#include "xfs_dir2.h"
#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_export.h"
#include "xfs_vnodeops.h"
#include "xfs_bmap_btree.h"
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a5f2042..16521f1 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -32,7 +32,7 @@
#include "xfs_vnodeops.h"
#include "xfs_da_btree.h"
#include "xfs_dir2_format.h"
-#include "xfs_dir2_priv.h"
+#include "xfs_dir2.h"
#include "xfs_ioctl.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_icreate_item.c b/fs/xfs/xfs_icreate_item.c
index 7716a4e..441a78a 100644
--- a/fs/xfs/xfs_icreate_item.c
+++ b/fs/xfs/xfs_icreate_item.c
@@ -20,23 +20,11 @@
#include "xfs_types.h"
#include "xfs_bit.h"
#include "xfs_log.h"
-#include "xfs_inum.h"
#include "xfs_trans.h"
-#include "xfs_buf_item.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
-#include "xfs_dir2.h"
#include "xfs_mount.h"
#include "xfs_trans_priv.h"
-#include "xfs_bmap_btree.h"
-#include "xfs_alloc_btree.h"
-#include "xfs_ialloc_btree.h"
-#include "xfs_attr_sf.h"
-#include "xfs_dinode.h"
-#include "xfs_inode.h"
-#include "xfs_inode_item.h"
-#include "xfs_btree.h"
-#include "xfs_ialloc.h"
#include "xfs_error.h"
#include "xfs_icreate_item.h"
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 6fcc910a..ffa519d 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -51,7 +51,7 @@
#include "xfs_symlink.h"
#include "xfs_da_btree.h"
#include "xfs_dir2_format.h"
-#include "xfs_dir2_priv.h"
+#include "xfs_dir2.h"
#include "xfs_attr_leaf.h"
#include "xfs_attr_remote.h"
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index e8e310c..e7253e2 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -25,8 +25,10 @@
#include "xfs_trans_priv.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
-#include "xfs_dir2.h"
#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_bmap_btree.h"
#include "xfs_alloc_btree.h"
#include "xfs_ialloc_btree.h"
diff --git a/fs/xfs/xfs_rename.c b/fs/xfs/xfs_rename.c
index 30ff5f4..b0ac54f 100644
--- a/fs/xfs/xfs_rename.c
+++ b/fs/xfs/xfs_rename.c
@@ -22,9 +22,10 @@
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
-#include "xfs_dir2.h"
#include "xfs_mount.h"
#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_bmap_btree.h"
#include "xfs_dinode.h"
#include "xfs_inode.h"
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 98dc670..7fed9c2 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -23,7 +23,6 @@
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
-#include "xfs_dir2.h"
#include "xfs_mount.h"
#include "xfs_bmap_btree.h"
#include "xfs_dinode.h"
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f59e27f..1496094 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -22,7 +22,6 @@
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
-#include "xfs_dir2.h"
#include "xfs_alloc.h"
#include "xfs_quota.h"
#include "xfs_mount.h"
@@ -46,6 +45,8 @@
#include "xfs_trans_priv.h"
#include "xfs_filestream.h"
#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_extfree_item.h"
#include "xfs_mru_cache.h"
#include "xfs_inode_item.h"
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 195a403..996925b 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -24,9 +24,10 @@
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
-#include "xfs_dir2.h"
#include "xfs_mount.h"
#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_bmap_btree.h"
#include "xfs_ialloc_btree.h"
#include "xfs_dinode.h"
diff --git a/fs/xfs/xfs_utils.c b/fs/xfs/xfs_utils.c
index 0025c78..7bf8f13 100644
--- a/fs/xfs/xfs_utils.c
+++ b/fs/xfs/xfs_utils.c
@@ -22,8 +22,10 @@
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
-#include "xfs_dir2.h"
#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_bmap_btree.h"
#include "xfs_dinode.h"
#include "xfs_inode.h"
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 0176bb2..acec328 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -25,9 +25,10 @@
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
-#include "xfs_dir2.h"
#include "xfs_mount.h"
#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_bmap_btree.h"
#include "xfs_ialloc_btree.h"
#include "xfs_dinode.h"
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 16/60] xfs: split out attribute listing code into separate file
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (14 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 15/60] xfs: reshuffle dir2 definitions around for userspace Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 17/60] xfs: split out attribute fork truncation " Dave Chinner
` (45 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The attribute listing code is not used by userspace, so like the
directory readdir code, split it out into a separate file to
minimise the differences between the filesystem shared with libxfs
in userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_attr.c | 317 +----------------------
fs/xfs/xfs_attr.h | 1 +
fs/xfs/xfs_attr_leaf.c | 300 ----------------------
fs/xfs/xfs_attr_list.c | 655 ++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 658 insertions(+), 616 deletions(-)
create mode 100644 fs/xfs/xfs_attr_list.c
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index f396fe9..4670207 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -27,6 +27,7 @@ xfs-y += xfs_trace.o
# highlevel code
xfs-y += xfs_aops.o \
+ xfs_attr_list.o \
xfs_bit.o \
xfs_buf.o \
xfs_dfrag.o \
diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index 20fe3fe..75a168c 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -62,7 +62,6 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
-STATIC int xfs_attr_leaf_list(xfs_attr_list_context_t *context);
/*
* Internal routines when attribute list is more than one block.
@@ -70,7 +69,6 @@ STATIC int xfs_attr_leaf_list(xfs_attr_list_context_t *context);
STATIC int xfs_attr_node_get(xfs_da_args_t *args);
STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
-STATIC int xfs_attr_node_list(xfs_attr_list_context_t *context);
STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
@@ -90,7 +88,7 @@ xfs_attr_name_to_xname(
return 0;
}
-STATIC int
+int
xfs_inode_hasattr(
struct xfs_inode *ip)
{
@@ -611,157 +609,6 @@ xfs_attr_remove(
return xfs_attr_remove_int(dp, &xname, flags);
}
-int
-xfs_attr_list_int(xfs_attr_list_context_t *context)
-{
- int error;
- xfs_inode_t *dp = context->dp;
-
- XFS_STATS_INC(xs_attr_list);
-
- if (XFS_FORCED_SHUTDOWN(dp->i_mount))
- return EIO;
-
- xfs_ilock(dp, XFS_ILOCK_SHARED);
-
- /*
- * Decide on what work routines to call based on the inode size.
- */
- if (!xfs_inode_hasattr(dp)) {
- error = 0;
- } else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
- error = xfs_attr_shortform_list(context);
- } else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
- error = xfs_attr_leaf_list(context);
- } else {
- error = xfs_attr_node_list(context);
- }
-
- xfs_iunlock(dp, XFS_ILOCK_SHARED);
-
- return error;
-}
-
-#define ATTR_ENTBASESIZE /* minimum bytes used by an attr */ \
- (((struct attrlist_ent *) 0)->a_name - (char *) 0)
-#define ATTR_ENTSIZE(namelen) /* actual bytes used by an attr */ \
- ((ATTR_ENTBASESIZE + (namelen) + 1 + sizeof(u_int32_t)-1) \
- & ~(sizeof(u_int32_t)-1))
-
-/*
- * Format an attribute and copy it out to the user's buffer.
- * Take care to check values and protect against them changing later,
- * we may be reading them directly out of a user buffer.
- */
-/*ARGSUSED*/
-STATIC int
-xfs_attr_put_listent(
- xfs_attr_list_context_t *context,
- int flags,
- unsigned char *name,
- int namelen,
- int valuelen,
- unsigned char *value)
-{
- struct attrlist *alist = (struct attrlist *)context->alist;
- attrlist_ent_t *aep;
- int arraytop;
-
- ASSERT(!(context->flags & ATTR_KERNOVAL));
- ASSERT(context->count >= 0);
- ASSERT(context->count < (ATTR_MAX_VALUELEN/8));
- ASSERT(context->firstu >= sizeof(*alist));
- ASSERT(context->firstu <= context->bufsize);
-
- /*
- * Only list entries in the right namespace.
- */
- if (((context->flags & ATTR_SECURE) == 0) !=
- ((flags & XFS_ATTR_SECURE) == 0))
- return 0;
- if (((context->flags & ATTR_ROOT) == 0) !=
- ((flags & XFS_ATTR_ROOT) == 0))
- return 0;
-
- arraytop = sizeof(*alist) +
- context->count * sizeof(alist->al_offset[0]);
- context->firstu -= ATTR_ENTSIZE(namelen);
- if (context->firstu < arraytop) {
- trace_xfs_attr_list_full(context);
- alist->al_more = 1;
- context->seen_enough = 1;
- return 1;
- }
-
- aep = (attrlist_ent_t *)&context->alist[context->firstu];
- aep->a_valuelen = valuelen;
- memcpy(aep->a_name, name, namelen);
- aep->a_name[namelen] = 0;
- alist->al_offset[context->count++] = context->firstu;
- alist->al_count = context->count;
- trace_xfs_attr_list_add(context);
- return 0;
-}
-
-/*
- * Generate a list of extended attribute names and optionally
- * also value lengths. Positive return value follows the XFS
- * convention of being an error, zero or negative return code
- * is the length of the buffer returned (negated), indicating
- * success.
- */
-int
-xfs_attr_list(
- xfs_inode_t *dp,
- char *buffer,
- int bufsize,
- int flags,
- attrlist_cursor_kern_t *cursor)
-{
- xfs_attr_list_context_t context;
- struct attrlist *alist;
- int error;
-
- /*
- * Validate the cursor.
- */
- if (cursor->pad1 || cursor->pad2)
- return(XFS_ERROR(EINVAL));
- if ((cursor->initted == 0) &&
- (cursor->hashval || cursor->blkno || cursor->offset))
- return XFS_ERROR(EINVAL);
-
- /*
- * Check for a properly aligned buffer.
- */
- if (((long)buffer) & (sizeof(int)-1))
- return XFS_ERROR(EFAULT);
- if (flags & ATTR_KERNOVAL)
- bufsize = 0;
-
- /*
- * Initialize the output buffer.
- */
- memset(&context, 0, sizeof(context));
- context.dp = dp;
- context.cursor = cursor;
- context.resynch = 1;
- context.flags = flags;
- context.alist = buffer;
- context.bufsize = (bufsize & ~(sizeof(int)-1)); /* align */
- context.firstu = context.bufsize;
- context.put_listent = xfs_attr_put_listent;
-
- alist = (struct attrlist *)context.alist;
- alist->al_count = 0;
- alist->al_more = 0;
- alist->al_offset[0] = context.bufsize;
-
- error = xfs_attr_list_int(&context);
- ASSERT(error >= 0);
- return error;
-}
-
int /* error */
xfs_attr_inactive(xfs_inode_t *dp)
{
@@ -1166,28 +1013,6 @@ xfs_attr_leaf_get(xfs_da_args_t *args)
return error;
}
-/*
- * Copy out attribute entries for attr_list(), for leaf attribute lists.
- */
-STATIC int
-xfs_attr_leaf_list(xfs_attr_list_context_t *context)
-{
- int error;
- struct xfs_buf *bp;
-
- trace_xfs_attr_leaf_list(context);
-
- context->cursor->blkno = 0;
- error = xfs_attr3_leaf_read(NULL, context->dp, 0, -1, &bp);
- if (error)
- return XFS_ERROR(error);
-
- error = xfs_attr3_leaf_list_int(bp, context);
- xfs_trans_brelse(NULL, bp);
- return XFS_ERROR(error);
-}
-
-
/*========================================================================
* External routines when attribute list size > XFS_LBSIZE(mp).
*========================================================================*/
@@ -1780,143 +1605,3 @@ xfs_attr_node_get(xfs_da_args_t *args)
xfs_da_state_free(state);
return(retval);
}
-
-STATIC int /* error */
-xfs_attr_node_list(xfs_attr_list_context_t *context)
-{
- attrlist_cursor_kern_t *cursor;
- xfs_attr_leafblock_t *leaf;
- xfs_da_intnode_t *node;
- struct xfs_attr3_icleaf_hdr leafhdr;
- struct xfs_da3_icnode_hdr nodehdr;
- struct xfs_da_node_entry *btree;
- int error, i;
- struct xfs_buf *bp;
-
- trace_xfs_attr_node_list(context);
-
- cursor = context->cursor;
- cursor->initted = 1;
-
- /*
- * Do all sorts of validation on the passed-in cursor structure.
- * If anything is amiss, ignore the cursor and look up the hashval
- * starting from the btree root.
- */
- bp = NULL;
- if (cursor->blkno > 0) {
- error = xfs_da3_node_read(NULL, context->dp, cursor->blkno, -1,
- &bp, XFS_ATTR_FORK);
- if ((error != 0) && (error != EFSCORRUPTED))
- return(error);
- if (bp) {
- struct xfs_attr_leaf_entry *entries;
-
- node = bp->b_addr;
- switch (be16_to_cpu(node->hdr.info.magic)) {
- case XFS_DA_NODE_MAGIC:
- case XFS_DA3_NODE_MAGIC:
- trace_xfs_attr_list_wrong_blk(context);
- xfs_trans_brelse(NULL, bp);
- bp = NULL;
- break;
- case XFS_ATTR_LEAF_MAGIC:
- case XFS_ATTR3_LEAF_MAGIC:
- leaf = bp->b_addr;
- xfs_attr3_leaf_hdr_from_disk(&leafhdr, leaf);
- entries = xfs_attr3_leaf_entryp(leaf);
- if (cursor->hashval > be32_to_cpu(
- entries[leafhdr.count - 1].hashval)) {
- trace_xfs_attr_list_wrong_blk(context);
- xfs_trans_brelse(NULL, bp);
- bp = NULL;
- } else if (cursor->hashval <= be32_to_cpu(
- entries[0].hashval)) {
- trace_xfs_attr_list_wrong_blk(context);
- xfs_trans_brelse(NULL, bp);
- bp = NULL;
- }
- break;
- default:
- trace_xfs_attr_list_wrong_blk(context);
- xfs_trans_brelse(NULL, bp);
- bp = NULL;
- }
- }
- }
-
- /*
- * We did not find what we expected given the cursor's contents,
- * so we start from the top and work down based on the hash value.
- * Note that start of node block is same as start of leaf block.
- */
- if (bp == NULL) {
- cursor->blkno = 0;
- for (;;) {
- __uint16_t magic;
-
- error = xfs_da3_node_read(NULL, context->dp,
- cursor->blkno, -1, &bp,
- XFS_ATTR_FORK);
- if (error)
- return(error);
- node = bp->b_addr;
- magic = be16_to_cpu(node->hdr.info.magic);
- if (magic == XFS_ATTR_LEAF_MAGIC ||
- magic == XFS_ATTR3_LEAF_MAGIC)
- break;
- if (magic != XFS_DA_NODE_MAGIC &&
- magic != XFS_DA3_NODE_MAGIC) {
- XFS_CORRUPTION_ERROR("xfs_attr_node_list(3)",
- XFS_ERRLEVEL_LOW,
- context->dp->i_mount,
- node);
- xfs_trans_brelse(NULL, bp);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- xfs_da3_node_hdr_from_disk(&nodehdr, node);
- btree = xfs_da3_node_tree_p(node);
- for (i = 0; i < nodehdr.count; btree++, i++) {
- if (cursor->hashval
- <= be32_to_cpu(btree->hashval)) {
- cursor->blkno = be32_to_cpu(btree->before);
- trace_xfs_attr_list_node_descend(context,
- btree);
- break;
- }
- }
- if (i == nodehdr.count) {
- xfs_trans_brelse(NULL, bp);
- return 0;
- }
- xfs_trans_brelse(NULL, bp);
- }
- }
- ASSERT(bp != NULL);
-
- /*
- * Roll upward through the blocks, processing each leaf block in
- * order. As long as there is space in the result buffer, keep
- * adding the information.
- */
- for (;;) {
- leaf = bp->b_addr;
- error = xfs_attr3_leaf_list_int(bp, context);
- if (error) {
- xfs_trans_brelse(NULL, bp);
- return error;
- }
- xfs_attr3_leaf_hdr_from_disk(&leafhdr, leaf);
- if (context->seen_enough || leafhdr.forw == 0)
- break;
- cursor->blkno = leafhdr.forw;
- xfs_trans_brelse(NULL, bp);
- error = xfs_attr3_leaf_read(NULL, context->dp, cursor->blkno, -1,
- &bp);
- if (error)
- return error;
- }
- xfs_trans_brelse(NULL, bp);
- return 0;
-}
diff --git a/fs/xfs/xfs_attr.h b/fs/xfs/xfs_attr.h
index de8dd58..cb604b5 100644
--- a/fs/xfs/xfs_attr.h
+++ b/fs/xfs/xfs_attr.h
@@ -141,5 +141,6 @@ typedef struct xfs_attr_list_context {
*/
int xfs_attr_inactive(struct xfs_inode *dp);
int xfs_attr_list_int(struct xfs_attr_list_context *);
+int xfs_inode_hasattr(struct xfs_inode *ip);
#endif /* __XFS_ATTR_H__ */
diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index b800fbc..ba13857 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -751,182 +751,6 @@ out:
return(error);
}
-STATIC int
-xfs_attr_shortform_compare(const void *a, const void *b)
-{
- xfs_attr_sf_sort_t *sa, *sb;
-
- sa = (xfs_attr_sf_sort_t *)a;
- sb = (xfs_attr_sf_sort_t *)b;
- if (sa->hash < sb->hash) {
- return(-1);
- } else if (sa->hash > sb->hash) {
- return(1);
- } else {
- return(sa->entno - sb->entno);
- }
-}
-
-
-#define XFS_ISRESET_CURSOR(cursor) \
- (!((cursor)->initted) && !((cursor)->hashval) && \
- !((cursor)->blkno) && !((cursor)->offset))
-/*
- * Copy out entries of shortform attribute lists for attr_list().
- * Shortform attribute lists are not stored in hashval sorted order.
- * If the output buffer is not large enough to hold them all, then we
- * we have to calculate each entries' hashvalue and sort them before
- * we can begin returning them to the user.
- */
-/*ARGSUSED*/
-int
-xfs_attr_shortform_list(xfs_attr_list_context_t *context)
-{
- attrlist_cursor_kern_t *cursor;
- xfs_attr_sf_sort_t *sbuf, *sbp;
- xfs_attr_shortform_t *sf;
- xfs_attr_sf_entry_t *sfe;
- xfs_inode_t *dp;
- int sbsize, nsbuf, count, i;
- int error;
-
- ASSERT(context != NULL);
- dp = context->dp;
- ASSERT(dp != NULL);
- ASSERT(dp->i_afp != NULL);
- sf = (xfs_attr_shortform_t *)dp->i_afp->if_u1.if_data;
- ASSERT(sf != NULL);
- if (!sf->hdr.count)
- return(0);
- cursor = context->cursor;
- ASSERT(cursor != NULL);
-
- trace_xfs_attr_list_sf(context);
-
- /*
- * If the buffer is large enough and the cursor is at the start,
- * do not bother with sorting since we will return everything in
- * one buffer and another call using the cursor won't need to be
- * made.
- * Note the generous fudge factor of 16 overhead bytes per entry.
- * If bufsize is zero then put_listent must be a search function
- * and can just scan through what we have.
- */
- if (context->bufsize == 0 ||
- (XFS_ISRESET_CURSOR(cursor) &&
- (dp->i_afp->if_bytes + sf->hdr.count * 16) < context->bufsize)) {
- for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
- error = context->put_listent(context,
- sfe->flags,
- sfe->nameval,
- (int)sfe->namelen,
- (int)sfe->valuelen,
- &sfe->nameval[sfe->namelen]);
-
- /*
- * Either search callback finished early or
- * didn't fit it all in the buffer after all.
- */
- if (context->seen_enough)
- break;
-
- if (error)
- return error;
- sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
- }
- trace_xfs_attr_list_sf_all(context);
- return(0);
- }
-
- /* do no more for a search callback */
- if (context->bufsize == 0)
- return 0;
-
- /*
- * It didn't all fit, so we have to sort everything on hashval.
- */
- sbsize = sf->hdr.count * sizeof(*sbuf);
- sbp = sbuf = kmem_alloc(sbsize, KM_SLEEP | KM_NOFS);
-
- /*
- * Scan the attribute list for the rest of the entries, storing
- * the relevant info from only those that match into a buffer.
- */
- nsbuf = 0;
- for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
- if (unlikely(
- ((char *)sfe < (char *)sf) ||
- ((char *)sfe >= ((char *)sf + dp->i_afp->if_bytes)))) {
- XFS_CORRUPTION_ERROR("xfs_attr_shortform_list",
- XFS_ERRLEVEL_LOW,
- context->dp->i_mount, sfe);
- kmem_free(sbuf);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- sbp->entno = i;
- sbp->hash = xfs_da_hashname(sfe->nameval, sfe->namelen);
- sbp->name = sfe->nameval;
- sbp->namelen = sfe->namelen;
- /* These are bytes, and both on-disk, don't endian-flip */
- sbp->valuelen = sfe->valuelen;
- sbp->flags = sfe->flags;
- sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
- sbp++;
- nsbuf++;
- }
-
- /*
- * Sort the entries on hash then entno.
- */
- xfs_sort(sbuf, nsbuf, sizeof(*sbuf), xfs_attr_shortform_compare);
-
- /*
- * Re-find our place IN THE SORTED LIST.
- */
- count = 0;
- cursor->initted = 1;
- cursor->blkno = 0;
- for (sbp = sbuf, i = 0; i < nsbuf; i++, sbp++) {
- if (sbp->hash == cursor->hashval) {
- if (cursor->offset == count) {
- break;
- }
- count++;
- } else if (sbp->hash > cursor->hashval) {
- break;
- }
- }
- if (i == nsbuf) {
- kmem_free(sbuf);
- return(0);
- }
-
- /*
- * Loop putting entries into the user buffer.
- */
- for ( ; i < nsbuf; i++, sbp++) {
- if (cursor->hashval != sbp->hash) {
- cursor->hashval = sbp->hash;
- cursor->offset = 0;
- }
- error = context->put_listent(context,
- sbp->flags,
- sbp->name,
- sbp->namelen,
- sbp->valuelen,
- &sbp->name[sbp->namelen]);
- if (error)
- return error;
- if (context->seen_enough)
- break;
- cursor->offset++;
- }
-
- kmem_free(sbuf);
- return(0);
-}
-
/*
* Check a leaf attribute block to see if all the entries would fit into
* a shortform attribute list.
@@ -2643,130 +2467,6 @@ xfs_attr_leaf_newentsize(int namelen, int valuelen, int blocksize, int *local)
return size;
}
-/*
- * Copy out attribute list entries for attr_list(), for leaf attribute lists.
- */
-int
-xfs_attr3_leaf_list_int(
- struct xfs_buf *bp,
- struct xfs_attr_list_context *context)
-{
- struct attrlist_cursor_kern *cursor;
- struct xfs_attr_leafblock *leaf;
- struct xfs_attr3_icleaf_hdr ichdr;
- struct xfs_attr_leaf_entry *entries;
- struct xfs_attr_leaf_entry *entry;
- int retval;
- int i;
-
- trace_xfs_attr_list_leaf(context);
-
- leaf = bp->b_addr;
- xfs_attr3_leaf_hdr_from_disk(&ichdr, leaf);
- entries = xfs_attr3_leaf_entryp(leaf);
-
- cursor = context->cursor;
- cursor->initted = 1;
-
- /*
- * Re-find our place in the leaf block if this is a new syscall.
- */
- if (context->resynch) {
- entry = &entries[0];
- for (i = 0; i < ichdr.count; entry++, i++) {
- if (be32_to_cpu(entry->hashval) == cursor->hashval) {
- if (cursor->offset == context->dupcnt) {
- context->dupcnt = 0;
- break;
- }
- context->dupcnt++;
- } else if (be32_to_cpu(entry->hashval) >
- cursor->hashval) {
- context->dupcnt = 0;
- break;
- }
- }
- if (i == ichdr.count) {
- trace_xfs_attr_list_notfound(context);
- return 0;
- }
- } else {
- entry = &entries[0];
- i = 0;
- }
- context->resynch = 0;
-
- /*
- * We have found our place, start copying out the new attributes.
- */
- retval = 0;
- for (; i < ichdr.count; entry++, i++) {
- if (be32_to_cpu(entry->hashval) != cursor->hashval) {
- cursor->hashval = be32_to_cpu(entry->hashval);
- cursor->offset = 0;
- }
-
- if (entry->flags & XFS_ATTR_INCOMPLETE)
- continue; /* skip incomplete entries */
-
- if (entry->flags & XFS_ATTR_LOCAL) {
- xfs_attr_leaf_name_local_t *name_loc =
- xfs_attr3_leaf_name_local(leaf, i);
-
- retval = context->put_listent(context,
- entry->flags,
- name_loc->nameval,
- (int)name_loc->namelen,
- be16_to_cpu(name_loc->valuelen),
- &name_loc->nameval[name_loc->namelen]);
- if (retval)
- return retval;
- } else {
- xfs_attr_leaf_name_remote_t *name_rmt =
- xfs_attr3_leaf_name_remote(leaf, i);
-
- int valuelen = be32_to_cpu(name_rmt->valuelen);
-
- if (context->put_value) {
- xfs_da_args_t args;
-
- memset((char *)&args, 0, sizeof(args));
- args.dp = context->dp;
- args.whichfork = XFS_ATTR_FORK;
- args.valuelen = valuelen;
- args.value = kmem_alloc(valuelen, KM_SLEEP | KM_NOFS);
- args.rmtblkno = be32_to_cpu(name_rmt->valueblk);
- args.rmtblkcnt = xfs_attr3_rmt_blocks(
- args.dp->i_mount, valuelen);
- retval = xfs_attr_rmtval_get(&args);
- if (retval)
- return retval;
- retval = context->put_listent(context,
- entry->flags,
- name_rmt->name,
- (int)name_rmt->namelen,
- valuelen,
- args.value);
- kmem_free(args.value);
- } else {
- retval = context->put_listent(context,
- entry->flags,
- name_rmt->name,
- (int)name_rmt->namelen,
- valuelen,
- NULL);
- }
- if (retval)
- return retval;
- }
- if (context->seen_enough)
- break;
- cursor->offset++;
- }
- trace_xfs_attr_list_leaf_end(context);
- return retval;
-}
-
/*========================================================================
* Manage the INCOMPLETE flag in a leaf entry
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
new file mode 100644
index 0000000..cbc80d4
--- /dev/null
+++ b/fs/xfs/xfs_attr_list.c
@@ -0,0 +1,655 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2013 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_btree.h"
+#include "xfs_attr_sf.h"
+#include "xfs_attr_remote.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_inode_item.h"
+#include "xfs_bmap.h"
+#include "xfs_attr.h"
+#include "xfs_attr_leaf.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_buf_item.h"
+#include "xfs_cksum.h"
+
+STATIC int
+xfs_attr_shortform_compare(const void *a, const void *b)
+{
+ xfs_attr_sf_sort_t *sa, *sb;
+
+ sa = (xfs_attr_sf_sort_t *)a;
+ sb = (xfs_attr_sf_sort_t *)b;
+ if (sa->hash < sb->hash) {
+ return(-1);
+ } else if (sa->hash > sb->hash) {
+ return(1);
+ } else {
+ return(sa->entno - sb->entno);
+ }
+}
+
+#define XFS_ISRESET_CURSOR(cursor) \
+ (!((cursor)->initted) && !((cursor)->hashval) && \
+ !((cursor)->blkno) && !((cursor)->offset))
+/*
+ * Copy out entries of shortform attribute lists for attr_list().
+ * Shortform attribute lists are not stored in hashval sorted order.
+ * If the output buffer is not large enough to hold them all, then we
+ * we have to calculate each entries' hashvalue and sort them before
+ * we can begin returning them to the user.
+ */
+int
+xfs_attr_shortform_list(xfs_attr_list_context_t *context)
+{
+ attrlist_cursor_kern_t *cursor;
+ xfs_attr_sf_sort_t *sbuf, *sbp;
+ xfs_attr_shortform_t *sf;
+ xfs_attr_sf_entry_t *sfe;
+ xfs_inode_t *dp;
+ int sbsize, nsbuf, count, i;
+ int error;
+
+ ASSERT(context != NULL);
+ dp = context->dp;
+ ASSERT(dp != NULL);
+ ASSERT(dp->i_afp != NULL);
+ sf = (xfs_attr_shortform_t *)dp->i_afp->if_u1.if_data;
+ ASSERT(sf != NULL);
+ if (!sf->hdr.count)
+ return(0);
+ cursor = context->cursor;
+ ASSERT(cursor != NULL);
+
+ trace_xfs_attr_list_sf(context);
+
+ /*
+ * If the buffer is large enough and the cursor is at the start,
+ * do not bother with sorting since we will return everything in
+ * one buffer and another call using the cursor won't need to be
+ * made.
+ * Note the generous fudge factor of 16 overhead bytes per entry.
+ * If bufsize is zero then put_listent must be a search function
+ * and can just scan through what we have.
+ */
+ if (context->bufsize == 0 ||
+ (XFS_ISRESET_CURSOR(cursor) &&
+ (dp->i_afp->if_bytes + sf->hdr.count * 16) < context->bufsize)) {
+ for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
+ error = context->put_listent(context,
+ sfe->flags,
+ sfe->nameval,
+ (int)sfe->namelen,
+ (int)sfe->valuelen,
+ &sfe->nameval[sfe->namelen]);
+
+ /*
+ * Either search callback finished early or
+ * didn't fit it all in the buffer after all.
+ */
+ if (context->seen_enough)
+ break;
+
+ if (error)
+ return error;
+ sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
+ }
+ trace_xfs_attr_list_sf_all(context);
+ return(0);
+ }
+
+ /* do no more for a search callback */
+ if (context->bufsize == 0)
+ return 0;
+
+ /*
+ * It didn't all fit, so we have to sort everything on hashval.
+ */
+ sbsize = sf->hdr.count * sizeof(*sbuf);
+ sbp = sbuf = kmem_alloc(sbsize, KM_SLEEP | KM_NOFS);
+
+ /*
+ * Scan the attribute list for the rest of the entries, storing
+ * the relevant info from only those that match into a buffer.
+ */
+ nsbuf = 0;
+ for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
+ if (unlikely(
+ ((char *)sfe < (char *)sf) ||
+ ((char *)sfe >= ((char *)sf + dp->i_afp->if_bytes)))) {
+ XFS_CORRUPTION_ERROR("xfs_attr_shortform_list",
+ XFS_ERRLEVEL_LOW,
+ context->dp->i_mount, sfe);
+ kmem_free(sbuf);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ sbp->entno = i;
+ sbp->hash = xfs_da_hashname(sfe->nameval, sfe->namelen);
+ sbp->name = sfe->nameval;
+ sbp->namelen = sfe->namelen;
+ /* These are bytes, and both on-disk, don't endian-flip */
+ sbp->valuelen = sfe->valuelen;
+ sbp->flags = sfe->flags;
+ sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
+ sbp++;
+ nsbuf++;
+ }
+
+ /*
+ * Sort the entries on hash then entno.
+ */
+ xfs_sort(sbuf, nsbuf, sizeof(*sbuf), xfs_attr_shortform_compare);
+
+ /*
+ * Re-find our place IN THE SORTED LIST.
+ */
+ count = 0;
+ cursor->initted = 1;
+ cursor->blkno = 0;
+ for (sbp = sbuf, i = 0; i < nsbuf; i++, sbp++) {
+ if (sbp->hash == cursor->hashval) {
+ if (cursor->offset == count) {
+ break;
+ }
+ count++;
+ } else if (sbp->hash > cursor->hashval) {
+ break;
+ }
+ }
+ if (i == nsbuf) {
+ kmem_free(sbuf);
+ return(0);
+ }
+
+ /*
+ * Loop putting entries into the user buffer.
+ */
+ for ( ; i < nsbuf; i++, sbp++) {
+ if (cursor->hashval != sbp->hash) {
+ cursor->hashval = sbp->hash;
+ cursor->offset = 0;
+ }
+ error = context->put_listent(context,
+ sbp->flags,
+ sbp->name,
+ sbp->namelen,
+ sbp->valuelen,
+ &sbp->name[sbp->namelen]);
+ if (error)
+ return error;
+ if (context->seen_enough)
+ break;
+ cursor->offset++;
+ }
+
+ kmem_free(sbuf);
+ return(0);
+}
+
+STATIC int
+xfs_attr_node_list(xfs_attr_list_context_t *context)
+{
+ attrlist_cursor_kern_t *cursor;
+ xfs_attr_leafblock_t *leaf;
+ xfs_da_intnode_t *node;
+ struct xfs_attr3_icleaf_hdr leafhdr;
+ struct xfs_da3_icnode_hdr nodehdr;
+ struct xfs_da_node_entry *btree;
+ int error, i;
+ struct xfs_buf *bp;
+
+ trace_xfs_attr_node_list(context);
+
+ cursor = context->cursor;
+ cursor->initted = 1;
+
+ /*
+ * Do all sorts of validation on the passed-in cursor structure.
+ * If anything is amiss, ignore the cursor and look up the hashval
+ * starting from the btree root.
+ */
+ bp = NULL;
+ if (cursor->blkno > 0) {
+ error = xfs_da3_node_read(NULL, context->dp, cursor->blkno, -1,
+ &bp, XFS_ATTR_FORK);
+ if ((error != 0) && (error != EFSCORRUPTED))
+ return(error);
+ if (bp) {
+ struct xfs_attr_leaf_entry *entries;
+
+ node = bp->b_addr;
+ switch (be16_to_cpu(node->hdr.info.magic)) {
+ case XFS_DA_NODE_MAGIC:
+ case XFS_DA3_NODE_MAGIC:
+ trace_xfs_attr_list_wrong_blk(context);
+ xfs_trans_brelse(NULL, bp);
+ bp = NULL;
+ break;
+ case XFS_ATTR_LEAF_MAGIC:
+ case XFS_ATTR3_LEAF_MAGIC:
+ leaf = bp->b_addr;
+ xfs_attr3_leaf_hdr_from_disk(&leafhdr, leaf);
+ entries = xfs_attr3_leaf_entryp(leaf);
+ if (cursor->hashval > be32_to_cpu(
+ entries[leafhdr.count - 1].hashval)) {
+ trace_xfs_attr_list_wrong_blk(context);
+ xfs_trans_brelse(NULL, bp);
+ bp = NULL;
+ } else if (cursor->hashval <= be32_to_cpu(
+ entries[0].hashval)) {
+ trace_xfs_attr_list_wrong_blk(context);
+ xfs_trans_brelse(NULL, bp);
+ bp = NULL;
+ }
+ break;
+ default:
+ trace_xfs_attr_list_wrong_blk(context);
+ xfs_trans_brelse(NULL, bp);
+ bp = NULL;
+ }
+ }
+ }
+
+ /*
+ * We did not find what we expected given the cursor's contents,
+ * so we start from the top and work down based on the hash value.
+ * Note that start of node block is same as start of leaf block.
+ */
+ if (bp == NULL) {
+ cursor->blkno = 0;
+ for (;;) {
+ __uint16_t magic;
+
+ error = xfs_da3_node_read(NULL, context->dp,
+ cursor->blkno, -1, &bp,
+ XFS_ATTR_FORK);
+ if (error)
+ return(error);
+ node = bp->b_addr;
+ magic = be16_to_cpu(node->hdr.info.magic);
+ if (magic == XFS_ATTR_LEAF_MAGIC ||
+ magic == XFS_ATTR3_LEAF_MAGIC)
+ break;
+ if (magic != XFS_DA_NODE_MAGIC &&
+ magic != XFS_DA3_NODE_MAGIC) {
+ XFS_CORRUPTION_ERROR("xfs_attr_node_list(3)",
+ XFS_ERRLEVEL_LOW,
+ context->dp->i_mount,
+ node);
+ xfs_trans_brelse(NULL, bp);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ xfs_da3_node_hdr_from_disk(&nodehdr, node);
+ btree = xfs_da3_node_tree_p(node);
+ for (i = 0; i < nodehdr.count; btree++, i++) {
+ if (cursor->hashval
+ <= be32_to_cpu(btree->hashval)) {
+ cursor->blkno = be32_to_cpu(btree->before);
+ trace_xfs_attr_list_node_descend(context,
+ btree);
+ break;
+ }
+ }
+ if (i == nodehdr.count) {
+ xfs_trans_brelse(NULL, bp);
+ return 0;
+ }
+ xfs_trans_brelse(NULL, bp);
+ }
+ }
+ ASSERT(bp != NULL);
+
+ /*
+ * Roll upward through the blocks, processing each leaf block in
+ * order. As long as there is space in the result buffer, keep
+ * adding the information.
+ */
+ for (;;) {
+ leaf = bp->b_addr;
+ error = xfs_attr3_leaf_list_int(bp, context);
+ if (error) {
+ xfs_trans_brelse(NULL, bp);
+ return error;
+ }
+ xfs_attr3_leaf_hdr_from_disk(&leafhdr, leaf);
+ if (context->seen_enough || leafhdr.forw == 0)
+ break;
+ cursor->blkno = leafhdr.forw;
+ xfs_trans_brelse(NULL, bp);
+ error = xfs_attr3_leaf_read(NULL, context->dp, cursor->blkno, -1,
+ &bp);
+ if (error)
+ return error;
+ }
+ xfs_trans_brelse(NULL, bp);
+ return 0;
+}
+
+/*
+ * Copy out attribute list entries for attr_list(), for leaf attribute lists.
+ */
+int
+xfs_attr3_leaf_list_int(
+ struct xfs_buf *bp,
+ struct xfs_attr_list_context *context)
+{
+ struct attrlist_cursor_kern *cursor;
+ struct xfs_attr_leafblock *leaf;
+ struct xfs_attr3_icleaf_hdr ichdr;
+ struct xfs_attr_leaf_entry *entries;
+ struct xfs_attr_leaf_entry *entry;
+ int retval;
+ int i;
+
+ trace_xfs_attr_list_leaf(context);
+
+ leaf = bp->b_addr;
+ xfs_attr3_leaf_hdr_from_disk(&ichdr, leaf);
+ entries = xfs_attr3_leaf_entryp(leaf);
+
+ cursor = context->cursor;
+ cursor->initted = 1;
+
+ /*
+ * Re-find our place in the leaf block if this is a new syscall.
+ */
+ if (context->resynch) {
+ entry = &entries[0];
+ for (i = 0; i < ichdr.count; entry++, i++) {
+ if (be32_to_cpu(entry->hashval) == cursor->hashval) {
+ if (cursor->offset == context->dupcnt) {
+ context->dupcnt = 0;
+ break;
+ }
+ context->dupcnt++;
+ } else if (be32_to_cpu(entry->hashval) >
+ cursor->hashval) {
+ context->dupcnt = 0;
+ break;
+ }
+ }
+ if (i == ichdr.count) {
+ trace_xfs_attr_list_notfound(context);
+ return 0;
+ }
+ } else {
+ entry = &entries[0];
+ i = 0;
+ }
+ context->resynch = 0;
+
+ /*
+ * We have found our place, start copying out the new attributes.
+ */
+ retval = 0;
+ for (; i < ichdr.count; entry++, i++) {
+ if (be32_to_cpu(entry->hashval) != cursor->hashval) {
+ cursor->hashval = be32_to_cpu(entry->hashval);
+ cursor->offset = 0;
+ }
+
+ if (entry->flags & XFS_ATTR_INCOMPLETE)
+ continue; /* skip incomplete entries */
+
+ if (entry->flags & XFS_ATTR_LOCAL) {
+ xfs_attr_leaf_name_local_t *name_loc =
+ xfs_attr3_leaf_name_local(leaf, i);
+
+ retval = context->put_listent(context,
+ entry->flags,
+ name_loc->nameval,
+ (int)name_loc->namelen,
+ be16_to_cpu(name_loc->valuelen),
+ &name_loc->nameval[name_loc->namelen]);
+ if (retval)
+ return retval;
+ } else {
+ xfs_attr_leaf_name_remote_t *name_rmt =
+ xfs_attr3_leaf_name_remote(leaf, i);
+
+ int valuelen = be32_to_cpu(name_rmt->valuelen);
+
+ if (context->put_value) {
+ xfs_da_args_t args;
+
+ memset((char *)&args, 0, sizeof(args));
+ args.dp = context->dp;
+ args.whichfork = XFS_ATTR_FORK;
+ args.valuelen = valuelen;
+ args.value = kmem_alloc(valuelen, KM_SLEEP | KM_NOFS);
+ args.rmtblkno = be32_to_cpu(name_rmt->valueblk);
+ args.rmtblkcnt = xfs_attr3_rmt_blocks(
+ args.dp->i_mount, valuelen);
+ retval = xfs_attr_rmtval_get(&args);
+ if (retval)
+ return retval;
+ retval = context->put_listent(context,
+ entry->flags,
+ name_rmt->name,
+ (int)name_rmt->namelen,
+ valuelen,
+ args.value);
+ kmem_free(args.value);
+ } else {
+ retval = context->put_listent(context,
+ entry->flags,
+ name_rmt->name,
+ (int)name_rmt->namelen,
+ valuelen,
+ NULL);
+ }
+ if (retval)
+ return retval;
+ }
+ if (context->seen_enough)
+ break;
+ cursor->offset++;
+ }
+ trace_xfs_attr_list_leaf_end(context);
+ return retval;
+}
+
+/*
+ * Copy out attribute entries for attr_list(), for leaf attribute lists.
+ */
+STATIC int
+xfs_attr_leaf_list(xfs_attr_list_context_t *context)
+{
+ int error;
+ struct xfs_buf *bp;
+
+ trace_xfs_attr_leaf_list(context);
+
+ context->cursor->blkno = 0;
+ error = xfs_attr3_leaf_read(NULL, context->dp, 0, -1, &bp);
+ if (error)
+ return XFS_ERROR(error);
+
+ error = xfs_attr3_leaf_list_int(bp, context);
+ xfs_trans_brelse(NULL, bp);
+ return XFS_ERROR(error);
+}
+
+int
+xfs_attr_list_int(
+ xfs_attr_list_context_t *context)
+{
+ int error;
+ xfs_inode_t *dp = context->dp;
+
+ XFS_STATS_INC(xs_attr_list);
+
+ if (XFS_FORCED_SHUTDOWN(dp->i_mount))
+ return EIO;
+
+ xfs_ilock(dp, XFS_ILOCK_SHARED);
+
+ /*
+ * Decide on what work routines to call based on the inode size.
+ */
+ if (!xfs_inode_hasattr(dp)) {
+ error = 0;
+ } else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
+ error = xfs_attr_shortform_list(context);
+ } else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
+ error = xfs_attr_leaf_list(context);
+ } else {
+ error = xfs_attr_node_list(context);
+ }
+
+ xfs_iunlock(dp, XFS_ILOCK_SHARED);
+
+ return error;
+}
+
+#define ATTR_ENTBASESIZE /* minimum bytes used by an attr */ \
+ (((struct attrlist_ent *) 0)->a_name - (char *) 0)
+#define ATTR_ENTSIZE(namelen) /* actual bytes used by an attr */ \
+ ((ATTR_ENTBASESIZE + (namelen) + 1 + sizeof(u_int32_t)-1) \
+ & ~(sizeof(u_int32_t)-1))
+
+/*
+ * Format an attribute and copy it out to the user's buffer.
+ * Take care to check values and protect against them changing later,
+ * we may be reading them directly out of a user buffer.
+ */
+STATIC int
+xfs_attr_put_listent(
+ xfs_attr_list_context_t *context,
+ int flags,
+ unsigned char *name,
+ int namelen,
+ int valuelen,
+ unsigned char *value)
+{
+ struct attrlist *alist = (struct attrlist *)context->alist;
+ attrlist_ent_t *aep;
+ int arraytop;
+
+ ASSERT(!(context->flags & ATTR_KERNOVAL));
+ ASSERT(context->count >= 0);
+ ASSERT(context->count < (ATTR_MAX_VALUELEN/8));
+ ASSERT(context->firstu >= sizeof(*alist));
+ ASSERT(context->firstu <= context->bufsize);
+
+ /*
+ * Only list entries in the right namespace.
+ */
+ if (((context->flags & ATTR_SECURE) == 0) !=
+ ((flags & XFS_ATTR_SECURE) == 0))
+ return 0;
+ if (((context->flags & ATTR_ROOT) == 0) !=
+ ((flags & XFS_ATTR_ROOT) == 0))
+ return 0;
+
+ arraytop = sizeof(*alist) +
+ context->count * sizeof(alist->al_offset[0]);
+ context->firstu -= ATTR_ENTSIZE(namelen);
+ if (context->firstu < arraytop) {
+ trace_xfs_attr_list_full(context);
+ alist->al_more = 1;
+ context->seen_enough = 1;
+ return 1;
+ }
+
+ aep = (attrlist_ent_t *)&context->alist[context->firstu];
+ aep->a_valuelen = valuelen;
+ memcpy(aep->a_name, name, namelen);
+ aep->a_name[namelen] = 0;
+ alist->al_offset[context->count++] = context->firstu;
+ alist->al_count = context->count;
+ trace_xfs_attr_list_add(context);
+ return 0;
+}
+
+/*
+ * Generate a list of extended attribute names and optionally
+ * also value lengths. Positive return value follows the XFS
+ * convention of being an error, zero or negative return code
+ * is the length of the buffer returned (negated), indicating
+ * success.
+ */
+int
+xfs_attr_list(
+ xfs_inode_t *dp,
+ char *buffer,
+ int bufsize,
+ int flags,
+ attrlist_cursor_kern_t *cursor)
+{
+ xfs_attr_list_context_t context;
+ struct attrlist *alist;
+ int error;
+
+ /*
+ * Validate the cursor.
+ */
+ if (cursor->pad1 || cursor->pad2)
+ return(XFS_ERROR(EINVAL));
+ if ((cursor->initted == 0) &&
+ (cursor->hashval || cursor->blkno || cursor->offset))
+ return XFS_ERROR(EINVAL);
+
+ /*
+ * Check for a properly aligned buffer.
+ */
+ if (((long)buffer) & (sizeof(int)-1))
+ return XFS_ERROR(EFAULT);
+ if (flags & ATTR_KERNOVAL)
+ bufsize = 0;
+
+ /*
+ * Initialize the output buffer.
+ */
+ memset(&context, 0, sizeof(context));
+ context.dp = dp;
+ context.cursor = cursor;
+ context.resynch = 1;
+ context.flags = flags;
+ context.alist = buffer;
+ context.bufsize = (bufsize & ~(sizeof(int)-1)); /* align */
+ context.firstu = context.bufsize;
+ context.put_listent = xfs_attr_put_listent;
+
+ alist = (struct attrlist *)context.alist;
+ alist->al_count = 0;
+ alist->al_more = 0;
+ alist->al_offset[0] = context.bufsize;
+
+ error = xfs_attr_list_int(&context);
+ ASSERT(error >= 0);
+ return error;
+}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 17/60] xfs: split out attribute fork truncation code into separate file
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (15 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 16/60] xfs: split out attribute listing code into separate file Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 18/60] xfs: split out xfs inode operations " Dave Chinner
` (44 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The attribute inactivation code is not used by userspace, so like
the attribute listing, split it out into a separate file to minimise
the differences between the filesystem shared with libxfs in
userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_attr.c | 71 -------
fs/xfs/xfs_attr_inactive.c | 453 ++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_attr_leaf.c | 341 ---------------------------------
4 files changed, 454 insertions(+), 412 deletions(-)
create mode 100644 fs/xfs/xfs_attr_inactive.c
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 4670207..6f33414 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -27,6 +27,7 @@ xfs-y += xfs_trace.o
# highlevel code
xfs-y += xfs_aops.o \
+ xfs_attr_inactive.o \
xfs_attr_list.o \
xfs_bit.o \
xfs_buf.o \
diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index 75a168c..10c18ff 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -609,77 +609,6 @@ xfs_attr_remove(
return xfs_attr_remove_int(dp, &xname, flags);
}
-int /* error */
-xfs_attr_inactive(xfs_inode_t *dp)
-{
- xfs_trans_t *trans;
- xfs_mount_t *mp;
- int error;
-
- mp = dp->i_mount;
- ASSERT(! XFS_NOT_DQATTACHED(mp, dp));
-
- xfs_ilock(dp, XFS_ILOCK_SHARED);
- if (!xfs_inode_hasattr(dp) ||
- dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
- xfs_iunlock(dp, XFS_ILOCK_SHARED);
- return 0;
- }
- xfs_iunlock(dp, XFS_ILOCK_SHARED);
-
- /*
- * Start our first transaction of the day.
- *
- * All future transactions during this code must be "chained" off
- * this one via the trans_dup() call. All transactions will contain
- * the inode, and the inode will always be marked with trans_ihold().
- * Since the inode will be locked in all transactions, we must log
- * the inode in every transaction to let it float upward through
- * the log.
- */
- trans = xfs_trans_alloc(mp, XFS_TRANS_ATTRINVAL);
- if ((error = xfs_trans_reserve(trans, 0, XFS_ATTRINVAL_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_ATTRINVAL_LOG_COUNT))) {
- xfs_trans_cancel(trans, 0);
- return(error);
- }
- xfs_ilock(dp, XFS_ILOCK_EXCL);
-
- /*
- * No need to make quota reservations here. We expect to release some
- * blocks, not allocate, in the common case.
- */
- xfs_trans_ijoin(trans, dp, 0);
-
- /*
- * Decide on what work routines to call based on the inode size.
- */
- if (!xfs_inode_hasattr(dp) ||
- dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
- error = 0;
- goto out;
- }
- error = xfs_attr3_root_inactive(&trans, dp);
- if (error)
- goto out;
-
- error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK, 0);
- if (error)
- goto out;
-
- error = xfs_trans_commit(trans, XFS_TRANS_RELEASE_LOG_RES);
- xfs_iunlock(dp, XFS_ILOCK_EXCL);
-
- return(error);
-
-out:
- xfs_trans_cancel(trans, XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_ABORT);
- xfs_iunlock(dp, XFS_ILOCK_EXCL);
- return(error);
-}
-
-
/*========================================================================
* External routines when attribute list is inside the inode
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
new file mode 100644
index 0000000..0d344ed
--- /dev/null
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -0,0 +1,453 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2013 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_btree.h"
+#include "xfs_attr_remote.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_inode_item.h"
+#include "xfs_bmap.h"
+#include "xfs_attr.h"
+#include "xfs_attr_leaf.h"
+#include "xfs_error.h"
+#include "xfs_quota.h"
+#include "xfs_trace.h"
+
+/*
+ * Look at all the extents for this logical region,
+ * invalidate any buffers that are incore/in transactions.
+ */
+STATIC int
+xfs_attr3_leaf_freextent(
+ struct xfs_trans **trans,
+ struct xfs_inode *dp,
+ xfs_dablk_t blkno,
+ int blkcnt)
+{
+ struct xfs_bmbt_irec map;
+ struct xfs_buf *bp;
+ xfs_dablk_t tblkno;
+ xfs_daddr_t dblkno;
+ int tblkcnt;
+ int dblkcnt;
+ int nmap;
+ int error;
+
+ /*
+ * Roll through the "value", invalidating the attribute value's
+ * blocks.
+ */
+ tblkno = blkno;
+ tblkcnt = blkcnt;
+ while (tblkcnt > 0) {
+ /*
+ * Try to remember where we decided to put the value.
+ */
+ nmap = 1;
+ error = xfs_bmapi_read(dp, (xfs_fileoff_t)tblkno, tblkcnt,
+ &map, &nmap, XFS_BMAPI_ATTRFORK);
+ if (error) {
+ return(error);
+ }
+ ASSERT(nmap == 1);
+ ASSERT(map.br_startblock != DELAYSTARTBLOCK);
+
+ /*
+ * If it's a hole, these are already unmapped
+ * so there's nothing to invalidate.
+ */
+ if (map.br_startblock != HOLESTARTBLOCK) {
+
+ dblkno = XFS_FSB_TO_DADDR(dp->i_mount,
+ map.br_startblock);
+ dblkcnt = XFS_FSB_TO_BB(dp->i_mount,
+ map.br_blockcount);
+ bp = xfs_trans_get_buf(*trans,
+ dp->i_mount->m_ddev_targp,
+ dblkno, dblkcnt, 0);
+ if (!bp)
+ return ENOMEM;
+ xfs_trans_binval(*trans, bp);
+ /*
+ * Roll to next transaction.
+ */
+ error = xfs_trans_roll(trans, dp);
+ if (error)
+ return (error);
+ }
+
+ tblkno += map.br_blockcount;
+ tblkcnt -= map.br_blockcount;
+ }
+
+ return(0);
+}
+
+/*
+ * Invalidate all of the "remote" value regions pointed to by a particular
+ * leaf block.
+ * Note that we must release the lock on the buffer so that we are not
+ * caught holding something that the logging code wants to flush to disk.
+ */
+STATIC int
+xfs_attr3_leaf_inactive(
+ struct xfs_trans **trans,
+ struct xfs_inode *dp,
+ struct xfs_buf *bp)
+{
+ struct xfs_attr_leafblock *leaf;
+ struct xfs_attr3_icleaf_hdr ichdr;
+ struct xfs_attr_leaf_entry *entry;
+ struct xfs_attr_leaf_name_remote *name_rmt;
+ struct xfs_attr_inactive_list *list;
+ struct xfs_attr_inactive_list *lp;
+ int error;
+ int count;
+ int size;
+ int tmp;
+ int i;
+
+ leaf = bp->b_addr;
+ xfs_attr3_leaf_hdr_from_disk(&ichdr, leaf);
+
+ /*
+ * Count the number of "remote" value extents.
+ */
+ count = 0;
+ entry = xfs_attr3_leaf_entryp(leaf);
+ for (i = 0; i < ichdr.count; entry++, i++) {
+ if (be16_to_cpu(entry->nameidx) &&
+ ((entry->flags & XFS_ATTR_LOCAL) == 0)) {
+ name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
+ if (name_rmt->valueblk)
+ count++;
+ }
+ }
+
+ /*
+ * If there are no "remote" values, we're done.
+ */
+ if (count == 0) {
+ xfs_trans_brelse(*trans, bp);
+ return 0;
+ }
+
+ /*
+ * Allocate storage for a list of all the "remote" value extents.
+ */
+ size = count * sizeof(xfs_attr_inactive_list_t);
+ list = kmem_alloc(size, KM_SLEEP);
+
+ /*
+ * Identify each of the "remote" value extents.
+ */
+ lp = list;
+ entry = xfs_attr3_leaf_entryp(leaf);
+ for (i = 0; i < ichdr.count; entry++, i++) {
+ if (be16_to_cpu(entry->nameidx) &&
+ ((entry->flags & XFS_ATTR_LOCAL) == 0)) {
+ name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
+ if (name_rmt->valueblk) {
+ lp->valueblk = be32_to_cpu(name_rmt->valueblk);
+ lp->valuelen = xfs_attr3_rmt_blocks(dp->i_mount,
+ be32_to_cpu(name_rmt->valuelen));
+ lp++;
+ }
+ }
+ }
+ xfs_trans_brelse(*trans, bp); /* unlock for trans. in freextent() */
+
+ /*
+ * Invalidate each of the "remote" value extents.
+ */
+ error = 0;
+ for (lp = list, i = 0; i < count; i++, lp++) {
+ tmp = xfs_attr3_leaf_freextent(trans, dp,
+ lp->valueblk, lp->valuelen);
+
+ if (error == 0)
+ error = tmp; /* save only the 1st errno */
+ }
+
+ kmem_free(list);
+ return error;
+}
+
+/*
+ * Recurse (gasp!) through the attribute nodes until we find leaves.
+ * We're doing a depth-first traversal in order to invalidate everything.
+ */
+STATIC int
+xfs_attr3_node_inactive(
+ struct xfs_trans **trans,
+ struct xfs_inode *dp,
+ struct xfs_buf *bp,
+ int level)
+{
+ xfs_da_blkinfo_t *info;
+ xfs_da_intnode_t *node;
+ xfs_dablk_t child_fsb;
+ xfs_daddr_t parent_blkno, child_blkno;
+ int error, i;
+ struct xfs_buf *child_bp;
+ struct xfs_da_node_entry *btree;
+ struct xfs_da3_icnode_hdr ichdr;
+
+ /*
+ * Since this code is recursive (gasp!) we must protect ourselves.
+ */
+ if (level > XFS_DA_NODE_MAXDEPTH) {
+ xfs_trans_brelse(*trans, bp); /* no locks for later trans */
+ return XFS_ERROR(EIO);
+ }
+
+ node = bp->b_addr;
+ xfs_da3_node_hdr_from_disk(&ichdr, node);
+ parent_blkno = bp->b_bn;
+ if (!ichdr.count) {
+ xfs_trans_brelse(*trans, bp);
+ return 0;
+ }
+ btree = xfs_da3_node_tree_p(node);
+ child_fsb = be32_to_cpu(btree[0].before);
+ xfs_trans_brelse(*trans, bp); /* no locks for later trans */
+
+ /*
+ * If this is the node level just above the leaves, simply loop
+ * over the leaves removing all of them. If this is higher up
+ * in the tree, recurse downward.
+ */
+ for (i = 0; i < ichdr.count; i++) {
+ /*
+ * Read the subsidiary block to see what we have to work with.
+ * Don't do this in a transaction. This is a depth-first
+ * traversal of the tree so we may deal with many blocks
+ * before we come back to this one.
+ */
+ error = xfs_da3_node_read(*trans, dp, child_fsb, -2, &child_bp,
+ XFS_ATTR_FORK);
+ if (error)
+ return(error);
+ if (child_bp) {
+ /* save for re-read later */
+ child_blkno = XFS_BUF_ADDR(child_bp);
+
+ /*
+ * Invalidate the subtree, however we have to.
+ */
+ info = child_bp->b_addr;
+ switch (info->magic) {
+ case cpu_to_be16(XFS_DA_NODE_MAGIC):
+ case cpu_to_be16(XFS_DA3_NODE_MAGIC):
+ error = xfs_attr3_node_inactive(trans, dp,
+ child_bp, level + 1);
+ break;
+ case cpu_to_be16(XFS_ATTR_LEAF_MAGIC):
+ case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
+ error = xfs_attr3_leaf_inactive(trans, dp,
+ child_bp);
+ break;
+ default:
+ error = XFS_ERROR(EIO);
+ xfs_trans_brelse(*trans, child_bp);
+ break;
+ }
+ if (error)
+ return error;
+
+ /*
+ * Remove the subsidiary block from the cache
+ * and from the log.
+ */
+ error = xfs_da_get_buf(*trans, dp, 0, child_blkno,
+ &child_bp, XFS_ATTR_FORK);
+ if (error)
+ return error;
+ xfs_trans_binval(*trans, child_bp);
+ }
+
+ /*
+ * If we're not done, re-read the parent to get the next
+ * child block number.
+ */
+ if (i + 1 < ichdr.count) {
+ error = xfs_da3_node_read(*trans, dp, 0, parent_blkno,
+ &bp, XFS_ATTR_FORK);
+ if (error)
+ return error;
+ child_fsb = be32_to_cpu(btree[i + 1].before);
+ xfs_trans_brelse(*trans, bp);
+ }
+ /*
+ * Atomically commit the whole invalidate stuff.
+ */
+ error = xfs_trans_roll(trans, dp);
+ if (error)
+ return error;
+ }
+
+ return 0;
+}
+
+/*
+ * Indiscriminately delete the entire attribute fork
+ *
+ * Recurse (gasp!) through the attribute nodes until we find leaves.
+ * We're doing a depth-first traversal in order to invalidate everything.
+ */
+int
+xfs_attr3_root_inactive(
+ struct xfs_trans **trans,
+ struct xfs_inode *dp)
+{
+ struct xfs_da_blkinfo *info;
+ struct xfs_buf *bp;
+ xfs_daddr_t blkno;
+ int error;
+
+ /*
+ * Read block 0 to see what we have to work with.
+ * We only get here if we have extents, since we remove
+ * the extents in reverse order the extent containing
+ * block 0 must still be there.
+ */
+ error = xfs_da3_node_read(*trans, dp, 0, -1, &bp, XFS_ATTR_FORK);
+ if (error)
+ return error;
+ blkno = bp->b_bn;
+
+ /*
+ * Invalidate the tree, even if the "tree" is only a single leaf block.
+ * This is a depth-first traversal!
+ */
+ info = bp->b_addr;
+ switch (info->magic) {
+ case cpu_to_be16(XFS_DA_NODE_MAGIC):
+ case cpu_to_be16(XFS_DA3_NODE_MAGIC):
+ error = xfs_attr3_node_inactive(trans, dp, bp, 1);
+ break;
+ case cpu_to_be16(XFS_ATTR_LEAF_MAGIC):
+ case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
+ error = xfs_attr3_leaf_inactive(trans, dp, bp);
+ break;
+ default:
+ error = XFS_ERROR(EIO);
+ xfs_trans_brelse(*trans, bp);
+ break;
+ }
+ if (error)
+ return error;
+
+ /*
+ * Invalidate the incore copy of the root block.
+ */
+ error = xfs_da_get_buf(*trans, dp, 0, blkno, &bp, XFS_ATTR_FORK);
+ if (error)
+ return error;
+ xfs_trans_binval(*trans, bp); /* remove from cache */
+ /*
+ * Commit the invalidate and start the next transaction.
+ */
+ error = xfs_trans_roll(trans, dp);
+
+ return error;
+}
+
+int
+xfs_attr_inactive(xfs_inode_t *dp)
+{
+ xfs_trans_t *trans;
+ xfs_mount_t *mp;
+ int error;
+
+ mp = dp->i_mount;
+ ASSERT(! XFS_NOT_DQATTACHED(mp, dp));
+
+ xfs_ilock(dp, XFS_ILOCK_SHARED);
+ if (!xfs_inode_hasattr(dp) ||
+ dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
+ xfs_iunlock(dp, XFS_ILOCK_SHARED);
+ return 0;
+ }
+ xfs_iunlock(dp, XFS_ILOCK_SHARED);
+
+ /*
+ * Start our first transaction of the day.
+ *
+ * All future transactions during this code must be "chained" off
+ * this one via the trans_dup() call. All transactions will contain
+ * the inode, and the inode will always be marked with trans_ihold().
+ * Since the inode will be locked in all transactions, we must log
+ * the inode in every transaction to let it float upward through
+ * the log.
+ */
+ trans = xfs_trans_alloc(mp, XFS_TRANS_ATTRINVAL);
+ if ((error = xfs_trans_reserve(trans, 0, XFS_ATTRINVAL_LOG_RES(mp), 0,
+ XFS_TRANS_PERM_LOG_RES,
+ XFS_ATTRINVAL_LOG_COUNT))) {
+ xfs_trans_cancel(trans, 0);
+ return(error);
+ }
+ xfs_ilock(dp, XFS_ILOCK_EXCL);
+
+ /*
+ * No need to make quota reservations here. We expect to release some
+ * blocks, not allocate, in the common case.
+ */
+ xfs_trans_ijoin(trans, dp, 0);
+
+ /*
+ * Decide on what work routines to call based on the inode size.
+ */
+ if (!xfs_inode_hasattr(dp) ||
+ dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
+ error = 0;
+ goto out;
+ }
+ error = xfs_attr3_root_inactive(&trans, dp);
+ if (error)
+ goto out;
+
+ error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK, 0);
+ if (error)
+ goto out;
+
+ error = xfs_trans_commit(trans, XFS_TRANS_RELEASE_LOG_RES);
+ xfs_iunlock(dp, XFS_ILOCK_EXCL);
+
+ return(error);
+
+out:
+ xfs_trans_cancel(trans, XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_ABORT);
+ xfs_iunlock(dp, XFS_ILOCK_EXCL);
+ return(error);
+}
diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index ba13857..791ade2 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -2712,344 +2712,3 @@ xfs_attr3_leaf_flipflags(
return error;
}
-/*========================================================================
- * Indiscriminately delete the entire attribute fork
- *========================================================================*/
-
-/*
- * Recurse (gasp!) through the attribute nodes until we find leaves.
- * We're doing a depth-first traversal in order to invalidate everything.
- */
-int
-xfs_attr3_root_inactive(
- struct xfs_trans **trans,
- struct xfs_inode *dp)
-{
- struct xfs_da_blkinfo *info;
- struct xfs_buf *bp;
- xfs_daddr_t blkno;
- int error;
-
- /*
- * Read block 0 to see what we have to work with.
- * We only get here if we have extents, since we remove
- * the extents in reverse order the extent containing
- * block 0 must still be there.
- */
- error = xfs_da3_node_read(*trans, dp, 0, -1, &bp, XFS_ATTR_FORK);
- if (error)
- return error;
- blkno = bp->b_bn;
-
- /*
- * Invalidate the tree, even if the "tree" is only a single leaf block.
- * This is a depth-first traversal!
- */
- info = bp->b_addr;
- switch (info->magic) {
- case cpu_to_be16(XFS_DA_NODE_MAGIC):
- case cpu_to_be16(XFS_DA3_NODE_MAGIC):
- error = xfs_attr3_node_inactive(trans, dp, bp, 1);
- break;
- case cpu_to_be16(XFS_ATTR_LEAF_MAGIC):
- case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
- error = xfs_attr3_leaf_inactive(trans, dp, bp);
- break;
- default:
- error = XFS_ERROR(EIO);
- xfs_trans_brelse(*trans, bp);
- break;
- }
- if (error)
- return error;
-
- /*
- * Invalidate the incore copy of the root block.
- */
- error = xfs_da_get_buf(*trans, dp, 0, blkno, &bp, XFS_ATTR_FORK);
- if (error)
- return error;
- xfs_trans_binval(*trans, bp); /* remove from cache */
- /*
- * Commit the invalidate and start the next transaction.
- */
- error = xfs_trans_roll(trans, dp);
-
- return error;
-}
-
-/*
- * Recurse (gasp!) through the attribute nodes until we find leaves.
- * We're doing a depth-first traversal in order to invalidate everything.
- */
-STATIC int
-xfs_attr3_node_inactive(
- struct xfs_trans **trans,
- struct xfs_inode *dp,
- struct xfs_buf *bp,
- int level)
-{
- xfs_da_blkinfo_t *info;
- xfs_da_intnode_t *node;
- xfs_dablk_t child_fsb;
- xfs_daddr_t parent_blkno, child_blkno;
- int error, i;
- struct xfs_buf *child_bp;
- struct xfs_da_node_entry *btree;
- struct xfs_da3_icnode_hdr ichdr;
-
- /*
- * Since this code is recursive (gasp!) we must protect ourselves.
- */
- if (level > XFS_DA_NODE_MAXDEPTH) {
- xfs_trans_brelse(*trans, bp); /* no locks for later trans */
- return XFS_ERROR(EIO);
- }
-
- node = bp->b_addr;
- xfs_da3_node_hdr_from_disk(&ichdr, node);
- parent_blkno = bp->b_bn;
- if (!ichdr.count) {
- xfs_trans_brelse(*trans, bp);
- return 0;
- }
- btree = xfs_da3_node_tree_p(node);
- child_fsb = be32_to_cpu(btree[0].before);
- xfs_trans_brelse(*trans, bp); /* no locks for later trans */
-
- /*
- * If this is the node level just above the leaves, simply loop
- * over the leaves removing all of them. If this is higher up
- * in the tree, recurse downward.
- */
- for (i = 0; i < ichdr.count; i++) {
- /*
- * Read the subsidiary block to see what we have to work with.
- * Don't do this in a transaction. This is a depth-first
- * traversal of the tree so we may deal with many blocks
- * before we come back to this one.
- */
- error = xfs_da3_node_read(*trans, dp, child_fsb, -2, &child_bp,
- XFS_ATTR_FORK);
- if (error)
- return(error);
- if (child_bp) {
- /* save for re-read later */
- child_blkno = XFS_BUF_ADDR(child_bp);
-
- /*
- * Invalidate the subtree, however we have to.
- */
- info = child_bp->b_addr;
- switch (info->magic) {
- case cpu_to_be16(XFS_DA_NODE_MAGIC):
- case cpu_to_be16(XFS_DA3_NODE_MAGIC):
- error = xfs_attr3_node_inactive(trans, dp,
- child_bp, level + 1);
- break;
- case cpu_to_be16(XFS_ATTR_LEAF_MAGIC):
- case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
- error = xfs_attr3_leaf_inactive(trans, dp,
- child_bp);
- break;
- default:
- error = XFS_ERROR(EIO);
- xfs_trans_brelse(*trans, child_bp);
- break;
- }
- if (error)
- return error;
-
- /*
- * Remove the subsidiary block from the cache
- * and from the log.
- */
- error = xfs_da_get_buf(*trans, dp, 0, child_blkno,
- &child_bp, XFS_ATTR_FORK);
- if (error)
- return error;
- xfs_trans_binval(*trans, child_bp);
- }
-
- /*
- * If we're not done, re-read the parent to get the next
- * child block number.
- */
- if (i + 1 < ichdr.count) {
- error = xfs_da3_node_read(*trans, dp, 0, parent_blkno,
- &bp, XFS_ATTR_FORK);
- if (error)
- return error;
- child_fsb = be32_to_cpu(btree[i + 1].before);
- xfs_trans_brelse(*trans, bp);
- }
- /*
- * Atomically commit the whole invalidate stuff.
- */
- error = xfs_trans_roll(trans, dp);
- if (error)
- return error;
- }
-
- return 0;
-}
-
-/*
- * Invalidate all of the "remote" value regions pointed to by a particular
- * leaf block.
- * Note that we must release the lock on the buffer so that we are not
- * caught holding something that the logging code wants to flush to disk.
- */
-STATIC int
-xfs_attr3_leaf_inactive(
- struct xfs_trans **trans,
- struct xfs_inode *dp,
- struct xfs_buf *bp)
-{
- struct xfs_attr_leafblock *leaf;
- struct xfs_attr3_icleaf_hdr ichdr;
- struct xfs_attr_leaf_entry *entry;
- struct xfs_attr_leaf_name_remote *name_rmt;
- struct xfs_attr_inactive_list *list;
- struct xfs_attr_inactive_list *lp;
- int error;
- int count;
- int size;
- int tmp;
- int i;
-
- leaf = bp->b_addr;
- xfs_attr3_leaf_hdr_from_disk(&ichdr, leaf);
-
- /*
- * Count the number of "remote" value extents.
- */
- count = 0;
- entry = xfs_attr3_leaf_entryp(leaf);
- for (i = 0; i < ichdr.count; entry++, i++) {
- if (be16_to_cpu(entry->nameidx) &&
- ((entry->flags & XFS_ATTR_LOCAL) == 0)) {
- name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
- if (name_rmt->valueblk)
- count++;
- }
- }
-
- /*
- * If there are no "remote" values, we're done.
- */
- if (count == 0) {
- xfs_trans_brelse(*trans, bp);
- return 0;
- }
-
- /*
- * Allocate storage for a list of all the "remote" value extents.
- */
- size = count * sizeof(xfs_attr_inactive_list_t);
- list = kmem_alloc(size, KM_SLEEP);
-
- /*
- * Identify each of the "remote" value extents.
- */
- lp = list;
- entry = xfs_attr3_leaf_entryp(leaf);
- for (i = 0; i < ichdr.count; entry++, i++) {
- if (be16_to_cpu(entry->nameidx) &&
- ((entry->flags & XFS_ATTR_LOCAL) == 0)) {
- name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
- if (name_rmt->valueblk) {
- lp->valueblk = be32_to_cpu(name_rmt->valueblk);
- lp->valuelen = xfs_attr3_rmt_blocks(dp->i_mount,
- be32_to_cpu(name_rmt->valuelen));
- lp++;
- }
- }
- }
- xfs_trans_brelse(*trans, bp); /* unlock for trans. in freextent() */
-
- /*
- * Invalidate each of the "remote" value extents.
- */
- error = 0;
- for (lp = list, i = 0; i < count; i++, lp++) {
- tmp = xfs_attr3_leaf_freextent(trans, dp,
- lp->valueblk, lp->valuelen);
-
- if (error == 0)
- error = tmp; /* save only the 1st errno */
- }
-
- kmem_free(list);
- return error;
-}
-
-/*
- * Look at all the extents for this logical region,
- * invalidate any buffers that are incore/in transactions.
- */
-STATIC int
-xfs_attr3_leaf_freextent(
- struct xfs_trans **trans,
- struct xfs_inode *dp,
- xfs_dablk_t blkno,
- int blkcnt)
-{
- struct xfs_bmbt_irec map;
- struct xfs_buf *bp;
- xfs_dablk_t tblkno;
- xfs_daddr_t dblkno;
- int tblkcnt;
- int dblkcnt;
- int nmap;
- int error;
-
- /*
- * Roll through the "value", invalidating the attribute value's
- * blocks.
- */
- tblkno = blkno;
- tblkcnt = blkcnt;
- while (tblkcnt > 0) {
- /*
- * Try to remember where we decided to put the value.
- */
- nmap = 1;
- error = xfs_bmapi_read(dp, (xfs_fileoff_t)tblkno, tblkcnt,
- &map, &nmap, XFS_BMAPI_ATTRFORK);
- if (error) {
- return(error);
- }
- ASSERT(nmap == 1);
- ASSERT(map.br_startblock != DELAYSTARTBLOCK);
-
- /*
- * If it's a hole, these are already unmapped
- * so there's nothing to invalidate.
- */
- if (map.br_startblock != HOLESTARTBLOCK) {
-
- dblkno = XFS_FSB_TO_DADDR(dp->i_mount,
- map.br_startblock);
- dblkcnt = XFS_FSB_TO_BB(dp->i_mount,
- map.br_blockcount);
- bp = xfs_trans_get_buf(*trans,
- dp->i_mount->m_ddev_targp,
- dblkno, dblkcnt, 0);
- if (!bp)
- return ENOMEM;
- xfs_trans_binval(*trans, bp);
- /*
- * Roll to next transaction.
- */
- error = xfs_trans_roll(trans, dp);
- if (error)
- return (error);
- }
-
- tblkno += map.br_blockcount;
- tblkcnt -= map.br_blockcount;
- }
-
- return(0);
-}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 18/60] xfs: split out xfs inode operations into separate file
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (16 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 17/60] xfs: split out attribute fork truncation " Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 19/60] xfs: consolidate xfs_vnodeops.c into xfs_inode_ops.c Dave Chinner
` (43 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The core xfs inode operations such as locking, allocation, reading
and writing are not shared with userspace. However, much of the
remaining internal inode code such as the incore extent tracking and
formatting is. Split the XFS inode operations out into a new
xfs_inode_iops.c file to minimise the differences between the files
shred with userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_inode.c | 2295 ++++++------------------------------------------
fs/xfs/xfs_inode.h | 341 +------
fs/xfs/xfs_inode_ops.c | 1772 +++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_inode_ops.h | 353 ++++++++
5 files changed, 2418 insertions(+), 2344 deletions(-)
create mode 100644 fs/xfs/xfs_inode_ops.c
create mode 100644 fs/xfs/xfs_inode_ops.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 6f33414..f372ed9 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -42,6 +42,7 @@ xfs-y += xfs_aops.o \
xfs_fsops.o \
xfs_globals.o \
xfs_icache.o \
+ xfs_inode_ops.o \
xfs_ioctl.o \
xfs_iomap.o \
xfs_iops.o \
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 9ecfe1e..07fb510 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -51,281 +51,10 @@
kmem_zone_t *xfs_ifork_zone;
kmem_zone_t *xfs_inode_zone;
-/*
- * Used in xfs_itruncate_extents(). This is the maximum number of extents
- * freed from a file in a single transaction.
- */
-#define XFS_ITRUNC_MAX_EXTENTS 2
-
-STATIC int xfs_iflush_int(xfs_inode_t *, xfs_buf_t *);
STATIC int xfs_iformat_local(xfs_inode_t *, xfs_dinode_t *, int, int);
STATIC int xfs_iformat_extents(xfs_inode_t *, xfs_dinode_t *, int);
STATIC int xfs_iformat_btree(xfs_inode_t *, xfs_dinode_t *, int);
-/*
- * helper function to extract extent size hint from inode
- */
-xfs_extlen_t
-xfs_get_extsz_hint(
- struct xfs_inode *ip)
-{
- if ((ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE) && ip->i_d.di_extsize)
- return ip->i_d.di_extsize;
- if (XFS_IS_REALTIME_INODE(ip))
- return ip->i_mount->m_sb.sb_rextsize;
- return 0;
-}
-
-/*
- * This is a wrapper routine around the xfs_ilock() routine used to centralize
- * some grungy code. It is used in places that wish to lock the inode solely
- * for reading the extents. The reason these places can't just call
- * xfs_ilock(SHARED) is that the inode lock also guards to bringing in of the
- * extents from disk for a file in b-tree format. If the inode is in b-tree
- * format, then we need to lock the inode exclusively until the extents are read
- * in. Locking it exclusively all the time would limit our parallelism
- * unnecessarily, though. What we do instead is check to see if the extents
- * have been read in yet, and only lock the inode exclusively if they have not.
- *
- * The function returns a value which should be given to the corresponding
- * xfs_iunlock_map_shared(). This value is the mode in which the lock was
- * actually taken.
- */
-uint
-xfs_ilock_map_shared(
- xfs_inode_t *ip)
-{
- uint lock_mode;
-
- if ((ip->i_d.di_format == XFS_DINODE_FMT_BTREE) &&
- ((ip->i_df.if_flags & XFS_IFEXTENTS) == 0)) {
- lock_mode = XFS_ILOCK_EXCL;
- } else {
- lock_mode = XFS_ILOCK_SHARED;
- }
-
- xfs_ilock(ip, lock_mode);
-
- return lock_mode;
-}
-
-/*
- * This is simply the unlock routine to go with xfs_ilock_map_shared().
- * All it does is call xfs_iunlock() with the given lock_mode.
- */
-void
-xfs_iunlock_map_shared(
- xfs_inode_t *ip,
- unsigned int lock_mode)
-{
- xfs_iunlock(ip, lock_mode);
-}
-
-/*
- * The xfs inode contains 2 locks: a multi-reader lock called the
- * i_iolock and a multi-reader lock called the i_lock. This routine
- * allows either or both of the locks to be obtained.
- *
- * The 2 locks should always be ordered so that the IO lock is
- * obtained first in order to prevent deadlock.
- *
- * ip -- the inode being locked
- * lock_flags -- this parameter indicates the inode's locks
- * to be locked. It can be:
- * XFS_IOLOCK_SHARED,
- * XFS_IOLOCK_EXCL,
- * XFS_ILOCK_SHARED,
- * XFS_ILOCK_EXCL,
- * XFS_IOLOCK_SHARED | XFS_ILOCK_SHARED,
- * XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL,
- * XFS_IOLOCK_EXCL | XFS_ILOCK_SHARED,
- * XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL
- */
-void
-xfs_ilock(
- xfs_inode_t *ip,
- uint lock_flags)
-{
- trace_xfs_ilock(ip, lock_flags, _RET_IP_);
-
- /*
- * You can't set both SHARED and EXCL for the same lock,
- * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
- * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
- */
- ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
- (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
- ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
- (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
- ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
-
- if (lock_flags & XFS_IOLOCK_EXCL)
- mrupdate_nested(&ip->i_iolock, XFS_IOLOCK_DEP(lock_flags));
- else if (lock_flags & XFS_IOLOCK_SHARED)
- mraccess_nested(&ip->i_iolock, XFS_IOLOCK_DEP(lock_flags));
-
- if (lock_flags & XFS_ILOCK_EXCL)
- mrupdate_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
- else if (lock_flags & XFS_ILOCK_SHARED)
- mraccess_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
-}
-
-/*
- * This is just like xfs_ilock(), except that the caller
- * is guaranteed not to sleep. It returns 1 if it gets
- * the requested locks and 0 otherwise. If the IO lock is
- * obtained but the inode lock cannot be, then the IO lock
- * is dropped before returning.
- *
- * ip -- the inode being locked
- * lock_flags -- this parameter indicates the inode's locks to be
- * to be locked. See the comment for xfs_ilock() for a list
- * of valid values.
- */
-int
-xfs_ilock_nowait(
- xfs_inode_t *ip,
- uint lock_flags)
-{
- trace_xfs_ilock_nowait(ip, lock_flags, _RET_IP_);
-
- /*
- * You can't set both SHARED and EXCL for the same lock,
- * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
- * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
- */
- ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
- (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
- ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
- (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
- ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
-
- if (lock_flags & XFS_IOLOCK_EXCL) {
- if (!mrtryupdate(&ip->i_iolock))
- goto out;
- } else if (lock_flags & XFS_IOLOCK_SHARED) {
- if (!mrtryaccess(&ip->i_iolock))
- goto out;
- }
- if (lock_flags & XFS_ILOCK_EXCL) {
- if (!mrtryupdate(&ip->i_lock))
- goto out_undo_iolock;
- } else if (lock_flags & XFS_ILOCK_SHARED) {
- if (!mrtryaccess(&ip->i_lock))
- goto out_undo_iolock;
- }
- return 1;
-
- out_undo_iolock:
- if (lock_flags & XFS_IOLOCK_EXCL)
- mrunlock_excl(&ip->i_iolock);
- else if (lock_flags & XFS_IOLOCK_SHARED)
- mrunlock_shared(&ip->i_iolock);
- out:
- return 0;
-}
-
-/*
- * xfs_iunlock() is used to drop the inode locks acquired with
- * xfs_ilock() and xfs_ilock_nowait(). The caller must pass
- * in the flags given to xfs_ilock() or xfs_ilock_nowait() so
- * that we know which locks to drop.
- *
- * ip -- the inode being unlocked
- * lock_flags -- this parameter indicates the inode's locks to be
- * to be unlocked. See the comment for xfs_ilock() for a list
- * of valid values for this parameter.
- *
- */
-void
-xfs_iunlock(
- xfs_inode_t *ip,
- uint lock_flags)
-{
- /*
- * You can't set both SHARED and EXCL for the same lock,
- * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
- * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
- */
- ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
- (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
- ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
- (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
- ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
- ASSERT(lock_flags != 0);
-
- if (lock_flags & XFS_IOLOCK_EXCL)
- mrunlock_excl(&ip->i_iolock);
- else if (lock_flags & XFS_IOLOCK_SHARED)
- mrunlock_shared(&ip->i_iolock);
-
- if (lock_flags & XFS_ILOCK_EXCL)
- mrunlock_excl(&ip->i_lock);
- else if (lock_flags & XFS_ILOCK_SHARED)
- mrunlock_shared(&ip->i_lock);
-
- trace_xfs_iunlock(ip, lock_flags, _RET_IP_);
-}
-
-/*
- * give up write locks. the i/o lock cannot be held nested
- * if it is being demoted.
- */
-void
-xfs_ilock_demote(
- xfs_inode_t *ip,
- uint lock_flags)
-{
- ASSERT(lock_flags & (XFS_IOLOCK_EXCL|XFS_ILOCK_EXCL));
- ASSERT((lock_flags & ~(XFS_IOLOCK_EXCL|XFS_ILOCK_EXCL)) == 0);
-
- if (lock_flags & XFS_ILOCK_EXCL)
- mrdemote(&ip->i_lock);
- if (lock_flags & XFS_IOLOCK_EXCL)
- mrdemote(&ip->i_iolock);
-
- trace_xfs_ilock_demote(ip, lock_flags, _RET_IP_);
-}
-
-#if defined(DEBUG) || defined(XFS_WARN)
-int
-xfs_isilocked(
- xfs_inode_t *ip,
- uint lock_flags)
-{
- if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
- if (!(lock_flags & XFS_ILOCK_SHARED))
- return !!ip->i_lock.mr_writer;
- return rwsem_is_locked(&ip->i_lock.mr_lock);
- }
-
- if (lock_flags & (XFS_IOLOCK_EXCL|XFS_IOLOCK_SHARED)) {
- if (!(lock_flags & XFS_IOLOCK_SHARED))
- return !!ip->i_iolock.mr_writer;
- return rwsem_is_locked(&ip->i_iolock.mr_lock);
- }
-
- ASSERT(0);
- return 0;
-}
-#endif
-
-void
-__xfs_iflock(
- struct xfs_inode *ip)
-{
- wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_IFLOCK_BIT);
- DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_IFLOCK_BIT);
-
- do {
- prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
- if (xfs_isiflocked(ip))
- io_schedule();
- } while (!xfs_iflock_nowait(ip));
-
- finish_wait(wq, &wait.wait);
-}
-
#ifdef DEBUG
/*
* Make sure that the extents in the given memory buffer
@@ -927,64 +656,6 @@ xfs_dinode_to_disk(
}
}
-STATIC uint
-_xfs_dic2xflags(
- __uint16_t di_flags)
-{
- uint flags = 0;
-
- if (di_flags & XFS_DIFLAG_ANY) {
- if (di_flags & XFS_DIFLAG_REALTIME)
- flags |= XFS_XFLAG_REALTIME;
- if (di_flags & XFS_DIFLAG_PREALLOC)
- flags |= XFS_XFLAG_PREALLOC;
- if (di_flags & XFS_DIFLAG_IMMUTABLE)
- flags |= XFS_XFLAG_IMMUTABLE;
- if (di_flags & XFS_DIFLAG_APPEND)
- flags |= XFS_XFLAG_APPEND;
- if (di_flags & XFS_DIFLAG_SYNC)
- flags |= XFS_XFLAG_SYNC;
- if (di_flags & XFS_DIFLAG_NOATIME)
- flags |= XFS_XFLAG_NOATIME;
- if (di_flags & XFS_DIFLAG_NODUMP)
- flags |= XFS_XFLAG_NODUMP;
- if (di_flags & XFS_DIFLAG_RTINHERIT)
- flags |= XFS_XFLAG_RTINHERIT;
- if (di_flags & XFS_DIFLAG_PROJINHERIT)
- flags |= XFS_XFLAG_PROJINHERIT;
- if (di_flags & XFS_DIFLAG_NOSYMLINKS)
- flags |= XFS_XFLAG_NOSYMLINKS;
- if (di_flags & XFS_DIFLAG_EXTSIZE)
- flags |= XFS_XFLAG_EXTSIZE;
- if (di_flags & XFS_DIFLAG_EXTSZINHERIT)
- flags |= XFS_XFLAG_EXTSZINHERIT;
- if (di_flags & XFS_DIFLAG_NODEFRAG)
- flags |= XFS_XFLAG_NODEFRAG;
- if (di_flags & XFS_DIFLAG_FILESTREAM)
- flags |= XFS_XFLAG_FILESTREAM;
- }
-
- return flags;
-}
-
-uint
-xfs_ip2xflags(
- xfs_inode_t *ip)
-{
- xfs_icdinode_t *dic = &ip->i_d;
-
- return _xfs_dic2xflags(dic->di_flags) |
- (XFS_IFORK_Q(ip) ? XFS_XFLAG_HASATTR : 0);
-}
-
-uint
-xfs_dic2xflags(
- xfs_dinode_t *dip)
-{
- return _xfs_dic2xflags(be16_to_cpu(dip->di_flags)) |
- (XFS_DFORK_Q(dip) ? XFS_XFLAG_HASATTR : 0);
-}
-
static bool
xfs_dinode_verify(
struct xfs_mount *mp,
@@ -1209,1018 +880,140 @@ xfs_iread_extents(
}
/*
- * Allocate an inode on disk and return a copy of its in-core version.
- * The in-core inode is locked exclusively. Set mode, nlink, and rdev
- * appropriately within the inode. The uid and gid for the inode are
- * set according to the contents of the given cred structure.
- *
- * Use xfs_dialloc() to allocate the on-disk inode. If xfs_dialloc()
- * has a free inode available, call xfs_iget() to obtain the in-core
- * version of the allocated inode. Finally, fill in the inode and
- * log its initial contents. In this case, ialloc_context would be
- * set to NULL.
- *
- * If xfs_dialloc() does not have an available inode, it will replenish
- * its supply by doing an allocation. Since we can only do one
- * allocation within a transaction without deadlocks, we must commit
- * the current transaction before returning the inode itself.
- * In this case, therefore, we will set ialloc_context and return.
- * The caller should then commit the current transaction, start a new
- * transaction, and call xfs_ialloc() again to actually get the inode.
+ * Reallocate the space for if_broot based on the number of records
+ * being added or deleted as indicated in rec_diff. Move the records
+ * and pointers in if_broot to fit the new size. When shrinking this
+ * will eliminate holes between the records and pointers created by
+ * the caller. When growing this will create holes to be filled in
+ * by the caller.
*
- * To ensure that some other process does not grab the inode that
- * was allocated during the first call to xfs_ialloc(), this routine
- * also returns the [locked] bp pointing to the head of the freelist
- * as ialloc_context. The caller should hold this buffer across
- * the commit and pass it back into this routine on the second call.
+ * The caller must not request to add more records than would fit in
+ * the on-disk inode root. If the if_broot is currently NULL, then
+ * if we adding records one will be allocated. The caller must also
+ * not request that the number of records go below zero, although
+ * it can go to zero.
*
- * If we are allocating quota inodes, we do not have a parent inode
- * to attach to or associate with (i.e. pip == NULL) because they
- * are not linked into the directory structure - they are attached
- * directly to the superblock - and so have no parent.
+ * ip -- the inode whose if_broot area is changing
+ * ext_diff -- the change in the number of records, positive or negative,
+ * requested for the if_broot array.
*/
-int
-xfs_ialloc(
- xfs_trans_t *tp,
- xfs_inode_t *pip,
- umode_t mode,
- xfs_nlink_t nlink,
- xfs_dev_t rdev,
- prid_t prid,
- int okalloc,
- xfs_buf_t **ialloc_context,
- xfs_inode_t **ipp)
+void
+xfs_iroot_realloc(
+ xfs_inode_t *ip,
+ int rec_diff,
+ int whichfork)
{
- struct xfs_mount *mp = tp->t_mountp;
- xfs_ino_t ino;
- xfs_inode_t *ip;
- uint flags;
- int error;
- timespec_t tv;
- int filestreams = 0;
+ struct xfs_mount *mp = ip->i_mount;
+ int cur_max;
+ xfs_ifork_t *ifp;
+ struct xfs_btree_block *new_broot;
+ int new_max;
+ size_t new_size;
+ char *np;
+ char *op;
/*
- * Call the space management code to pick
- * the on-disk inode to be allocated.
+ * Handle the degenerate case quietly.
*/
- error = xfs_dialloc(tp, pip ? pip->i_ino : 0, mode, okalloc,
- ialloc_context, &ino);
- if (error)
- return error;
- if (*ialloc_context || ino == NULLFSINO) {
- *ipp = NULL;
- return 0;
+ if (rec_diff == 0) {
+ return;
}
- ASSERT(*ialloc_context == NULL);
-
- /*
- * Get the in-core inode with the lock held exclusively.
- * This is because we're setting fields here we need
- * to prevent others from looking at until we're done.
- */
- error = xfs_iget(mp, tp, ino, XFS_IGET_CREATE,
- XFS_ILOCK_EXCL, &ip);
- if (error)
- return error;
- ASSERT(ip != NULL);
- ip->i_d.di_mode = mode;
- ip->i_d.di_onlink = 0;
- ip->i_d.di_nlink = nlink;
- ASSERT(ip->i_d.di_nlink == nlink);
- ip->i_d.di_uid = current_fsuid();
- ip->i_d.di_gid = current_fsgid();
- xfs_set_projid(ip, prid);
- memset(&(ip->i_d.di_pad[0]), 0, sizeof(ip->i_d.di_pad));
-
- /*
- * If the superblock version is up to where we support new format
- * inodes and this is currently an old format inode, then change
- * the inode version number now. This way we only do the conversion
- * here rather than here and in the flush/logging code.
- */
- if (xfs_sb_version_hasnlink(&mp->m_sb) &&
- ip->i_d.di_version == 1) {
- ip->i_d.di_version = 2;
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ if (rec_diff > 0) {
/*
- * We've already zeroed the old link count, the projid field,
- * and the pad field.
+ * If there wasn't any memory allocated before, just
+ * allocate it now and get out.
*/
- }
-
- /*
- * Project ids won't be stored on disk if we are using a version 1 inode.
- */
- if ((prid != 0) && (ip->i_d.di_version == 1))
- xfs_bump_ino_vers2(tp, ip);
-
- if (pip && XFS_INHERIT_GID(pip)) {
- ip->i_d.di_gid = pip->i_d.di_gid;
- if ((pip->i_d.di_mode & S_ISGID) && S_ISDIR(mode)) {
- ip->i_d.di_mode |= S_ISGID;
+ if (ifp->if_broot_bytes == 0) {
+ new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, rec_diff);
+ ifp->if_broot = kmem_alloc(new_size, KM_SLEEP | KM_NOFS);
+ ifp->if_broot_bytes = (int)new_size;
+ return;
}
- }
- /*
- * If the group ID of the new file does not match the effective group
- * ID or one of the supplementary group IDs, the S_ISGID bit is cleared
- * (and only if the irix_sgid_inherit compatibility variable is set).
- */
- if ((irix_sgid_inherit) &&
- (ip->i_d.di_mode & S_ISGID) &&
- (!in_group_p((gid_t)ip->i_d.di_gid))) {
- ip->i_d.di_mode &= ~S_ISGID;
+ /*
+ * If there is already an existing if_broot, then we need
+ * to realloc() it and shift the pointers to their new
+ * location. The records don't change location because
+ * they are kept butted up against the btree block header.
+ */
+ cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
+ new_max = cur_max + rec_diff;
+ new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
+ ifp->if_broot = kmem_realloc(ifp->if_broot, new_size,
+ XFS_BMAP_BROOT_SPACE_CALC(mp, cur_max),
+ KM_SLEEP | KM_NOFS);
+ op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
+ ifp->if_broot_bytes);
+ np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
+ (int)new_size);
+ ifp->if_broot_bytes = (int)new_size;
+ ASSERT(ifp->if_broot_bytes <=
+ XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ(ip));
+ memmove(np, op, cur_max * (uint)sizeof(xfs_dfsbno_t));
+ return;
}
- ip->i_d.di_size = 0;
- ip->i_d.di_nextents = 0;
- ASSERT(ip->i_d.di_nblocks == 0);
-
- nanotime(&tv);
- ip->i_d.di_mtime.t_sec = (__int32_t)tv.tv_sec;
- ip->i_d.di_mtime.t_nsec = (__int32_t)tv.tv_nsec;
- ip->i_d.di_atime = ip->i_d.di_mtime;
- ip->i_d.di_ctime = ip->i_d.di_mtime;
-
/*
- * di_gen will have been taken care of in xfs_iread.
+ * rec_diff is less than 0. In this case, we are shrinking the
+ * if_broot buffer. It must already exist. If we go to zero
+ * records, just get rid of the root and clear the status bit.
*/
- ip->i_d.di_extsize = 0;
- ip->i_d.di_dmevmask = 0;
- ip->i_d.di_dmstate = 0;
- ip->i_d.di_flags = 0;
-
- if (ip->i_d.di_version == 3) {
- ASSERT(ip->i_d.di_ino == ino);
- ASSERT(uuid_equal(&ip->i_d.di_uuid, &mp->m_sb.sb_uuid));
- ip->i_d.di_crc = 0;
- ip->i_d.di_changecount = 1;
- ip->i_d.di_lsn = 0;
- ip->i_d.di_flags2 = 0;
- memset(&(ip->i_d.di_pad2[0]), 0, sizeof(ip->i_d.di_pad2));
- ip->i_d.di_crtime = ip->i_d.di_mtime;
- }
-
-
- flags = XFS_ILOG_CORE;
- switch (mode & S_IFMT) {
- case S_IFIFO:
- case S_IFCHR:
- case S_IFBLK:
- case S_IFSOCK:
- ip->i_d.di_format = XFS_DINODE_FMT_DEV;
- ip->i_df.if_u2.if_rdev = rdev;
- ip->i_df.if_flags = 0;
- flags |= XFS_ILOG_DEV;
- break;
- case S_IFREG:
+ ASSERT((ifp->if_broot != NULL) && (ifp->if_broot_bytes > 0));
+ cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
+ new_max = cur_max + rec_diff;
+ ASSERT(new_max >= 0);
+ if (new_max > 0)
+ new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
+ else
+ new_size = 0;
+ if (new_size > 0) {
+ new_broot = kmem_alloc(new_size, KM_SLEEP | KM_NOFS);
/*
- * we can't set up filestreams until after the VFS inode
- * is set up properly.
+ * First copy over the btree block header.
*/
- if (pip && xfs_inode_is_filestream(pip))
- filestreams = 1;
- /* fall through */
- case S_IFDIR:
- if (pip && (pip->i_d.di_flags & XFS_DIFLAG_ANY)) {
- uint di_flags = 0;
-
- if (S_ISDIR(mode)) {
- if (pip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)
- di_flags |= XFS_DIFLAG_RTINHERIT;
- if (pip->i_d.di_flags & XFS_DIFLAG_EXTSZINHERIT) {
- di_flags |= XFS_DIFLAG_EXTSZINHERIT;
- ip->i_d.di_extsize = pip->i_d.di_extsize;
- }
- } else if (S_ISREG(mode)) {
- if (pip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)
- di_flags |= XFS_DIFLAG_REALTIME;
- if (pip->i_d.di_flags & XFS_DIFLAG_EXTSZINHERIT) {
- di_flags |= XFS_DIFLAG_EXTSIZE;
- ip->i_d.di_extsize = pip->i_d.di_extsize;
- }
- }
- if ((pip->i_d.di_flags & XFS_DIFLAG_NOATIME) &&
- xfs_inherit_noatime)
- di_flags |= XFS_DIFLAG_NOATIME;
- if ((pip->i_d.di_flags & XFS_DIFLAG_NODUMP) &&
- xfs_inherit_nodump)
- di_flags |= XFS_DIFLAG_NODUMP;
- if ((pip->i_d.di_flags & XFS_DIFLAG_SYNC) &&
- xfs_inherit_sync)
- di_flags |= XFS_DIFLAG_SYNC;
- if ((pip->i_d.di_flags & XFS_DIFLAG_NOSYMLINKS) &&
- xfs_inherit_nosymlinks)
- di_flags |= XFS_DIFLAG_NOSYMLINKS;
- if (pip->i_d.di_flags & XFS_DIFLAG_PROJINHERIT)
- di_flags |= XFS_DIFLAG_PROJINHERIT;
- if ((pip->i_d.di_flags & XFS_DIFLAG_NODEFRAG) &&
- xfs_inherit_nodefrag)
- di_flags |= XFS_DIFLAG_NODEFRAG;
- if (pip->i_d.di_flags & XFS_DIFLAG_FILESTREAM)
- di_flags |= XFS_DIFLAG_FILESTREAM;
- ip->i_d.di_flags |= di_flags;
- }
- /* FALLTHROUGH */
- case S_IFLNK:
- ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
- ip->i_df.if_flags = XFS_IFEXTENTS;
- ip->i_df.if_bytes = ip->i_df.if_real_bytes = 0;
- ip->i_df.if_u1.if_extents = NULL;
- break;
- default:
- ASSERT(0);
+ memcpy(new_broot, ifp->if_broot,
+ XFS_BMBT_BLOCK_LEN(ip->i_mount));
+ } else {
+ new_broot = NULL;
+ ifp->if_flags &= ~XFS_IFBROOT;
}
- /*
- * Attribute fork settings for new inode.
- */
- ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
- ip->i_d.di_anextents = 0;
/*
- * Log the new values stuffed into the inode.
+ * Only copy the records and pointers if there are any.
*/
- xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
- xfs_trans_log_inode(tp, ip, flags);
-
- /* now that we have an i_mode we can setup inode ops and unlock */
- xfs_setup_inode(ip);
+ if (new_max > 0) {
+ /*
+ * First copy the records.
+ */
+ op = (char *)XFS_BMBT_REC_ADDR(mp, ifp->if_broot, 1);
+ np = (char *)XFS_BMBT_REC_ADDR(mp, new_broot, 1);
+ memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_rec_t));
- /* now we have set up the vfs inode we can associate the filestream */
- if (filestreams) {
- error = xfs_filestream_associate(pip, ip);
- if (error < 0)
- return -error;
- if (!error)
- xfs_iflags_set(ip, XFS_IFILESTREAM);
+ /*
+ * Then copy the pointers.
+ */
+ op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
+ ifp->if_broot_bytes);
+ np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, new_broot, 1,
+ (int)new_size);
+ memcpy(np, op, new_max * (uint)sizeof(xfs_dfsbno_t));
}
-
- *ipp = ip;
- return 0;
+ kmem_free(ifp->if_broot);
+ ifp->if_broot = new_broot;
+ ifp->if_broot_bytes = (int)new_size;
+ ASSERT(ifp->if_broot_bytes <=
+ XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ(ip));
+ return;
}
+
/*
- * Free up the underlying blocks past new_size. The new size must be smaller
- * than the current size. This routine can be used both for the attribute and
- * data fork, and does not modify the inode size, which is left to the caller.
- *
- * The transaction passed to this routine must have made a permanent log
- * reservation of at least XFS_ITRUNCATE_LOG_RES. This routine may commit the
- * given transaction and start new ones, so make sure everything involved in
- * the transaction is tidy before calling here. Some transaction will be
- * returned to the caller to be committed. The incoming transaction must
- * already include the inode, and both inode locks must be held exclusively.
- * The inode must also be "held" within the transaction. On return the inode
- * will be "held" within the returned transaction. This routine does NOT
- * require any disk space to be reserved for it within the transaction.
- *
- * If we get an error, we must return with the inode locked and linked into the
- * current transaction. This keeps things simple for the higher level code,
- * because it always knows that the inode is locked and held in the transaction
- * that returns to it whether errors occur or not. We don't mark the inode
- * dirty on error so that transactions can be easily aborted if possible.
- */
-int
-xfs_itruncate_extents(
- struct xfs_trans **tpp,
- struct xfs_inode *ip,
- int whichfork,
- xfs_fsize_t new_size)
-{
- struct xfs_mount *mp = ip->i_mount;
- struct xfs_trans *tp = *tpp;
- struct xfs_trans *ntp;
- xfs_bmap_free_t free_list;
- xfs_fsblock_t first_block;
- xfs_fileoff_t first_unmap_block;
- xfs_fileoff_t last_block;
- xfs_filblks_t unmap_len;
- int committed;
- int error = 0;
- int done = 0;
-
- ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
- ASSERT(!atomic_read(&VFS_I(ip)->i_count) ||
- xfs_isilocked(ip, XFS_IOLOCK_EXCL));
- ASSERT(new_size <= XFS_ISIZE(ip));
- ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
- ASSERT(ip->i_itemp != NULL);
- ASSERT(ip->i_itemp->ili_lock_flags == 0);
- ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
-
- trace_xfs_itruncate_extents_start(ip, new_size);
-
- /*
- * Since it is possible for space to become allocated beyond
- * the end of the file (in a crash where the space is allocated
- * but the inode size is not yet updated), simply remove any
- * blocks which show up between the new EOF and the maximum
- * possible file size. If the first block to be removed is
- * beyond the maximum file size (ie it is the same as last_block),
- * then there is nothing to do.
- */
- first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size);
- last_block = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
- if (first_unmap_block == last_block)
- return 0;
-
- ASSERT(first_unmap_block < last_block);
- unmap_len = last_block - first_unmap_block + 1;
- while (!done) {
- xfs_bmap_init(&free_list, &first_block);
- error = xfs_bunmapi(tp, ip,
- first_unmap_block, unmap_len,
- xfs_bmapi_aflag(whichfork),
- XFS_ITRUNC_MAX_EXTENTS,
- &first_block, &free_list,
- &done);
- if (error)
- goto out_bmap_cancel;
-
- /*
- * Duplicate the transaction that has the permanent
- * reservation and commit the old transaction.
- */
- error = xfs_bmap_finish(&tp, &free_list, &committed);
- if (committed)
- xfs_trans_ijoin(tp, ip, 0);
- if (error)
- goto out_bmap_cancel;
-
- if (committed) {
- /*
- * Mark the inode dirty so it will be logged and
- * moved forward in the log as part of every commit.
- */
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
- }
-
- ntp = xfs_trans_dup(tp);
- error = xfs_trans_commit(tp, 0);
- tp = ntp;
-
- xfs_trans_ijoin(tp, ip, 0);
-
- if (error)
- goto out;
-
- /*
- * Transaction commit worked ok so we can drop the extra ticket
- * reference that we gained in xfs_trans_dup()
- */
- xfs_log_ticket_put(tp->t_ticket);
- error = xfs_trans_reserve(tp, 0,
- XFS_ITRUNCATE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_ITRUNCATE_LOG_COUNT);
- if (error)
- goto out;
- }
-
- /*
- * Always re-log the inode so that our permanent transaction can keep
- * on rolling it forward in the log.
- */
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-
- trace_xfs_itruncate_extents_end(ip, new_size);
-
-out:
- *tpp = tp;
- return error;
-out_bmap_cancel:
- /*
- * If the bunmapi call encounters an error, return to the caller where
- * the transaction can be properly aborted. We just need to make sure
- * we're not holding any resources that we were not when we came in.
- */
- xfs_bmap_cancel(&free_list);
- goto out;
-}
-
-/*
- * This is called when the inode's link count goes to 0.
- * We place the on-disk inode on a list in the AGI. It
- * will be pulled from this list when the inode is freed.
- */
-int
-xfs_iunlink(
- xfs_trans_t *tp,
- xfs_inode_t *ip)
-{
- xfs_mount_t *mp;
- xfs_agi_t *agi;
- xfs_dinode_t *dip;
- xfs_buf_t *agibp;
- xfs_buf_t *ibp;
- xfs_agino_t agino;
- short bucket_index;
- int offset;
- int error;
-
- ASSERT(ip->i_d.di_nlink == 0);
- ASSERT(ip->i_d.di_mode != 0);
-
- mp = tp->t_mountp;
-
- /*
- * Get the agi buffer first. It ensures lock ordering
- * on the list.
- */
- error = xfs_read_agi(mp, tp, XFS_INO_TO_AGNO(mp, ip->i_ino), &agibp);
- if (error)
- return error;
- agi = XFS_BUF_TO_AGI(agibp);
-
- /*
- * Get the index into the agi hash table for the
- * list this inode will go on.
- */
- agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
- ASSERT(agino != 0);
- bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS;
- ASSERT(agi->agi_unlinked[bucket_index]);
- ASSERT(be32_to_cpu(agi->agi_unlinked[bucket_index]) != agino);
-
- if (agi->agi_unlinked[bucket_index] != cpu_to_be32(NULLAGINO)) {
- /*
- * There is already another inode in the bucket we need
- * to add ourselves to. Add us at the front of the list.
- * Here we put the head pointer into our next pointer,
- * and then we fall through to point the head at us.
- */
- error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &dip, &ibp,
- 0, 0);
- if (error)
- return error;
-
- ASSERT(dip->di_next_unlinked == cpu_to_be32(NULLAGINO));
- dip->di_next_unlinked = agi->agi_unlinked[bucket_index];
- offset = ip->i_imap.im_boffset +
- offsetof(xfs_dinode_t, di_next_unlinked);
-
- /* need to recalc the inode CRC if appropriate */
- xfs_dinode_calc_crc(mp, dip);
-
- xfs_trans_inode_buf(tp, ibp);
- xfs_trans_log_buf(tp, ibp, offset,
- (offset + sizeof(xfs_agino_t) - 1));
- xfs_inobp_check(mp, ibp);
- }
-
- /*
- * Point the bucket head pointer at the inode being inserted.
- */
- ASSERT(agino != 0);
- agi->agi_unlinked[bucket_index] = cpu_to_be32(agino);
- offset = offsetof(xfs_agi_t, agi_unlinked) +
- (sizeof(xfs_agino_t) * bucket_index);
- xfs_trans_log_buf(tp, agibp, offset,
- (offset + sizeof(xfs_agino_t) - 1));
- return 0;
-}
-
-/*
- * Pull the on-disk inode from the AGI unlinked list.
- */
-STATIC int
-xfs_iunlink_remove(
- xfs_trans_t *tp,
- xfs_inode_t *ip)
-{
- xfs_ino_t next_ino;
- xfs_mount_t *mp;
- xfs_agi_t *agi;
- xfs_dinode_t *dip;
- xfs_buf_t *agibp;
- xfs_buf_t *ibp;
- xfs_agnumber_t agno;
- xfs_agino_t agino;
- xfs_agino_t next_agino;
- xfs_buf_t *last_ibp;
- xfs_dinode_t *last_dip = NULL;
- short bucket_index;
- int offset, last_offset = 0;
- int error;
-
- mp = tp->t_mountp;
- agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
-
- /*
- * Get the agi buffer first. It ensures lock ordering
- * on the list.
- */
- error = xfs_read_agi(mp, tp, agno, &agibp);
- if (error)
- return error;
-
- agi = XFS_BUF_TO_AGI(agibp);
-
- /*
- * Get the index into the agi hash table for the
- * list this inode will go on.
- */
- agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
- ASSERT(agino != 0);
- bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS;
- ASSERT(agi->agi_unlinked[bucket_index] != cpu_to_be32(NULLAGINO));
- ASSERT(agi->agi_unlinked[bucket_index]);
-
- if (be32_to_cpu(agi->agi_unlinked[bucket_index]) == agino) {
- /*
- * We're at the head of the list. Get the inode's on-disk
- * buffer to see if there is anyone after us on the list.
- * Only modify our next pointer if it is not already NULLAGINO.
- * This saves us the overhead of dealing with the buffer when
- * there is no need to change it.
- */
- error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &dip, &ibp,
- 0, 0);
- if (error) {
- xfs_warn(mp, "%s: xfs_imap_to_bp returned error %d.",
- __func__, error);
- return error;
- }
- next_agino = be32_to_cpu(dip->di_next_unlinked);
- ASSERT(next_agino != 0);
- if (next_agino != NULLAGINO) {
- dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
- offset = ip->i_imap.im_boffset +
- offsetof(xfs_dinode_t, di_next_unlinked);
-
- /* need to recalc the inode CRC if appropriate */
- xfs_dinode_calc_crc(mp, dip);
-
- xfs_trans_inode_buf(tp, ibp);
- xfs_trans_log_buf(tp, ibp, offset,
- (offset + sizeof(xfs_agino_t) - 1));
- xfs_inobp_check(mp, ibp);
- } else {
- xfs_trans_brelse(tp, ibp);
- }
- /*
- * Point the bucket head pointer at the next inode.
- */
- ASSERT(next_agino != 0);
- ASSERT(next_agino != agino);
- agi->agi_unlinked[bucket_index] = cpu_to_be32(next_agino);
- offset = offsetof(xfs_agi_t, agi_unlinked) +
- (sizeof(xfs_agino_t) * bucket_index);
- xfs_trans_log_buf(tp, agibp, offset,
- (offset + sizeof(xfs_agino_t) - 1));
- } else {
- /*
- * We need to search the list for the inode being freed.
- */
- next_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]);
- last_ibp = NULL;
- while (next_agino != agino) {
- struct xfs_imap imap;
-
- if (last_ibp)
- xfs_trans_brelse(tp, last_ibp);
-
- imap.im_blkno = 0;
- next_ino = XFS_AGINO_TO_INO(mp, agno, next_agino);
-
- error = xfs_imap(mp, tp, next_ino, &imap, 0);
- if (error) {
- xfs_warn(mp,
- "%s: xfs_imap returned error %d.",
- __func__, error);
- return error;
- }
-
- error = xfs_imap_to_bp(mp, tp, &imap, &last_dip,
- &last_ibp, 0, 0);
- if (error) {
- xfs_warn(mp,
- "%s: xfs_imap_to_bp returned error %d.",
- __func__, error);
- return error;
- }
-
- last_offset = imap.im_boffset;
- next_agino = be32_to_cpu(last_dip->di_next_unlinked);
- ASSERT(next_agino != NULLAGINO);
- ASSERT(next_agino != 0);
- }
-
- /*
- * Now last_ibp points to the buffer previous to us on the
- * unlinked list. Pull us from the list.
- */
- error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &dip, &ibp,
- 0, 0);
- if (error) {
- xfs_warn(mp, "%s: xfs_imap_to_bp(2) returned error %d.",
- __func__, error);
- return error;
- }
- next_agino = be32_to_cpu(dip->di_next_unlinked);
- ASSERT(next_agino != 0);
- ASSERT(next_agino != agino);
- if (next_agino != NULLAGINO) {
- dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
- offset = ip->i_imap.im_boffset +
- offsetof(xfs_dinode_t, di_next_unlinked);
-
- /* need to recalc the inode CRC if appropriate */
- xfs_dinode_calc_crc(mp, dip);
-
- xfs_trans_inode_buf(tp, ibp);
- xfs_trans_log_buf(tp, ibp, offset,
- (offset + sizeof(xfs_agino_t) - 1));
- xfs_inobp_check(mp, ibp);
- } else {
- xfs_trans_brelse(tp, ibp);
- }
- /*
- * Point the previous inode on the list to the next inode.
- */
- last_dip->di_next_unlinked = cpu_to_be32(next_agino);
- ASSERT(next_agino != 0);
- offset = last_offset + offsetof(xfs_dinode_t, di_next_unlinked);
-
- /* need to recalc the inode CRC if appropriate */
- xfs_dinode_calc_crc(mp, last_dip);
-
- xfs_trans_inode_buf(tp, last_ibp);
- xfs_trans_log_buf(tp, last_ibp, offset,
- (offset + sizeof(xfs_agino_t) - 1));
- xfs_inobp_check(mp, last_ibp);
- }
- return 0;
-}
-
-/*
- * A big issue when freeing the inode cluster is is that we _cannot_ skip any
- * inodes that are in memory - they all must be marked stale and attached to
- * the cluster buffer.
- */
-STATIC int
-xfs_ifree_cluster(
- xfs_inode_t *free_ip,
- xfs_trans_t *tp,
- xfs_ino_t inum)
-{
- xfs_mount_t *mp = free_ip->i_mount;
- int blks_per_cluster;
- int nbufs;
- int ninodes;
- int i, j;
- xfs_daddr_t blkno;
- xfs_buf_t *bp;
- xfs_inode_t *ip;
- xfs_inode_log_item_t *iip;
- xfs_log_item_t *lip;
- struct xfs_perag *pag;
-
- pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, inum));
- if (mp->m_sb.sb_blocksize >= XFS_INODE_CLUSTER_SIZE(mp)) {
- blks_per_cluster = 1;
- ninodes = mp->m_sb.sb_inopblock;
- nbufs = XFS_IALLOC_BLOCKS(mp);
- } else {
- blks_per_cluster = XFS_INODE_CLUSTER_SIZE(mp) /
- mp->m_sb.sb_blocksize;
- ninodes = blks_per_cluster * mp->m_sb.sb_inopblock;
- nbufs = XFS_IALLOC_BLOCKS(mp) / blks_per_cluster;
- }
-
- for (j = 0; j < nbufs; j++, inum += ninodes) {
- blkno = XFS_AGB_TO_DADDR(mp, XFS_INO_TO_AGNO(mp, inum),
- XFS_INO_TO_AGBNO(mp, inum));
-
- /*
- * We obtain and lock the backing buffer first in the process
- * here, as we have to ensure that any dirty inode that we
- * can't get the flush lock on is attached to the buffer.
- * If we scan the in-memory inodes first, then buffer IO can
- * complete before we get a lock on it, and hence we may fail
- * to mark all the active inodes on the buffer stale.
- */
- bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, blkno,
- mp->m_bsize * blks_per_cluster,
- XBF_UNMAPPED);
-
- if (!bp)
- return ENOMEM;
-
- /*
- * This buffer may not have been correctly initialised as we
- * didn't read it from disk. That's not important because we are
- * only using to mark the buffer as stale in the log, and to
- * attach stale cached inodes on it. That means it will never be
- * dispatched for IO. If it is, we want to know about it, and we
- * want it to fail. We can acheive this by adding a write
- * verifier to the buffer.
- */
- bp->b_ops = &xfs_inode_buf_ops;
-
- /*
- * Walk the inodes already attached to the buffer and mark them
- * stale. These will all have the flush locks held, so an
- * in-memory inode walk can't lock them. By marking them all
- * stale first, we will not attempt to lock them in the loop
- * below as the XFS_ISTALE flag will be set.
- */
- lip = bp->b_fspriv;
- while (lip) {
- if (lip->li_type == XFS_LI_INODE) {
- iip = (xfs_inode_log_item_t *)lip;
- ASSERT(iip->ili_logged == 1);
- lip->li_cb = xfs_istale_done;
- xfs_trans_ail_copy_lsn(mp->m_ail,
- &iip->ili_flush_lsn,
- &iip->ili_item.li_lsn);
- xfs_iflags_set(iip->ili_inode, XFS_ISTALE);
- }
- lip = lip->li_bio_list;
- }
-
-
- /*
- * For each inode in memory attempt to add it to the inode
- * buffer and set it up for being staled on buffer IO
- * completion. This is safe as we've locked out tail pushing
- * and flushing by locking the buffer.
- *
- * We have already marked every inode that was part of a
- * transaction stale above, which means there is no point in
- * even trying to lock them.
- */
- for (i = 0; i < ninodes; i++) {
-retry:
- rcu_read_lock();
- ip = radix_tree_lookup(&pag->pag_ici_root,
- XFS_INO_TO_AGINO(mp, (inum + i)));
-
- /* Inode not in memory, nothing to do */
- if (!ip) {
- rcu_read_unlock();
- continue;
- }
-
- /*
- * because this is an RCU protected lookup, we could
- * find a recently freed or even reallocated inode
- * during the lookup. We need to check under the
- * i_flags_lock for a valid inode here. Skip it if it
- * is not valid, the wrong inode or stale.
- */
- spin_lock(&ip->i_flags_lock);
- if (ip->i_ino != inum + i ||
- __xfs_iflags_test(ip, XFS_ISTALE)) {
- spin_unlock(&ip->i_flags_lock);
- rcu_read_unlock();
- continue;
- }
- spin_unlock(&ip->i_flags_lock);
-
- /*
- * Don't try to lock/unlock the current inode, but we
- * _cannot_ skip the other inodes that we did not find
- * in the list attached to the buffer and are not
- * already marked stale. If we can't lock it, back off
- * and retry.
- */
- if (ip != free_ip &&
- !xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) {
- rcu_read_unlock();
- delay(1);
- goto retry;
- }
- rcu_read_unlock();
-
- xfs_iflock(ip);
- xfs_iflags_set(ip, XFS_ISTALE);
-
- /*
- * we don't need to attach clean inodes or those only
- * with unlogged changes (which we throw away, anyway).
- */
- iip = ip->i_itemp;
- if (!iip || xfs_inode_clean(ip)) {
- ASSERT(ip != free_ip);
- xfs_ifunlock(ip);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- continue;
- }
-
- iip->ili_last_fields = iip->ili_fields;
- iip->ili_fields = 0;
- iip->ili_logged = 1;
- xfs_trans_ail_copy_lsn(mp->m_ail, &iip->ili_flush_lsn,
- &iip->ili_item.li_lsn);
-
- xfs_buf_attach_iodone(bp, xfs_istale_done,
- &iip->ili_item);
-
- if (ip != free_ip)
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- }
-
- xfs_trans_stale_inode_buf(tp, bp);
- xfs_trans_binval(tp, bp);
- }
-
- xfs_perag_put(pag);
- return 0;
-}
-
-/*
- * This is called to return an inode to the inode free list.
- * The inode should already be truncated to 0 length and have
- * no pages associated with it. This routine also assumes that
- * the inode is already a part of the transaction.
- *
- * The on-disk copy of the inode will have been added to the list
- * of unlinked inodes in the AGI. We need to remove the inode from
- * that list atomically with respect to freeing it here.
- */
-int
-xfs_ifree(
- xfs_trans_t *tp,
- xfs_inode_t *ip,
- xfs_bmap_free_t *flist)
-{
- int error;
- int delete;
- xfs_ino_t first_ino;
-
- ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
- ASSERT(ip->i_d.di_nlink == 0);
- ASSERT(ip->i_d.di_nextents == 0);
- ASSERT(ip->i_d.di_anextents == 0);
- ASSERT(ip->i_d.di_size == 0 || !S_ISREG(ip->i_d.di_mode));
- ASSERT(ip->i_d.di_nblocks == 0);
-
- /*
- * Pull the on-disk inode from the AGI unlinked list.
- */
- error = xfs_iunlink_remove(tp, ip);
- if (error)
- return error;
-
- error = xfs_difree(tp, ip->i_ino, flist, &delete, &first_ino);
- if (error)
- return error;
-
- ip->i_d.di_mode = 0; /* mark incore inode as free */
- ip->i_d.di_flags = 0;
- ip->i_d.di_dmevmask = 0;
- ip->i_d.di_forkoff = 0; /* mark the attr fork not in use */
- ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
- ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
- /*
- * Bump the generation count so no one will be confused
- * by reincarnations of this inode.
- */
- ip->i_d.di_gen++;
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-
- if (delete)
- error = xfs_ifree_cluster(ip, tp, first_ino);
-
- return error;
-}
-
-/*
- * Reallocate the space for if_broot based on the number of records
- * being added or deleted as indicated in rec_diff. Move the records
- * and pointers in if_broot to fit the new size. When shrinking this
- * will eliminate holes between the records and pointers created by
- * the caller. When growing this will create holes to be filled in
- * by the caller.
- *
- * The caller must not request to add more records than would fit in
- * the on-disk inode root. If the if_broot is currently NULL, then
- * if we adding records one will be allocated. The caller must also
- * not request that the number of records go below zero, although
- * it can go to zero.
- *
- * ip -- the inode whose if_broot area is changing
- * ext_diff -- the change in the number of records, positive or negative,
- * requested for the if_broot array.
- */
-void
-xfs_iroot_realloc(
- xfs_inode_t *ip,
- int rec_diff,
- int whichfork)
-{
- struct xfs_mount *mp = ip->i_mount;
- int cur_max;
- xfs_ifork_t *ifp;
- struct xfs_btree_block *new_broot;
- int new_max;
- size_t new_size;
- char *np;
- char *op;
-
- /*
- * Handle the degenerate case quietly.
- */
- if (rec_diff == 0) {
- return;
- }
-
- ifp = XFS_IFORK_PTR(ip, whichfork);
- if (rec_diff > 0) {
- /*
- * If there wasn't any memory allocated before, just
- * allocate it now and get out.
- */
- if (ifp->if_broot_bytes == 0) {
- new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, rec_diff);
- ifp->if_broot = kmem_alloc(new_size, KM_SLEEP | KM_NOFS);
- ifp->if_broot_bytes = (int)new_size;
- return;
- }
-
- /*
- * If there is already an existing if_broot, then we need
- * to realloc() it and shift the pointers to their new
- * location. The records don't change location because
- * they are kept butted up against the btree block header.
- */
- cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
- new_max = cur_max + rec_diff;
- new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
- ifp->if_broot = kmem_realloc(ifp->if_broot, new_size,
- XFS_BMAP_BROOT_SPACE_CALC(mp, cur_max),
- KM_SLEEP | KM_NOFS);
- op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
- ifp->if_broot_bytes);
- np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
- (int)new_size);
- ifp->if_broot_bytes = (int)new_size;
- ASSERT(ifp->if_broot_bytes <=
- XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ(ip));
- memmove(np, op, cur_max * (uint)sizeof(xfs_dfsbno_t));
- return;
- }
-
- /*
- * rec_diff is less than 0. In this case, we are shrinking the
- * if_broot buffer. It must already exist. If we go to zero
- * records, just get rid of the root and clear the status bit.
- */
- ASSERT((ifp->if_broot != NULL) && (ifp->if_broot_bytes > 0));
- cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
- new_max = cur_max + rec_diff;
- ASSERT(new_max >= 0);
- if (new_max > 0)
- new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
- else
- new_size = 0;
- if (new_size > 0) {
- new_broot = kmem_alloc(new_size, KM_SLEEP | KM_NOFS);
- /*
- * First copy over the btree block header.
- */
- memcpy(new_broot, ifp->if_broot,
- XFS_BMBT_BLOCK_LEN(ip->i_mount));
- } else {
- new_broot = NULL;
- ifp->if_flags &= ~XFS_IFBROOT;
- }
-
- /*
- * Only copy the records and pointers if there are any.
- */
- if (new_max > 0) {
- /*
- * First copy the records.
- */
- op = (char *)XFS_BMBT_REC_ADDR(mp, ifp->if_broot, 1);
- np = (char *)XFS_BMBT_REC_ADDR(mp, new_broot, 1);
- memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_rec_t));
-
- /*
- * Then copy the pointers.
- */
- op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
- ifp->if_broot_bytes);
- np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, new_broot, 1,
- (int)new_size);
- memcpy(np, op, new_max * (uint)sizeof(xfs_dfsbno_t));
- }
- kmem_free(ifp->if_broot);
- ifp->if_broot = new_broot;
- ifp->if_broot_bytes = (int)new_size;
- ASSERT(ifp->if_broot_bytes <=
- XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ(ip));
- return;
-}
-
-
-/*
- * This is called when the amount of space needed for if_data
- * is increased or decreased. The change in size is indicated by
- * the number of bytes that need to be added or deleted in the
- * byte_diff parameter.
+ * This is called when the amount of space needed for if_data
+ * is increased or decreased. The change in size is indicated by
+ * the number of bytes that need to be added or deleted in the
+ * byte_diff parameter.
*
* If the amount of space needed has decreased below the size of the
* inline buffer, then switch to using the inline buffer. Otherwise,
@@ -2339,59 +1132,16 @@ xfs_idestroy_fork(
((ifp->if_flags & XFS_IFEXTIREC) ||
((ifp->if_u1.if_extents != NULL) &&
(ifp->if_u1.if_extents != ifp->if_u2.if_inline_ext)))) {
- ASSERT(ifp->if_real_bytes != 0);
- xfs_iext_destroy(ifp);
- }
- ASSERT(ifp->if_u1.if_extents == NULL ||
- ifp->if_u1.if_extents == ifp->if_u2.if_inline_ext);
- ASSERT(ifp->if_real_bytes == 0);
- if (whichfork == XFS_ATTR_FORK) {
- kmem_zone_free(xfs_ifork_zone, ip->i_afp);
- ip->i_afp = NULL;
- }
-}
-
-/*
- * This is called to unpin an inode. The caller must have the inode locked
- * in at least shared mode so that the buffer cannot be subsequently pinned
- * once someone is waiting for it to be unpinned.
- */
-static void
-xfs_iunpin(
- struct xfs_inode *ip)
-{
- ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
-
- trace_xfs_inode_unpin_nowait(ip, _RET_IP_);
-
- /* Give the log a push to start the unpinning I/O */
- xfs_log_force_lsn(ip->i_mount, ip->i_itemp->ili_last_lsn, 0);
-
-}
-
-static void
-__xfs_iunpin_wait(
- struct xfs_inode *ip)
-{
- wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_IPINNED_BIT);
- DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_IPINNED_BIT);
-
- xfs_iunpin(ip);
-
- do {
- prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
- if (xfs_ipincount(ip))
- io_schedule();
- } while (xfs_ipincount(ip));
- finish_wait(wq, &wait.wait);
-}
-
-void
-xfs_iunpin_wait(
- struct xfs_inode *ip)
-{
- if (xfs_ipincount(ip))
- __xfs_iunpin_wait(ip);
+ ASSERT(ifp->if_real_bytes != 0);
+ xfs_iext_destroy(ifp);
+ }
+ ASSERT(ifp->if_u1.if_extents == NULL ||
+ ifp->if_u1.if_extents == ifp->if_u2.if_inline_ext);
+ ASSERT(ifp->if_real_bytes == 0);
+ if (whichfork == XFS_ATTR_FORK) {
+ kmem_zone_free(xfs_ifork_zone, ip->i_afp);
+ ip->i_afp = NULL;
+ }
}
/*
@@ -2428,568 +1178,128 @@ xfs_iextents_copy(
/*
* There are some delayed allocation extents in the
* inode, so copy the extents one at a time and skip
- * the delayed ones. There must be at least one
- * non-delayed extent.
- */
- copied = 0;
- for (i = 0; i < nrecs; i++) {
- xfs_bmbt_rec_host_t *ep = xfs_iext_get_ext(ifp, i);
- start_block = xfs_bmbt_get_startblock(ep);
- if (isnullstartblock(start_block)) {
- /*
- * It's a delayed allocation extent, so skip it.
- */
- continue;
- }
-
- /* Translate to on disk format */
- put_unaligned(cpu_to_be64(ep->l0), &dp->l0);
- put_unaligned(cpu_to_be64(ep->l1), &dp->l1);
- dp++;
- copied++;
- }
- ASSERT(copied != 0);
- xfs_validate_extents(ifp, copied, XFS_EXTFMT_INODE(ip));
-
- return (copied * (uint)sizeof(xfs_bmbt_rec_t));
-}
-
-/*
- * Each of the following cases stores data into the same region
- * of the on-disk inode, so only one of them can be valid at
- * any given time. While it is possible to have conflicting formats
- * and log flags, e.g. having XFS_ILOG_?DATA set when the fork is
- * in EXTENTS format, this can only happen when the fork has
- * changed formats after being modified but before being flushed.
- * In these cases, the format always takes precedence, because the
- * format indicates the current state of the fork.
- */
-/*ARGSUSED*/
-STATIC void
-xfs_iflush_fork(
- xfs_inode_t *ip,
- xfs_dinode_t *dip,
- xfs_inode_log_item_t *iip,
- int whichfork,
- xfs_buf_t *bp)
-{
- char *cp;
- xfs_ifork_t *ifp;
- xfs_mount_t *mp;
- static const short brootflag[2] =
- { XFS_ILOG_DBROOT, XFS_ILOG_ABROOT };
- static const short dataflag[2] =
- { XFS_ILOG_DDATA, XFS_ILOG_ADATA };
- static const short extflag[2] =
- { XFS_ILOG_DEXT, XFS_ILOG_AEXT };
-
- if (!iip)
- return;
- ifp = XFS_IFORK_PTR(ip, whichfork);
- /*
- * This can happen if we gave up in iformat in an error path,
- * for the attribute fork.
- */
- if (!ifp) {
- ASSERT(whichfork == XFS_ATTR_FORK);
- return;
- }
- cp = XFS_DFORK_PTR(dip, whichfork);
- mp = ip->i_mount;
- switch (XFS_IFORK_FORMAT(ip, whichfork)) {
- case XFS_DINODE_FMT_LOCAL:
- if ((iip->ili_fields & dataflag[whichfork]) &&
- (ifp->if_bytes > 0)) {
- ASSERT(ifp->if_u1.if_data != NULL);
- ASSERT(ifp->if_bytes <= XFS_IFORK_SIZE(ip, whichfork));
- memcpy(cp, ifp->if_u1.if_data, ifp->if_bytes);
- }
- break;
-
- case XFS_DINODE_FMT_EXTENTS:
- ASSERT((ifp->if_flags & XFS_IFEXTENTS) ||
- !(iip->ili_fields & extflag[whichfork]));
- if ((iip->ili_fields & extflag[whichfork]) &&
- (ifp->if_bytes > 0)) {
- ASSERT(xfs_iext_get_ext(ifp, 0));
- ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) > 0);
- (void)xfs_iextents_copy(ip, (xfs_bmbt_rec_t *)cp,
- whichfork);
- }
- break;
-
- case XFS_DINODE_FMT_BTREE:
- if ((iip->ili_fields & brootflag[whichfork]) &&
- (ifp->if_broot_bytes > 0)) {
- ASSERT(ifp->if_broot != NULL);
- ASSERT(ifp->if_broot_bytes <=
- (XFS_IFORK_SIZE(ip, whichfork) +
- XFS_BROOT_SIZE_ADJ(ip)));
- xfs_bmbt_to_bmdr(mp, ifp->if_broot, ifp->if_broot_bytes,
- (xfs_bmdr_block_t *)cp,
- XFS_DFORK_SIZE(dip, mp, whichfork));
- }
- break;
-
- case XFS_DINODE_FMT_DEV:
- if (iip->ili_fields & XFS_ILOG_DEV) {
- ASSERT(whichfork == XFS_DATA_FORK);
- xfs_dinode_put_rdev(dip, ip->i_df.if_u2.if_rdev);
- }
- break;
-
- case XFS_DINODE_FMT_UUID:
- if (iip->ili_fields & XFS_ILOG_UUID) {
- ASSERT(whichfork == XFS_DATA_FORK);
- memcpy(XFS_DFORK_DPTR(dip),
- &ip->i_df.if_u2.if_uuid,
- sizeof(uuid_t));
- }
- break;
-
- default:
- ASSERT(0);
- break;
- }
-}
-
-STATIC int
-xfs_iflush_cluster(
- xfs_inode_t *ip,
- xfs_buf_t *bp)
-{
- xfs_mount_t *mp = ip->i_mount;
- struct xfs_perag *pag;
- unsigned long first_index, mask;
- unsigned long inodes_per_cluster;
- int ilist_size;
- xfs_inode_t **ilist;
- xfs_inode_t *iq;
- int nr_found;
- int clcount = 0;
- int bufwasdelwri;
- int i;
-
- pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-
- inodes_per_cluster = XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog;
- ilist_size = inodes_per_cluster * sizeof(xfs_inode_t *);
- ilist = kmem_alloc(ilist_size, KM_MAYFAIL|KM_NOFS);
- if (!ilist)
- goto out_put;
-
- mask = ~(((XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog)) - 1);
- first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & mask;
- rcu_read_lock();
- /* really need a gang lookup range call here */
- nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)ilist,
- first_index, inodes_per_cluster);
- if (nr_found == 0)
- goto out_free;
-
- for (i = 0; i < nr_found; i++) {
- iq = ilist[i];
- if (iq == ip)
- continue;
-
- /*
- * because this is an RCU protected lookup, we could find a
- * recently freed or even reallocated inode during the lookup.
- * We need to check under the i_flags_lock for a valid inode
- * here. Skip it if it is not valid or the wrong inode.
- */
- spin_lock(&ip->i_flags_lock);
- if (!ip->i_ino ||
- (XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) {
- spin_unlock(&ip->i_flags_lock);
- continue;
- }
- spin_unlock(&ip->i_flags_lock);
-
- /*
- * Do an un-protected check to see if the inode is dirty and
- * is a candidate for flushing. These checks will be repeated
- * later after the appropriate locks are acquired.
- */
- if (xfs_inode_clean(iq) && xfs_ipincount(iq) == 0)
- continue;
-
- /*
- * Try to get locks. If any are unavailable or it is pinned,
- * then this inode cannot be flushed and is skipped.
- */
-
- if (!xfs_ilock_nowait(iq, XFS_ILOCK_SHARED))
- continue;
- if (!xfs_iflock_nowait(iq)) {
- xfs_iunlock(iq, XFS_ILOCK_SHARED);
- continue;
- }
- if (xfs_ipincount(iq)) {
- xfs_ifunlock(iq);
- xfs_iunlock(iq, XFS_ILOCK_SHARED);
- continue;
- }
-
- /*
- * arriving here means that this inode can be flushed. First
- * re-check that it's dirty before flushing.
- */
- if (!xfs_inode_clean(iq)) {
- int error;
- error = xfs_iflush_int(iq, bp);
- if (error) {
- xfs_iunlock(iq, XFS_ILOCK_SHARED);
- goto cluster_corrupt_out;
- }
- clcount++;
- } else {
- xfs_ifunlock(iq);
- }
- xfs_iunlock(iq, XFS_ILOCK_SHARED);
- }
-
- if (clcount) {
- XFS_STATS_INC(xs_icluster_flushcnt);
- XFS_STATS_ADD(xs_icluster_flushinode, clcount);
- }
-
-out_free:
- rcu_read_unlock();
- kmem_free(ilist);
-out_put:
- xfs_perag_put(pag);
- return 0;
-
-
-cluster_corrupt_out:
- /*
- * Corruption detected in the clustering loop. Invalidate the
- * inode buffer and shut down the filesystem.
- */
- rcu_read_unlock();
- /*
- * Clean up the buffer. If it was delwri, just release it --
- * brelse can handle it with no problems. If not, shut down the
- * filesystem before releasing the buffer.
- */
- bufwasdelwri = (bp->b_flags & _XBF_DELWRI_Q);
- if (bufwasdelwri)
- xfs_buf_relse(bp);
-
- xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
-
- if (!bufwasdelwri) {
- /*
- * Just like incore_relse: if we have b_iodone functions,
- * mark the buffer as an error and call them. Otherwise
- * mark it as stale and brelse.
- */
- if (bp->b_iodone) {
- XFS_BUF_UNDONE(bp);
- xfs_buf_stale(bp);
- xfs_buf_ioerror(bp, EIO);
- xfs_buf_ioend(bp, 0);
- } else {
- xfs_buf_stale(bp);
- xfs_buf_relse(bp);
- }
- }
-
- /*
- * Unlocks the flush lock
- */
- xfs_iflush_abort(iq, false);
- kmem_free(ilist);
- xfs_perag_put(pag);
- return XFS_ERROR(EFSCORRUPTED);
-}
-
-/*
- * Flush dirty inode metadata into the backing buffer.
- *
- * The caller must have the inode lock and the inode flush lock held. The
- * inode lock will still be held upon return to the caller, and the inode
- * flush lock will be released after the inode has reached the disk.
- *
- * The caller must write out the buffer returned in *bpp and release it.
- */
-int
-xfs_iflush(
- struct xfs_inode *ip,
- struct xfs_buf **bpp)
-{
- struct xfs_mount *mp = ip->i_mount;
- struct xfs_buf *bp;
- struct xfs_dinode *dip;
- int error;
-
- XFS_STATS_INC(xs_iflush_count);
-
- ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
- ASSERT(xfs_isiflocked(ip));
- ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
- ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
-
- *bpp = NULL;
-
- xfs_iunpin_wait(ip);
-
- /*
- * For stale inodes we cannot rely on the backing buffer remaining
- * stale in cache for the remaining life of the stale inode and so
- * xfs_imap_to_bp() below may give us a buffer that no longer contains
- * inodes below. We have to check this after ensuring the inode is
- * unpinned so that it is safe to reclaim the stale inode after the
- * flush call.
- */
- if (xfs_iflags_test(ip, XFS_ISTALE)) {
- xfs_ifunlock(ip);
- return 0;
- }
-
- /*
- * This may have been unpinned because the filesystem is shutting
- * down forcibly. If that's the case we must not write this inode
- * to disk, because the log record didn't make it to disk.
- *
- * We also have to remove the log item from the AIL in this case,
- * as we wait for an empty AIL as part of the unmount process.
- */
- if (XFS_FORCED_SHUTDOWN(mp)) {
- error = XFS_ERROR(EIO);
- goto abort_out;
- }
-
- /*
- * Get the buffer containing the on-disk inode.
- */
- error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
- 0);
- if (error || !bp) {
- xfs_ifunlock(ip);
- return error;
- }
-
- /*
- * First flush out the inode that xfs_iflush was called with.
- */
- error = xfs_iflush_int(ip, bp);
- if (error)
- goto corrupt_out;
-
- /*
- * If the buffer is pinned then push on the log now so we won't
- * get stuck waiting in the write for too long.
- */
- if (xfs_buf_ispinned(bp))
- xfs_log_force(mp, 0);
-
- /*
- * inode clustering:
- * see if other inodes can be gathered into this write
- */
- error = xfs_iflush_cluster(ip, bp);
- if (error)
- goto cluster_corrupt_out;
-
- *bpp = bp;
- return 0;
-
-corrupt_out:
- xfs_buf_relse(bp);
- xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
-cluster_corrupt_out:
- error = XFS_ERROR(EFSCORRUPTED);
-abort_out:
- /*
- * Unlocks the flush lock
- */
- xfs_iflush_abort(ip, false);
- return error;
-}
-
-
-STATIC int
-xfs_iflush_int(
- struct xfs_inode *ip,
- struct xfs_buf *bp)
-{
- struct xfs_inode_log_item *iip = ip->i_itemp;
- struct xfs_dinode *dip;
- struct xfs_mount *mp = ip->i_mount;
-
- ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
- ASSERT(xfs_isiflocked(ip));
- ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
- ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
- ASSERT(iip != NULL && iip->ili_fields != 0);
-
- /* set *dip = inode's place in the buffer */
- dip = (xfs_dinode_t *)xfs_buf_offset(bp, ip->i_imap.im_boffset);
-
- if (XFS_TEST_ERROR(dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC),
- mp, XFS_ERRTAG_IFLUSH_1, XFS_RANDOM_IFLUSH_1)) {
- xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
- "%s: Bad inode %Lu magic number 0x%x, ptr 0x%p",
- __func__, ip->i_ino, be16_to_cpu(dip->di_magic), dip);
- goto corrupt_out;
- }
- if (XFS_TEST_ERROR(ip->i_d.di_magic != XFS_DINODE_MAGIC,
- mp, XFS_ERRTAG_IFLUSH_2, XFS_RANDOM_IFLUSH_2)) {
- xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
- "%s: Bad inode %Lu, ptr 0x%p, magic number 0x%x",
- __func__, ip->i_ino, ip, ip->i_d.di_magic);
- goto corrupt_out;
- }
- if (S_ISREG(ip->i_d.di_mode)) {
- if (XFS_TEST_ERROR(
- (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS) &&
- (ip->i_d.di_format != XFS_DINODE_FMT_BTREE),
- mp, XFS_ERRTAG_IFLUSH_3, XFS_RANDOM_IFLUSH_3)) {
- xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
- "%s: Bad regular inode %Lu, ptr 0x%p",
- __func__, ip->i_ino, ip);
- goto corrupt_out;
- }
- } else if (S_ISDIR(ip->i_d.di_mode)) {
- if (XFS_TEST_ERROR(
- (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS) &&
- (ip->i_d.di_format != XFS_DINODE_FMT_BTREE) &&
- (ip->i_d.di_format != XFS_DINODE_FMT_LOCAL),
- mp, XFS_ERRTAG_IFLUSH_4, XFS_RANDOM_IFLUSH_4)) {
- xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
- "%s: Bad directory inode %Lu, ptr 0x%p",
- __func__, ip->i_ino, ip);
- goto corrupt_out;
- }
- }
- if (XFS_TEST_ERROR(ip->i_d.di_nextents + ip->i_d.di_anextents >
- ip->i_d.di_nblocks, mp, XFS_ERRTAG_IFLUSH_5,
- XFS_RANDOM_IFLUSH_5)) {
- xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
- "%s: detected corrupt incore inode %Lu, "
- "total extents = %d, nblocks = %Ld, ptr 0x%p",
- __func__, ip->i_ino,
- ip->i_d.di_nextents + ip->i_d.di_anextents,
- ip->i_d.di_nblocks, ip);
- goto corrupt_out;
- }
- if (XFS_TEST_ERROR(ip->i_d.di_forkoff > mp->m_sb.sb_inodesize,
- mp, XFS_ERRTAG_IFLUSH_6, XFS_RANDOM_IFLUSH_6)) {
- xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
- "%s: bad inode %Lu, forkoff 0x%x, ptr 0x%p",
- __func__, ip->i_ino, ip->i_d.di_forkoff, ip);
- goto corrupt_out;
- }
- /*
- * bump the flush iteration count, used to detect flushes which
- * postdate a log record during recovery. This is redundant as we now
- * log every change and hence this can't happen. Still, it doesn't hurt.
- */
- ip->i_d.di_flushiter++;
-
- /*
- * Copy the dirty parts of the inode into the on-disk
- * inode. We always copy out the core of the inode,
- * because if the inode is dirty at all the core must
- * be.
- */
- xfs_dinode_to_disk(dip, &ip->i_d);
-
- /* Wrap, we never let the log put out DI_MAX_FLUSH */
- if (ip->i_d.di_flushiter == DI_MAX_FLUSH)
- ip->i_d.di_flushiter = 0;
-
- /*
- * If this is really an old format inode and the superblock version
- * has not been updated to support only new format inodes, then
- * convert back to the old inode format. If the superblock version
- * has been updated, then make the conversion permanent.
+ * the delayed ones. There must be at least one
+ * non-delayed extent.
*/
- ASSERT(ip->i_d.di_version == 1 || xfs_sb_version_hasnlink(&mp->m_sb));
- if (ip->i_d.di_version == 1) {
- if (!xfs_sb_version_hasnlink(&mp->m_sb)) {
- /*
- * Convert it back.
- */
- ASSERT(ip->i_d.di_nlink <= XFS_MAXLINK_1);
- dip->di_onlink = cpu_to_be16(ip->i_d.di_nlink);
- } else {
+ copied = 0;
+ for (i = 0; i < nrecs; i++) {
+ xfs_bmbt_rec_host_t *ep = xfs_iext_get_ext(ifp, i);
+ start_block = xfs_bmbt_get_startblock(ep);
+ if (isnullstartblock(start_block)) {
/*
- * The superblock version has already been bumped,
- * so just make the conversion to the new inode
- * format permanent.
+ * It's a delayed allocation extent, so skip it.
*/
- ip->i_d.di_version = 2;
- dip->di_version = 2;
- ip->i_d.di_onlink = 0;
- dip->di_onlink = 0;
- memset(&(ip->i_d.di_pad[0]), 0, sizeof(ip->i_d.di_pad));
- memset(&(dip->di_pad[0]), 0,
- sizeof(dip->di_pad));
- ASSERT(xfs_get_projid(ip) == 0);
+ continue;
}
- }
- xfs_iflush_fork(ip, dip, iip, XFS_DATA_FORK, bp);
- if (XFS_IFORK_Q(ip))
- xfs_iflush_fork(ip, dip, iip, XFS_ATTR_FORK, bp);
- xfs_inobp_check(mp, bp);
+ /* Translate to on disk format */
+ put_unaligned(cpu_to_be64(ep->l0), &dp->l0);
+ put_unaligned(cpu_to_be64(ep->l1), &dp->l1);
+ dp++;
+ copied++;
+ }
+ ASSERT(copied != 0);
+ xfs_validate_extents(ifp, copied, XFS_EXTFMT_INODE(ip));
- /*
- * We've recorded everything logged in the inode, so we'd like to clear
- * the ili_fields bits so we don't log and flush things unnecessarily.
- * However, we can't stop logging all this information until the data
- * we've copied into the disk buffer is written to disk. If we did we
- * might overwrite the copy of the inode in the log with all the data
- * after re-logging only part of it, and in the face of a crash we
- * wouldn't have all the data we need to recover.
- *
- * What we do is move the bits to the ili_last_fields field. When
- * logging the inode, these bits are moved back to the ili_fields field.
- * In the xfs_iflush_done() routine we clear ili_last_fields, since we
- * know that the information those bits represent is permanently on
- * disk. As long as the flush completes before the inode is logged
- * again, then both ili_fields and ili_last_fields will be cleared.
- *
- * We can play with the ili_fields bits here, because the inode lock
- * must be held exclusively in order to set bits there and the flush
- * lock protects the ili_last_fields bits. Set ili_logged so the flush
- * done routine can tell whether or not to look in the AIL. Also, store
- * the current LSN of the inode so that we can tell whether the item has
- * moved in the AIL from xfs_iflush_done(). In order to read the lsn we
- * need the AIL lock, because it is a 64 bit value that cannot be read
- * atomically.
- */
- iip->ili_last_fields = iip->ili_fields;
- iip->ili_fields = 0;
- iip->ili_logged = 1;
+ return (copied * (uint)sizeof(xfs_bmbt_rec_t));
+}
- xfs_trans_ail_copy_lsn(mp->m_ail, &iip->ili_flush_lsn,
- &iip->ili_item.li_lsn);
+/*
+ * Each of the following cases stores data into the same region
+ * of the on-disk inode, so only one of them can be valid at
+ * any given time. While it is possible to have conflicting formats
+ * and log flags, e.g. having XFS_ILOG_?DATA set when the fork is
+ * in EXTENTS format, this can only happen when the fork has
+ * changed formats after being modified but before being flushed.
+ * In these cases, the format always takes precedence, because the
+ * format indicates the current state of the fork.
+ */
+void
+xfs_iflush_fork(
+ xfs_inode_t *ip,
+ xfs_dinode_t *dip,
+ xfs_inode_log_item_t *iip,
+ int whichfork,
+ xfs_buf_t *bp)
+{
+ char *cp;
+ xfs_ifork_t *ifp;
+ xfs_mount_t *mp;
+ static const short brootflag[2] =
+ { XFS_ILOG_DBROOT, XFS_ILOG_ABROOT };
+ static const short dataflag[2] =
+ { XFS_ILOG_DDATA, XFS_ILOG_ADATA };
+ static const short extflag[2] =
+ { XFS_ILOG_DEXT, XFS_ILOG_AEXT };
+ if (!iip)
+ return;
+ ifp = XFS_IFORK_PTR(ip, whichfork);
/*
- * Attach the function xfs_iflush_done to the inode's
- * buffer. This will remove the inode from the AIL
- * and unlock the inode's flush lock when the inode is
- * completely written to disk.
+ * This can happen if we gave up in iformat in an error path,
+ * for the attribute fork.
*/
- xfs_buf_attach_iodone(bp, xfs_iflush_done, &iip->ili_item);
+ if (!ifp) {
+ ASSERT(whichfork == XFS_ATTR_FORK);
+ return;
+ }
+ cp = XFS_DFORK_PTR(dip, whichfork);
+ mp = ip->i_mount;
+ switch (XFS_IFORK_FORMAT(ip, whichfork)) {
+ case XFS_DINODE_FMT_LOCAL:
+ if ((iip->ili_fields & dataflag[whichfork]) &&
+ (ifp->if_bytes > 0)) {
+ ASSERT(ifp->if_u1.if_data != NULL);
+ ASSERT(ifp->if_bytes <= XFS_IFORK_SIZE(ip, whichfork));
+ memcpy(cp, ifp->if_u1.if_data, ifp->if_bytes);
+ }
+ break;
+
+ case XFS_DINODE_FMT_EXTENTS:
+ ASSERT((ifp->if_flags & XFS_IFEXTENTS) ||
+ !(iip->ili_fields & extflag[whichfork]));
+ if ((iip->ili_fields & extflag[whichfork]) &&
+ (ifp->if_bytes > 0)) {
+ ASSERT(xfs_iext_get_ext(ifp, 0));
+ ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) > 0);
+ (void)xfs_iextents_copy(ip, (xfs_bmbt_rec_t *)cp,
+ whichfork);
+ }
+ break;
- /* update the lsn in the on disk inode if required */
- if (ip->i_d.di_version == 3)
- dip->di_lsn = cpu_to_be64(iip->ili_item.li_lsn);
+ case XFS_DINODE_FMT_BTREE:
+ if ((iip->ili_fields & brootflag[whichfork]) &&
+ (ifp->if_broot_bytes > 0)) {
+ ASSERT(ifp->if_broot != NULL);
+ ASSERT(ifp->if_broot_bytes <=
+ (XFS_IFORK_SIZE(ip, whichfork) +
+ XFS_BROOT_SIZE_ADJ(ip)));
+ xfs_bmbt_to_bmdr(mp, ifp->if_broot, ifp->if_broot_bytes,
+ (xfs_bmdr_block_t *)cp,
+ XFS_DFORK_SIZE(dip, mp, whichfork));
+ }
+ break;
- /* generate the checksum. */
- xfs_dinode_calc_crc(mp, dip);
+ case XFS_DINODE_FMT_DEV:
+ if (iip->ili_fields & XFS_ILOG_DEV) {
+ ASSERT(whichfork == XFS_DATA_FORK);
+ xfs_dinode_put_rdev(dip, ip->i_df.if_u2.if_rdev);
+ }
+ break;
- ASSERT(bp->b_fspriv != NULL);
- ASSERT(bp->b_iodone != NULL);
- return 0;
+ case XFS_DINODE_FMT_UUID:
+ if (iip->ili_fields & XFS_ILOG_UUID) {
+ ASSERT(whichfork == XFS_DATA_FORK);
+ memcpy(XFS_DFORK_DPTR(dip),
+ &ip->i_df.if_u2.if_uuid,
+ sizeof(uuid_t));
+ }
+ break;
-corrupt_out:
- return XFS_ERROR(EFSCORRUPTED);
+ default:
+ ASSERT(0);
+ break;
+ }
}
/*
@@ -3020,6 +1330,66 @@ xfs_iext_get_ext(
}
/*
+ * Create, destroy, or resize a linear (direct) block of extents.
+ */
+STATIC void
+xfs_iext_realloc_direct(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ int new_size) /* new size of extents */
+{
+ int rnew_size; /* real new size of extents */
+
+ rnew_size = new_size;
+
+ ASSERT(!(ifp->if_flags & XFS_IFEXTIREC) ||
+ ((new_size >= 0) && (new_size <= XFS_IEXT_BUFSZ) &&
+ (new_size != ifp->if_real_bytes)));
+
+ /* Free extent records */
+ if (new_size == 0) {
+ xfs_iext_destroy(ifp);
+ }
+ /* Resize direct extent list and zero any new bytes */
+ else if (ifp->if_real_bytes) {
+ /* Check if extents will fit inside the inode */
+ if (new_size <= XFS_INLINE_EXTS * sizeof(xfs_bmbt_rec_t)) {
+ xfs_iext_direct_to_inline(ifp, new_size /
+ (uint)sizeof(xfs_bmbt_rec_t));
+ ifp->if_bytes = new_size;
+ return;
+ }
+ if (!is_power_of_2(new_size)){
+ rnew_size = roundup_pow_of_two(new_size);
+ }
+ if (rnew_size != ifp->if_real_bytes) {
+ ifp->if_u1.if_extents =
+ kmem_realloc(ifp->if_u1.if_extents,
+ rnew_size,
+ ifp->if_real_bytes, KM_NOFS);
+ }
+ if (rnew_size > ifp->if_real_bytes) {
+ memset(&ifp->if_u1.if_extents[ifp->if_bytes /
+ (uint)sizeof(xfs_bmbt_rec_t)], 0,
+ rnew_size - ifp->if_real_bytes);
+ }
+ }
+ /*
+ * Switch from the inline extent buffer to a direct
+ * extent list. Be sure to include the inline extent
+ * bytes in new_size.
+ */
+ else {
+ new_size += ifp->if_bytes;
+ if (!is_power_of_2(new_size)) {
+ rnew_size = roundup_pow_of_two(new_size);
+ }
+ xfs_iext_inline_to_direct(ifp, rnew_size);
+ }
+ ifp->if_real_bytes = rnew_size;
+ ifp->if_bytes = new_size;
+}
+
+/*
* Insert new item(s) into the extent records for incore inode
* fork 'ifp'. 'count' new items are inserted at index 'idx'.
*/
@@ -3459,66 +1829,6 @@ xfs_iext_remove_indirect(
}
/*
- * Create, destroy, or resize a linear (direct) block of extents.
- */
-void
-xfs_iext_realloc_direct(
- xfs_ifork_t *ifp, /* inode fork pointer */
- int new_size) /* new size of extents */
-{
- int rnew_size; /* real new size of extents */
-
- rnew_size = new_size;
-
- ASSERT(!(ifp->if_flags & XFS_IFEXTIREC) ||
- ((new_size >= 0) && (new_size <= XFS_IEXT_BUFSZ) &&
- (new_size != ifp->if_real_bytes)));
-
- /* Free extent records */
- if (new_size == 0) {
- xfs_iext_destroy(ifp);
- }
- /* Resize direct extent list and zero any new bytes */
- else if (ifp->if_real_bytes) {
- /* Check if extents will fit inside the inode */
- if (new_size <= XFS_INLINE_EXTS * sizeof(xfs_bmbt_rec_t)) {
- xfs_iext_direct_to_inline(ifp, new_size /
- (uint)sizeof(xfs_bmbt_rec_t));
- ifp->if_bytes = new_size;
- return;
- }
- if (!is_power_of_2(new_size)){
- rnew_size = roundup_pow_of_two(new_size);
- }
- if (rnew_size != ifp->if_real_bytes) {
- ifp->if_u1.if_extents =
- kmem_realloc(ifp->if_u1.if_extents,
- rnew_size,
- ifp->if_real_bytes, KM_NOFS);
- }
- if (rnew_size > ifp->if_real_bytes) {
- memset(&ifp->if_u1.if_extents[ifp->if_bytes /
- (uint)sizeof(xfs_bmbt_rec_t)], 0,
- rnew_size - ifp->if_real_bytes);
- }
- }
- /*
- * Switch from the inline extent buffer to a direct
- * extent list. Be sure to include the inline extent
- * bytes in new_size.
- */
- else {
- new_size += ifp->if_bytes;
- if (!is_power_of_2(new_size)) {
- rnew_size = roundup_pow_of_two(new_size);
- }
- xfs_iext_inline_to_direct(ifp, rnew_size);
- }
- ifp->if_real_bytes = rnew_size;
- ifp->if_bytes = new_size;
-}
-
-/*
* Switch from linear (direct) extent records to inline buffer.
*/
void
@@ -4023,40 +2333,3 @@ xfs_iext_irec_update_extoffs(
ifp->if_u1.if_ext_irec[i].er_extoff += ext_diff;
}
}
-
-/*
- * Test whether it is appropriate to check an inode for and free post EOF
- * blocks. The 'force' parameter determines whether we should also consider
- * regular files that are marked preallocated or append-only.
- */
-bool
-xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
-{
- /* prealloc/delalloc exists only on regular files */
- if (!S_ISREG(ip->i_d.di_mode))
- return false;
-
- /*
- * Zero sized files with no cached pages and delalloc blocks will not
- * have speculative prealloc/delalloc blocks to remove.
- */
- if (VFS_I(ip)->i_size == 0 &&
- VN_CACHED(VFS_I(ip)) == 0 &&
- ip->i_delayed_blks == 0)
- return false;
-
- /* If we haven't read in the extent list, then don't do it now. */
- if (!(ip->i_df.if_flags & XFS_IFEXTENTS))
- return false;
-
- /*
- * Do not free real preallocated or append-only files unless the file
- * has delalloc blocks and we are forced to remove them.
- */
- if (ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
- if (!force || ip->i_delayed_blks == 0)
- return false;
-
- return true;
-}
-
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 9112979..739a975 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -21,6 +21,7 @@
struct posix_acl;
struct xfs_dinode;
struct xfs_inode;
+struct xfs_inode_log_item;
/*
* Fork identifiers.
@@ -237,335 +238,7 @@ static inline uint xfs_icdinode_size(int version)
#ifdef __KERNEL__
-struct xfs_buf;
-struct xfs_bmap_free;
-struct xfs_bmbt_irec;
-struct xfs_inode_log_item;
-struct xfs_mount;
-struct xfs_trans;
-struct xfs_dquot;
-
-typedef struct xfs_inode {
- /* Inode linking and identification information. */
- struct xfs_mount *i_mount; /* fs mount struct ptr */
- struct xfs_dquot *i_udquot; /* user dquot */
- struct xfs_dquot *i_gdquot; /* group dquot */
-
- /* Inode location stuff */
- xfs_ino_t i_ino; /* inode number (agno/agino)*/
- struct xfs_imap i_imap; /* location for xfs_imap() */
-
- /* Extent information. */
- xfs_ifork_t *i_afp; /* attribute fork pointer */
- xfs_ifork_t i_df; /* data fork */
-
- /* Transaction and locking information. */
- struct xfs_inode_log_item *i_itemp; /* logging information */
- mrlock_t i_lock; /* inode lock */
- mrlock_t i_iolock; /* inode IO lock */
- atomic_t i_pincount; /* inode pin count */
- spinlock_t i_flags_lock; /* inode i_flags lock */
- /* Miscellaneous state. */
- unsigned long i_flags; /* see defined flags below */
- unsigned int i_delayed_blks; /* count of delay alloc blks */
-
- xfs_icdinode_t i_d; /* most of ondisk inode */
-
- /* VFS inode */
- struct inode i_vnode; /* embedded VFS inode */
-} xfs_inode_t;
-
-/* Convert from vfs inode to xfs inode */
-static inline struct xfs_inode *XFS_I(struct inode *inode)
-{
- return container_of(inode, struct xfs_inode, i_vnode);
-}
-
-/* convert from xfs inode to vfs inode */
-static inline struct inode *VFS_I(struct xfs_inode *ip)
-{
- return &ip->i_vnode;
-}
-
-/*
- * For regular files we only update the on-disk filesize when actually
- * writing data back to disk. Until then only the copy in the VFS inode
- * is uptodate.
- */
-static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
-{
- if (S_ISREG(ip->i_d.di_mode))
- return i_size_read(VFS_I(ip));
- return ip->i_d.di_size;
-}
-
-/*
- * If this I/O goes past the on-disk inode size update it unless it would
- * be past the current in-core inode size.
- */
-static inline xfs_fsize_t
-xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size)
-{
- xfs_fsize_t i_size = i_size_read(VFS_I(ip));
-
- if (new_size > i_size)
- new_size = i_size;
- return new_size > ip->i_d.di_size ? new_size : 0;
-}
-
-/*
- * i_flags helper functions
- */
-static inline void
-__xfs_iflags_set(xfs_inode_t *ip, unsigned short flags)
-{
- ip->i_flags |= flags;
-}
-
-static inline void
-xfs_iflags_set(xfs_inode_t *ip, unsigned short flags)
-{
- spin_lock(&ip->i_flags_lock);
- __xfs_iflags_set(ip, flags);
- spin_unlock(&ip->i_flags_lock);
-}
-
-static inline void
-xfs_iflags_clear(xfs_inode_t *ip, unsigned short flags)
-{
- spin_lock(&ip->i_flags_lock);
- ip->i_flags &= ~flags;
- spin_unlock(&ip->i_flags_lock);
-}
-
-static inline int
-__xfs_iflags_test(xfs_inode_t *ip, unsigned short flags)
-{
- return (ip->i_flags & flags);
-}
-
-static inline int
-xfs_iflags_test(xfs_inode_t *ip, unsigned short flags)
-{
- int ret;
- spin_lock(&ip->i_flags_lock);
- ret = __xfs_iflags_test(ip, flags);
- spin_unlock(&ip->i_flags_lock);
- return ret;
-}
-
-static inline int
-xfs_iflags_test_and_clear(xfs_inode_t *ip, unsigned short flags)
-{
- int ret;
-
- spin_lock(&ip->i_flags_lock);
- ret = ip->i_flags & flags;
- if (ret)
- ip->i_flags &= ~flags;
- spin_unlock(&ip->i_flags_lock);
- return ret;
-}
-
-static inline int
-xfs_iflags_test_and_set(xfs_inode_t *ip, unsigned short flags)
-{
- int ret;
-
- spin_lock(&ip->i_flags_lock);
- ret = ip->i_flags & flags;
- if (!ret)
- ip->i_flags |= flags;
- spin_unlock(&ip->i_flags_lock);
- return ret;
-}
-
-/*
- * Project quota id helpers (previously projid was 16bit only
- * and using two 16bit values to hold new 32bit projid was chosen
- * to retain compatibility with "old" filesystems).
- */
-static inline prid_t
-xfs_get_projid(struct xfs_inode *ip)
-{
- return (prid_t)ip->i_d.di_projid_hi << 16 | ip->i_d.di_projid_lo;
-}
-
-static inline void
-xfs_set_projid(struct xfs_inode *ip,
- prid_t projid)
-{
- ip->i_d.di_projid_hi = (__uint16_t) (projid >> 16);
- ip->i_d.di_projid_lo = (__uint16_t) (projid & 0xffff);
-}
-
-/*
- * In-core inode flags.
- */
-#define XFS_IRECLAIM (1 << 0) /* started reclaiming this inode */
-#define XFS_ISTALE (1 << 1) /* inode has been staled */
-#define XFS_IRECLAIMABLE (1 << 2) /* inode can be reclaimed */
-#define XFS_INEW (1 << 3) /* inode has just been allocated */
-#define XFS_IFILESTREAM (1 << 4) /* inode is in a filestream dir. */
-#define XFS_ITRUNCATED (1 << 5) /* truncated down so flush-on-close */
-#define XFS_IDIRTY_RELEASE (1 << 6) /* dirty release already seen */
-#define __XFS_IFLOCK_BIT 7 /* inode is being flushed right now */
-#define XFS_IFLOCK (1 << __XFS_IFLOCK_BIT)
-#define __XFS_IPINNED_BIT 8 /* wakeup key for zero pin count */
-#define XFS_IPINNED (1 << __XFS_IPINNED_BIT)
-#define XFS_IDONTCACHE (1 << 9) /* don't cache the inode long term */
-
-/*
- * Per-lifetime flags need to be reset when re-using a reclaimable inode during
- * inode lookup. This prevents unintended behaviour on the new inode from
- * ocurring.
- */
-#define XFS_IRECLAIM_RESET_FLAGS \
- (XFS_IRECLAIMABLE | XFS_IRECLAIM | \
- XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | \
- XFS_IFILESTREAM);
-
-/*
- * Synchronize processes attempting to flush the in-core inode back to disk.
- */
-
-extern void __xfs_iflock(struct xfs_inode *ip);
-
-static inline int xfs_iflock_nowait(struct xfs_inode *ip)
-{
- return !xfs_iflags_test_and_set(ip, XFS_IFLOCK);
-}
-
-static inline void xfs_iflock(struct xfs_inode *ip)
-{
- if (!xfs_iflock_nowait(ip))
- __xfs_iflock(ip);
-}
-
-static inline void xfs_ifunlock(struct xfs_inode *ip)
-{
- xfs_iflags_clear(ip, XFS_IFLOCK);
- smp_mb();
- wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
-}
-
-static inline int xfs_isiflocked(struct xfs_inode *ip)
-{
- return xfs_iflags_test(ip, XFS_IFLOCK);
-}
-
-/*
- * Flags for inode locking.
- * Bit ranges: 1<<1 - 1<<16-1 -- iolock/ilock modes (bitfield)
- * 1<<16 - 1<<32-1 -- lockdep annotation (integers)
- */
-#define XFS_IOLOCK_EXCL (1<<0)
-#define XFS_IOLOCK_SHARED (1<<1)
-#define XFS_ILOCK_EXCL (1<<2)
-#define XFS_ILOCK_SHARED (1<<3)
-
-#define XFS_LOCK_MASK (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED \
- | XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)
-
-#define XFS_LOCK_FLAGS \
- { XFS_IOLOCK_EXCL, "IOLOCK_EXCL" }, \
- { XFS_IOLOCK_SHARED, "IOLOCK_SHARED" }, \
- { XFS_ILOCK_EXCL, "ILOCK_EXCL" }, \
- { XFS_ILOCK_SHARED, "ILOCK_SHARED" }
-
-
-/*
- * Flags for lockdep annotations.
- *
- * XFS_LOCK_PARENT - for directory operations that require locking a
- * parent directory inode and a child entry inode. The parent gets locked
- * with this flag so it gets a lockdep subclass of 1 and the child entry
- * lock will have a lockdep subclass of 0.
- *
- * XFS_LOCK_RTBITMAP/XFS_LOCK_RTSUM - the realtime device bitmap and summary
- * inodes do not participate in the normal lock order, and thus have their
- * own subclasses.
- *
- * XFS_LOCK_INUMORDER - for locking several inodes at the some time
- * with xfs_lock_inodes(). This flag is used as the starting subclass
- * and each subsequent lock acquired will increment the subclass by one.
- * So the first lock acquired will have a lockdep subclass of 4, the
- * second lock will have a lockdep subclass of 5, and so on. It is
- * the responsibility of the class builder to shift this to the correct
- * portion of the lock_mode lockdep mask.
- */
-#define XFS_LOCK_PARENT 1
-#define XFS_LOCK_RTBITMAP 2
-#define XFS_LOCK_RTSUM 3
-#define XFS_LOCK_INUMORDER 4
-
-#define XFS_IOLOCK_SHIFT 16
-#define XFS_IOLOCK_PARENT (XFS_LOCK_PARENT << XFS_IOLOCK_SHIFT)
-
-#define XFS_ILOCK_SHIFT 24
-#define XFS_ILOCK_PARENT (XFS_LOCK_PARENT << XFS_ILOCK_SHIFT)
-#define XFS_ILOCK_RTBITMAP (XFS_LOCK_RTBITMAP << XFS_ILOCK_SHIFT)
-#define XFS_ILOCK_RTSUM (XFS_LOCK_RTSUM << XFS_ILOCK_SHIFT)
-
-#define XFS_IOLOCK_DEP_MASK 0x00ff0000
-#define XFS_ILOCK_DEP_MASK 0xff000000
-#define XFS_LOCK_DEP_MASK (XFS_IOLOCK_DEP_MASK | XFS_ILOCK_DEP_MASK)
-
-#define XFS_IOLOCK_DEP(flags) (((flags) & XFS_IOLOCK_DEP_MASK) >> XFS_IOLOCK_SHIFT)
-#define XFS_ILOCK_DEP(flags) (((flags) & XFS_ILOCK_DEP_MASK) >> XFS_ILOCK_SHIFT)
-
-/*
- * For multiple groups support: if S_ISGID bit is set in the parent
- * directory, group of new file is set to that of the parent, and
- * new subdirectory gets S_ISGID bit from parent.
- */
-#define XFS_INHERIT_GID(pip) \
- (((pip)->i_mount->m_flags & XFS_MOUNT_GRPID) || \
- ((pip)->i_d.di_mode & S_ISGID))
-
-
-/*
- * xfs_inode.c prototypes.
- */
-void xfs_ilock(xfs_inode_t *, uint);
-int xfs_ilock_nowait(xfs_inode_t *, uint);
-void xfs_iunlock(xfs_inode_t *, uint);
-void xfs_ilock_demote(xfs_inode_t *, uint);
-int xfs_isilocked(xfs_inode_t *, uint);
-uint xfs_ilock_map_shared(xfs_inode_t *);
-void xfs_iunlock_map_shared(xfs_inode_t *, uint);
-int xfs_ialloc(struct xfs_trans *, xfs_inode_t *, umode_t,
- xfs_nlink_t, xfs_dev_t, prid_t, int,
- struct xfs_buf **, xfs_inode_t **);
-
-uint xfs_ip2xflags(struct xfs_inode *);
-uint xfs_dic2xflags(struct xfs_dinode *);
-int xfs_ifree(struct xfs_trans *, xfs_inode_t *,
- struct xfs_bmap_free *);
-int xfs_itruncate_extents(struct xfs_trans **, struct xfs_inode *,
- int, xfs_fsize_t);
-int xfs_iunlink(struct xfs_trans *, xfs_inode_t *);
-
-void xfs_iext_realloc(xfs_inode_t *, int, int);
-void xfs_iunpin_wait(xfs_inode_t *);
-int xfs_iflush(struct xfs_inode *, struct xfs_buf **);
-void xfs_lock_inodes(xfs_inode_t **, int, uint);
-void xfs_lock_two_inodes(xfs_inode_t *, xfs_inode_t *, uint);
-
-xfs_extlen_t xfs_get_extsz_hint(struct xfs_inode *ip);
-
-#define IHOLD(ip) \
-do { \
- ASSERT(atomic_read(&VFS_I(ip)->i_count) > 0) ; \
- ihold(VFS_I(ip)); \
- trace_xfs_ihold(ip, _THIS_IP_); \
-} while (0)
-
-#define IRELE(ip) \
-do { \
- trace_xfs_irele(ip, _THIS_IP_); \
- iput(VFS_I(ip)); \
-} while (0)
+#include "xfs_inode_ops.h"
#endif /* __KERNEL__ */
@@ -584,6 +257,9 @@ int xfs_iread(struct xfs_mount *, struct xfs_trans *,
void xfs_dinode_calc_crc(struct xfs_mount *, struct xfs_dinode *);
void xfs_dinode_to_disk(struct xfs_dinode *,
struct xfs_icdinode *);
+void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
+ struct xfs_inode_log_item *, int,
+ struct xfs_buf *);
void xfs_idestroy_fork(struct xfs_inode *, int);
void xfs_idata_realloc(struct xfs_inode *, int, int);
void xfs_iroot_realloc(struct xfs_inode *, int, int);
@@ -591,15 +267,14 @@ int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int);
int xfs_iextents_copy(struct xfs_inode *, xfs_bmbt_rec_t *, int);
xfs_bmbt_rec_host_t *xfs_iext_get_ext(xfs_ifork_t *, xfs_extnum_t);
-void xfs_iext_insert(xfs_inode_t *, xfs_extnum_t, xfs_extnum_t,
- xfs_bmbt_irec_t *, int);
+void xfs_iext_insert(struct xfs_inode *, xfs_extnum_t, xfs_extnum_t,
+ struct xfs_bmbt_irec *, int);
void xfs_iext_add(xfs_ifork_t *, xfs_extnum_t, int);
void xfs_iext_add_indirect_multi(xfs_ifork_t *, int, xfs_extnum_t, int);
-void xfs_iext_remove(xfs_inode_t *, xfs_extnum_t, int, int);
+void xfs_iext_remove(struct xfs_inode *, xfs_extnum_t, int, int);
void xfs_iext_remove_inline(xfs_ifork_t *, xfs_extnum_t, int);
void xfs_iext_remove_direct(xfs_ifork_t *, xfs_extnum_t, int);
void xfs_iext_remove_indirect(xfs_ifork_t *, xfs_extnum_t, int);
-void xfs_iext_realloc_direct(xfs_ifork_t *, int);
void xfs_iext_direct_to_inline(xfs_ifork_t *, xfs_extnum_t);
void xfs_iext_inline_to_direct(xfs_ifork_t *, int);
void xfs_iext_destroy(xfs_ifork_t *);
diff --git a/fs/xfs/xfs_inode_ops.c b/fs/xfs/xfs_inode_ops.c
new file mode 100644
index 0000000..295a393
--- /dev/null
+++ b/fs/xfs/xfs_inode_ops.c
@@ -0,0 +1,1772 @@
+/*
+ * Copyright (c) 2000-2006 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include <linux/log2.h>
+
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_log.h"
+#include "xfs_inum.h"
+#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_attr_sf.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_buf_item.h"
+#include "xfs_inode_item.h"
+#include "xfs_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_ialloc.h"
+#include "xfs_bmap.h"
+#include "xfs_error.h"
+#include "xfs_utils.h"
+#include "xfs_quota.h"
+#include "xfs_filestream.h"
+#include "xfs_vnodeops.h"
+#include "xfs_cksum.h"
+#include "xfs_trace.h"
+#include "xfs_icache.h"
+
+/*
+ * Used in xfs_itruncate_extents(). This is the maximum number of extents
+ * freed from a file in a single transaction.
+ */
+#define XFS_ITRUNC_MAX_EXTENTS 2
+
+/*
+ * helper function to extract extent size hint from inode
+ */
+xfs_extlen_t
+xfs_get_extsz_hint(
+ struct xfs_inode *ip)
+{
+ if ((ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE) && ip->i_d.di_extsize)
+ return ip->i_d.di_extsize;
+ if (XFS_IS_REALTIME_INODE(ip))
+ return ip->i_mount->m_sb.sb_rextsize;
+ return 0;
+}
+
+/*
+ * Test whether it is appropriate to check an inode for and free post EOF
+ * blocks. The 'force' parameter determines whether we should also consider
+ * regular files that are marked preallocated or append-only.
+ */
+bool
+xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
+{
+ /* prealloc/delalloc exists only on regular files */
+ if (!S_ISREG(ip->i_d.di_mode))
+ return false;
+
+ /*
+ * Zero sized files with no cached pages and delalloc blocks will not
+ * have speculative prealloc/delalloc blocks to remove.
+ */
+ if (VFS_I(ip)->i_size == 0 &&
+ VN_CACHED(VFS_I(ip)) == 0 &&
+ ip->i_delayed_blks == 0)
+ return false;
+
+ /* If we haven't read in the extent list, then don't do it now. */
+ if (!(ip->i_df.if_flags & XFS_IFEXTENTS))
+ return false;
+
+ /*
+ * Do not free real preallocated or append-only files unless the file
+ * has delalloc blocks and we are forced to remove them.
+ */
+ if (ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
+ if (!force || ip->i_delayed_blks == 0)
+ return false;
+
+ return true;
+}
+
+/*
+ * This is a wrapper routine around the xfs_ilock() routine used to centralize
+ * some grungy code. It is used in places that wish to lock the inode solely
+ * for reading the extents. The reason these places can't just call
+ * xfs_ilock(SHARED) is that the inode lock also guards to bringing in of the
+ * extents from disk for a file in b-tree format. If the inode is in b-tree
+ * format, then we need to lock the inode exclusively until the extents are read
+ * in. Locking it exclusively all the time would limit our parallelism
+ * unnecessarily, though. What we do instead is check to see if the extents
+ * have been read in yet, and only lock the inode exclusively if they have not.
+ *
+ * The function returns a value which should be given to the corresponding
+ * xfs_iunlock_map_shared(). This value is the mode in which the lock was
+ * actually taken.
+ */
+uint
+xfs_ilock_map_shared(
+ xfs_inode_t *ip)
+{
+ uint lock_mode;
+
+ if ((ip->i_d.di_format == XFS_DINODE_FMT_BTREE) &&
+ ((ip->i_df.if_flags & XFS_IFEXTENTS) == 0)) {
+ lock_mode = XFS_ILOCK_EXCL;
+ } else {
+ lock_mode = XFS_ILOCK_SHARED;
+ }
+
+ xfs_ilock(ip, lock_mode);
+
+ return lock_mode;
+}
+
+/*
+ * This is simply the unlock routine to go with xfs_ilock_map_shared().
+ * All it does is call xfs_iunlock() with the given lock_mode.
+ */
+void
+xfs_iunlock_map_shared(
+ xfs_inode_t *ip,
+ unsigned int lock_mode)
+{
+ xfs_iunlock(ip, lock_mode);
+}
+
+/*
+ * The xfs inode contains 2 locks: a multi-reader lock called the
+ * i_iolock and a multi-reader lock called the i_lock. This routine
+ * allows either or both of the locks to be obtained.
+ *
+ * The 2 locks should always be ordered so that the IO lock is
+ * obtained first in order to prevent deadlock.
+ *
+ * ip -- the inode being locked
+ * lock_flags -- this parameter indicates the inode's locks
+ * to be locked. It can be:
+ * XFS_IOLOCK_SHARED,
+ * XFS_IOLOCK_EXCL,
+ * XFS_ILOCK_SHARED,
+ * XFS_ILOCK_EXCL,
+ * XFS_IOLOCK_SHARED | XFS_ILOCK_SHARED,
+ * XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL,
+ * XFS_IOLOCK_EXCL | XFS_ILOCK_SHARED,
+ * XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL
+ */
+void
+xfs_ilock(
+ xfs_inode_t *ip,
+ uint lock_flags)
+{
+ trace_xfs_ilock(ip, lock_flags, _RET_IP_);
+
+ /*
+ * You can't set both SHARED and EXCL for the same lock,
+ * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
+ * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
+ */
+ ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
+ (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
+ ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
+ (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
+ ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
+
+ if (lock_flags & XFS_IOLOCK_EXCL)
+ mrupdate_nested(&ip->i_iolock, XFS_IOLOCK_DEP(lock_flags));
+ else if (lock_flags & XFS_IOLOCK_SHARED)
+ mraccess_nested(&ip->i_iolock, XFS_IOLOCK_DEP(lock_flags));
+
+ if (lock_flags & XFS_ILOCK_EXCL)
+ mrupdate_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
+ else if (lock_flags & XFS_ILOCK_SHARED)
+ mraccess_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
+}
+
+/*
+ * This is just like xfs_ilock(), except that the caller
+ * is guaranteed not to sleep. It returns 1 if it gets
+ * the requested locks and 0 otherwise. If the IO lock is
+ * obtained but the inode lock cannot be, then the IO lock
+ * is dropped before returning.
+ *
+ * ip -- the inode being locked
+ * lock_flags -- this parameter indicates the inode's locks to be
+ * to be locked. See the comment for xfs_ilock() for a list
+ * of valid values.
+ */
+int
+xfs_ilock_nowait(
+ xfs_inode_t *ip,
+ uint lock_flags)
+{
+ trace_xfs_ilock_nowait(ip, lock_flags, _RET_IP_);
+
+ /*
+ * You can't set both SHARED and EXCL for the same lock,
+ * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
+ * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
+ */
+ ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
+ (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
+ ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
+ (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
+ ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
+
+ if (lock_flags & XFS_IOLOCK_EXCL) {
+ if (!mrtryupdate(&ip->i_iolock))
+ goto out;
+ } else if (lock_flags & XFS_IOLOCK_SHARED) {
+ if (!mrtryaccess(&ip->i_iolock))
+ goto out;
+ }
+ if (lock_flags & XFS_ILOCK_EXCL) {
+ if (!mrtryupdate(&ip->i_lock))
+ goto out_undo_iolock;
+ } else if (lock_flags & XFS_ILOCK_SHARED) {
+ if (!mrtryaccess(&ip->i_lock))
+ goto out_undo_iolock;
+ }
+ return 1;
+
+ out_undo_iolock:
+ if (lock_flags & XFS_IOLOCK_EXCL)
+ mrunlock_excl(&ip->i_iolock);
+ else if (lock_flags & XFS_IOLOCK_SHARED)
+ mrunlock_shared(&ip->i_iolock);
+ out:
+ return 0;
+}
+
+/*
+ * xfs_iunlock() is used to drop the inode locks acquired with
+ * xfs_ilock() and xfs_ilock_nowait(). The caller must pass
+ * in the flags given to xfs_ilock() or xfs_ilock_nowait() so
+ * that we know which locks to drop.
+ *
+ * ip -- the inode being unlocked
+ * lock_flags -- this parameter indicates the inode's locks to be
+ * to be unlocked. See the comment for xfs_ilock() for a list
+ * of valid values for this parameter.
+ *
+ */
+void
+xfs_iunlock(
+ xfs_inode_t *ip,
+ uint lock_flags)
+{
+ /*
+ * You can't set both SHARED and EXCL for the same lock,
+ * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
+ * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
+ */
+ ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
+ (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
+ ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
+ (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
+ ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
+ ASSERT(lock_flags != 0);
+
+ if (lock_flags & XFS_IOLOCK_EXCL)
+ mrunlock_excl(&ip->i_iolock);
+ else if (lock_flags & XFS_IOLOCK_SHARED)
+ mrunlock_shared(&ip->i_iolock);
+
+ if (lock_flags & XFS_ILOCK_EXCL)
+ mrunlock_excl(&ip->i_lock);
+ else if (lock_flags & XFS_ILOCK_SHARED)
+ mrunlock_shared(&ip->i_lock);
+
+ trace_xfs_iunlock(ip, lock_flags, _RET_IP_);
+}
+
+/*
+ * give up write locks. the i/o lock cannot be held nested
+ * if it is being demoted.
+ */
+void
+xfs_ilock_demote(
+ xfs_inode_t *ip,
+ uint lock_flags)
+{
+ ASSERT(lock_flags & (XFS_IOLOCK_EXCL|XFS_ILOCK_EXCL));
+ ASSERT((lock_flags & ~(XFS_IOLOCK_EXCL|XFS_ILOCK_EXCL)) == 0);
+
+ if (lock_flags & XFS_ILOCK_EXCL)
+ mrdemote(&ip->i_lock);
+ if (lock_flags & XFS_IOLOCK_EXCL)
+ mrdemote(&ip->i_iolock);
+
+ trace_xfs_ilock_demote(ip, lock_flags, _RET_IP_);
+}
+
+#if defined(DEBUG) || defined(XFS_WARN)
+int
+xfs_isilocked(
+ xfs_inode_t *ip,
+ uint lock_flags)
+{
+ if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
+ if (!(lock_flags & XFS_ILOCK_SHARED))
+ return !!ip->i_lock.mr_writer;
+ return rwsem_is_locked(&ip->i_lock.mr_lock);
+ }
+
+ if (lock_flags & (XFS_IOLOCK_EXCL|XFS_IOLOCK_SHARED)) {
+ if (!(lock_flags & XFS_IOLOCK_SHARED))
+ return !!ip->i_iolock.mr_writer;
+ return rwsem_is_locked(&ip->i_iolock.mr_lock);
+ }
+
+ ASSERT(0);
+ return 0;
+}
+#endif
+
+void
+__xfs_iflock(
+ struct xfs_inode *ip)
+{
+ wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_IFLOCK_BIT);
+ DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_IFLOCK_BIT);
+
+ do {
+ prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+ if (xfs_isiflocked(ip))
+ io_schedule();
+ } while (!xfs_iflock_nowait(ip));
+
+ finish_wait(wq, &wait.wait);
+}
+
+/*
+ * This is called to unpin an inode. The caller must have the inode locked
+ * in at least shared mode so that the buffer cannot be subsequently pinned
+ * once someone is waiting for it to be unpinned.
+ */
+static void
+xfs_iunpin(
+ struct xfs_inode *ip)
+{
+ ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
+
+ trace_xfs_inode_unpin_nowait(ip, _RET_IP_);
+
+ /* Give the log a push to start the unpinning I/O */
+ xfs_log_force_lsn(ip->i_mount, ip->i_itemp->ili_last_lsn, 0);
+
+}
+
+static void
+__xfs_iunpin_wait(
+ struct xfs_inode *ip)
+{
+ wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_IPINNED_BIT);
+ DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_IPINNED_BIT);
+
+ xfs_iunpin(ip);
+
+ do {
+ prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+ if (xfs_ipincount(ip))
+ io_schedule();
+ } while (xfs_ipincount(ip));
+ finish_wait(wq, &wait.wait);
+}
+
+void
+xfs_iunpin_wait(
+ struct xfs_inode *ip)
+{
+ if (xfs_ipincount(ip))
+ __xfs_iunpin_wait(ip);
+}
+
+STATIC uint
+_xfs_dic2xflags(
+ __uint16_t di_flags)
+{
+ uint flags = 0;
+
+ if (di_flags & XFS_DIFLAG_ANY) {
+ if (di_flags & XFS_DIFLAG_REALTIME)
+ flags |= XFS_XFLAG_REALTIME;
+ if (di_flags & XFS_DIFLAG_PREALLOC)
+ flags |= XFS_XFLAG_PREALLOC;
+ if (di_flags & XFS_DIFLAG_IMMUTABLE)
+ flags |= XFS_XFLAG_IMMUTABLE;
+ if (di_flags & XFS_DIFLAG_APPEND)
+ flags |= XFS_XFLAG_APPEND;
+ if (di_flags & XFS_DIFLAG_SYNC)
+ flags |= XFS_XFLAG_SYNC;
+ if (di_flags & XFS_DIFLAG_NOATIME)
+ flags |= XFS_XFLAG_NOATIME;
+ if (di_flags & XFS_DIFLAG_NODUMP)
+ flags |= XFS_XFLAG_NODUMP;
+ if (di_flags & XFS_DIFLAG_RTINHERIT)
+ flags |= XFS_XFLAG_RTINHERIT;
+ if (di_flags & XFS_DIFLAG_PROJINHERIT)
+ flags |= XFS_XFLAG_PROJINHERIT;
+ if (di_flags & XFS_DIFLAG_NOSYMLINKS)
+ flags |= XFS_XFLAG_NOSYMLINKS;
+ if (di_flags & XFS_DIFLAG_EXTSIZE)
+ flags |= XFS_XFLAG_EXTSIZE;
+ if (di_flags & XFS_DIFLAG_EXTSZINHERIT)
+ flags |= XFS_XFLAG_EXTSZINHERIT;
+ if (di_flags & XFS_DIFLAG_NODEFRAG)
+ flags |= XFS_XFLAG_NODEFRAG;
+ if (di_flags & XFS_DIFLAG_FILESTREAM)
+ flags |= XFS_XFLAG_FILESTREAM;
+ }
+
+ return flags;
+}
+
+uint
+xfs_ip2xflags(
+ xfs_inode_t *ip)
+{
+ xfs_icdinode_t *dic = &ip->i_d;
+
+ return _xfs_dic2xflags(dic->di_flags) |
+ (XFS_IFORK_Q(ip) ? XFS_XFLAG_HASATTR : 0);
+}
+
+uint
+xfs_dic2xflags(
+ xfs_dinode_t *dip)
+{
+ return _xfs_dic2xflags(be16_to_cpu(dip->di_flags)) |
+ (XFS_DFORK_Q(dip) ? XFS_XFLAG_HASATTR : 0);
+}
+
+/*
+ * Allocate an inode on disk and return a copy of its in-core version. The
+ * in-core inode is locked exclusively. Set mode, nlink, and rdev appropriately
+ * within the inode. The uid and gid for the inode are set according to the
+ * contents of the given cred structure.
+ *
+ * Use xfs_dialloc() to allocate the on-disk inode. If xfs_dialloc() has a free
+ * inode available, call xfs_iget() to obtain the in-core version of the
+ * allocated inode. Finally, fill in the inode and log its initial contents.
+ * In this case, ialloc_context would be set to NULL.
+ *
+ * If xfs_dialloc() does not have an available inode, it will replenish its
+ * supply by doing an allocation. Since we can only do one allocation within a
+ * transaction without deadlocks, we must commit the current transaction before
+ * returning the inode itself. In this case, therefore, we will set
+ * ialloc_context and return. The caller should then commit the current
+ * transaction, start a new transaction, and call xfs_ialloc() again to actually
+ * get the inode.
+ *
+ * To ensure that some other process does not grab the inode that was allocated
+ * during the first call to xfs_ialloc(), this routine also returns the [locked]
+ * bp pointing to the head of the freelist as ialloc_context. The caller should
+ * hold this buffer across the commit and pass it back into this routine on the
+ * second call.
+ *
+ * If we are allocating quota inodes, we do not have a parent inode to attach to
+ * or associate with (i.e. pip == NULL) because they are not linked into the
+ * directory structure - they are attached directly to the superblock - and so
+ * have no parent.
+ */
+int
+xfs_ialloc(
+ xfs_trans_t *tp,
+ xfs_inode_t *pip,
+ umode_t mode,
+ xfs_nlink_t nlink,
+ xfs_dev_t rdev,
+ prid_t prid,
+ int okalloc,
+ xfs_buf_t **ialloc_context,
+ xfs_inode_t **ipp)
+{
+ struct xfs_mount *mp = tp->t_mountp;
+ xfs_ino_t ino;
+ xfs_inode_t *ip;
+ uint flags;
+ int error;
+ timespec_t tv;
+ int filestreams = 0;
+
+ /*
+ * Call the space management code to pick
+ * the on-disk inode to be allocated.
+ */
+ error = xfs_dialloc(tp, pip ? pip->i_ino : 0, mode, okalloc,
+ ialloc_context, &ino);
+ if (error)
+ return error;
+ if (*ialloc_context || ino == NULLFSINO) {
+ *ipp = NULL;
+ return 0;
+ }
+ ASSERT(*ialloc_context == NULL);
+
+ /*
+ * Get the in-core inode with the lock held exclusively.
+ * This is because we're setting fields here we need
+ * to prevent others from looking at until we're done.
+ */
+ error = xfs_iget(mp, tp, ino, XFS_IGET_CREATE,
+ XFS_ILOCK_EXCL, &ip);
+ if (error)
+ return error;
+ ASSERT(ip != NULL);
+
+ ip->i_d.di_mode = mode;
+ ip->i_d.di_onlink = 0;
+ ip->i_d.di_nlink = nlink;
+ ASSERT(ip->i_d.di_nlink == nlink);
+ ip->i_d.di_uid = current_fsuid();
+ ip->i_d.di_gid = current_fsgid();
+ xfs_set_projid(ip, prid);
+ memset(&(ip->i_d.di_pad[0]), 0, sizeof(ip->i_d.di_pad));
+
+ /*
+ * If the superblock version is up to where we support new format
+ * inodes and this is currently an old format inode, then change
+ * the inode version number now. This way we only do the conversion
+ * here rather than here and in the flush/logging code.
+ */
+ if (xfs_sb_version_hasnlink(&mp->m_sb) &&
+ ip->i_d.di_version == 1) {
+ ip->i_d.di_version = 2;
+ /*
+ * We've already zeroed the old link count, the projid field,
+ * and the pad field.
+ */
+ }
+
+ /*
+ * Project ids won't be stored on disk if we are using a version 1 inode.
+ */
+ if ((prid != 0) && (ip->i_d.di_version == 1))
+ xfs_bump_ino_vers2(tp, ip);
+
+ if (pip && XFS_INHERIT_GID(pip)) {
+ ip->i_d.di_gid = pip->i_d.di_gid;
+ if ((pip->i_d.di_mode & S_ISGID) && S_ISDIR(mode)) {
+ ip->i_d.di_mode |= S_ISGID;
+ }
+ }
+
+ /*
+ * If the group ID of the new file does not match the effective group
+ * ID or one of the supplementary group IDs, the S_ISGID bit is cleared
+ * (and only if the irix_sgid_inherit compatibility variable is set).
+ */
+ if ((irix_sgid_inherit) &&
+ (ip->i_d.di_mode & S_ISGID) &&
+ (!in_group_p((gid_t)ip->i_d.di_gid))) {
+ ip->i_d.di_mode &= ~S_ISGID;
+ }
+
+ ip->i_d.di_size = 0;
+ ip->i_d.di_nextents = 0;
+ ASSERT(ip->i_d.di_nblocks == 0);
+
+ nanotime(&tv);
+ ip->i_d.di_mtime.t_sec = (__int32_t)tv.tv_sec;
+ ip->i_d.di_mtime.t_nsec = (__int32_t)tv.tv_nsec;
+ ip->i_d.di_atime = ip->i_d.di_mtime;
+ ip->i_d.di_ctime = ip->i_d.di_mtime;
+
+ /*
+ * di_gen will have been taken care of in xfs_iread.
+ */
+ ip->i_d.di_extsize = 0;
+ ip->i_d.di_dmevmask = 0;
+ ip->i_d.di_dmstate = 0;
+ ip->i_d.di_flags = 0;
+
+ if (ip->i_d.di_version == 3) {
+ ASSERT(ip->i_d.di_ino == ino);
+ ASSERT(uuid_equal(&ip->i_d.di_uuid, &mp->m_sb.sb_uuid));
+ ip->i_d.di_crc = 0;
+ ip->i_d.di_changecount = 1;
+ ip->i_d.di_lsn = 0;
+ ip->i_d.di_flags2 = 0;
+ memset(&(ip->i_d.di_pad2[0]), 0, sizeof(ip->i_d.di_pad2));
+ ip->i_d.di_crtime = ip->i_d.di_mtime;
+ }
+
+
+ flags = XFS_ILOG_CORE;
+ switch (mode & S_IFMT) {
+ case S_IFIFO:
+ case S_IFCHR:
+ case S_IFBLK:
+ case S_IFSOCK:
+ ip->i_d.di_format = XFS_DINODE_FMT_DEV;
+ ip->i_df.if_u2.if_rdev = rdev;
+ ip->i_df.if_flags = 0;
+ flags |= XFS_ILOG_DEV;
+ break;
+ case S_IFREG:
+ /*
+ * we can't set up filestreams until after the VFS inode
+ * is set up properly.
+ */
+ if (pip && xfs_inode_is_filestream(pip))
+ filestreams = 1;
+ /* fall through */
+ case S_IFDIR:
+ if (pip && (pip->i_d.di_flags & XFS_DIFLAG_ANY)) {
+ uint di_flags = 0;
+
+ if (S_ISDIR(mode)) {
+ if (pip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)
+ di_flags |= XFS_DIFLAG_RTINHERIT;
+ if (pip->i_d.di_flags & XFS_DIFLAG_EXTSZINHERIT) {
+ di_flags |= XFS_DIFLAG_EXTSZINHERIT;
+ ip->i_d.di_extsize = pip->i_d.di_extsize;
+ }
+ } else if (S_ISREG(mode)) {
+ if (pip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)
+ di_flags |= XFS_DIFLAG_REALTIME;
+ if (pip->i_d.di_flags & XFS_DIFLAG_EXTSZINHERIT) {
+ di_flags |= XFS_DIFLAG_EXTSIZE;
+ ip->i_d.di_extsize = pip->i_d.di_extsize;
+ }
+ }
+ if ((pip->i_d.di_flags & XFS_DIFLAG_NOATIME) &&
+ xfs_inherit_noatime)
+ di_flags |= XFS_DIFLAG_NOATIME;
+ if ((pip->i_d.di_flags & XFS_DIFLAG_NODUMP) &&
+ xfs_inherit_nodump)
+ di_flags |= XFS_DIFLAG_NODUMP;
+ if ((pip->i_d.di_flags & XFS_DIFLAG_SYNC) &&
+ xfs_inherit_sync)
+ di_flags |= XFS_DIFLAG_SYNC;
+ if ((pip->i_d.di_flags & XFS_DIFLAG_NOSYMLINKS) &&
+ xfs_inherit_nosymlinks)
+ di_flags |= XFS_DIFLAG_NOSYMLINKS;
+ if (pip->i_d.di_flags & XFS_DIFLAG_PROJINHERIT)
+ di_flags |= XFS_DIFLAG_PROJINHERIT;
+ if ((pip->i_d.di_flags & XFS_DIFLAG_NODEFRAG) &&
+ xfs_inherit_nodefrag)
+ di_flags |= XFS_DIFLAG_NODEFRAG;
+ if (pip->i_d.di_flags & XFS_DIFLAG_FILESTREAM)
+ di_flags |= XFS_DIFLAG_FILESTREAM;
+ ip->i_d.di_flags |= di_flags;
+ }
+ /* FALLTHROUGH */
+ case S_IFLNK:
+ ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
+ ip->i_df.if_flags = XFS_IFEXTENTS;
+ ip->i_df.if_bytes = ip->i_df.if_real_bytes = 0;
+ ip->i_df.if_u1.if_extents = NULL;
+ break;
+ default:
+ ASSERT(0);
+ }
+ /*
+ * Attribute fork settings for new inode.
+ */
+ ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
+ ip->i_d.di_anextents = 0;
+
+ /*
+ * Log the new values stuffed into the inode.
+ */
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+ xfs_trans_log_inode(tp, ip, flags);
+
+ /* now that we have an i_mode we can setup inode ops and unlock */
+ xfs_setup_inode(ip);
+
+ /* now we have set up the vfs inode we can associate the filestream */
+ if (filestreams) {
+ error = xfs_filestream_associate(pip, ip);
+ if (error < 0)
+ return -error;
+ if (!error)
+ xfs_iflags_set(ip, XFS_IFILESTREAM);
+ }
+
+ *ipp = ip;
+ return 0;
+}
+
+/*
+ * Free up the underlying blocks past new_size. The new size must be smaller
+ * than the current size. This routine can be used both for the attribute and
+ * data fork, and does not modify the inode size, which is left to the caller.
+ *
+ * The transaction passed to this routine must have made a permanent log
+ * reservation of at least XFS_ITRUNCATE_LOG_RES. This routine may commit the
+ * given transaction and start new ones, so make sure everything involved in
+ * the transaction is tidy before calling here. Some transaction will be
+ * returned to the caller to be committed. The incoming transaction must
+ * already include the inode, and both inode locks must be held exclusively.
+ * The inode must also be "held" within the transaction. On return the inode
+ * will be "held" within the returned transaction. This routine does NOT
+ * require any disk space to be reserved for it within the transaction.
+ *
+ * If we get an error, we must return with the inode locked and linked into the
+ * current transaction. This keeps things simple for the higher level code,
+ * because it always knows that the inode is locked and held in the transaction
+ * that returns to it whether errors occur or not. We don't mark the inode
+ * dirty on error so that transactions can be easily aborted if possible.
+ */
+int
+xfs_itruncate_extents(
+ struct xfs_trans **tpp,
+ struct xfs_inode *ip,
+ int whichfork,
+ xfs_fsize_t new_size)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp = *tpp;
+ struct xfs_trans *ntp;
+ xfs_bmap_free_t free_list;
+ xfs_fsblock_t first_block;
+ xfs_fileoff_t first_unmap_block;
+ xfs_fileoff_t last_block;
+ xfs_filblks_t unmap_len;
+ int committed;
+ int error = 0;
+ int done = 0;
+
+ ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+ ASSERT(!atomic_read(&VFS_I(ip)->i_count) ||
+ xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+ ASSERT(new_size <= XFS_ISIZE(ip));
+ ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
+ ASSERT(ip->i_itemp != NULL);
+ ASSERT(ip->i_itemp->ili_lock_flags == 0);
+ ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
+
+ trace_xfs_itruncate_extents_start(ip, new_size);
+
+ /*
+ * Since it is possible for space to become allocated beyond
+ * the end of the file (in a crash where the space is allocated
+ * but the inode size is not yet updated), simply remove any
+ * blocks which show up between the new EOF and the maximum
+ * possible file size. If the first block to be removed is
+ * beyond the maximum file size (ie it is the same as last_block),
+ * then there is nothing to do.
+ */
+ first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size);
+ last_block = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
+ if (first_unmap_block == last_block)
+ return 0;
+
+ ASSERT(first_unmap_block < last_block);
+ unmap_len = last_block - first_unmap_block + 1;
+ while (!done) {
+ xfs_bmap_init(&free_list, &first_block);
+ error = xfs_bunmapi(tp, ip,
+ first_unmap_block, unmap_len,
+ xfs_bmapi_aflag(whichfork),
+ XFS_ITRUNC_MAX_EXTENTS,
+ &first_block, &free_list,
+ &done);
+ if (error)
+ goto out_bmap_cancel;
+
+ /*
+ * Duplicate the transaction that has the permanent
+ * reservation and commit the old transaction.
+ */
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (committed)
+ xfs_trans_ijoin(tp, ip, 0);
+ if (error)
+ goto out_bmap_cancel;
+
+ if (committed) {
+ /*
+ * Mark the inode dirty so it will be logged and
+ * moved forward in the log as part of every commit.
+ */
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ }
+
+ ntp = xfs_trans_dup(tp);
+ error = xfs_trans_commit(tp, 0);
+ tp = ntp;
+
+ xfs_trans_ijoin(tp, ip, 0);
+
+ if (error)
+ goto out;
+
+ /*
+ * Transaction commit worked ok so we can drop the extra ticket
+ * reference that we gained in xfs_trans_dup()
+ */
+ xfs_log_ticket_put(tp->t_ticket);
+ error = xfs_trans_reserve(tp, 0,
+ XFS_ITRUNCATE_LOG_RES(mp), 0,
+ XFS_TRANS_PERM_LOG_RES,
+ XFS_ITRUNCATE_LOG_COUNT);
+ if (error)
+ goto out;
+ }
+
+ /*
+ * Always re-log the inode so that our permanent transaction can keep
+ * on rolling it forward in the log.
+ */
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+ trace_xfs_itruncate_extents_end(ip, new_size);
+
+out:
+ *tpp = tp;
+ return error;
+out_bmap_cancel:
+ /*
+ * If the bunmapi call encounters an error, return to the caller where
+ * the transaction can be properly aborted. We just need to make sure
+ * we're not holding any resources that we were not when we came in.
+ */
+ xfs_bmap_cancel(&free_list);
+ goto out;
+}
+
+/*
+ * This is called when the inode's link count goes to 0.
+ * We place the on-disk inode on a list in the AGI. It
+ * will be pulled from this list when the inode is freed.
+ */
+int
+xfs_iunlink(
+ xfs_trans_t *tp,
+ xfs_inode_t *ip)
+{
+ xfs_mount_t *mp;
+ xfs_agi_t *agi;
+ xfs_dinode_t *dip;
+ xfs_buf_t *agibp;
+ xfs_buf_t *ibp;
+ xfs_agino_t agino;
+ short bucket_index;
+ int offset;
+ int error;
+
+ ASSERT(ip->i_d.di_nlink == 0);
+ ASSERT(ip->i_d.di_mode != 0);
+
+ mp = tp->t_mountp;
+
+ /*
+ * Get the agi buffer first. It ensures lock ordering
+ * on the list.
+ */
+ error = xfs_read_agi(mp, tp, XFS_INO_TO_AGNO(mp, ip->i_ino), &agibp);
+ if (error)
+ return error;
+ agi = XFS_BUF_TO_AGI(agibp);
+
+ /*
+ * Get the index into the agi hash table for the
+ * list this inode will go on.
+ */
+ agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
+ ASSERT(agino != 0);
+ bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS;
+ ASSERT(agi->agi_unlinked[bucket_index]);
+ ASSERT(be32_to_cpu(agi->agi_unlinked[bucket_index]) != agino);
+
+ if (agi->agi_unlinked[bucket_index] != cpu_to_be32(NULLAGINO)) {
+ /*
+ * There is already another inode in the bucket we need
+ * to add ourselves to. Add us at the front of the list.
+ * Here we put the head pointer into our next pointer,
+ * and then we fall through to point the head at us.
+ */
+ error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &dip, &ibp,
+ 0, 0);
+ if (error)
+ return error;
+
+ ASSERT(dip->di_next_unlinked == cpu_to_be32(NULLAGINO));
+ dip->di_next_unlinked = agi->agi_unlinked[bucket_index];
+ offset = ip->i_imap.im_boffset +
+ offsetof(xfs_dinode_t, di_next_unlinked);
+
+ /* need to recalc the inode CRC if appropriate */
+ xfs_dinode_calc_crc(mp, dip);
+
+ xfs_trans_inode_buf(tp, ibp);
+ xfs_trans_log_buf(tp, ibp, offset,
+ (offset + sizeof(xfs_agino_t) - 1));
+ xfs_inobp_check(mp, ibp);
+ }
+
+ /*
+ * Point the bucket head pointer at the inode being inserted.
+ */
+ ASSERT(agino != 0);
+ agi->agi_unlinked[bucket_index] = cpu_to_be32(agino);
+ offset = offsetof(xfs_agi_t, agi_unlinked) +
+ (sizeof(xfs_agino_t) * bucket_index);
+ xfs_trans_log_buf(tp, agibp, offset,
+ (offset + sizeof(xfs_agino_t) - 1));
+ return 0;
+}
+
+/*
+ * Pull the on-disk inode from the AGI unlinked list.
+ */
+STATIC int
+xfs_iunlink_remove(
+ xfs_trans_t *tp,
+ xfs_inode_t *ip)
+{
+ xfs_ino_t next_ino;
+ xfs_mount_t *mp;
+ xfs_agi_t *agi;
+ xfs_dinode_t *dip;
+ xfs_buf_t *agibp;
+ xfs_buf_t *ibp;
+ xfs_agnumber_t agno;
+ xfs_agino_t agino;
+ xfs_agino_t next_agino;
+ xfs_buf_t *last_ibp;
+ xfs_dinode_t *last_dip = NULL;
+ short bucket_index;
+ int offset, last_offset = 0;
+ int error;
+
+ mp = tp->t_mountp;
+ agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+
+ /*
+ * Get the agi buffer first. It ensures lock ordering
+ * on the list.
+ */
+ error = xfs_read_agi(mp, tp, agno, &agibp);
+ if (error)
+ return error;
+
+ agi = XFS_BUF_TO_AGI(agibp);
+
+ /*
+ * Get the index into the agi hash table for the
+ * list this inode will go on.
+ */
+ agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
+ ASSERT(agino != 0);
+ bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS;
+ ASSERT(agi->agi_unlinked[bucket_index] != cpu_to_be32(NULLAGINO));
+ ASSERT(agi->agi_unlinked[bucket_index]);
+
+ if (be32_to_cpu(agi->agi_unlinked[bucket_index]) == agino) {
+ /*
+ * We're at the head of the list. Get the inode's on-disk
+ * buffer to see if there is anyone after us on the list.
+ * Only modify our next pointer if it is not already NULLAGINO.
+ * This saves us the overhead of dealing with the buffer when
+ * there is no need to change it.
+ */
+ error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &dip, &ibp,
+ 0, 0);
+ if (error) {
+ xfs_warn(mp, "%s: xfs_imap_to_bp returned error %d.",
+ __func__, error);
+ return error;
+ }
+ next_agino = be32_to_cpu(dip->di_next_unlinked);
+ ASSERT(next_agino != 0);
+ if (next_agino != NULLAGINO) {
+ dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
+ offset = ip->i_imap.im_boffset +
+ offsetof(xfs_dinode_t, di_next_unlinked);
+
+ /* need to recalc the inode CRC if appropriate */
+ xfs_dinode_calc_crc(mp, dip);
+
+ xfs_trans_inode_buf(tp, ibp);
+ xfs_trans_log_buf(tp, ibp, offset,
+ (offset + sizeof(xfs_agino_t) - 1));
+ xfs_inobp_check(mp, ibp);
+ } else {
+ xfs_trans_brelse(tp, ibp);
+ }
+ /*
+ * Point the bucket head pointer at the next inode.
+ */
+ ASSERT(next_agino != 0);
+ ASSERT(next_agino != agino);
+ agi->agi_unlinked[bucket_index] = cpu_to_be32(next_agino);
+ offset = offsetof(xfs_agi_t, agi_unlinked) +
+ (sizeof(xfs_agino_t) * bucket_index);
+ xfs_trans_log_buf(tp, agibp, offset,
+ (offset + sizeof(xfs_agino_t) - 1));
+ } else {
+ /*
+ * We need to search the list for the inode being freed.
+ */
+ next_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]);
+ last_ibp = NULL;
+ while (next_agino != agino) {
+ struct xfs_imap imap;
+
+ if (last_ibp)
+ xfs_trans_brelse(tp, last_ibp);
+
+ imap.im_blkno = 0;
+ next_ino = XFS_AGINO_TO_INO(mp, agno, next_agino);
+
+ error = xfs_imap(mp, tp, next_ino, &imap, 0);
+ if (error) {
+ xfs_warn(mp,
+ "%s: xfs_imap returned error %d.",
+ __func__, error);
+ return error;
+ }
+
+ error = xfs_imap_to_bp(mp, tp, &imap, &last_dip,
+ &last_ibp, 0, 0);
+ if (error) {
+ xfs_warn(mp,
+ "%s: xfs_imap_to_bp returned error %d.",
+ __func__, error);
+ return error;
+ }
+
+ last_offset = imap.im_boffset;
+ next_agino = be32_to_cpu(last_dip->di_next_unlinked);
+ ASSERT(next_agino != NULLAGINO);
+ ASSERT(next_agino != 0);
+ }
+
+ /*
+ * Now last_ibp points to the buffer previous to us on the
+ * unlinked list. Pull us from the list.
+ */
+ error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &dip, &ibp,
+ 0, 0);
+ if (error) {
+ xfs_warn(mp, "%s: xfs_imap_to_bp(2) returned error %d.",
+ __func__, error);
+ return error;
+ }
+ next_agino = be32_to_cpu(dip->di_next_unlinked);
+ ASSERT(next_agino != 0);
+ ASSERT(next_agino != agino);
+ if (next_agino != NULLAGINO) {
+ dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
+ offset = ip->i_imap.im_boffset +
+ offsetof(xfs_dinode_t, di_next_unlinked);
+
+ /* need to recalc the inode CRC if appropriate */
+ xfs_dinode_calc_crc(mp, dip);
+
+ xfs_trans_inode_buf(tp, ibp);
+ xfs_trans_log_buf(tp, ibp, offset,
+ (offset + sizeof(xfs_agino_t) - 1));
+ xfs_inobp_check(mp, ibp);
+ } else {
+ xfs_trans_brelse(tp, ibp);
+ }
+ /*
+ * Point the previous inode on the list to the next inode.
+ */
+ last_dip->di_next_unlinked = cpu_to_be32(next_agino);
+ ASSERT(next_agino != 0);
+ offset = last_offset + offsetof(xfs_dinode_t, di_next_unlinked);
+
+ /* need to recalc the inode CRC if appropriate */
+ xfs_dinode_calc_crc(mp, last_dip);
+
+ xfs_trans_inode_buf(tp, last_ibp);
+ xfs_trans_log_buf(tp, last_ibp, offset,
+ (offset + sizeof(xfs_agino_t) - 1));
+ xfs_inobp_check(mp, last_ibp);
+ }
+ return 0;
+}
+
+/*
+ * A big issue when freeing the inode cluster is is that we _cannot_ skip any
+ * inodes that are in memory - they all must be marked stale and attached to
+ * the cluster buffer.
+ */
+STATIC int
+xfs_ifree_cluster(
+ xfs_inode_t *free_ip,
+ xfs_trans_t *tp,
+ xfs_ino_t inum)
+{
+ xfs_mount_t *mp = free_ip->i_mount;
+ int blks_per_cluster;
+ int nbufs;
+ int ninodes;
+ int i, j;
+ xfs_daddr_t blkno;
+ xfs_buf_t *bp;
+ xfs_inode_t *ip;
+ xfs_inode_log_item_t *iip;
+ xfs_log_item_t *lip;
+ struct xfs_perag *pag;
+
+ pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, inum));
+ if (mp->m_sb.sb_blocksize >= XFS_INODE_CLUSTER_SIZE(mp)) {
+ blks_per_cluster = 1;
+ ninodes = mp->m_sb.sb_inopblock;
+ nbufs = XFS_IALLOC_BLOCKS(mp);
+ } else {
+ blks_per_cluster = XFS_INODE_CLUSTER_SIZE(mp) /
+ mp->m_sb.sb_blocksize;
+ ninodes = blks_per_cluster * mp->m_sb.sb_inopblock;
+ nbufs = XFS_IALLOC_BLOCKS(mp) / blks_per_cluster;
+ }
+
+ for (j = 0; j < nbufs; j++, inum += ninodes) {
+ blkno = XFS_AGB_TO_DADDR(mp, XFS_INO_TO_AGNO(mp, inum),
+ XFS_INO_TO_AGBNO(mp, inum));
+
+ /*
+ * We obtain and lock the backing buffer first in the process
+ * here, as we have to ensure that any dirty inode that we
+ * can't get the flush lock on is attached to the buffer.
+ * If we scan the in-memory inodes first, then buffer IO can
+ * complete before we get a lock on it, and hence we may fail
+ * to mark all the active inodes on the buffer stale.
+ */
+ bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, blkno,
+ mp->m_bsize * blks_per_cluster,
+ XBF_UNMAPPED);
+
+ if (!bp)
+ return ENOMEM;
+
+ /*
+ * This buffer may not have been correctly initialised as we
+ * didn't read it from disk. That's not important because we are
+ * only using to mark the buffer as stale in the log, and to
+ * attach stale cached inodes on it. That means it will never be
+ * dispatched for IO. If it is, we want to know about it, and we
+ * want it to fail. We can acheive this by adding a write
+ * verifier to the buffer.
+ */
+ bp->b_ops = &xfs_inode_buf_ops;
+
+ /*
+ * Walk the inodes already attached to the buffer and mark them
+ * stale. These will all have the flush locks held, so an
+ * in-memory inode walk can't lock them. By marking them all
+ * stale first, we will not attempt to lock them in the loop
+ * below as the XFS_ISTALE flag will be set.
+ */
+ lip = bp->b_fspriv;
+ while (lip) {
+ if (lip->li_type == XFS_LI_INODE) {
+ iip = (xfs_inode_log_item_t *)lip;
+ ASSERT(iip->ili_logged == 1);
+ lip->li_cb = xfs_istale_done;
+ xfs_trans_ail_copy_lsn(mp->m_ail,
+ &iip->ili_flush_lsn,
+ &iip->ili_item.li_lsn);
+ xfs_iflags_set(iip->ili_inode, XFS_ISTALE);
+ }
+ lip = lip->li_bio_list;
+ }
+
+
+ /*
+ * For each inode in memory attempt to add it to the inode
+ * buffer and set it up for being staled on buffer IO
+ * completion. This is safe as we've locked out tail pushing
+ * and flushing by locking the buffer.
+ *
+ * We have already marked every inode that was part of a
+ * transaction stale above, which means there is no point in
+ * even trying to lock them.
+ */
+ for (i = 0; i < ninodes; i++) {
+retry:
+ rcu_read_lock();
+ ip = radix_tree_lookup(&pag->pag_ici_root,
+ XFS_INO_TO_AGINO(mp, (inum + i)));
+
+ /* Inode not in memory, nothing to do */
+ if (!ip) {
+ rcu_read_unlock();
+ continue;
+ }
+
+ /*
+ * because this is an RCU protected lookup, we could
+ * find a recently freed or even reallocated inode
+ * during the lookup. We need to check under the
+ * i_flags_lock for a valid inode here. Skip it if it
+ * is not valid, the wrong inode or stale.
+ */
+ spin_lock(&ip->i_flags_lock);
+ if (ip->i_ino != inum + i ||
+ __xfs_iflags_test(ip, XFS_ISTALE)) {
+ spin_unlock(&ip->i_flags_lock);
+ rcu_read_unlock();
+ continue;
+ }
+ spin_unlock(&ip->i_flags_lock);
+
+ /*
+ * Don't try to lock/unlock the current inode, but we
+ * _cannot_ skip the other inodes that we did not find
+ * in the list attached to the buffer and are not
+ * already marked stale. If we can't lock it, back off
+ * and retry.
+ */
+ if (ip != free_ip &&
+ !xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) {
+ rcu_read_unlock();
+ delay(1);
+ goto retry;
+ }
+ rcu_read_unlock();
+
+ xfs_iflock(ip);
+ xfs_iflags_set(ip, XFS_ISTALE);
+
+ /*
+ * we don't need to attach clean inodes or those only
+ * with unlogged changes (which we throw away, anyway).
+ */
+ iip = ip->i_itemp;
+ if (!iip || xfs_inode_clean(ip)) {
+ ASSERT(ip != free_ip);
+ xfs_ifunlock(ip);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ continue;
+ }
+
+ iip->ili_last_fields = iip->ili_fields;
+ iip->ili_fields = 0;
+ iip->ili_logged = 1;
+ xfs_trans_ail_copy_lsn(mp->m_ail, &iip->ili_flush_lsn,
+ &iip->ili_item.li_lsn);
+
+ xfs_buf_attach_iodone(bp, xfs_istale_done,
+ &iip->ili_item);
+
+ if (ip != free_ip)
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ }
+
+ xfs_trans_stale_inode_buf(tp, bp);
+ xfs_trans_binval(tp, bp);
+ }
+
+ xfs_perag_put(pag);
+ return 0;
+}
+
+/*
+ * This is called to return an inode to the inode free list.
+ * The inode should already be truncated to 0 length and have
+ * no pages associated with it. This routine also assumes that
+ * the inode is already a part of the transaction.
+ *
+ * The on-disk copy of the inode will have been added to the list
+ * of unlinked inodes in the AGI. We need to remove the inode from
+ * that list atomically with respect to freeing it here.
+ */
+int
+xfs_ifree(
+ xfs_trans_t *tp,
+ xfs_inode_t *ip,
+ xfs_bmap_free_t *flist)
+{
+ int error;
+ int delete;
+ xfs_ino_t first_ino;
+
+ ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+ ASSERT(ip->i_d.di_nlink == 0);
+ ASSERT(ip->i_d.di_nextents == 0);
+ ASSERT(ip->i_d.di_anextents == 0);
+ ASSERT(ip->i_d.di_size == 0 || !S_ISREG(ip->i_d.di_mode));
+ ASSERT(ip->i_d.di_nblocks == 0);
+
+ /*
+ * Pull the on-disk inode from the AGI unlinked list.
+ */
+ error = xfs_iunlink_remove(tp, ip);
+ if (error)
+ return error;
+
+ error = xfs_difree(tp, ip->i_ino, flist, &delete, &first_ino);
+ if (error)
+ return error;
+
+ ip->i_d.di_mode = 0; /* mark incore inode as free */
+ ip->i_d.di_flags = 0;
+ ip->i_d.di_dmevmask = 0;
+ ip->i_d.di_forkoff = 0; /* mark the attr fork not in use */
+ ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
+ ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
+ /*
+ * Bump the generation count so no one will be confused
+ * by reincarnations of this inode.
+ */
+ ip->i_d.di_gen++;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+ if (delete)
+ error = xfs_ifree_cluster(ip, tp, first_ino);
+
+ return error;
+}
+
+STATIC int
+xfs_iflush_int(
+ struct xfs_inode *ip,
+ struct xfs_buf *bp)
+{
+ struct xfs_inode_log_item *iip = ip->i_itemp;
+ struct xfs_dinode *dip;
+ struct xfs_mount *mp = ip->i_mount;
+
+ ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
+ ASSERT(xfs_isiflocked(ip));
+ ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
+ ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
+ ASSERT(iip != NULL && iip->ili_fields != 0);
+
+ /* set *dip = inode's place in the buffer */
+ dip = (xfs_dinode_t *)xfs_buf_offset(bp, ip->i_imap.im_boffset);
+
+ if (XFS_TEST_ERROR(dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC),
+ mp, XFS_ERRTAG_IFLUSH_1, XFS_RANDOM_IFLUSH_1)) {
+ xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
+ "%s: Bad inode %Lu magic number 0x%x, ptr 0x%p",
+ __func__, ip->i_ino, be16_to_cpu(dip->di_magic), dip);
+ goto corrupt_out;
+ }
+ if (XFS_TEST_ERROR(ip->i_d.di_magic != XFS_DINODE_MAGIC,
+ mp, XFS_ERRTAG_IFLUSH_2, XFS_RANDOM_IFLUSH_2)) {
+ xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
+ "%s: Bad inode %Lu, ptr 0x%p, magic number 0x%x",
+ __func__, ip->i_ino, ip, ip->i_d.di_magic);
+ goto corrupt_out;
+ }
+ if (S_ISREG(ip->i_d.di_mode)) {
+ if (XFS_TEST_ERROR(
+ (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS) &&
+ (ip->i_d.di_format != XFS_DINODE_FMT_BTREE),
+ mp, XFS_ERRTAG_IFLUSH_3, XFS_RANDOM_IFLUSH_3)) {
+ xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
+ "%s: Bad regular inode %Lu, ptr 0x%p",
+ __func__, ip->i_ino, ip);
+ goto corrupt_out;
+ }
+ } else if (S_ISDIR(ip->i_d.di_mode)) {
+ if (XFS_TEST_ERROR(
+ (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS) &&
+ (ip->i_d.di_format != XFS_DINODE_FMT_BTREE) &&
+ (ip->i_d.di_format != XFS_DINODE_FMT_LOCAL),
+ mp, XFS_ERRTAG_IFLUSH_4, XFS_RANDOM_IFLUSH_4)) {
+ xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
+ "%s: Bad directory inode %Lu, ptr 0x%p",
+ __func__, ip->i_ino, ip);
+ goto corrupt_out;
+ }
+ }
+ if (XFS_TEST_ERROR(ip->i_d.di_nextents + ip->i_d.di_anextents >
+ ip->i_d.di_nblocks, mp, XFS_ERRTAG_IFLUSH_5,
+ XFS_RANDOM_IFLUSH_5)) {
+ xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
+ "%s: detected corrupt incore inode %Lu, "
+ "total extents = %d, nblocks = %Ld, ptr 0x%p",
+ __func__, ip->i_ino,
+ ip->i_d.di_nextents + ip->i_d.di_anextents,
+ ip->i_d.di_nblocks, ip);
+ goto corrupt_out;
+ }
+ if (XFS_TEST_ERROR(ip->i_d.di_forkoff > mp->m_sb.sb_inodesize,
+ mp, XFS_ERRTAG_IFLUSH_6, XFS_RANDOM_IFLUSH_6)) {
+ xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
+ "%s: bad inode %Lu, forkoff 0x%x, ptr 0x%p",
+ __func__, ip->i_ino, ip->i_d.di_forkoff, ip);
+ goto corrupt_out;
+ }
+ /*
+ * bump the flush iteration count, used to detect flushes which
+ * postdate a log record during recovery. This is redundant as we now
+ * log every change and hence this can't happen. Still, it doesn't hurt.
+ */
+ ip->i_d.di_flushiter++;
+
+ /*
+ * Copy the dirty parts of the inode into the on-disk
+ * inode. We always copy out the core of the inode,
+ * because if the inode is dirty at all the core must
+ * be.
+ */
+ xfs_dinode_to_disk(dip, &ip->i_d);
+
+ /* Wrap, we never let the log put out DI_MAX_FLUSH */
+ if (ip->i_d.di_flushiter == DI_MAX_FLUSH)
+ ip->i_d.di_flushiter = 0;
+
+ /*
+ * If this is really an old format inode and the superblock version
+ * has not been updated to support only new format inodes, then
+ * convert back to the old inode format. If the superblock version
+ * has been updated, then make the conversion permanent.
+ */
+ ASSERT(ip->i_d.di_version == 1 || xfs_sb_version_hasnlink(&mp->m_sb));
+ if (ip->i_d.di_version == 1) {
+ if (!xfs_sb_version_hasnlink(&mp->m_sb)) {
+ /*
+ * Convert it back.
+ */
+ ASSERT(ip->i_d.di_nlink <= XFS_MAXLINK_1);
+ dip->di_onlink = cpu_to_be16(ip->i_d.di_nlink);
+ } else {
+ /*
+ * The superblock version has already been bumped,
+ * so just make the conversion to the new inode
+ * format permanent.
+ */
+ ip->i_d.di_version = 2;
+ dip->di_version = 2;
+ ip->i_d.di_onlink = 0;
+ dip->di_onlink = 0;
+ memset(&(ip->i_d.di_pad[0]), 0, sizeof(ip->i_d.di_pad));
+ memset(&(dip->di_pad[0]), 0,
+ sizeof(dip->di_pad));
+ ASSERT(xfs_get_projid(ip) == 0);
+ }
+ }
+
+ xfs_iflush_fork(ip, dip, iip, XFS_DATA_FORK, bp);
+ if (XFS_IFORK_Q(ip))
+ xfs_iflush_fork(ip, dip, iip, XFS_ATTR_FORK, bp);
+ xfs_inobp_check(mp, bp);
+
+ /*
+ * We've recorded everything logged in the inode, so we'd like to clear
+ * the ili_fields bits so we don't log and flush things unnecessarily.
+ * However, we can't stop logging all this information until the data
+ * we've copied into the disk buffer is written to disk. If we did we
+ * might overwrite the copy of the inode in the log with all the data
+ * after re-logging only part of it, and in the face of a crash we
+ * wouldn't have all the data we need to recover.
+ *
+ * What we do is move the bits to the ili_last_fields field. When
+ * logging the inode, these bits are moved back to the ili_fields field.
+ * In the xfs_iflush_done() routine we clear ili_last_fields, since we
+ * know that the information those bits represent is permanently on
+ * disk. As long as the flush completes before the inode is logged
+ * again, then both ili_fields and ili_last_fields will be cleared.
+ *
+ * We can play with the ili_fields bits here, because the inode lock
+ * must be held exclusively in order to set bits there and the flush
+ * lock protects the ili_last_fields bits. Set ili_logged so the flush
+ * done routine can tell whether or not to look in the AIL. Also, store
+ * the current LSN of the inode so that we can tell whether the item has
+ * moved in the AIL from xfs_iflush_done(). In order to read the lsn we
+ * need the AIL lock, because it is a 64 bit value that cannot be read
+ * atomically.
+ */
+ iip->ili_last_fields = iip->ili_fields;
+ iip->ili_fields = 0;
+ iip->ili_logged = 1;
+
+ xfs_trans_ail_copy_lsn(mp->m_ail, &iip->ili_flush_lsn,
+ &iip->ili_item.li_lsn);
+
+ /*
+ * Attach the function xfs_iflush_done to the inode's
+ * buffer. This will remove the inode from the AIL
+ * and unlock the inode's flush lock when the inode is
+ * completely written to disk.
+ */
+ xfs_buf_attach_iodone(bp, xfs_iflush_done, &iip->ili_item);
+
+ /* update the lsn in the on disk inode if required */
+ if (ip->i_d.di_version == 3)
+ dip->di_lsn = cpu_to_be64(iip->ili_item.li_lsn);
+
+ /* generate the checksum. */
+ xfs_dinode_calc_crc(mp, dip);
+
+ ASSERT(bp->b_fspriv != NULL);
+ ASSERT(bp->b_iodone != NULL);
+ return 0;
+
+corrupt_out:
+ return XFS_ERROR(EFSCORRUPTED);
+}
+
+STATIC int
+xfs_iflush_cluster(
+ xfs_inode_t *ip,
+ xfs_buf_t *bp)
+{
+ xfs_mount_t *mp = ip->i_mount;
+ struct xfs_perag *pag;
+ unsigned long first_index, mask;
+ unsigned long inodes_per_cluster;
+ int ilist_size;
+ xfs_inode_t **ilist;
+ xfs_inode_t *iq;
+ int nr_found;
+ int clcount = 0;
+ int bufwasdelwri;
+ int i;
+
+ pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
+
+ inodes_per_cluster = XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog;
+ ilist_size = inodes_per_cluster * sizeof(xfs_inode_t *);
+ ilist = kmem_alloc(ilist_size, KM_MAYFAIL|KM_NOFS);
+ if (!ilist)
+ goto out_put;
+
+ mask = ~(((XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog)) - 1);
+ first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & mask;
+ rcu_read_lock();
+ /* really need a gang lookup range call here */
+ nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)ilist,
+ first_index, inodes_per_cluster);
+ if (nr_found == 0)
+ goto out_free;
+
+ for (i = 0; i < nr_found; i++) {
+ iq = ilist[i];
+ if (iq == ip)
+ continue;
+
+ /*
+ * because this is an RCU protected lookup, we could find a
+ * recently freed or even reallocated inode during the lookup.
+ * We need to check under the i_flags_lock for a valid inode
+ * here. Skip it if it is not valid or the wrong inode.
+ */
+ spin_lock(&ip->i_flags_lock);
+ if (!ip->i_ino ||
+ (XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) {
+ spin_unlock(&ip->i_flags_lock);
+ continue;
+ }
+ spin_unlock(&ip->i_flags_lock);
+
+ /*
+ * Do an un-protected check to see if the inode is dirty and
+ * is a candidate for flushing. These checks will be repeated
+ * later after the appropriate locks are acquired.
+ */
+ if (xfs_inode_clean(iq) && xfs_ipincount(iq) == 0)
+ continue;
+
+ /*
+ * Try to get locks. If any are unavailable or it is pinned,
+ * then this inode cannot be flushed and is skipped.
+ */
+
+ if (!xfs_ilock_nowait(iq, XFS_ILOCK_SHARED))
+ continue;
+ if (!xfs_iflock_nowait(iq)) {
+ xfs_iunlock(iq, XFS_ILOCK_SHARED);
+ continue;
+ }
+ if (xfs_ipincount(iq)) {
+ xfs_ifunlock(iq);
+ xfs_iunlock(iq, XFS_ILOCK_SHARED);
+ continue;
+ }
+
+ /*
+ * arriving here means that this inode can be flushed. First
+ * re-check that it's dirty before flushing.
+ */
+ if (!xfs_inode_clean(iq)) {
+ int error;
+ error = xfs_iflush_int(iq, bp);
+ if (error) {
+ xfs_iunlock(iq, XFS_ILOCK_SHARED);
+ goto cluster_corrupt_out;
+ }
+ clcount++;
+ } else {
+ xfs_ifunlock(iq);
+ }
+ xfs_iunlock(iq, XFS_ILOCK_SHARED);
+ }
+
+ if (clcount) {
+ XFS_STATS_INC(xs_icluster_flushcnt);
+ XFS_STATS_ADD(xs_icluster_flushinode, clcount);
+ }
+
+out_free:
+ rcu_read_unlock();
+ kmem_free(ilist);
+out_put:
+ xfs_perag_put(pag);
+ return 0;
+
+
+cluster_corrupt_out:
+ /*
+ * Corruption detected in the clustering loop. Invalidate the
+ * inode buffer and shut down the filesystem.
+ */
+ rcu_read_unlock();
+ /*
+ * Clean up the buffer. If it was delwri, just release it --
+ * brelse can handle it with no problems. If not, shut down the
+ * filesystem before releasing the buffer.
+ */
+ bufwasdelwri = (bp->b_flags & _XBF_DELWRI_Q);
+ if (bufwasdelwri)
+ xfs_buf_relse(bp);
+
+ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+
+ if (!bufwasdelwri) {
+ /*
+ * Just like incore_relse: if we have b_iodone functions,
+ * mark the buffer as an error and call them. Otherwise
+ * mark it as stale and brelse.
+ */
+ if (bp->b_iodone) {
+ XFS_BUF_UNDONE(bp);
+ xfs_buf_stale(bp);
+ xfs_buf_ioerror(bp, EIO);
+ xfs_buf_ioend(bp, 0);
+ } else {
+ xfs_buf_stale(bp);
+ xfs_buf_relse(bp);
+ }
+ }
+
+ /*
+ * Unlocks the flush lock
+ */
+ xfs_iflush_abort(iq, false);
+ kmem_free(ilist);
+ xfs_perag_put(pag);
+ return XFS_ERROR(EFSCORRUPTED);
+}
+
+/*
+ * Flush dirty inode metadata into the backing buffer.
+ *
+ * The caller must have the inode lock and the inode flush lock held. The
+ * inode lock will still be held upon return to the caller, and the inode
+ * flush lock will be released after the inode has reached the disk.
+ *
+ * The caller must write out the buffer returned in *bpp and release it.
+ */
+int
+xfs_iflush(
+ struct xfs_inode *ip,
+ struct xfs_buf **bpp)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_buf *bp;
+ struct xfs_dinode *dip;
+ int error;
+
+ XFS_STATS_INC(xs_iflush_count);
+
+ ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
+ ASSERT(xfs_isiflocked(ip));
+ ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
+ ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
+
+ *bpp = NULL;
+
+ xfs_iunpin_wait(ip);
+
+ /*
+ * For stale inodes we cannot rely on the backing buffer remaining
+ * stale in cache for the remaining life of the stale inode and so
+ * xfs_imap_to_bp() below may give us a buffer that no longer contains
+ * inodes below. We have to check this after ensuring the inode is
+ * unpinned so that it is safe to reclaim the stale inode after the
+ * flush call.
+ */
+ if (xfs_iflags_test(ip, XFS_ISTALE)) {
+ xfs_ifunlock(ip);
+ return 0;
+ }
+
+ /*
+ * This may have been unpinned because the filesystem is shutting
+ * down forcibly. If that's the case we must not write this inode
+ * to disk, because the log record didn't make it to disk.
+ *
+ * We also have to remove the log item from the AIL in this case,
+ * as we wait for an empty AIL as part of the unmount process.
+ */
+ if (XFS_FORCED_SHUTDOWN(mp)) {
+ error = XFS_ERROR(EIO);
+ goto abort_out;
+ }
+
+ /*
+ * Get the buffer containing the on-disk inode.
+ */
+ error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
+ 0);
+ if (error || !bp) {
+ xfs_ifunlock(ip);
+ return error;
+ }
+
+ /*
+ * First flush out the inode that xfs_iflush was called with.
+ */
+ error = xfs_iflush_int(ip, bp);
+ if (error)
+ goto corrupt_out;
+
+ /*
+ * If the buffer is pinned then push on the log now so we won't
+ * get stuck waiting in the write for too long.
+ */
+ if (xfs_buf_ispinned(bp))
+ xfs_log_force(mp, 0);
+
+ /*
+ * inode clustering:
+ * see if other inodes can be gathered into this write
+ */
+ error = xfs_iflush_cluster(ip, bp);
+ if (error)
+ goto cluster_corrupt_out;
+
+ *bpp = bp;
+ return 0;
+
+corrupt_out:
+ xfs_buf_relse(bp);
+ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+cluster_corrupt_out:
+ error = XFS_ERROR(EFSCORRUPTED);
+abort_out:
+ /*
+ * Unlocks the flush lock
+ */
+ xfs_iflush_abort(ip, false);
+ return error;
+}
+
diff --git a/fs/xfs/xfs_inode_ops.h b/fs/xfs/xfs_inode_ops.h
new file mode 100644
index 0000000..8556f5a
--- /dev/null
+++ b/fs/xfs/xfs_inode_ops.h
@@ -0,0 +1,353 @@
+/*
+ * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_INODE_OPS_H__
+#define __XFS_INODE_OPS_H__
+
+/* kernel only definitions */
+
+struct xfs_buf;
+struct xfs_bmap_free;
+struct xfs_bmbt_irec;
+struct xfs_inode_log_item;
+struct xfs_mount;
+struct xfs_trans;
+struct xfs_dquot;
+
+typedef struct xfs_inode {
+ /* Inode linking and identification information. */
+ struct xfs_mount *i_mount; /* fs mount struct ptr */
+ struct xfs_dquot *i_udquot; /* user dquot */
+ struct xfs_dquot *i_gdquot; /* group dquot */
+
+ /* Inode location stuff */
+ xfs_ino_t i_ino; /* inode number (agno/agino)*/
+ struct xfs_imap i_imap; /* location for xfs_imap() */
+
+ /* Extent information. */
+ xfs_ifork_t *i_afp; /* attribute fork pointer */
+ xfs_ifork_t i_df; /* data fork */
+
+ /* Transaction and locking information. */
+ struct xfs_inode_log_item *i_itemp; /* logging information */
+ mrlock_t i_lock; /* inode lock */
+ mrlock_t i_iolock; /* inode IO lock */
+ atomic_t i_pincount; /* inode pin count */
+ spinlock_t i_flags_lock; /* inode i_flags lock */
+ /* Miscellaneous state. */
+ unsigned long i_flags; /* see defined flags below */
+ unsigned int i_delayed_blks; /* count of delay alloc blks */
+
+ xfs_icdinode_t i_d; /* most of ondisk inode */
+
+ /* VFS inode */
+ struct inode i_vnode; /* embedded VFS inode */
+} xfs_inode_t;
+
+/* Convert from vfs inode to xfs inode */
+static inline struct xfs_inode *XFS_I(struct inode *inode)
+{
+ return container_of(inode, struct xfs_inode, i_vnode);
+}
+
+/* convert from xfs inode to vfs inode */
+static inline struct inode *VFS_I(struct xfs_inode *ip)
+{
+ return &ip->i_vnode;
+}
+
+/*
+ * For regular files we only update the on-disk filesize when actually
+ * writing data back to disk. Until then only the copy in the VFS inode
+ * is uptodate.
+ */
+static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
+{
+ if (S_ISREG(ip->i_d.di_mode))
+ return i_size_read(VFS_I(ip));
+ return ip->i_d.di_size;
+}
+
+/*
+ * If this I/O goes past the on-disk inode size update it unless it would
+ * be past the current in-core inode size.
+ */
+static inline xfs_fsize_t
+xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size)
+{
+ xfs_fsize_t i_size = i_size_read(VFS_I(ip));
+
+ if (new_size > i_size)
+ new_size = i_size;
+ return new_size > ip->i_d.di_size ? new_size : 0;
+}
+
+/*
+ * i_flags helper functions
+ */
+static inline void
+__xfs_iflags_set(xfs_inode_t *ip, unsigned short flags)
+{
+ ip->i_flags |= flags;
+}
+
+static inline void
+xfs_iflags_set(xfs_inode_t *ip, unsigned short flags)
+{
+ spin_lock(&ip->i_flags_lock);
+ __xfs_iflags_set(ip, flags);
+ spin_unlock(&ip->i_flags_lock);
+}
+
+static inline void
+xfs_iflags_clear(xfs_inode_t *ip, unsigned short flags)
+{
+ spin_lock(&ip->i_flags_lock);
+ ip->i_flags &= ~flags;
+ spin_unlock(&ip->i_flags_lock);
+}
+
+static inline int
+__xfs_iflags_test(xfs_inode_t *ip, unsigned short flags)
+{
+ return (ip->i_flags & flags);
+}
+
+static inline int
+xfs_iflags_test(xfs_inode_t *ip, unsigned short flags)
+{
+ int ret;
+ spin_lock(&ip->i_flags_lock);
+ ret = __xfs_iflags_test(ip, flags);
+ spin_unlock(&ip->i_flags_lock);
+ return ret;
+}
+
+static inline int
+xfs_iflags_test_and_clear(xfs_inode_t *ip, unsigned short flags)
+{
+ int ret;
+
+ spin_lock(&ip->i_flags_lock);
+ ret = ip->i_flags & flags;
+ if (ret)
+ ip->i_flags &= ~flags;
+ spin_unlock(&ip->i_flags_lock);
+ return ret;
+}
+
+static inline int
+xfs_iflags_test_and_set(xfs_inode_t *ip, unsigned short flags)
+{
+ int ret;
+
+ spin_lock(&ip->i_flags_lock);
+ ret = ip->i_flags & flags;
+ if (!ret)
+ ip->i_flags |= flags;
+ spin_unlock(&ip->i_flags_lock);
+ return ret;
+}
+
+/*
+ * Project quota id helpers (previously projid was 16bit only
+ * and using two 16bit values to hold new 32bit projid was chosen
+ * to retain compatibility with "old" filesystems).
+ */
+static inline prid_t
+xfs_get_projid(struct xfs_inode *ip)
+{
+ return (prid_t)ip->i_d.di_projid_hi << 16 | ip->i_d.di_projid_lo;
+}
+
+static inline void
+xfs_set_projid(struct xfs_inode *ip,
+ prid_t projid)
+{
+ ip->i_d.di_projid_hi = (__uint16_t) (projid >> 16);
+ ip->i_d.di_projid_lo = (__uint16_t) (projid & 0xffff);
+}
+
+/*
+ * In-core inode flags.
+ */
+#define XFS_IRECLAIM (1 << 0) /* started reclaiming this inode */
+#define XFS_ISTALE (1 << 1) /* inode has been staled */
+#define XFS_IRECLAIMABLE (1 << 2) /* inode can be reclaimed */
+#define XFS_INEW (1 << 3) /* inode has just been allocated */
+#define XFS_IFILESTREAM (1 << 4) /* inode is in a filestream dir. */
+#define XFS_ITRUNCATED (1 << 5) /* truncated down so flush-on-close */
+#define XFS_IDIRTY_RELEASE (1 << 6) /* dirty release already seen */
+#define __XFS_IFLOCK_BIT 7 /* inode is being flushed right now */
+#define XFS_IFLOCK (1 << __XFS_IFLOCK_BIT)
+#define __XFS_IPINNED_BIT 8 /* wakeup key for zero pin count */
+#define XFS_IPINNED (1 << __XFS_IPINNED_BIT)
+#define XFS_IDONTCACHE (1 << 9) /* don't cache the inode long term */
+
+/*
+ * Per-lifetime flags need to be reset when re-using a reclaimable inode during
+ * inode lookup. This prevents unintended behaviour on the new inode from
+ * ocurring.
+ */
+#define XFS_IRECLAIM_RESET_FLAGS \
+ (XFS_IRECLAIMABLE | XFS_IRECLAIM | \
+ XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | \
+ XFS_IFILESTREAM);
+
+/*
+ * Synchronize processes attempting to flush the in-core inode back to disk.
+ */
+
+extern void __xfs_iflock(struct xfs_inode *ip);
+
+static inline int xfs_iflock_nowait(struct xfs_inode *ip)
+{
+ return !xfs_iflags_test_and_set(ip, XFS_IFLOCK);
+}
+
+static inline void xfs_iflock(struct xfs_inode *ip)
+{
+ if (!xfs_iflock_nowait(ip))
+ __xfs_iflock(ip);
+}
+
+static inline void xfs_ifunlock(struct xfs_inode *ip)
+{
+ xfs_iflags_clear(ip, XFS_IFLOCK);
+ smp_mb();
+ wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
+}
+
+static inline int xfs_isiflocked(struct xfs_inode *ip)
+{
+ return xfs_iflags_test(ip, XFS_IFLOCK);
+}
+
+/*
+ * Flags for inode locking.
+ * Bit ranges: 1<<1 - 1<<16-1 -- iolock/ilock modes (bitfield)
+ * 1<<16 - 1<<32-1 -- lockdep annotation (integers)
+ */
+#define XFS_IOLOCK_EXCL (1<<0)
+#define XFS_IOLOCK_SHARED (1<<1)
+#define XFS_ILOCK_EXCL (1<<2)
+#define XFS_ILOCK_SHARED (1<<3)
+
+#define XFS_LOCK_MASK (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED \
+ | XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)
+
+#define XFS_LOCK_FLAGS \
+ { XFS_IOLOCK_EXCL, "IOLOCK_EXCL" }, \
+ { XFS_IOLOCK_SHARED, "IOLOCK_SHARED" }, \
+ { XFS_ILOCK_EXCL, "ILOCK_EXCL" }, \
+ { XFS_ILOCK_SHARED, "ILOCK_SHARED" }
+
+
+/*
+ * Flags for lockdep annotations.
+ *
+ * XFS_LOCK_PARENT - for directory operations that require locking a
+ * parent directory inode and a child entry inode. The parent gets locked
+ * with this flag so it gets a lockdep subclass of 1 and the child entry
+ * lock will have a lockdep subclass of 0.
+ *
+ * XFS_LOCK_RTBITMAP/XFS_LOCK_RTSUM - the realtime device bitmap and summary
+ * inodes do not participate in the normal lock order, and thus have their
+ * own subclasses.
+ *
+ * XFS_LOCK_INUMORDER - for locking several inodes at the some time
+ * with xfs_lock_inodes(). This flag is used as the starting subclass
+ * and each subsequent lock acquired will increment the subclass by one.
+ * So the first lock acquired will have a lockdep subclass of 4, the
+ * second lock will have a lockdep subclass of 5, and so on. It is
+ * the responsibility of the class builder to shift this to the correct
+ * portion of the lock_mode lockdep mask.
+ */
+#define XFS_LOCK_PARENT 1
+#define XFS_LOCK_RTBITMAP 2
+#define XFS_LOCK_RTSUM 3
+#define XFS_LOCK_INUMORDER 4
+
+#define XFS_IOLOCK_SHIFT 16
+#define XFS_IOLOCK_PARENT (XFS_LOCK_PARENT << XFS_IOLOCK_SHIFT)
+
+#define XFS_ILOCK_SHIFT 24
+#define XFS_ILOCK_PARENT (XFS_LOCK_PARENT << XFS_ILOCK_SHIFT)
+#define XFS_ILOCK_RTBITMAP (XFS_LOCK_RTBITMAP << XFS_ILOCK_SHIFT)
+#define XFS_ILOCK_RTSUM (XFS_LOCK_RTSUM << XFS_ILOCK_SHIFT)
+
+#define XFS_IOLOCK_DEP_MASK 0x00ff0000
+#define XFS_ILOCK_DEP_MASK 0xff000000
+#define XFS_LOCK_DEP_MASK (XFS_IOLOCK_DEP_MASK | XFS_ILOCK_DEP_MASK)
+
+#define XFS_IOLOCK_DEP(flags) (((flags) & XFS_IOLOCK_DEP_MASK) >> XFS_IOLOCK_SHIFT)
+#define XFS_ILOCK_DEP(flags) (((flags) & XFS_ILOCK_DEP_MASK) >> XFS_ILOCK_SHIFT)
+
+/*
+ * For multiple groups support: if S_ISGID bit is set in the parent
+ * directory, group of new file is set to that of the parent, and
+ * new subdirectory gets S_ISGID bit from parent.
+ */
+#define XFS_INHERIT_GID(pip) \
+ (((pip)->i_mount->m_flags & XFS_MOUNT_GRPID) || \
+ ((pip)->i_d.di_mode & S_ISGID))
+
+
+/*
+ * xfs_inode.c prototypes.
+ */
+void xfs_ilock(xfs_inode_t *, uint);
+int xfs_ilock_nowait(xfs_inode_t *, uint);
+void xfs_iunlock(xfs_inode_t *, uint);
+void xfs_ilock_demote(xfs_inode_t *, uint);
+int xfs_isilocked(xfs_inode_t *, uint);
+uint xfs_ilock_map_shared(xfs_inode_t *);
+void xfs_iunlock_map_shared(xfs_inode_t *, uint);
+int xfs_ialloc(struct xfs_trans *, xfs_inode_t *, umode_t,
+ xfs_nlink_t, xfs_dev_t, prid_t, int,
+ struct xfs_buf **, xfs_inode_t **);
+
+uint xfs_ip2xflags(struct xfs_inode *);
+uint xfs_dic2xflags(struct xfs_dinode *);
+int xfs_ifree(struct xfs_trans *, xfs_inode_t *,
+ struct xfs_bmap_free *);
+int xfs_itruncate_extents(struct xfs_trans **, struct xfs_inode *,
+ int, xfs_fsize_t);
+int xfs_iunlink(struct xfs_trans *, xfs_inode_t *);
+
+void xfs_iext_realloc(xfs_inode_t *, int, int);
+void xfs_iunpin_wait(xfs_inode_t *);
+int xfs_iflush(struct xfs_inode *, struct xfs_buf **);
+void xfs_lock_inodes(xfs_inode_t **, int, uint);
+void xfs_lock_two_inodes(xfs_inode_t *, xfs_inode_t *, uint);
+
+xfs_extlen_t xfs_get_extsz_hint(struct xfs_inode *ip);
+
+#define IHOLD(ip) \
+do { \
+ ASSERT(atomic_read(&VFS_I(ip)->i_count) > 0) ; \
+ ihold(VFS_I(ip)); \
+ trace_xfs_ihold(ip, _THIS_IP_); \
+} while (0)
+
+#define IRELE(ip) \
+do { \
+ trace_xfs_irele(ip, _THIS_IP_); \
+ iput(VFS_I(ip)); \
+} while (0)
+
+#endif /* __XFS_INODE_OPS_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 19/60] xfs: consolidate xfs_vnodeops.c into xfs_inode_ops.c
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (17 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 18/60] xfs: split out xfs inode operations " Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 20/60] xfs: move xfs_getbmap to xfs_extent_ops.c Dave Chinner
` (42 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Now we have xfs_inode_ops.c for holding kernel-only XFS inode
operations, move all the operations from xfs_vnodeops.c to this new
file as it holds another set of kernel-only inode operations. The
name of this file traces back to the days of Irix and it's vnodes,
so it's more relevant to the current code to move them to the inode
operations file...
This consolidates some inode locking functions, some EOF
preallocation functions, and a bunch of xfs inode operations into
the one file.
This leaves only internal preallocation and hole punching functions
in vnodeops.c. Rename this file to xfs_extent_ops.c so we have a
place to consolidate all the various in-kernel physical extent
manipulation and querying functions.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 2 +-
fs/xfs/xfs_acl.c | 1 -
fs/xfs/xfs_aops.c | 1 -
fs/xfs/xfs_attr.c | 1 -
fs/xfs/xfs_attr.h | 8 +
fs/xfs/xfs_bmap.c | 1 -
fs/xfs/xfs_dfrag.c | 1 -
fs/xfs/xfs_dir2.c | 1 -
fs/xfs/xfs_dir2.h | 3 +
fs/xfs/xfs_export.c | 1 -
fs/xfs/xfs_extent_ops.c | 772 +++++++++++++++++++
fs/xfs/xfs_extent_ops.h | 24 +
fs/xfs/xfs_file.c | 2 +-
fs/xfs/xfs_icache.c | 1 -
fs/xfs/xfs_inode.c | 1 -
fs/xfs/xfs_inode_ops.c | 1147 ++++++++++++++++++++++++++++-
fs/xfs/xfs_inode_ops.h | 24 +-
fs/xfs/xfs_ioctl.c | 37 +-
fs/xfs/xfs_ioctl.h | 6 +
fs/xfs/xfs_ioctl32.c | 1 -
fs/xfs/xfs_iops.c | 2 +-
fs/xfs/xfs_iops.h | 13 +
fs/xfs/xfs_rename.c | 1 -
fs/xfs/xfs_super.c | 1 -
fs/xfs/xfs_vnodeops.c | 1875 -----------------------------------------------
fs/xfs/xfs_vnodeops.h | 56 --
fs/xfs/xfs_xattr.c | 1 -
27 files changed, 1996 insertions(+), 1988 deletions(-)
create mode 100644 fs/xfs/xfs_extent_ops.c
create mode 100644 fs/xfs/xfs_extent_ops.h
delete mode 100644 fs/xfs/xfs_vnodeops.c
delete mode 100644 fs/xfs/xfs_vnodeops.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index f372ed9..8567bd6 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -37,6 +37,7 @@ xfs-y += xfs_aops.o \
xfs_error.o \
xfs_export.o \
xfs_extent_busy.o \
+ xfs_extent_ops.o \
xfs_file.o \
xfs_filestream.o \
xfs_fsops.o \
@@ -52,7 +53,6 @@ xfs-y += xfs_aops.o \
xfs_rename.o \
xfs_super.o \
xfs_utils.o \
- xfs_vnodeops.o \
xfs_xattr.o \
kmem.o \
uuid.o
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index 306d883..fecdcd0 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -20,7 +20,6 @@
#include "xfs_attr.h"
#include "xfs_bmap_btree.h"
#include "xfs_inode.h"
-#include "xfs_vnodeops.h"
#include "xfs_sb.h"
#include "xfs_mount.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 41a6950..2895178 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -28,7 +28,6 @@
#include "xfs_alloc.h"
#include "xfs_error.h"
#include "xfs_iomap.h"
-#include "xfs_vnodeops.h"
#include "xfs_trace.h"
#include "xfs_bmap.h"
#include <linux/aio.h>
diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index 10c18ff..0c9bb2e 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -38,7 +38,6 @@
#include "xfs_error.h"
#include "xfs_quota.h"
#include "xfs_trans_space.h"
-#include "xfs_vnodeops.h"
#include "xfs_trace.h"
/*
diff --git a/fs/xfs/xfs_attr.h b/fs/xfs/xfs_attr.h
index cb604b5..dd48245 100644
--- a/fs/xfs/xfs_attr.h
+++ b/fs/xfs/xfs_attr.h
@@ -142,5 +142,13 @@ typedef struct xfs_attr_list_context {
int xfs_attr_inactive(struct xfs_inode *dp);
int xfs_attr_list_int(struct xfs_attr_list_context *);
int xfs_inode_hasattr(struct xfs_inode *ip);
+int xfs_attr_get(struct xfs_inode *ip, const unsigned char *name,
+ unsigned char *value, int *valuelenp, int flags);
+int xfs_attr_set(struct xfs_inode *dp, const unsigned char *name,
+ unsigned char *value, int valuelen, int flags);
+int xfs_attr_remove(struct xfs_inode *dp, const unsigned char *name, int flags);
+int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
+ int flags, struct attrlist_cursor_kern *cursor);
+
#endif /* __XFS_ATTR_H__ */
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 2ddeb43..533c786 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -47,7 +47,6 @@
#include "xfs_trans_space.h"
#include "xfs_buf_item.h"
#include "xfs_filestream.h"
-#include "xfs_vnodeops.h"
#include "xfs_trace.h"
#include "xfs_symlink.h"
diff --git a/fs/xfs/xfs_dfrag.c b/fs/xfs/xfs_dfrag.c
index c407e1c..a76356a 100644
--- a/fs/xfs/xfs_dfrag.c
+++ b/fs/xfs/xfs_dfrag.c
@@ -31,7 +31,6 @@
#include "xfs_itable.h"
#include "xfs_dfrag.h"
#include "xfs_error.h"
-#include "xfs_vnodeops.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_dir2.c b/fs/xfs/xfs_dir2.c
index c3263a5..841933c 100644
--- a/fs/xfs/xfs_dir2.c
+++ b/fs/xfs/xfs_dir2.c
@@ -35,7 +35,6 @@
#include "xfs_dir2.h"
#include "xfs_dir2_priv.h"
#include "xfs_error.h"
-#include "xfs_vnodeops.h"
#include "xfs_trace.h"
struct xfs_name xfs_name_dotdot = { (unsigned char *)"..", 2};
diff --git a/fs/xfs/xfs_dir2.h b/fs/xfs/xfs_dir2.h
index 7ef6b0f..4faf66e 100644
--- a/fs/xfs/xfs_dir2.h
+++ b/fs/xfs/xfs_dir2.h
@@ -67,6 +67,9 @@ extern int xfs_dir2_sf_to_block(struct xfs_da_args *args);
*/
extern int xfs_dir3_data_readahead(struct xfs_trans *tp, struct xfs_inode *dp,
xfs_dablk_t bno, xfs_daddr_t mapped_bno);
+extern int xfs_readdir(struct xfs_inode *dp, void *dirent, size_t bufsize,
+ xfs_off_t *offset, filldir_t filldir);
+
/*
* Interface routines used by userspace utilities
diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
index 29c880f..066df42 100644
--- a/fs/xfs/xfs_export.c
+++ b/fs/xfs/xfs_export.c
@@ -26,7 +26,6 @@
#include "xfs_dir2_format.h"
#include "xfs_dir2.h"
#include "xfs_export.h"
-#include "xfs_vnodeops.h"
#include "xfs_bmap_btree.h"
#include "xfs_inode.h"
#include "xfs_inode_item.h"
diff --git a/fs/xfs/xfs_extent_ops.c b/fs/xfs/xfs_extent_ops.c
new file mode 100644
index 0000000..5590d23
--- /dev/null
+++ b/fs/xfs/xfs_extent_ops.c
@@ -0,0 +1,772 @@
+/*
+ * Copyright (c) 2000-2006 Silicon Graphics, Inc.
+ * Copyright (c) 2012 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_inode_item.h"
+#include "xfs_itable.h"
+#include "xfs_ialloc.h"
+#include "xfs_alloc.h"
+#include "xfs_bmap.h"
+#include "xfs_acl.h"
+#include "xfs_attr.h"
+#include "xfs_error.h"
+#include "xfs_quota.h"
+#include "xfs_utils.h"
+#include "xfs_rtalloc.h"
+#include "xfs_trans_space.h"
+#include "xfs_log_priv.h"
+#include "xfs_filestream.h"
+#include "xfs_trace.h"
+#include "xfs_icache.h"
+
+
+/*
+ * xfs_alloc_file_space()
+ * This routine allocates disk space for the given file.
+ *
+ * If alloc_type == 0, this request is for an ALLOCSP type
+ * request which will change the file size. In this case, no
+ * DMAPI event will be generated by the call. A TRUNCATE event
+ * will be generated later by xfs_setattr.
+ *
+ * If alloc_type != 0, this request is for a RESVSP type
+ * request, and a DMAPI DM_EVENT_WRITE will be generated if the
+ * lower block boundary byte address is less than the file's
+ * length.
+ *
+ * RETURNS:
+ * 0 on success
+ * errno on error
+ *
+ */
+STATIC int
+xfs_alloc_file_space(
+ xfs_inode_t *ip,
+ xfs_off_t offset,
+ xfs_off_t len,
+ int alloc_type,
+ int attr_flags)
+{
+ xfs_mount_t *mp = ip->i_mount;
+ xfs_off_t count;
+ xfs_filblks_t allocated_fsb;
+ xfs_filblks_t allocatesize_fsb;
+ xfs_extlen_t extsz, temp;
+ xfs_fileoff_t startoffset_fsb;
+ xfs_fsblock_t firstfsb;
+ int nimaps;
+ int quota_flag;
+ int rt;
+ xfs_trans_t *tp;
+ xfs_bmbt_irec_t imaps[1], *imapp;
+ xfs_bmap_free_t free_list;
+ uint qblocks, resblks, resrtextents;
+ int committed;
+ int error;
+
+ trace_xfs_alloc_file_space(ip);
+
+ if (XFS_FORCED_SHUTDOWN(mp))
+ return XFS_ERROR(EIO);
+
+ error = xfs_qm_dqattach(ip, 0);
+ if (error)
+ return error;
+
+ if (len <= 0)
+ return XFS_ERROR(EINVAL);
+
+ rt = XFS_IS_REALTIME_INODE(ip);
+ extsz = xfs_get_extsz_hint(ip);
+
+ count = len;
+ imapp = &imaps[0];
+ nimaps = 1;
+ startoffset_fsb = XFS_B_TO_FSBT(mp, offset);
+ allocatesize_fsb = XFS_B_TO_FSB(mp, count);
+
+ /*
+ * Allocate file space until done or until there is an error
+ */
+ while (allocatesize_fsb && !error) {
+ xfs_fileoff_t s, e;
+
+ /*
+ * Determine space reservations for data/realtime.
+ */
+ if (unlikely(extsz)) {
+ s = startoffset_fsb;
+ do_div(s, extsz);
+ s *= extsz;
+ e = startoffset_fsb + allocatesize_fsb;
+ if ((temp = do_mod(startoffset_fsb, extsz)))
+ e += temp;
+ if ((temp = do_mod(e, extsz)))
+ e += extsz - temp;
+ } else {
+ s = 0;
+ e = allocatesize_fsb;
+ }
+
+ /*
+ * The transaction reservation is limited to a 32-bit block
+ * count, hence we need to limit the number of blocks we are
+ * trying to reserve to avoid an overflow. We can't allocate
+ * more than @nimaps extents, and an extent is limited on disk
+ * to MAXEXTLEN (21 bits), so use that to enforce the limit.
+ */
+ resblks = min_t(xfs_fileoff_t, (e - s), (MAXEXTLEN * nimaps));
+ if (unlikely(rt)) {
+ resrtextents = qblocks = resblks;
+ resrtextents /= mp->m_sb.sb_rextsize;
+ resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
+ quota_flag = XFS_QMOPT_RES_RTBLKS;
+ } else {
+ resrtextents = 0;
+ resblks = qblocks = XFS_DIOSTRAT_SPACE_RES(mp, resblks);
+ quota_flag = XFS_QMOPT_RES_REGBLKS;
+ }
+
+ /*
+ * Allocate and setup the transaction.
+ */
+ tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
+ error = xfs_trans_reserve(tp, resblks,
+ XFS_WRITE_LOG_RES(mp), resrtextents,
+ XFS_TRANS_PERM_LOG_RES,
+ XFS_WRITE_LOG_COUNT);
+ /*
+ * Check for running out of space
+ */
+ if (error) {
+ /*
+ * Free the transaction structure.
+ */
+ ASSERT(error == ENOSPC || XFS_FORCED_SHUTDOWN(mp));
+ xfs_trans_cancel(tp, 0);
+ break;
+ }
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks,
+ 0, quota_flag);
+ if (error)
+ goto error1;
+
+ xfs_trans_ijoin(tp, ip, 0);
+
+ xfs_bmap_init(&free_list, &firstfsb);
+ error = xfs_bmapi_write(tp, ip, startoffset_fsb,
+ allocatesize_fsb, alloc_type, &firstfsb,
+ 0, imapp, &nimaps, &free_list);
+ if (error) {
+ goto error0;
+ }
+
+ /*
+ * Complete the transaction
+ */
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error) {
+ goto error0;
+ }
+
+ error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ if (error) {
+ break;
+ }
+
+ allocated_fsb = imapp->br_blockcount;
+
+ if (nimaps == 0) {
+ error = XFS_ERROR(ENOSPC);
+ break;
+ }
+
+ startoffset_fsb += allocated_fsb;
+ allocatesize_fsb -= allocated_fsb;
+ }
+
+ return error;
+
+error0: /* Cancel bmap, unlock inode, unreserve quota blocks, cancel trans */
+ xfs_bmap_cancel(&free_list);
+ xfs_trans_unreserve_quota_nblks(tp, ip, (long)qblocks, 0, quota_flag);
+
+error1: /* Just cancel transaction */
+ xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ return error;
+}
+
+/*
+ * Zero file bytes between startoff and endoff inclusive.
+ * The iolock is held exclusive and no blocks are buffered.
+ *
+ * This function is used by xfs_free_file_space() to zero
+ * partial blocks when the range to free is not block aligned.
+ * When unreserving space with boundaries that are not block
+ * aligned we round up the start and round down the end
+ * boundaries and then use this function to zero the parts of
+ * the blocks that got dropped during the rounding.
+ */
+STATIC int
+xfs_zero_remaining_bytes(
+ xfs_inode_t *ip,
+ xfs_off_t startoff,
+ xfs_off_t endoff)
+{
+ xfs_bmbt_irec_t imap;
+ xfs_fileoff_t offset_fsb;
+ xfs_off_t lastoffset;
+ xfs_off_t offset;
+ xfs_buf_t *bp;
+ xfs_mount_t *mp = ip->i_mount;
+ int nimap;
+ int error = 0;
+
+ /*
+ * Avoid doing I/O beyond eof - it's not necessary
+ * since nothing can read beyond eof. The space will
+ * be zeroed when the file is extended anyway.
+ */
+ if (startoff >= XFS_ISIZE(ip))
+ return 0;
+
+ if (endoff > XFS_ISIZE(ip))
+ endoff = XFS_ISIZE(ip);
+
+ bp = xfs_buf_get_uncached(XFS_IS_REALTIME_INODE(ip) ?
+ mp->m_rtdev_targp : mp->m_ddev_targp,
+ BTOBB(mp->m_sb.sb_blocksize), 0);
+ if (!bp)
+ return XFS_ERROR(ENOMEM);
+
+ xfs_buf_unlock(bp);
+
+ for (offset = startoff; offset <= endoff; offset = lastoffset + 1) {
+ offset_fsb = XFS_B_TO_FSBT(mp, offset);
+ nimap = 1;
+ error = xfs_bmapi_read(ip, offset_fsb, 1, &imap, &nimap, 0);
+ if (error || nimap < 1)
+ break;
+ ASSERT(imap.br_blockcount >= 1);
+ ASSERT(imap.br_startoff == offset_fsb);
+ lastoffset = XFS_FSB_TO_B(mp, imap.br_startoff + 1) - 1;
+ if (lastoffset > endoff)
+ lastoffset = endoff;
+ if (imap.br_startblock == HOLESTARTBLOCK)
+ continue;
+ ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
+ if (imap.br_state == XFS_EXT_UNWRITTEN)
+ continue;
+ XFS_BUF_UNDONE(bp);
+ XFS_BUF_UNWRITE(bp);
+ XFS_BUF_READ(bp);
+ XFS_BUF_SET_ADDR(bp, xfs_fsb_to_db(ip, imap.br_startblock));
+ xfsbdstrat(mp, bp);
+ error = xfs_buf_iowait(bp);
+ if (error) {
+ xfs_buf_ioerror_alert(bp,
+ "xfs_zero_remaining_bytes(read)");
+ break;
+ }
+ memset(bp->b_addr +
+ (offset - XFS_FSB_TO_B(mp, imap.br_startoff)),
+ 0, lastoffset - offset + 1);
+ XFS_BUF_UNDONE(bp);
+ XFS_BUF_UNREAD(bp);
+ XFS_BUF_WRITE(bp);
+ xfsbdstrat(mp, bp);
+ error = xfs_buf_iowait(bp);
+ if (error) {
+ xfs_buf_ioerror_alert(bp,
+ "xfs_zero_remaining_bytes(write)");
+ break;
+ }
+ }
+ xfs_buf_free(bp);
+ return error;
+}
+
+/*
+ * xfs_free_file_space()
+ * This routine frees disk space for the given file.
+ *
+ * This routine is only called by xfs_change_file_space
+ * for an UNRESVSP type call.
+ *
+ * RETURNS:
+ * 0 on success
+ * errno on error
+ *
+ */
+STATIC int
+xfs_free_file_space(
+ xfs_inode_t *ip,
+ xfs_off_t offset,
+ xfs_off_t len,
+ int attr_flags)
+{
+ int committed;
+ int done;
+ xfs_fileoff_t endoffset_fsb;
+ int error;
+ xfs_fsblock_t firstfsb;
+ xfs_bmap_free_t free_list;
+ xfs_bmbt_irec_t imap;
+ xfs_off_t ioffset;
+ xfs_extlen_t mod=0;
+ xfs_mount_t *mp;
+ int nimap;
+ uint resblks;
+ xfs_off_t rounding;
+ int rt;
+ xfs_fileoff_t startoffset_fsb;
+ xfs_trans_t *tp;
+ int need_iolock = 1;
+
+ mp = ip->i_mount;
+
+ trace_xfs_free_file_space(ip);
+
+ error = xfs_qm_dqattach(ip, 0);
+ if (error)
+ return error;
+
+ error = 0;
+ if (len <= 0) /* if nothing being freed */
+ return error;
+ rt = XFS_IS_REALTIME_INODE(ip);
+ startoffset_fsb = XFS_B_TO_FSB(mp, offset);
+ endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
+
+ if (attr_flags & XFS_ATTR_NOLOCK)
+ need_iolock = 0;
+ if (need_iolock) {
+ xfs_ilock(ip, XFS_IOLOCK_EXCL);
+ /* wait for the completion of any pending DIOs */
+ inode_dio_wait(VFS_I(ip));
+ }
+
+ rounding = max_t(xfs_off_t, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
+ ioffset = offset & ~(rounding - 1);
+ error = -filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
+ ioffset, -1);
+ if (error)
+ goto out_unlock_iolock;
+ truncate_pagecache_range(VFS_I(ip), ioffset, -1);
+
+ /*
+ * Need to zero the stuff we're not freeing, on disk.
+ * If it's a realtime file & can't use unwritten extents then we
+ * actually need to zero the extent edges. Otherwise xfs_bunmapi
+ * will take care of it for us.
+ */
+ if (rt && !xfs_sb_version_hasextflgbit(&mp->m_sb)) {
+ nimap = 1;
+ error = xfs_bmapi_read(ip, startoffset_fsb, 1,
+ &imap, &nimap, 0);
+ if (error)
+ goto out_unlock_iolock;
+ ASSERT(nimap == 0 || nimap == 1);
+ if (nimap && imap.br_startblock != HOLESTARTBLOCK) {
+ xfs_daddr_t block;
+
+ ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
+ block = imap.br_startblock;
+ mod = do_div(block, mp->m_sb.sb_rextsize);
+ if (mod)
+ startoffset_fsb += mp->m_sb.sb_rextsize - mod;
+ }
+ nimap = 1;
+ error = xfs_bmapi_read(ip, endoffset_fsb - 1, 1,
+ &imap, &nimap, 0);
+ if (error)
+ goto out_unlock_iolock;
+ ASSERT(nimap == 0 || nimap == 1);
+ if (nimap && imap.br_startblock != HOLESTARTBLOCK) {
+ ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
+ mod++;
+ if (mod && (mod != mp->m_sb.sb_rextsize))
+ endoffset_fsb -= mod;
+ }
+ }
+ if ((done = (endoffset_fsb <= startoffset_fsb)))
+ /*
+ * One contiguous piece to clear
+ */
+ error = xfs_zero_remaining_bytes(ip, offset, offset + len - 1);
+ else {
+ /*
+ * Some full blocks, possibly two pieces to clear
+ */
+ if (offset < XFS_FSB_TO_B(mp, startoffset_fsb))
+ error = xfs_zero_remaining_bytes(ip, offset,
+ XFS_FSB_TO_B(mp, startoffset_fsb) - 1);
+ if (!error &&
+ XFS_FSB_TO_B(mp, endoffset_fsb) < offset + len)
+ error = xfs_zero_remaining_bytes(ip,
+ XFS_FSB_TO_B(mp, endoffset_fsb),
+ offset + len - 1);
+ }
+
+ /*
+ * free file space until done or until there is an error
+ */
+ resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
+ while (!error && !done) {
+
+ /*
+ * allocate and setup the transaction. Allow this
+ * transaction to dip into the reserve blocks to ensure
+ * the freeing of the space succeeds at ENOSPC.
+ */
+ tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
+ tp->t_flags |= XFS_TRANS_RESERVE;
+ error = xfs_trans_reserve(tp,
+ resblks,
+ XFS_WRITE_LOG_RES(mp),
+ 0,
+ XFS_TRANS_PERM_LOG_RES,
+ XFS_WRITE_LOG_COUNT);
+
+ /*
+ * check for running out of space
+ */
+ if (error) {
+ /*
+ * Free the transaction structure.
+ */
+ ASSERT(error == ENOSPC || XFS_FORCED_SHUTDOWN(mp));
+ xfs_trans_cancel(tp, 0);
+ break;
+ }
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ error = xfs_trans_reserve_quota(tp, mp,
+ ip->i_udquot, ip->i_gdquot,
+ resblks, 0, XFS_QMOPT_RES_REGBLKS);
+ if (error)
+ goto error1;
+
+ xfs_trans_ijoin(tp, ip, 0);
+
+ /*
+ * issue the bunmapi() call to free the blocks
+ */
+ xfs_bmap_init(&free_list, &firstfsb);
+ error = xfs_bunmapi(tp, ip, startoffset_fsb,
+ endoffset_fsb - startoffset_fsb,
+ 0, 2, &firstfsb, &free_list, &done);
+ if (error) {
+ goto error0;
+ }
+
+ /*
+ * complete the transaction
+ */
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error) {
+ goto error0;
+ }
+
+ error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ }
+
+ out_unlock_iolock:
+ if (need_iolock)
+ xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+ return error;
+
+ error0:
+ xfs_bmap_cancel(&free_list);
+ error1:
+ xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
+ xfs_iunlock(ip, need_iolock ? (XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL) :
+ XFS_ILOCK_EXCL);
+ return error;
+}
+
+
+STATIC int
+xfs_zero_file_space(
+ struct xfs_inode *ip,
+ xfs_off_t offset,
+ xfs_off_t len,
+ int attr_flags)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ uint granularity;
+ xfs_off_t start_boundary;
+ xfs_off_t end_boundary;
+ int error;
+
+ granularity = max_t(uint, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
+
+ /*
+ * Round the range of extents we are going to convert inwards. If the
+ * offset is aligned, then it doesn't get changed so we zero from the
+ * start of the block offset points to.
+ */
+ start_boundary = round_up(offset, granularity);
+ end_boundary = round_down(offset + len, granularity);
+
+ ASSERT(start_boundary >= offset);
+ ASSERT(end_boundary <= offset + len);
+
+ if (!(attr_flags & XFS_ATTR_NOLOCK))
+ xfs_ilock(ip, XFS_IOLOCK_EXCL);
+
+ if (start_boundary < end_boundary - 1) {
+ /* punch out the page cache over the conversion range */
+ truncate_pagecache_range(VFS_I(ip), start_boundary,
+ end_boundary - 1);
+ /* convert the blocks */
+ error = xfs_alloc_file_space(ip, start_boundary,
+ end_boundary - start_boundary - 1,
+ XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT,
+ attr_flags);
+ if (error)
+ goto out_unlock;
+
+ /* We've handled the interior of the range, now for the edges */
+ if (start_boundary != offset)
+ error = xfs_iozero(ip, offset, start_boundary - offset);
+ if (error)
+ goto out_unlock;
+
+ if (end_boundary != offset + len)
+ error = xfs_iozero(ip, end_boundary,
+ offset + len - end_boundary);
+
+ } else {
+ /*
+ * It's either a sub-granularity range or the range spanned lies
+ * partially across two adjacent blocks.
+ */
+ error = xfs_iozero(ip, offset, len);
+ }
+
+out_unlock:
+ if (!(attr_flags & XFS_ATTR_NOLOCK))
+ xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+ return error;
+
+}
+
+/*
+ * xfs_change_file_space()
+ * This routine allocates or frees disk space for the given file.
+ * The user specified parameters are checked for alignment and size
+ * limitations.
+ *
+ * RETURNS:
+ * 0 on success
+ * errno on error
+ *
+ */
+int
+xfs_change_file_space(
+ xfs_inode_t *ip,
+ int cmd,
+ xfs_flock64_t *bf,
+ xfs_off_t offset,
+ int attr_flags)
+{
+ xfs_mount_t *mp = ip->i_mount;
+ int clrprealloc;
+ int error;
+ xfs_fsize_t fsize;
+ int setprealloc;
+ xfs_off_t startoffset;
+ xfs_trans_t *tp;
+ struct iattr iattr;
+
+ if (!S_ISREG(ip->i_d.di_mode))
+ return XFS_ERROR(EINVAL);
+
+ switch (bf->l_whence) {
+ case 0: /*SEEK_SET*/
+ break;
+ case 1: /*SEEK_CUR*/
+ bf->l_start += offset;
+ break;
+ case 2: /*SEEK_END*/
+ bf->l_start += XFS_ISIZE(ip);
+ break;
+ default:
+ return XFS_ERROR(EINVAL);
+ }
+
+ /*
+ * length of <= 0 for resv/unresv/zero is invalid. length for
+ * alloc/free is ignored completely and we have no idea what userspace
+ * might have set it to, so set it to zero to allow range
+ * checks to pass.
+ */
+ switch (cmd) {
+ case XFS_IOC_ZERO_RANGE:
+ case XFS_IOC_RESVSP:
+ case XFS_IOC_RESVSP64:
+ case XFS_IOC_UNRESVSP:
+ case XFS_IOC_UNRESVSP64:
+ if (bf->l_len <= 0)
+ return XFS_ERROR(EINVAL);
+ break;
+ default:
+ bf->l_len = 0;
+ break;
+ }
+
+ if (bf->l_start < 0 ||
+ bf->l_start > mp->m_super->s_maxbytes ||
+ bf->l_start + bf->l_len < 0 ||
+ bf->l_start + bf->l_len >= mp->m_super->s_maxbytes)
+ return XFS_ERROR(EINVAL);
+
+ bf->l_whence = 0;
+
+ startoffset = bf->l_start;
+ fsize = XFS_ISIZE(ip);
+
+ setprealloc = clrprealloc = 0;
+ switch (cmd) {
+ case XFS_IOC_ZERO_RANGE:
+ error = xfs_zero_file_space(ip, startoffset, bf->l_len,
+ attr_flags);
+ if (error)
+ return error;
+ setprealloc = 1;
+ break;
+
+ case XFS_IOC_RESVSP:
+ case XFS_IOC_RESVSP64:
+ error = xfs_alloc_file_space(ip, startoffset, bf->l_len,
+ XFS_BMAPI_PREALLOC, attr_flags);
+ if (error)
+ return error;
+ setprealloc = 1;
+ break;
+
+ case XFS_IOC_UNRESVSP:
+ case XFS_IOC_UNRESVSP64:
+ if ((error = xfs_free_file_space(ip, startoffset, bf->l_len,
+ attr_flags)))
+ return error;
+ break;
+
+ case XFS_IOC_ALLOCSP:
+ case XFS_IOC_ALLOCSP64:
+ case XFS_IOC_FREESP:
+ case XFS_IOC_FREESP64:
+ /*
+ * These operations actually do IO when extending the file, but
+ * the allocation is done seperately to the zeroing that is
+ * done. This set of operations need to be serialised against
+ * other IO operations, such as truncate and buffered IO. We
+ * need to take the IOLOCK here to serialise the allocation and
+ * zeroing IO to prevent other IOLOCK holders (e.g. getbmap,
+ * truncate, direct IO) from racing against the transient
+ * allocated but not written state we can have here.
+ */
+ xfs_ilock(ip, XFS_IOLOCK_EXCL);
+ if (startoffset > fsize) {
+ error = xfs_alloc_file_space(ip, fsize,
+ startoffset - fsize, 0,
+ attr_flags | XFS_ATTR_NOLOCK);
+ if (error) {
+ xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+ break;
+ }
+ }
+
+ iattr.ia_valid = ATTR_SIZE;
+ iattr.ia_size = startoffset;
+
+ error = xfs_setattr_size(ip, &iattr,
+ attr_flags | XFS_ATTR_NOLOCK);
+ xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+
+ if (error)
+ return error;
+
+ clrprealloc = 1;
+ break;
+
+ default:
+ ASSERT(0);
+ return XFS_ERROR(EINVAL);
+ }
+
+ /*
+ * update the inode timestamp, mode, and prealloc flag bits
+ */
+ tp = xfs_trans_alloc(mp, XFS_TRANS_WRITEID);
+
+ if ((error = xfs_trans_reserve(tp, 0, XFS_WRITEID_LOG_RES(mp),
+ 0, 0, 0))) {
+ /* ASSERT(0); */
+ xfs_trans_cancel(tp, 0);
+ return error;
+ }
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+ if ((attr_flags & XFS_ATTR_DMI) == 0) {
+ ip->i_d.di_mode &= ~S_ISUID;
+
+ /*
+ * Note that we don't have to worry about mandatory
+ * file locking being disabled here because we only
+ * clear the S_ISGID bit if the Group execute bit is
+ * on, but if it was on then mandatory locking wouldn't
+ * have been enabled.
+ */
+ if (ip->i_d.di_mode & S_IXGRP)
+ ip->i_d.di_mode &= ~S_ISGID;
+
+ xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
+ }
+ if (setprealloc)
+ ip->i_d.di_flags |= XFS_DIFLAG_PREALLOC;
+ else if (clrprealloc)
+ ip->i_d.di_flags &= ~XFS_DIFLAG_PREALLOC;
+
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ if (attr_flags & XFS_ATTR_SYNC)
+ xfs_trans_set_sync(tp);
+ return xfs_trans_commit(tp, 0);
+}
diff --git a/fs/xfs/xfs_extent_ops.h b/fs/xfs/xfs_extent_ops.h
new file mode 100644
index 0000000..a737640
--- /dev/null
+++ b/fs/xfs/xfs_extent_ops.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef _XFS_EXTENT_OPS_H
+#define _XFS_EXTENT_OPS_H 1
+
+int xfs_change_file_space(struct xfs_inode *ip, int cmd,
+ xfs_flock64_t *bf, xfs_off_t offset, int attr_flags);
+
+#endif /* _XFS_EXTENT_OPS_H */
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 16521f1..aefcc44 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -29,12 +29,12 @@
#include "xfs_inode_item.h"
#include "xfs_bmap.h"
#include "xfs_error.h"
-#include "xfs_vnodeops.h"
#include "xfs_da_btree.h"
#include "xfs_dir2_format.h"
#include "xfs_dir2.h"
#include "xfs_ioctl.h"
#include "xfs_trace.h"
+#include "xfs_extent_ops.h"
#include <linux/aio.h>
#include <linux/dcache.h>
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 96e344e..956399f 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -31,7 +31,6 @@
#include "xfs_dinode.h"
#include "xfs_error.h"
#include "xfs_filestream.h"
-#include "xfs_vnodeops.h"
#include "xfs_inode_item.h"
#include "xfs_quota.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 07fb510..e724464 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -43,7 +43,6 @@
#include "xfs_utils.h"
#include "xfs_quota.h"
#include "xfs_filestream.h"
-#include "xfs_vnodeops.h"
#include "xfs_cksum.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
diff --git a/fs/xfs/xfs_inode_ops.c b/fs/xfs/xfs_inode_ops.c
index 295a393..0699afa 100644
--- a/fs/xfs/xfs_inode_ops.c
+++ b/fs/xfs/xfs_inode_ops.c
@@ -23,14 +23,19 @@
#include "xfs_log.h"
#include "xfs_inum.h"
#include "xfs_trans.h"
+#include "xfs_trans_space.h"
#include "xfs_trans_priv.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
#include "xfs_bmap_btree.h"
#include "xfs_alloc_btree.h"
#include "xfs_ialloc_btree.h"
#include "xfs_attr_sf.h"
+#include "xfs_attr.h"
#include "xfs_dinode.h"
#include "xfs_inode.h"
#include "xfs_buf_item.h"
@@ -43,10 +48,10 @@
#include "xfs_utils.h"
#include "xfs_quota.h"
#include "xfs_filestream.h"
-#include "xfs_vnodeops.h"
#include "xfs_cksum.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
+#include "xfs_symlink.h"
/*
* Used in xfs_itruncate_extents(). This is the maximum number of extents
@@ -69,42 +74,6 @@ xfs_get_extsz_hint(
}
/*
- * Test whether it is appropriate to check an inode for and free post EOF
- * blocks. The 'force' parameter determines whether we should also consider
- * regular files that are marked preallocated or append-only.
- */
-bool
-xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
-{
- /* prealloc/delalloc exists only on regular files */
- if (!S_ISREG(ip->i_d.di_mode))
- return false;
-
- /*
- * Zero sized files with no cached pages and delalloc blocks will not
- * have speculative prealloc/delalloc blocks to remove.
- */
- if (VFS_I(ip)->i_size == 0 &&
- VN_CACHED(VFS_I(ip)) == 0 &&
- ip->i_delayed_blks == 0)
- return false;
-
- /* If we haven't read in the extent list, then don't do it now. */
- if (!(ip->i_df.if_flags & XFS_IFEXTENTS))
- return false;
-
- /*
- * Do not free real preallocated or append-only files unless the file
- * has delalloc blocks and we are forced to remove them.
- */
- if (ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
- if (!force || ip->i_delayed_blks == 0)
- return false;
-
- return true;
-}
-
-/*
* This is a wrapper routine around the xfs_ilock() routine used to centralize
* some grungy code. It is used in places that wish to lock the inode solely
* for reading the extents. The reason these places can't just call
@@ -338,6 +307,188 @@ xfs_isilocked(
}
#endif
+#ifdef DEBUG
+int xfs_locked_n;
+int xfs_small_retries;
+int xfs_middle_retries;
+int xfs_lots_retries;
+int xfs_lock_delays;
+#endif
+
+/*
+ * Bump the subclass so xfs_lock_inodes() acquires each lock with
+ * a different value
+ */
+static inline int
+xfs_lock_inumorder(int lock_mode, int subclass)
+{
+ if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
+ lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_IOLOCK_SHIFT;
+ if (lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))
+ lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_ILOCK_SHIFT;
+
+ return lock_mode;
+}
+
+/*
+ * The following routine will lock n inodes in exclusive mode.
+ * We assume the caller calls us with the inodes in i_ino order.
+ *
+ * We need to detect deadlock where an inode that we lock
+ * is in the AIL and we start waiting for another inode that is locked
+ * by a thread in a long running transaction (such as truncate). This can
+ * result in deadlock since the long running trans might need to wait
+ * for the inode we just locked in order to push the tail and free space
+ * in the log.
+ */
+void
+xfs_lock_inodes(
+ xfs_inode_t **ips,
+ int inodes,
+ uint lock_mode)
+{
+ int attempts = 0, i, j, try_lock;
+ xfs_log_item_t *lp;
+
+ ASSERT(ips && (inodes >= 2)); /* we need at least two */
+
+ try_lock = 0;
+ i = 0;
+
+again:
+ for (; i < inodes; i++) {
+ ASSERT(ips[i]);
+
+ if (i && (ips[i] == ips[i-1])) /* Already locked */
+ continue;
+
+ /*
+ * If try_lock is not set yet, make sure all locked inodes
+ * are not in the AIL.
+ * If any are, set try_lock to be used later.
+ */
+
+ if (!try_lock) {
+ for (j = (i - 1); j >= 0 && !try_lock; j--) {
+ lp = (xfs_log_item_t *)ips[j]->i_itemp;
+ if (lp && (lp->li_flags & XFS_LI_IN_AIL)) {
+ try_lock++;
+ }
+ }
+ }
+
+ /*
+ * If any of the previous locks we have locked is in the AIL,
+ * we must TRY to get the second and subsequent locks. If
+ * we can't get any, we must release all we have
+ * and try again.
+ */
+
+ if (try_lock) {
+ /* try_lock must be 0 if i is 0. */
+ /*
+ * try_lock means we have an inode locked
+ * that is in the AIL.
+ */
+ ASSERT(i != 0);
+ if (!xfs_ilock_nowait(ips[i], xfs_lock_inumorder(lock_mode, i))) {
+ attempts++;
+
+ /*
+ * Unlock all previous guys and try again.
+ * xfs_iunlock will try to push the tail
+ * if the inode is in the AIL.
+ */
+
+ for(j = i - 1; j >= 0; j--) {
+
+ /*
+ * Check to see if we've already
+ * unlocked this one.
+ * Not the first one going back,
+ * and the inode ptr is the same.
+ */
+ if ((j != (i - 1)) && ips[j] ==
+ ips[j+1])
+ continue;
+
+ xfs_iunlock(ips[j], lock_mode);
+ }
+
+ if ((attempts % 5) == 0) {
+ delay(1); /* Don't just spin the CPU */
+#ifdef DEBUG
+ xfs_lock_delays++;
+#endif
+ }
+ i = 0;
+ try_lock = 0;
+ goto again;
+ }
+ } else {
+ xfs_ilock(ips[i], xfs_lock_inumorder(lock_mode, i));
+ }
+ }
+
+#ifdef DEBUG
+ if (attempts) {
+ if (attempts < 5) xfs_small_retries++;
+ else if (attempts < 100) xfs_middle_retries++;
+ else xfs_lots_retries++;
+ } else {
+ xfs_locked_n++;
+ }
+#endif
+}
+
+/*
+ * xfs_lock_two_inodes() can only be used to lock one type of lock
+ * at a time - the iolock or the ilock, but not both at once. If
+ * we lock both at once, lockdep will report false positives saying
+ * we have violated locking orders.
+ */
+void
+xfs_lock_two_inodes(
+ xfs_inode_t *ip0,
+ xfs_inode_t *ip1,
+ uint lock_mode)
+{
+ xfs_inode_t *temp;
+ int attempts = 0;
+ xfs_log_item_t *lp;
+
+ if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
+ ASSERT((lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)) == 0);
+ ASSERT(ip0->i_ino != ip1->i_ino);
+
+ if (ip0->i_ino > ip1->i_ino) {
+ temp = ip0;
+ ip0 = ip1;
+ ip1 = temp;
+ }
+
+ again:
+ xfs_ilock(ip0, xfs_lock_inumorder(lock_mode, 0));
+
+ /*
+ * If the first lock we have locked is in the AIL, we must TRY to get
+ * the second lock. If we can't get it, we must release the first one
+ * and try again.
+ */
+ lp = (xfs_log_item_t *)ip0->i_itemp;
+ if (lp && (lp->li_flags & XFS_LI_IN_AIL)) {
+ if (!xfs_ilock_nowait(ip1, xfs_lock_inumorder(lock_mode, 1))) {
+ xfs_iunlock(ip0, lock_mode);
+ if ((++attempts % 5) == 0)
+ delay(1); /* Don't just spin the CPU */
+ goto again;
+ }
+ } else {
+ xfs_ilock(ip1, xfs_lock_inumorder(lock_mode, 1));
+ }
+}
+
+
void
__xfs_iflock(
struct xfs_inode *ip)
@@ -456,6 +607,49 @@ xfs_dic2xflags(
}
/*
+ * Lookups up an inode from "name". If ci_name is not NULL, then a CI match
+ * is allowed, otherwise it has to be an exact match. If a CI match is found,
+ * ci_name->name will point to a the actual name (caller must free) or
+ * will be set to NULL if an exact match is found.
+ */
+int
+xfs_lookup(
+ xfs_inode_t *dp,
+ struct xfs_name *name,
+ xfs_inode_t **ipp,
+ struct xfs_name *ci_name)
+{
+ xfs_ino_t inum;
+ int error;
+ uint lock_mode;
+
+ trace_xfs_lookup(dp, name);
+
+ if (XFS_FORCED_SHUTDOWN(dp->i_mount))
+ return XFS_ERROR(EIO);
+
+ lock_mode = xfs_ilock_map_shared(dp);
+ error = xfs_dir_lookup(NULL, dp, name, &inum, ci_name);
+ xfs_iunlock_map_shared(dp, lock_mode);
+
+ if (error)
+ goto out;
+
+ error = xfs_iget(dp->i_mount, NULL, inum, 0, 0, ipp);
+ if (error)
+ goto out_free_name;
+
+ return 0;
+
+out_free_name:
+ if (ci_name)
+ kmem_free(ci_name->name);
+out:
+ *ipp = NULL;
+ return error;
+}
+
+/*
* Allocate an inode on disk and return a copy of its in-core version. The
* in-core inode is locked exclusively. Set mode, nlink, and rdev appropriately
* within the inode. The uid and gid for the inode are set according to the
@@ -705,6 +899,303 @@ xfs_ialloc(
return 0;
}
+int
+xfs_create(
+ xfs_inode_t *dp,
+ struct xfs_name *name,
+ umode_t mode,
+ xfs_dev_t rdev,
+ xfs_inode_t **ipp)
+{
+ int is_dir = S_ISDIR(mode);
+ struct xfs_mount *mp = dp->i_mount;
+ struct xfs_inode *ip = NULL;
+ struct xfs_trans *tp = NULL;
+ int error;
+ xfs_bmap_free_t free_list;
+ xfs_fsblock_t first_block;
+ bool unlock_dp_on_error = false;
+ uint cancel_flags;
+ int committed;
+ prid_t prid;
+ struct xfs_dquot *udqp = NULL;
+ struct xfs_dquot *gdqp = NULL;
+ uint resblks;
+ uint log_res;
+ uint log_count;
+
+ trace_xfs_create(dp, name);
+
+ if (XFS_FORCED_SHUTDOWN(mp))
+ return XFS_ERROR(EIO);
+
+ if (dp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT)
+ prid = xfs_get_projid(dp);
+ else
+ prid = XFS_PROJID_DEFAULT;
+
+ /*
+ * Make sure that we have allocated dquot(s) on disk.
+ */
+ error = xfs_qm_vop_dqalloc(dp, current_fsuid(), current_fsgid(), prid,
+ XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, &udqp, &gdqp);
+ if (error)
+ return error;
+
+ if (is_dir) {
+ rdev = 0;
+ resblks = XFS_MKDIR_SPACE_RES(mp, name->len);
+ log_res = XFS_MKDIR_LOG_RES(mp);
+ log_count = XFS_MKDIR_LOG_COUNT;
+ tp = xfs_trans_alloc(mp, XFS_TRANS_MKDIR);
+ } else {
+ resblks = XFS_CREATE_SPACE_RES(mp, name->len);
+ log_res = XFS_CREATE_LOG_RES(mp);
+ log_count = XFS_CREATE_LOG_COUNT;
+ tp = xfs_trans_alloc(mp, XFS_TRANS_CREATE);
+ }
+
+ cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
+
+ /*
+ * Initially assume that the file does not exist and
+ * reserve the resources for that case. If that is not
+ * the case we'll drop the one we have and get a more
+ * appropriate transaction later.
+ */
+ error = xfs_trans_reserve(tp, resblks, log_res, 0,
+ XFS_TRANS_PERM_LOG_RES, log_count);
+ if (error == ENOSPC) {
+ /* flush outstanding delalloc blocks and retry */
+ xfs_flush_inodes(mp);
+ error = xfs_trans_reserve(tp, resblks, log_res, 0,
+ XFS_TRANS_PERM_LOG_RES, log_count);
+ }
+ if (error == ENOSPC) {
+ /* No space at all so try a "no-allocation" reservation */
+ resblks = 0;
+ error = xfs_trans_reserve(tp, 0, log_res, 0,
+ XFS_TRANS_PERM_LOG_RES, log_count);
+ }
+ if (error) {
+ cancel_flags = 0;
+ goto out_trans_cancel;
+ }
+
+ xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
+ unlock_dp_on_error = true;
+
+ xfs_bmap_init(&free_list, &first_block);
+
+ /*
+ * Reserve disk quota and the inode.
+ */
+ error = xfs_trans_reserve_quota(tp, mp, udqp, gdqp, resblks, 1, 0);
+ if (error)
+ goto out_trans_cancel;
+
+ error = xfs_dir_canenter(tp, dp, name, resblks);
+ if (error)
+ goto out_trans_cancel;
+
+ /*
+ * A newly created regular or special file just has one directory
+ * entry pointing to them, but a directory also the "." entry
+ * pointing to itself.
+ */
+ error = xfs_dir_ialloc(&tp, dp, mode, is_dir ? 2 : 1, rdev,
+ prid, resblks > 0, &ip, &committed);
+ if (error) {
+ if (error == ENOSPC)
+ goto out_trans_cancel;
+ goto out_trans_abort;
+ }
+
+ /*
+ * Now we join the directory inode to the transaction. We do not do it
+ * earlier because xfs_dir_ialloc might commit the previous transaction
+ * (and release all the locks). An error from here on will result in
+ * the transaction cancel unlocking dp so don't do it explicitly in the
+ * error path.
+ */
+ xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
+ unlock_dp_on_error = false;
+
+ error = xfs_dir_createname(tp, dp, name, ip->i_ino,
+ &first_block, &free_list, resblks ?
+ resblks - XFS_IALLOC_SPACE_RES(mp) : 0);
+ if (error) {
+ ASSERT(error != ENOSPC);
+ goto out_trans_abort;
+ }
+ xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
+ xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
+
+ if (is_dir) {
+ error = xfs_dir_init(tp, ip, dp);
+ if (error)
+ goto out_bmap_cancel;
+
+ error = xfs_bumplink(tp, dp);
+ if (error)
+ goto out_bmap_cancel;
+ }
+
+ /*
+ * If this is a synchronous mount, make sure that the
+ * create transaction goes to disk before returning to
+ * the user.
+ */
+ if (mp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
+ xfs_trans_set_sync(tp);
+
+ /*
+ * Attach the dquot(s) to the inodes and modify them incore.
+ * These ids of the inode couldn't have changed since the new
+ * inode has been locked ever since it was created.
+ */
+ xfs_qm_vop_create_dqattach(tp, ip, udqp, gdqp);
+
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error)
+ goto out_bmap_cancel;
+
+ error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+ if (error)
+ goto out_release_inode;
+
+ xfs_qm_dqrele(udqp);
+ xfs_qm_dqrele(gdqp);
+
+ *ipp = ip;
+ return 0;
+
+ out_bmap_cancel:
+ xfs_bmap_cancel(&free_list);
+ out_trans_abort:
+ cancel_flags |= XFS_TRANS_ABORT;
+ out_trans_cancel:
+ xfs_trans_cancel(tp, cancel_flags);
+ out_release_inode:
+ /*
+ * Wait until after the current transaction is aborted to
+ * release the inode. This prevents recursive transactions
+ * and deadlocks from xfs_inactive.
+ */
+ if (ip)
+ IRELE(ip);
+
+ xfs_qm_dqrele(udqp);
+ xfs_qm_dqrele(gdqp);
+
+ if (unlock_dp_on_error)
+ xfs_iunlock(dp, XFS_ILOCK_EXCL);
+ return error;
+}
+
+int
+xfs_link(
+ xfs_inode_t *tdp,
+ xfs_inode_t *sip,
+ struct xfs_name *target_name)
+{
+ xfs_mount_t *mp = tdp->i_mount;
+ xfs_trans_t *tp;
+ int error;
+ xfs_bmap_free_t free_list;
+ xfs_fsblock_t first_block;
+ int cancel_flags;
+ int committed;
+ int resblks;
+
+ trace_xfs_link(tdp, target_name);
+
+ ASSERT(!S_ISDIR(sip->i_d.di_mode));
+
+ if (XFS_FORCED_SHUTDOWN(mp))
+ return XFS_ERROR(EIO);
+
+ error = xfs_qm_dqattach(sip, 0);
+ if (error)
+ goto std_return;
+
+ error = xfs_qm_dqattach(tdp, 0);
+ if (error)
+ goto std_return;
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_LINK);
+ cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
+ resblks = XFS_LINK_SPACE_RES(mp, target_name->len);
+ error = xfs_trans_reserve(tp, resblks, XFS_LINK_LOG_RES(mp), 0,
+ XFS_TRANS_PERM_LOG_RES, XFS_LINK_LOG_COUNT);
+ if (error == ENOSPC) {
+ resblks = 0;
+ error = xfs_trans_reserve(tp, 0, XFS_LINK_LOG_RES(mp), 0,
+ XFS_TRANS_PERM_LOG_RES, XFS_LINK_LOG_COUNT);
+ }
+ if (error) {
+ cancel_flags = 0;
+ goto error_return;
+ }
+
+ xfs_lock_two_inodes(sip, tdp, XFS_ILOCK_EXCL);
+
+ xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
+
+ /*
+ * If we are using project inheritance, we only allow hard link
+ * creation in our tree when the project IDs are the same; else
+ * the tree quota mechanism could be circumvented.
+ */
+ if (unlikely((tdp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT) &&
+ (xfs_get_projid(tdp) != xfs_get_projid(sip)))) {
+ error = XFS_ERROR(EXDEV);
+ goto error_return;
+ }
+
+ error = xfs_dir_canenter(tp, tdp, target_name, resblks);
+ if (error)
+ goto error_return;
+
+ xfs_bmap_init(&free_list, &first_block);
+
+ error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
+ &first_block, &free_list, resblks);
+ if (error)
+ goto abort_return;
+ xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
+ xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE);
+
+ error = xfs_bumplink(tp, sip);
+ if (error)
+ goto abort_return;
+
+ /*
+ * If this is a synchronous mount, make sure that the
+ * link transaction goes to disk before returning to
+ * the user.
+ */
+ if (mp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC)) {
+ xfs_trans_set_sync(tp);
+ }
+
+ error = xfs_bmap_finish (&tp, &free_list, &committed);
+ if (error) {
+ xfs_bmap_cancel(&free_list);
+ goto abort_return;
+ }
+
+ return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+
+ abort_return:
+ cancel_flags |= XFS_TRANS_ABORT;
+ error_return:
+ xfs_trans_cancel(tp, cancel_flags);
+ std_return:
+ return error;
+}
+
/*
* Free up the underlying blocks past new_size. The new size must be smaller
* than the current size. This routine can be used both for the attribute and
@@ -845,6 +1336,424 @@ out_bmap_cancel:
}
/*
+ * Test whether it is appropriate to check an inode for and free post EOF
+ * blocks. The 'force' parameter determines whether we should also consider
+ * regular files that are marked preallocated or append-only.
+ */
+bool
+xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
+{
+ /* prealloc/delalloc exists only on regular files */
+ if (!S_ISREG(ip->i_d.di_mode))
+ return false;
+
+ /*
+ * Zero sized files with no cached pages and delalloc blocks will not
+ * have speculative prealloc/delalloc blocks to remove.
+ */
+ if (VFS_I(ip)->i_size == 0 &&
+ VN_CACHED(VFS_I(ip)) == 0 &&
+ ip->i_delayed_blks == 0)
+ return false;
+
+ /* If we haven't read in the extent list, then don't do it now. */
+ if (!(ip->i_df.if_flags & XFS_IFEXTENTS))
+ return false;
+
+ /*
+ * Do not free real preallocated or append-only files unless the file
+ * has delalloc blocks and we are forced to remove them.
+ */
+ if (ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
+ if (!force || ip->i_delayed_blks == 0)
+ return false;
+
+ return true;
+}
+
+/*
+ * This is called by xfs_inactive to free any blocks beyond eof
+ * when the link count isn't zero and by xfs_dm_punch_hole() when
+ * punching a hole to EOF.
+ */
+int
+xfs_free_eofblocks(
+ xfs_mount_t *mp,
+ xfs_inode_t *ip,
+ bool need_iolock)
+{
+ xfs_trans_t *tp;
+ int error;
+ xfs_fileoff_t end_fsb;
+ xfs_fileoff_t last_fsb;
+ xfs_filblks_t map_len;
+ int nimaps;
+ xfs_bmbt_irec_t imap;
+
+ /*
+ * Figure out if there are any blocks beyond the end
+ * of the file. If not, then there is nothing to do.
+ */
+ end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
+ last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
+ if (last_fsb <= end_fsb)
+ return 0;
+ map_len = last_fsb - end_fsb;
+
+ nimaps = 1;
+ xfs_ilock(ip, XFS_ILOCK_SHARED);
+ error = xfs_bmapi_read(ip, end_fsb, map_len, &imap, &nimaps, 0);
+ xfs_iunlock(ip, XFS_ILOCK_SHARED);
+
+ if (!error && (nimaps != 0) &&
+ (imap.br_startblock != HOLESTARTBLOCK ||
+ ip->i_delayed_blks)) {
+ /*
+ * Attach the dquots to the inode up front.
+ */
+ error = xfs_qm_dqattach(ip, 0);
+ if (error)
+ return error;
+
+ /*
+ * There are blocks after the end of file.
+ * Free them up now by truncating the file to
+ * its current size.
+ */
+ tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
+
+ if (need_iolock) {
+ if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
+ xfs_trans_cancel(tp, 0);
+ return EAGAIN;
+ }
+ }
+
+ error = xfs_trans_reserve(tp, 0,
+ XFS_ITRUNCATE_LOG_RES(mp),
+ 0, XFS_TRANS_PERM_LOG_RES,
+ XFS_ITRUNCATE_LOG_COUNT);
+ if (error) {
+ ASSERT(XFS_FORCED_SHUTDOWN(mp));
+ xfs_trans_cancel(tp, 0);
+ if (need_iolock)
+ xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+ return error;
+ }
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, 0);
+
+ /*
+ * Do not update the on-disk file size. If we update the
+ * on-disk file size and then the system crashes before the
+ * contents of the file are flushed to disk then the files
+ * may be full of holes (ie NULL files bug).
+ */
+ error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK,
+ XFS_ISIZE(ip));
+ if (error) {
+ /*
+ * If we get an error at this point we simply don't
+ * bother truncating the file.
+ */
+ xfs_trans_cancel(tp,
+ (XFS_TRANS_RELEASE_LOG_RES |
+ XFS_TRANS_ABORT));
+ } else {
+ error = xfs_trans_commit(tp,
+ XFS_TRANS_RELEASE_LOG_RES);
+ if (!error)
+ xfs_inode_clear_eofblocks_tag(ip);
+ }
+
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ if (need_iolock)
+ xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+ }
+ return error;
+}
+
+int
+xfs_release(
+ xfs_inode_t *ip)
+{
+ xfs_mount_t *mp = ip->i_mount;
+ int error;
+
+ if (!S_ISREG(ip->i_d.di_mode) || (ip->i_d.di_mode == 0))
+ return 0;
+
+ /* If this is a read-only mount, don't do this (would generate I/O) */
+ if (mp->m_flags & XFS_MOUNT_RDONLY)
+ return 0;
+
+ if (!XFS_FORCED_SHUTDOWN(mp)) {
+ int truncated;
+
+ /*
+ * If we are using filestreams, and we have an unlinked
+ * file that we are processing the last close on, then nothing
+ * will be able to reopen and write to this file. Purge this
+ * inode from the filestreams cache so that it doesn't delay
+ * teardown of the inode.
+ */
+ if ((ip->i_d.di_nlink == 0) && xfs_inode_is_filestream(ip))
+ xfs_filestream_deassociate(ip);
+
+ /*
+ * If we previously truncated this file and removed old data
+ * in the process, we want to initiate "early" writeout on
+ * the last close. This is an attempt to combat the notorious
+ * NULL files problem which is particularly noticeable from a
+ * truncate down, buffered (re-)write (delalloc), followed by
+ * a crash. What we are effectively doing here is
+ * significantly reducing the time window where we'd otherwise
+ * be exposed to that problem.
+ */
+ truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED);
+ if (truncated) {
+ xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
+ if (VN_DIRTY(VFS_I(ip)) && ip->i_delayed_blks > 0) {
+ error = -filemap_flush(VFS_I(ip)->i_mapping);
+ if (error)
+ return error;
+ }
+ }
+ }
+
+ if (ip->i_d.di_nlink == 0)
+ return 0;
+
+ if (xfs_can_free_eofblocks(ip, false)) {
+
+ /*
+ * If we can't get the iolock just skip truncating the blocks
+ * past EOF because we could deadlock with the mmap_sem
+ * otherwise. We'll get another chance to drop them once the
+ * last reference to the inode is dropped, so we'll never leak
+ * blocks permanently.
+ *
+ * Further, check if the inode is being opened, written and
+ * closed frequently and we have delayed allocation blocks
+ * outstanding (e.g. streaming writes from the NFS server),
+ * truncating the blocks past EOF will cause fragmentation to
+ * occur.
+ *
+ * In this case don't do the truncation, either, but we have to
+ * be careful how we detect this case. Blocks beyond EOF show
+ * up as i_delayed_blks even when the inode is clean, so we
+ * need to truncate them away first before checking for a dirty
+ * release. Hence on the first dirty close we will still remove
+ * the speculative allocation, but after that we will leave it
+ * in place.
+ */
+ if (xfs_iflags_test(ip, XFS_IDIRTY_RELEASE))
+ return 0;
+
+ error = xfs_free_eofblocks(mp, ip, true);
+ if (error && error != EAGAIN)
+ return error;
+
+ /* delalloc blocks after truncation means it really is dirty */
+ if (ip->i_delayed_blks)
+ xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);
+ }
+ return 0;
+}
+
+/*
+ * xfs_inactive
+ *
+ * This is called when the vnode reference count for the vnode
+ * goes to zero. If the file has been unlinked, then it must
+ * now be truncated. Also, we clear all of the read-ahead state
+ * kept for the inode here since the file is now closed.
+ */
+int
+xfs_inactive(
+ xfs_inode_t *ip)
+{
+ xfs_bmap_free_t free_list;
+ xfs_fsblock_t first_block;
+ int committed;
+ xfs_trans_t *tp;
+ xfs_mount_t *mp;
+ int error;
+ int truncate = 0;
+
+ /*
+ * If the inode is already free, then there can be nothing
+ * to clean up here.
+ */
+ if (ip->i_d.di_mode == 0 || is_bad_inode(VFS_I(ip))) {
+ ASSERT(ip->i_df.if_real_bytes == 0);
+ ASSERT(ip->i_df.if_broot_bytes == 0);
+ return VN_INACTIVE_CACHE;
+ }
+
+ mp = ip->i_mount;
+
+ error = 0;
+
+ /* If this is a read-only mount, don't do this (would generate I/O) */
+ if (mp->m_flags & XFS_MOUNT_RDONLY)
+ goto out;
+
+ if (ip->i_d.di_nlink != 0) {
+ /*
+ * force is true because we are evicting an inode from the
+ * cache. Post-eof blocks must be freed, lest we end up with
+ * broken free space accounting.
+ */
+ if (xfs_can_free_eofblocks(ip, true)) {
+ error = xfs_free_eofblocks(mp, ip, false);
+ if (error)
+ return VN_INACTIVE_CACHE;
+ }
+ goto out;
+ }
+
+ if (S_ISREG(ip->i_d.di_mode) &&
+ (ip->i_d.di_size != 0 || XFS_ISIZE(ip) != 0 ||
+ ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0))
+ truncate = 1;
+
+ error = xfs_qm_dqattach(ip, 0);
+ if (error)
+ return VN_INACTIVE_CACHE;
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
+ error = xfs_trans_reserve(tp, 0,
+ (truncate || S_ISLNK(ip->i_d.di_mode)) ?
+ XFS_ITRUNCATE_LOG_RES(mp) :
+ XFS_IFREE_LOG_RES(mp),
+ 0,
+ XFS_TRANS_PERM_LOG_RES,
+ XFS_ITRUNCATE_LOG_COUNT);
+ if (error) {
+ ASSERT(XFS_FORCED_SHUTDOWN(mp));
+ xfs_trans_cancel(tp, 0);
+ return VN_INACTIVE_CACHE;
+ }
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, 0);
+
+ if (S_ISLNK(ip->i_d.di_mode)) {
+ /*
+ * Zero length symlinks _can_ exist.
+ */
+ if (ip->i_d.di_size > XFS_IFORK_DSIZE(ip)) {
+ error = xfs_inactive_symlink_rmt(ip, &tp);
+ if (error)
+ goto out_cancel;
+ } else if (ip->i_df.if_bytes > 0) {
+ xfs_idata_realloc(ip, -(ip->i_df.if_bytes),
+ XFS_DATA_FORK);
+ ASSERT(ip->i_df.if_bytes == 0);
+ }
+ } else if (truncate) {
+ ip->i_d.di_size = 0;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+ error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
+ if (error)
+ goto out_cancel;
+
+ ASSERT(ip->i_d.di_nextents == 0);
+ }
+
+ /*
+ * If there are attributes associated with the file then blow them away
+ * now. The code calls a routine that recursively deconstructs the
+ * attribute fork. We need to just commit the current transaction
+ * because we can't use it for xfs_attr_inactive().
+ */
+ if (ip->i_d.di_anextents > 0) {
+ ASSERT(ip->i_d.di_forkoff != 0);
+
+ error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+ if (error)
+ goto out_unlock;
+
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+ error = xfs_attr_inactive(ip);
+ if (error)
+ goto out;
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
+ error = xfs_trans_reserve(tp, 0,
+ XFS_IFREE_LOG_RES(mp),
+ 0, XFS_TRANS_PERM_LOG_RES,
+ XFS_INACTIVE_LOG_COUNT);
+ if (error) {
+ xfs_trans_cancel(tp, 0);
+ goto out;
+ }
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, 0);
+ }
+
+ if (ip->i_afp)
+ xfs_idestroy_fork(ip, XFS_ATTR_FORK);
+
+ ASSERT(ip->i_d.di_anextents == 0);
+
+ /*
+ * Free the inode.
+ */
+ xfs_bmap_init(&free_list, &first_block);
+ error = xfs_ifree(tp, ip, &free_list);
+ if (error) {
+ /*
+ * If we fail to free the inode, shut down. The cancel
+ * might do that, we need to make sure. Otherwise the
+ * inode might be lost for a long time or forever.
+ */
+ if (!XFS_FORCED_SHUTDOWN(mp)) {
+ xfs_notice(mp, "%s: xfs_ifree returned error %d",
+ __func__, error);
+ xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
+ }
+ xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_ABORT);
+ } else {
+ /*
+ * Credit the quota account(s). The inode is gone.
+ */
+ xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_ICOUNT, -1);
+
+ /*
+ * Just ignore errors at this point. There is nothing we can
+ * do except to try to keep going. Make sure it's not a silent
+ * error.
+ */
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error)
+ xfs_notice(mp, "%s: xfs_bmap_finish returned error %d",
+ __func__, error);
+ error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+ if (error)
+ xfs_notice(mp, "%s: xfs_trans_commit returned error %d",
+ __func__, error);
+ }
+
+ /*
+ * Release the dquots held by inode, if any.
+ */
+ xfs_qm_dqdetach(ip);
+out_unlock:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+out:
+ return VN_INACTIVE_CACHE;
+out_cancel:
+ xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
+ goto out_unlock;
+}
+
+/*
* This is called when the inode's link count goes to 0.
* We place the on-disk inode on a list in the AGI. It
* will be pulled from this list when the inode is freed.
@@ -1332,6 +2241,170 @@ xfs_ifree(
return error;
}
+int
+xfs_remove(
+ xfs_inode_t *dp,
+ struct xfs_name *name,
+ xfs_inode_t *ip)
+{
+ xfs_mount_t *mp = dp->i_mount;
+ xfs_trans_t *tp = NULL;
+ int is_dir = S_ISDIR(ip->i_d.di_mode);
+ int error = 0;
+ xfs_bmap_free_t free_list;
+ xfs_fsblock_t first_block;
+ int cancel_flags;
+ int committed;
+ int link_zero;
+ uint resblks;
+ uint log_count;
+
+ trace_xfs_remove(dp, name);
+
+ if (XFS_FORCED_SHUTDOWN(mp))
+ return XFS_ERROR(EIO);
+
+ error = xfs_qm_dqattach(dp, 0);
+ if (error)
+ goto std_return;
+
+ error = xfs_qm_dqattach(ip, 0);
+ if (error)
+ goto std_return;
+
+ if (is_dir) {
+ tp = xfs_trans_alloc(mp, XFS_TRANS_RMDIR);
+ log_count = XFS_DEFAULT_LOG_COUNT;
+ } else {
+ tp = xfs_trans_alloc(mp, XFS_TRANS_REMOVE);
+ log_count = XFS_REMOVE_LOG_COUNT;
+ }
+ cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
+
+ /*
+ * We try to get the real space reservation first,
+ * allowing for directory btree deletion(s) implying
+ * possible bmap insert(s). If we can't get the space
+ * reservation then we use 0 instead, and avoid the bmap
+ * btree insert(s) in the directory code by, if the bmap
+ * insert tries to happen, instead trimming the LAST
+ * block from the directory.
+ */
+ resblks = XFS_REMOVE_SPACE_RES(mp);
+ error = xfs_trans_reserve(tp, resblks, XFS_REMOVE_LOG_RES(mp), 0,
+ XFS_TRANS_PERM_LOG_RES, log_count);
+ if (error == ENOSPC) {
+ resblks = 0;
+ error = xfs_trans_reserve(tp, 0, XFS_REMOVE_LOG_RES(mp), 0,
+ XFS_TRANS_PERM_LOG_RES, log_count);
+ }
+ if (error) {
+ ASSERT(error != ENOSPC);
+ cancel_flags = 0;
+ goto out_trans_cancel;
+ }
+
+ xfs_lock_two_inodes(dp, ip, XFS_ILOCK_EXCL);
+
+ xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+ /*
+ * If we're removing a directory perform some additional validation.
+ */
+ if (is_dir) {
+ ASSERT(ip->i_d.di_nlink >= 2);
+ if (ip->i_d.di_nlink != 2) {
+ error = XFS_ERROR(ENOTEMPTY);
+ goto out_trans_cancel;
+ }
+ if (!xfs_dir_isempty(ip)) {
+ error = XFS_ERROR(ENOTEMPTY);
+ goto out_trans_cancel;
+ }
+ }
+
+ xfs_bmap_init(&free_list, &first_block);
+ error = xfs_dir_removename(tp, dp, name, ip->i_ino,
+ &first_block, &free_list, resblks);
+ if (error) {
+ ASSERT(error != ENOENT);
+ goto out_bmap_cancel;
+ }
+ xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
+
+ if (is_dir) {
+ /*
+ * Drop the link from ip's "..".
+ */
+ error = xfs_droplink(tp, dp);
+ if (error)
+ goto out_bmap_cancel;
+
+ /*
+ * Drop the "." link from ip to self.
+ */
+ error = xfs_droplink(tp, ip);
+ if (error)
+ goto out_bmap_cancel;
+ } else {
+ /*
+ * When removing a non-directory we need to log the parent
+ * inode here. For a directory this is done implicitly
+ * by the xfs_droplink call for the ".." entry.
+ */
+ xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
+ }
+
+ /*
+ * Drop the link from dp to ip.
+ */
+ error = xfs_droplink(tp, ip);
+ if (error)
+ goto out_bmap_cancel;
+
+ /*
+ * Determine if this is the last link while
+ * we are in the transaction.
+ */
+ link_zero = (ip->i_d.di_nlink == 0);
+
+ /*
+ * If this is a synchronous mount, make sure that the
+ * remove transaction goes to disk before returning to
+ * the user.
+ */
+ if (mp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
+ xfs_trans_set_sync(tp);
+
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error)
+ goto out_bmap_cancel;
+
+ error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+ if (error)
+ goto std_return;
+
+ /*
+ * If we are using filestreams, kill the stream association.
+ * If the file is still open it may get a new one but that
+ * will get killed on last close in xfs_close() so we don't
+ * have to worry about that.
+ */
+ if (!is_dir && link_zero && xfs_inode_is_filestream(ip))
+ xfs_filestream_deassociate(ip);
+
+ return 0;
+
+ out_bmap_cancel:
+ xfs_bmap_cancel(&free_list);
+ cancel_flags |= XFS_TRANS_ABORT;
+ out_trans_cancel:
+ xfs_trans_cancel(tp, cancel_flags);
+ std_return:
+ return error;
+}
+
STATIC int
xfs_iflush_int(
struct xfs_inode *ip,
diff --git a/fs/xfs/xfs_inode_ops.h b/fs/xfs/xfs_inode_ops.h
index 8556f5a..f4e6bf6 100644
--- a/fs/xfs/xfs_inode_ops.h
+++ b/fs/xfs/xfs_inode_ops.h
@@ -307,9 +307,22 @@ static inline int xfs_isiflocked(struct xfs_inode *ip)
((pip)->i_d.di_mode & S_ISGID))
-/*
- * xfs_inode.c prototypes.
- */
+int xfs_release(struct xfs_inode *ip);
+int xfs_inactive(struct xfs_inode *ip);
+int xfs_lookup(struct xfs_inode *dp, struct xfs_name *name,
+ struct xfs_inode **ipp, struct xfs_name *ci_name);
+int xfs_create(struct xfs_inode *dp, struct xfs_name *name,
+ umode_t mode, xfs_dev_t rdev, struct xfs_inode **ipp);
+int xfs_remove(struct xfs_inode *dp, struct xfs_name *name,
+ struct xfs_inode *ip);
+int xfs_link(struct xfs_inode *tdp, struct xfs_inode *sip,
+ struct xfs_name *target_name);
+int xfs_rename(struct xfs_inode *src_dp, struct xfs_name *src_name,
+ struct xfs_inode *src_ip, struct xfs_inode *target_dp,
+ struct xfs_name *target_name,
+ struct xfs_inode *target_ip);
+int xfs_free_eofblocks(struct xfs_mount *, struct xfs_inode *, bool);
+
void xfs_ilock(xfs_inode_t *, uint);
int xfs_ilock_nowait(xfs_inode_t *, uint);
void xfs_iunlock(xfs_inode_t *, uint);
@@ -337,6 +350,11 @@ void xfs_lock_two_inodes(xfs_inode_t *, xfs_inode_t *, uint);
xfs_extlen_t xfs_get_extsz_hint(struct xfs_inode *ip);
+/* from xfs_file.c */
+int xfs_zero_eof(struct xfs_inode *, xfs_off_t, xfs_fsize_t);
+int xfs_iozero(struct xfs_inode *, loff_t, size_t);
+
+
#define IHOLD(ip) \
do { \
ASSERT(atomic_read(&VFS_I(ip)->i_count) > 0) ; \
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 5e99968..633be2e 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -36,13 +36,14 @@
#include "xfs_utils.h"
#include "xfs_dfrag.h"
#include "xfs_fsops.h"
-#include "xfs_vnodeops.h"
#include "xfs_discard.h"
#include "xfs_quota.h"
#include "xfs_inode_item.h"
#include "xfs_export.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
+#include "xfs_symlink.h"
+#include "xfs_extent_ops.h"
#include <linux/capability.h>
#include <linux/dcache.h>
@@ -350,6 +351,40 @@ xfs_readlink_by_handle(
return error;
}
+int
+xfs_set_dmattrs(
+ xfs_inode_t *ip,
+ u_int evmask,
+ u_int16_t state)
+{
+ xfs_mount_t *mp = ip->i_mount;
+ xfs_trans_t *tp;
+ int error;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return XFS_ERROR(EPERM);
+
+ if (XFS_FORCED_SHUTDOWN(mp))
+ return XFS_ERROR(EIO);
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_SET_DMATTRS);
+ error = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES (mp), 0, 0, 0);
+ if (error) {
+ xfs_trans_cancel(tp, 0);
+ return error;
+ }
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+ ip->i_d.di_dmevmask = evmask;
+ ip->i_d.di_dmstate = state;
+
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ error = xfs_trans_commit(tp, 0);
+
+ return error;
+}
+
STATIC int
xfs_fssetdm_by_handle(
struct file *parfilp,
diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h
index d56173b..1233dee 100644
--- a/fs/xfs/xfs_ioctl.h
+++ b/fs/xfs/xfs_ioctl.h
@@ -82,4 +82,10 @@ xfs_file_compat_ioctl(
unsigned int cmd,
unsigned long arg);
+extern int
+xfs_set_dmattrs(
+ struct xfs_inode *ip,
+ u_int evmask,
+ u_int16_t state);
+
#endif
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index c0c6625..55a3072 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -34,7 +34,6 @@
#include "xfs_itable.h"
#include "xfs_error.h"
#include "xfs_dfrag.h"
-#include "xfs_vnodeops.h"
#include "xfs_fsops.h"
#include "xfs_alloc.h"
#include "xfs_rtalloc.h"
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ca9ecaa..7f44efa 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -35,10 +35,10 @@
#include "xfs_attr.h"
#include "xfs_buf_item.h"
#include "xfs_utils.h"
-#include "xfs_vnodeops.h"
#include "xfs_inode_item.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
+#include "xfs_symlink.h"
#include <linux/capability.h>
#include <linux/xattr.h>
diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h
index ef41c92..d81fb41 100644
--- a/fs/xfs/xfs_iops.h
+++ b/fs/xfs/xfs_iops.h
@@ -27,4 +27,17 @@ extern ssize_t xfs_vn_listxattr(struct dentry *, char *data, size_t size);
extern void xfs_setup_inode(struct xfs_inode *);
+/*
+ * Internal setattr interfaces.
+ */
+#define XFS_ATTR_DMI 0x01 /* invocation from a DMI function */
+#define XFS_ATTR_NONBLOCK 0x02 /* return EAGAIN if op would block */
+#define XFS_ATTR_NOLOCK 0x04 /* Don't grab any conflicting locks */
+#define XFS_ATTR_NOACL 0x08 /* Don't call xfs_acl_chmod */
+#define XFS_ATTR_SYNC 0x10 /* synchronous operation required */
+
+extern int xfs_setattr_nonsize(struct xfs_inode *ip, struct iattr *vap,
+ int flags);
+extern int xfs_setattr_size(struct xfs_inode *ip, struct iattr *vap, int flags);
+
#endif /* __XFS_IOPS_H__ */
diff --git a/fs/xfs/xfs_rename.c b/fs/xfs/xfs_rename.c
index b0ac54f..e0e59f4 100644
--- a/fs/xfs/xfs_rename.c
+++ b/fs/xfs/xfs_rename.c
@@ -35,7 +35,6 @@
#include "xfs_quota.h"
#include "xfs_utils.h"
#include "xfs_trans_space.h"
-#include "xfs_vnodeops.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 1496094..229c99d 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -40,7 +40,6 @@
#include "xfs_attr.h"
#include "xfs_buf_item.h"
#include "xfs_utils.h"
-#include "xfs_vnodeops.h"
#include "xfs_log_priv.h"
#include "xfs_trans_priv.h"
#include "xfs_filestream.h"
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
deleted file mode 100644
index acec328..0000000
--- a/fs/xfs/xfs_vnodeops.c
+++ /dev/null
@@ -1,1875 +0,0 @@
-/*
- * Copyright (c) 2000-2006 Silicon Graphics, Inc.
- * Copyright (c) 2012 Red Hat, Inc.
- * All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it would be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write the Free Software Foundation,
- * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include "xfs.h"
-#include "xfs_fs.h"
-#include "xfs_types.h"
-#include "xfs_bit.h"
-#include "xfs_log.h"
-#include "xfs_trans.h"
-#include "xfs_sb.h"
-#include "xfs_ag.h"
-#include "xfs_mount.h"
-#include "xfs_da_btree.h"
-#include "xfs_dir2_format.h"
-#include "xfs_dir2.h"
-#include "xfs_bmap_btree.h"
-#include "xfs_ialloc_btree.h"
-#include "xfs_dinode.h"
-#include "xfs_inode.h"
-#include "xfs_inode_item.h"
-#include "xfs_itable.h"
-#include "xfs_ialloc.h"
-#include "xfs_alloc.h"
-#include "xfs_bmap.h"
-#include "xfs_acl.h"
-#include "xfs_attr.h"
-#include "xfs_error.h"
-#include "xfs_quota.h"
-#include "xfs_utils.h"
-#include "xfs_rtalloc.h"
-#include "xfs_trans_space.h"
-#include "xfs_log_priv.h"
-#include "xfs_filestream.h"
-#include "xfs_vnodeops.h"
-#include "xfs_trace.h"
-#include "xfs_icache.h"
-#include "xfs_symlink.h"
-
-
-/*
- * This is called by xfs_inactive to free any blocks beyond eof
- * when the link count isn't zero and by xfs_dm_punch_hole() when
- * punching a hole to EOF.
- */
-int
-xfs_free_eofblocks(
- xfs_mount_t *mp,
- xfs_inode_t *ip,
- bool need_iolock)
-{
- xfs_trans_t *tp;
- int error;
- xfs_fileoff_t end_fsb;
- xfs_fileoff_t last_fsb;
- xfs_filblks_t map_len;
- int nimaps;
- xfs_bmbt_irec_t imap;
-
- /*
- * Figure out if there are any blocks beyond the end
- * of the file. If not, then there is nothing to do.
- */
- end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
- last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
- if (last_fsb <= end_fsb)
- return 0;
- map_len = last_fsb - end_fsb;
-
- nimaps = 1;
- xfs_ilock(ip, XFS_ILOCK_SHARED);
- error = xfs_bmapi_read(ip, end_fsb, map_len, &imap, &nimaps, 0);
- xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
- if (!error && (nimaps != 0) &&
- (imap.br_startblock != HOLESTARTBLOCK ||
- ip->i_delayed_blks)) {
- /*
- * Attach the dquots to the inode up front.
- */
- error = xfs_qm_dqattach(ip, 0);
- if (error)
- return error;
-
- /*
- * There are blocks after the end of file.
- * Free them up now by truncating the file to
- * its current size.
- */
- tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
-
- if (need_iolock) {
- if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
- xfs_trans_cancel(tp, 0);
- return EAGAIN;
- }
- }
-
- error = xfs_trans_reserve(tp, 0,
- XFS_ITRUNCATE_LOG_RES(mp),
- 0, XFS_TRANS_PERM_LOG_RES,
- XFS_ITRUNCATE_LOG_COUNT);
- if (error) {
- ASSERT(XFS_FORCED_SHUTDOWN(mp));
- xfs_trans_cancel(tp, 0);
- if (need_iolock)
- xfs_iunlock(ip, XFS_IOLOCK_EXCL);
- return error;
- }
-
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, 0);
-
- /*
- * Do not update the on-disk file size. If we update the
- * on-disk file size and then the system crashes before the
- * contents of the file are flushed to disk then the files
- * may be full of holes (ie NULL files bug).
- */
- error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK,
- XFS_ISIZE(ip));
- if (error) {
- /*
- * If we get an error at this point we simply don't
- * bother truncating the file.
- */
- xfs_trans_cancel(tp,
- (XFS_TRANS_RELEASE_LOG_RES |
- XFS_TRANS_ABORT));
- } else {
- error = xfs_trans_commit(tp,
- XFS_TRANS_RELEASE_LOG_RES);
- if (!error)
- xfs_inode_clear_eofblocks_tag(ip);
- }
-
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- if (need_iolock)
- xfs_iunlock(ip, XFS_IOLOCK_EXCL);
- }
- return error;
-}
-
-int
-xfs_release(
- xfs_inode_t *ip)
-{
- xfs_mount_t *mp = ip->i_mount;
- int error;
-
- if (!S_ISREG(ip->i_d.di_mode) || (ip->i_d.di_mode == 0))
- return 0;
-
- /* If this is a read-only mount, don't do this (would generate I/O) */
- if (mp->m_flags & XFS_MOUNT_RDONLY)
- return 0;
-
- if (!XFS_FORCED_SHUTDOWN(mp)) {
- int truncated;
-
- /*
- * If we are using filestreams, and we have an unlinked
- * file that we are processing the last close on, then nothing
- * will be able to reopen and write to this file. Purge this
- * inode from the filestreams cache so that it doesn't delay
- * teardown of the inode.
- */
- if ((ip->i_d.di_nlink == 0) && xfs_inode_is_filestream(ip))
- xfs_filestream_deassociate(ip);
-
- /*
- * If we previously truncated this file and removed old data
- * in the process, we want to initiate "early" writeout on
- * the last close. This is an attempt to combat the notorious
- * NULL files problem which is particularly noticeable from a
- * truncate down, buffered (re-)write (delalloc), followed by
- * a crash. What we are effectively doing here is
- * significantly reducing the time window where we'd otherwise
- * be exposed to that problem.
- */
- truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED);
- if (truncated) {
- xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
- if (VN_DIRTY(VFS_I(ip)) && ip->i_delayed_blks > 0) {
- error = -filemap_flush(VFS_I(ip)->i_mapping);
- if (error)
- return error;
- }
- }
- }
-
- if (ip->i_d.di_nlink == 0)
- return 0;
-
- if (xfs_can_free_eofblocks(ip, false)) {
-
- /*
- * If we can't get the iolock just skip truncating the blocks
- * past EOF because we could deadlock with the mmap_sem
- * otherwise. We'll get another chance to drop them once the
- * last reference to the inode is dropped, so we'll never leak
- * blocks permanently.
- *
- * Further, check if the inode is being opened, written and
- * closed frequently and we have delayed allocation blocks
- * outstanding (e.g. streaming writes from the NFS server),
- * truncating the blocks past EOF will cause fragmentation to
- * occur.
- *
- * In this case don't do the truncation, either, but we have to
- * be careful how we detect this case. Blocks beyond EOF show
- * up as i_delayed_blks even when the inode is clean, so we
- * need to truncate them away first before checking for a dirty
- * release. Hence on the first dirty close we will still remove
- * the speculative allocation, but after that we will leave it
- * in place.
- */
- if (xfs_iflags_test(ip, XFS_IDIRTY_RELEASE))
- return 0;
-
- error = xfs_free_eofblocks(mp, ip, true);
- if (error && error != EAGAIN)
- return error;
-
- /* delalloc blocks after truncation means it really is dirty */
- if (ip->i_delayed_blks)
- xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);
- }
- return 0;
-}
-
-/*
- * xfs_inactive
- *
- * This is called when the vnode reference count for the vnode
- * goes to zero. If the file has been unlinked, then it must
- * now be truncated. Also, we clear all of the read-ahead state
- * kept for the inode here since the file is now closed.
- */
-int
-xfs_inactive(
- xfs_inode_t *ip)
-{
- xfs_bmap_free_t free_list;
- xfs_fsblock_t first_block;
- int committed;
- xfs_trans_t *tp;
- xfs_mount_t *mp;
- int error;
- int truncate = 0;
-
- /*
- * If the inode is already free, then there can be nothing
- * to clean up here.
- */
- if (ip->i_d.di_mode == 0 || is_bad_inode(VFS_I(ip))) {
- ASSERT(ip->i_df.if_real_bytes == 0);
- ASSERT(ip->i_df.if_broot_bytes == 0);
- return VN_INACTIVE_CACHE;
- }
-
- mp = ip->i_mount;
-
- error = 0;
-
- /* If this is a read-only mount, don't do this (would generate I/O) */
- if (mp->m_flags & XFS_MOUNT_RDONLY)
- goto out;
-
- if (ip->i_d.di_nlink != 0) {
- /*
- * force is true because we are evicting an inode from the
- * cache. Post-eof blocks must be freed, lest we end up with
- * broken free space accounting.
- */
- if (xfs_can_free_eofblocks(ip, true)) {
- error = xfs_free_eofblocks(mp, ip, false);
- if (error)
- return VN_INACTIVE_CACHE;
- }
- goto out;
- }
-
- if (S_ISREG(ip->i_d.di_mode) &&
- (ip->i_d.di_size != 0 || XFS_ISIZE(ip) != 0 ||
- ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0))
- truncate = 1;
-
- error = xfs_qm_dqattach(ip, 0);
- if (error)
- return VN_INACTIVE_CACHE;
-
- tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
- error = xfs_trans_reserve(tp, 0,
- (truncate || S_ISLNK(ip->i_d.di_mode)) ?
- XFS_ITRUNCATE_LOG_RES(mp) :
- XFS_IFREE_LOG_RES(mp),
- 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_ITRUNCATE_LOG_COUNT);
- if (error) {
- ASSERT(XFS_FORCED_SHUTDOWN(mp));
- xfs_trans_cancel(tp, 0);
- return VN_INACTIVE_CACHE;
- }
-
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, 0);
-
- if (S_ISLNK(ip->i_d.di_mode)) {
- /*
- * Zero length symlinks _can_ exist.
- */
- if (ip->i_d.di_size > XFS_IFORK_DSIZE(ip)) {
- error = xfs_inactive_symlink_rmt(ip, &tp);
- if (error)
- goto out_cancel;
- } else if (ip->i_df.if_bytes > 0) {
- xfs_idata_realloc(ip, -(ip->i_df.if_bytes),
- XFS_DATA_FORK);
- ASSERT(ip->i_df.if_bytes == 0);
- }
- } else if (truncate) {
- ip->i_d.di_size = 0;
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-
- error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
- if (error)
- goto out_cancel;
-
- ASSERT(ip->i_d.di_nextents == 0);
- }
-
- /*
- * If there are attributes associated with the file then blow them away
- * now. The code calls a routine that recursively deconstructs the
- * attribute fork. We need to just commit the current transaction
- * because we can't use it for xfs_attr_inactive().
- */
- if (ip->i_d.di_anextents > 0) {
- ASSERT(ip->i_d.di_forkoff != 0);
-
- error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
- if (error)
- goto out_unlock;
-
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
- error = xfs_attr_inactive(ip);
- if (error)
- goto out;
-
- tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
- error = xfs_trans_reserve(tp, 0,
- XFS_IFREE_LOG_RES(mp),
- 0, XFS_TRANS_PERM_LOG_RES,
- XFS_INACTIVE_LOG_COUNT);
- if (error) {
- xfs_trans_cancel(tp, 0);
- goto out;
- }
-
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, 0);
- }
-
- if (ip->i_afp)
- xfs_idestroy_fork(ip, XFS_ATTR_FORK);
-
- ASSERT(ip->i_d.di_anextents == 0);
-
- /*
- * Free the inode.
- */
- xfs_bmap_init(&free_list, &first_block);
- error = xfs_ifree(tp, ip, &free_list);
- if (error) {
- /*
- * If we fail to free the inode, shut down. The cancel
- * might do that, we need to make sure. Otherwise the
- * inode might be lost for a long time or forever.
- */
- if (!XFS_FORCED_SHUTDOWN(mp)) {
- xfs_notice(mp, "%s: xfs_ifree returned error %d",
- __func__, error);
- xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
- }
- xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_ABORT);
- } else {
- /*
- * Credit the quota account(s). The inode is gone.
- */
- xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_ICOUNT, -1);
-
- /*
- * Just ignore errors at this point. There is nothing we can
- * do except to try to keep going. Make sure it's not a silent
- * error.
- */
- error = xfs_bmap_finish(&tp, &free_list, &committed);
- if (error)
- xfs_notice(mp, "%s: xfs_bmap_finish returned error %d",
- __func__, error);
- error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
- if (error)
- xfs_notice(mp, "%s: xfs_trans_commit returned error %d",
- __func__, error);
- }
-
- /*
- * Release the dquots held by inode, if any.
- */
- xfs_qm_dqdetach(ip);
-out_unlock:
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
-out:
- return VN_INACTIVE_CACHE;
-out_cancel:
- xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
- goto out_unlock;
-}
-
-/*
- * Lookups up an inode from "name". If ci_name is not NULL, then a CI match
- * is allowed, otherwise it has to be an exact match. If a CI match is found,
- * ci_name->name will point to a the actual name (caller must free) or
- * will be set to NULL if an exact match is found.
- */
-int
-xfs_lookup(
- xfs_inode_t *dp,
- struct xfs_name *name,
- xfs_inode_t **ipp,
- struct xfs_name *ci_name)
-{
- xfs_ino_t inum;
- int error;
- uint lock_mode;
-
- trace_xfs_lookup(dp, name);
-
- if (XFS_FORCED_SHUTDOWN(dp->i_mount))
- return XFS_ERROR(EIO);
-
- lock_mode = xfs_ilock_map_shared(dp);
- error = xfs_dir_lookup(NULL, dp, name, &inum, ci_name);
- xfs_iunlock_map_shared(dp, lock_mode);
-
- if (error)
- goto out;
-
- error = xfs_iget(dp->i_mount, NULL, inum, 0, 0, ipp);
- if (error)
- goto out_free_name;
-
- return 0;
-
-out_free_name:
- if (ci_name)
- kmem_free(ci_name->name);
-out:
- *ipp = NULL;
- return error;
-}
-
-int
-xfs_create(
- xfs_inode_t *dp,
- struct xfs_name *name,
- umode_t mode,
- xfs_dev_t rdev,
- xfs_inode_t **ipp)
-{
- int is_dir = S_ISDIR(mode);
- struct xfs_mount *mp = dp->i_mount;
- struct xfs_inode *ip = NULL;
- struct xfs_trans *tp = NULL;
- int error;
- xfs_bmap_free_t free_list;
- xfs_fsblock_t first_block;
- bool unlock_dp_on_error = false;
- uint cancel_flags;
- int committed;
- prid_t prid;
- struct xfs_dquot *udqp = NULL;
- struct xfs_dquot *gdqp = NULL;
- uint resblks;
- uint log_res;
- uint log_count;
-
- trace_xfs_create(dp, name);
-
- if (XFS_FORCED_SHUTDOWN(mp))
- return XFS_ERROR(EIO);
-
- if (dp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT)
- prid = xfs_get_projid(dp);
- else
- prid = XFS_PROJID_DEFAULT;
-
- /*
- * Make sure that we have allocated dquot(s) on disk.
- */
- error = xfs_qm_vop_dqalloc(dp, current_fsuid(), current_fsgid(), prid,
- XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, &udqp, &gdqp);
- if (error)
- return error;
-
- if (is_dir) {
- rdev = 0;
- resblks = XFS_MKDIR_SPACE_RES(mp, name->len);
- log_res = XFS_MKDIR_LOG_RES(mp);
- log_count = XFS_MKDIR_LOG_COUNT;
- tp = xfs_trans_alloc(mp, XFS_TRANS_MKDIR);
- } else {
- resblks = XFS_CREATE_SPACE_RES(mp, name->len);
- log_res = XFS_CREATE_LOG_RES(mp);
- log_count = XFS_CREATE_LOG_COUNT;
- tp = xfs_trans_alloc(mp, XFS_TRANS_CREATE);
- }
-
- cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
-
- /*
- * Initially assume that the file does not exist and
- * reserve the resources for that case. If that is not
- * the case we'll drop the one we have and get a more
- * appropriate transaction later.
- */
- error = xfs_trans_reserve(tp, resblks, log_res, 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
- if (error == ENOSPC) {
- /* flush outstanding delalloc blocks and retry */
- xfs_flush_inodes(mp);
- error = xfs_trans_reserve(tp, resblks, log_res, 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
- }
- if (error == ENOSPC) {
- /* No space at all so try a "no-allocation" reservation */
- resblks = 0;
- error = xfs_trans_reserve(tp, 0, log_res, 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
- }
- if (error) {
- cancel_flags = 0;
- goto out_trans_cancel;
- }
-
- xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
- unlock_dp_on_error = true;
-
- xfs_bmap_init(&free_list, &first_block);
-
- /*
- * Reserve disk quota and the inode.
- */
- error = xfs_trans_reserve_quota(tp, mp, udqp, gdqp, resblks, 1, 0);
- if (error)
- goto out_trans_cancel;
-
- error = xfs_dir_canenter(tp, dp, name, resblks);
- if (error)
- goto out_trans_cancel;
-
- /*
- * A newly created regular or special file just has one directory
- * entry pointing to them, but a directory also the "." entry
- * pointing to itself.
- */
- error = xfs_dir_ialloc(&tp, dp, mode, is_dir ? 2 : 1, rdev,
- prid, resblks > 0, &ip, &committed);
- if (error) {
- if (error == ENOSPC)
- goto out_trans_cancel;
- goto out_trans_abort;
- }
-
- /*
- * Now we join the directory inode to the transaction. We do not do it
- * earlier because xfs_dir_ialloc might commit the previous transaction
- * (and release all the locks). An error from here on will result in
- * the transaction cancel unlocking dp so don't do it explicitly in the
- * error path.
- */
- xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
- unlock_dp_on_error = false;
-
- error = xfs_dir_createname(tp, dp, name, ip->i_ino,
- &first_block, &free_list, resblks ?
- resblks - XFS_IALLOC_SPACE_RES(mp) : 0);
- if (error) {
- ASSERT(error != ENOSPC);
- goto out_trans_abort;
- }
- xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
- xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
-
- if (is_dir) {
- error = xfs_dir_init(tp, ip, dp);
- if (error)
- goto out_bmap_cancel;
-
- error = xfs_bumplink(tp, dp);
- if (error)
- goto out_bmap_cancel;
- }
-
- /*
- * If this is a synchronous mount, make sure that the
- * create transaction goes to disk before returning to
- * the user.
- */
- if (mp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
- xfs_trans_set_sync(tp);
-
- /*
- * Attach the dquot(s) to the inodes and modify them incore.
- * These ids of the inode couldn't have changed since the new
- * inode has been locked ever since it was created.
- */
- xfs_qm_vop_create_dqattach(tp, ip, udqp, gdqp);
-
- error = xfs_bmap_finish(&tp, &free_list, &committed);
- if (error)
- goto out_bmap_cancel;
-
- error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
- if (error)
- goto out_release_inode;
-
- xfs_qm_dqrele(udqp);
- xfs_qm_dqrele(gdqp);
-
- *ipp = ip;
- return 0;
-
- out_bmap_cancel:
- xfs_bmap_cancel(&free_list);
- out_trans_abort:
- cancel_flags |= XFS_TRANS_ABORT;
- out_trans_cancel:
- xfs_trans_cancel(tp, cancel_flags);
- out_release_inode:
- /*
- * Wait until after the current transaction is aborted to
- * release the inode. This prevents recursive transactions
- * and deadlocks from xfs_inactive.
- */
- if (ip)
- IRELE(ip);
-
- xfs_qm_dqrele(udqp);
- xfs_qm_dqrele(gdqp);
-
- if (unlock_dp_on_error)
- xfs_iunlock(dp, XFS_ILOCK_EXCL);
- return error;
-}
-
-#ifdef DEBUG
-int xfs_locked_n;
-int xfs_small_retries;
-int xfs_middle_retries;
-int xfs_lots_retries;
-int xfs_lock_delays;
-#endif
-
-/*
- * Bump the subclass so xfs_lock_inodes() acquires each lock with
- * a different value
- */
-static inline int
-xfs_lock_inumorder(int lock_mode, int subclass)
-{
- if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
- lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_IOLOCK_SHIFT;
- if (lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))
- lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_ILOCK_SHIFT;
-
- return lock_mode;
-}
-
-/*
- * The following routine will lock n inodes in exclusive mode.
- * We assume the caller calls us with the inodes in i_ino order.
- *
- * We need to detect deadlock where an inode that we lock
- * is in the AIL and we start waiting for another inode that is locked
- * by a thread in a long running transaction (such as truncate). This can
- * result in deadlock since the long running trans might need to wait
- * for the inode we just locked in order to push the tail and free space
- * in the log.
- */
-void
-xfs_lock_inodes(
- xfs_inode_t **ips,
- int inodes,
- uint lock_mode)
-{
- int attempts = 0, i, j, try_lock;
- xfs_log_item_t *lp;
-
- ASSERT(ips && (inodes >= 2)); /* we need at least two */
-
- try_lock = 0;
- i = 0;
-
-again:
- for (; i < inodes; i++) {
- ASSERT(ips[i]);
-
- if (i && (ips[i] == ips[i-1])) /* Already locked */
- continue;
-
- /*
- * If try_lock is not set yet, make sure all locked inodes
- * are not in the AIL.
- * If any are, set try_lock to be used later.
- */
-
- if (!try_lock) {
- for (j = (i - 1); j >= 0 && !try_lock; j--) {
- lp = (xfs_log_item_t *)ips[j]->i_itemp;
- if (lp && (lp->li_flags & XFS_LI_IN_AIL)) {
- try_lock++;
- }
- }
- }
-
- /*
- * If any of the previous locks we have locked is in the AIL,
- * we must TRY to get the second and subsequent locks. If
- * we can't get any, we must release all we have
- * and try again.
- */
-
- if (try_lock) {
- /* try_lock must be 0 if i is 0. */
- /*
- * try_lock means we have an inode locked
- * that is in the AIL.
- */
- ASSERT(i != 0);
- if (!xfs_ilock_nowait(ips[i], xfs_lock_inumorder(lock_mode, i))) {
- attempts++;
-
- /*
- * Unlock all previous guys and try again.
- * xfs_iunlock will try to push the tail
- * if the inode is in the AIL.
- */
-
- for(j = i - 1; j >= 0; j--) {
-
- /*
- * Check to see if we've already
- * unlocked this one.
- * Not the first one going back,
- * and the inode ptr is the same.
- */
- if ((j != (i - 1)) && ips[j] ==
- ips[j+1])
- continue;
-
- xfs_iunlock(ips[j], lock_mode);
- }
-
- if ((attempts % 5) == 0) {
- delay(1); /* Don't just spin the CPU */
-#ifdef DEBUG
- xfs_lock_delays++;
-#endif
- }
- i = 0;
- try_lock = 0;
- goto again;
- }
- } else {
- xfs_ilock(ips[i], xfs_lock_inumorder(lock_mode, i));
- }
- }
-
-#ifdef DEBUG
- if (attempts) {
- if (attempts < 5) xfs_small_retries++;
- else if (attempts < 100) xfs_middle_retries++;
- else xfs_lots_retries++;
- } else {
- xfs_locked_n++;
- }
-#endif
-}
-
-/*
- * xfs_lock_two_inodes() can only be used to lock one type of lock
- * at a time - the iolock or the ilock, but not both at once. If
- * we lock both at once, lockdep will report false positives saying
- * we have violated locking orders.
- */
-void
-xfs_lock_two_inodes(
- xfs_inode_t *ip0,
- xfs_inode_t *ip1,
- uint lock_mode)
-{
- xfs_inode_t *temp;
- int attempts = 0;
- xfs_log_item_t *lp;
-
- if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
- ASSERT((lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)) == 0);
- ASSERT(ip0->i_ino != ip1->i_ino);
-
- if (ip0->i_ino > ip1->i_ino) {
- temp = ip0;
- ip0 = ip1;
- ip1 = temp;
- }
-
- again:
- xfs_ilock(ip0, xfs_lock_inumorder(lock_mode, 0));
-
- /*
- * If the first lock we have locked is in the AIL, we must TRY to get
- * the second lock. If we can't get it, we must release the first one
- * and try again.
- */
- lp = (xfs_log_item_t *)ip0->i_itemp;
- if (lp && (lp->li_flags & XFS_LI_IN_AIL)) {
- if (!xfs_ilock_nowait(ip1, xfs_lock_inumorder(lock_mode, 1))) {
- xfs_iunlock(ip0, lock_mode);
- if ((++attempts % 5) == 0)
- delay(1); /* Don't just spin the CPU */
- goto again;
- }
- } else {
- xfs_ilock(ip1, xfs_lock_inumorder(lock_mode, 1));
- }
-}
-
-int
-xfs_remove(
- xfs_inode_t *dp,
- struct xfs_name *name,
- xfs_inode_t *ip)
-{
- xfs_mount_t *mp = dp->i_mount;
- xfs_trans_t *tp = NULL;
- int is_dir = S_ISDIR(ip->i_d.di_mode);
- int error = 0;
- xfs_bmap_free_t free_list;
- xfs_fsblock_t first_block;
- int cancel_flags;
- int committed;
- int link_zero;
- uint resblks;
- uint log_count;
-
- trace_xfs_remove(dp, name);
-
- if (XFS_FORCED_SHUTDOWN(mp))
- return XFS_ERROR(EIO);
-
- error = xfs_qm_dqattach(dp, 0);
- if (error)
- goto std_return;
-
- error = xfs_qm_dqattach(ip, 0);
- if (error)
- goto std_return;
-
- if (is_dir) {
- tp = xfs_trans_alloc(mp, XFS_TRANS_RMDIR);
- log_count = XFS_DEFAULT_LOG_COUNT;
- } else {
- tp = xfs_trans_alloc(mp, XFS_TRANS_REMOVE);
- log_count = XFS_REMOVE_LOG_COUNT;
- }
- cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
-
- /*
- * We try to get the real space reservation first,
- * allowing for directory btree deletion(s) implying
- * possible bmap insert(s). If we can't get the space
- * reservation then we use 0 instead, and avoid the bmap
- * btree insert(s) in the directory code by, if the bmap
- * insert tries to happen, instead trimming the LAST
- * block from the directory.
- */
- resblks = XFS_REMOVE_SPACE_RES(mp);
- error = xfs_trans_reserve(tp, resblks, XFS_REMOVE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
- if (error == ENOSPC) {
- resblks = 0;
- error = xfs_trans_reserve(tp, 0, XFS_REMOVE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
- }
- if (error) {
- ASSERT(error != ENOSPC);
- cancel_flags = 0;
- goto out_trans_cancel;
- }
-
- xfs_lock_two_inodes(dp, ip, XFS_ILOCK_EXCL);
-
- xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
-
- /*
- * If we're removing a directory perform some additional validation.
- */
- if (is_dir) {
- ASSERT(ip->i_d.di_nlink >= 2);
- if (ip->i_d.di_nlink != 2) {
- error = XFS_ERROR(ENOTEMPTY);
- goto out_trans_cancel;
- }
- if (!xfs_dir_isempty(ip)) {
- error = XFS_ERROR(ENOTEMPTY);
- goto out_trans_cancel;
- }
- }
-
- xfs_bmap_init(&free_list, &first_block);
- error = xfs_dir_removename(tp, dp, name, ip->i_ino,
- &first_block, &free_list, resblks);
- if (error) {
- ASSERT(error != ENOENT);
- goto out_bmap_cancel;
- }
- xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
-
- if (is_dir) {
- /*
- * Drop the link from ip's "..".
- */
- error = xfs_droplink(tp, dp);
- if (error)
- goto out_bmap_cancel;
-
- /*
- * Drop the "." link from ip to self.
- */
- error = xfs_droplink(tp, ip);
- if (error)
- goto out_bmap_cancel;
- } else {
- /*
- * When removing a non-directory we need to log the parent
- * inode here. For a directory this is done implicitly
- * by the xfs_droplink call for the ".." entry.
- */
- xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
- }
-
- /*
- * Drop the link from dp to ip.
- */
- error = xfs_droplink(tp, ip);
- if (error)
- goto out_bmap_cancel;
-
- /*
- * Determine if this is the last link while
- * we are in the transaction.
- */
- link_zero = (ip->i_d.di_nlink == 0);
-
- /*
- * If this is a synchronous mount, make sure that the
- * remove transaction goes to disk before returning to
- * the user.
- */
- if (mp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
- xfs_trans_set_sync(tp);
-
- error = xfs_bmap_finish(&tp, &free_list, &committed);
- if (error)
- goto out_bmap_cancel;
-
- error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
- if (error)
- goto std_return;
-
- /*
- * If we are using filestreams, kill the stream association.
- * If the file is still open it may get a new one but that
- * will get killed on last close in xfs_close() so we don't
- * have to worry about that.
- */
- if (!is_dir && link_zero && xfs_inode_is_filestream(ip))
- xfs_filestream_deassociate(ip);
-
- return 0;
-
- out_bmap_cancel:
- xfs_bmap_cancel(&free_list);
- cancel_flags |= XFS_TRANS_ABORT;
- out_trans_cancel:
- xfs_trans_cancel(tp, cancel_flags);
- std_return:
- return error;
-}
-
-int
-xfs_link(
- xfs_inode_t *tdp,
- xfs_inode_t *sip,
- struct xfs_name *target_name)
-{
- xfs_mount_t *mp = tdp->i_mount;
- xfs_trans_t *tp;
- int error;
- xfs_bmap_free_t free_list;
- xfs_fsblock_t first_block;
- int cancel_flags;
- int committed;
- int resblks;
-
- trace_xfs_link(tdp, target_name);
-
- ASSERT(!S_ISDIR(sip->i_d.di_mode));
-
- if (XFS_FORCED_SHUTDOWN(mp))
- return XFS_ERROR(EIO);
-
- error = xfs_qm_dqattach(sip, 0);
- if (error)
- goto std_return;
-
- error = xfs_qm_dqattach(tdp, 0);
- if (error)
- goto std_return;
-
- tp = xfs_trans_alloc(mp, XFS_TRANS_LINK);
- cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
- resblks = XFS_LINK_SPACE_RES(mp, target_name->len);
- error = xfs_trans_reserve(tp, resblks, XFS_LINK_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_LINK_LOG_COUNT);
- if (error == ENOSPC) {
- resblks = 0;
- error = xfs_trans_reserve(tp, 0, XFS_LINK_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_LINK_LOG_COUNT);
- }
- if (error) {
- cancel_flags = 0;
- goto error_return;
- }
-
- xfs_lock_two_inodes(sip, tdp, XFS_ILOCK_EXCL);
-
- xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
-
- /*
- * If we are using project inheritance, we only allow hard link
- * creation in our tree when the project IDs are the same; else
- * the tree quota mechanism could be circumvented.
- */
- if (unlikely((tdp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT) &&
- (xfs_get_projid(tdp) != xfs_get_projid(sip)))) {
- error = XFS_ERROR(EXDEV);
- goto error_return;
- }
-
- error = xfs_dir_canenter(tp, tdp, target_name, resblks);
- if (error)
- goto error_return;
-
- xfs_bmap_init(&free_list, &first_block);
-
- error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
- &first_block, &free_list, resblks);
- if (error)
- goto abort_return;
- xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
- xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE);
-
- error = xfs_bumplink(tp, sip);
- if (error)
- goto abort_return;
-
- /*
- * If this is a synchronous mount, make sure that the
- * link transaction goes to disk before returning to
- * the user.
- */
- if (mp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC)) {
- xfs_trans_set_sync(tp);
- }
-
- error = xfs_bmap_finish (&tp, &free_list, &committed);
- if (error) {
- xfs_bmap_cancel(&free_list);
- goto abort_return;
- }
-
- return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
-
- abort_return:
- cancel_flags |= XFS_TRANS_ABORT;
- error_return:
- xfs_trans_cancel(tp, cancel_flags);
- std_return:
- return error;
-}
-
-int
-xfs_set_dmattrs(
- xfs_inode_t *ip,
- u_int evmask,
- u_int16_t state)
-{
- xfs_mount_t *mp = ip->i_mount;
- xfs_trans_t *tp;
- int error;
-
- if (!capable(CAP_SYS_ADMIN))
- return XFS_ERROR(EPERM);
-
- if (XFS_FORCED_SHUTDOWN(mp))
- return XFS_ERROR(EIO);
-
- tp = xfs_trans_alloc(mp, XFS_TRANS_SET_DMATTRS);
- error = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES (mp), 0, 0, 0);
- if (error) {
- xfs_trans_cancel(tp, 0);
- return error;
- }
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
-
- ip->i_d.di_dmevmask = evmask;
- ip->i_d.di_dmstate = state;
-
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
- error = xfs_trans_commit(tp, 0);
-
- return error;
-}
-
-/*
- * xfs_alloc_file_space()
- * This routine allocates disk space for the given file.
- *
- * If alloc_type == 0, this request is for an ALLOCSP type
- * request which will change the file size. In this case, no
- * DMAPI event will be generated by the call. A TRUNCATE event
- * will be generated later by xfs_setattr.
- *
- * If alloc_type != 0, this request is for a RESVSP type
- * request, and a DMAPI DM_EVENT_WRITE will be generated if the
- * lower block boundary byte address is less than the file's
- * length.
- *
- * RETURNS:
- * 0 on success
- * errno on error
- *
- */
-STATIC int
-xfs_alloc_file_space(
- xfs_inode_t *ip,
- xfs_off_t offset,
- xfs_off_t len,
- int alloc_type,
- int attr_flags)
-{
- xfs_mount_t *mp = ip->i_mount;
- xfs_off_t count;
- xfs_filblks_t allocated_fsb;
- xfs_filblks_t allocatesize_fsb;
- xfs_extlen_t extsz, temp;
- xfs_fileoff_t startoffset_fsb;
- xfs_fsblock_t firstfsb;
- int nimaps;
- int quota_flag;
- int rt;
- xfs_trans_t *tp;
- xfs_bmbt_irec_t imaps[1], *imapp;
- xfs_bmap_free_t free_list;
- uint qblocks, resblks, resrtextents;
- int committed;
- int error;
-
- trace_xfs_alloc_file_space(ip);
-
- if (XFS_FORCED_SHUTDOWN(mp))
- return XFS_ERROR(EIO);
-
- error = xfs_qm_dqattach(ip, 0);
- if (error)
- return error;
-
- if (len <= 0)
- return XFS_ERROR(EINVAL);
-
- rt = XFS_IS_REALTIME_INODE(ip);
- extsz = xfs_get_extsz_hint(ip);
-
- count = len;
- imapp = &imaps[0];
- nimaps = 1;
- startoffset_fsb = XFS_B_TO_FSBT(mp, offset);
- allocatesize_fsb = XFS_B_TO_FSB(mp, count);
-
- /*
- * Allocate file space until done or until there is an error
- */
- while (allocatesize_fsb && !error) {
- xfs_fileoff_t s, e;
-
- /*
- * Determine space reservations for data/realtime.
- */
- if (unlikely(extsz)) {
- s = startoffset_fsb;
- do_div(s, extsz);
- s *= extsz;
- e = startoffset_fsb + allocatesize_fsb;
- if ((temp = do_mod(startoffset_fsb, extsz)))
- e += temp;
- if ((temp = do_mod(e, extsz)))
- e += extsz - temp;
- } else {
- s = 0;
- e = allocatesize_fsb;
- }
-
- /*
- * The transaction reservation is limited to a 32-bit block
- * count, hence we need to limit the number of blocks we are
- * trying to reserve to avoid an overflow. We can't allocate
- * more than @nimaps extents, and an extent is limited on disk
- * to MAXEXTLEN (21 bits), so use that to enforce the limit.
- */
- resblks = min_t(xfs_fileoff_t, (e - s), (MAXEXTLEN * nimaps));
- if (unlikely(rt)) {
- resrtextents = qblocks = resblks;
- resrtextents /= mp->m_sb.sb_rextsize;
- resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
- quota_flag = XFS_QMOPT_RES_RTBLKS;
- } else {
- resrtextents = 0;
- resblks = qblocks = XFS_DIOSTRAT_SPACE_RES(mp, resblks);
- quota_flag = XFS_QMOPT_RES_REGBLKS;
- }
-
- /*
- * Allocate and setup the transaction.
- */
- tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
- error = xfs_trans_reserve(tp, resblks,
- XFS_WRITE_LOG_RES(mp), resrtextents,
- XFS_TRANS_PERM_LOG_RES,
- XFS_WRITE_LOG_COUNT);
- /*
- * Check for running out of space
- */
- if (error) {
- /*
- * Free the transaction structure.
- */
- ASSERT(error == ENOSPC || XFS_FORCED_SHUTDOWN(mp));
- xfs_trans_cancel(tp, 0);
- break;
- }
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks,
- 0, quota_flag);
- if (error)
- goto error1;
-
- xfs_trans_ijoin(tp, ip, 0);
-
- xfs_bmap_init(&free_list, &firstfsb);
- error = xfs_bmapi_write(tp, ip, startoffset_fsb,
- allocatesize_fsb, alloc_type, &firstfsb,
- 0, imapp, &nimaps, &free_list);
- if (error) {
- goto error0;
- }
-
- /*
- * Complete the transaction
- */
- error = xfs_bmap_finish(&tp, &free_list, &committed);
- if (error) {
- goto error0;
- }
-
- error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- if (error) {
- break;
- }
-
- allocated_fsb = imapp->br_blockcount;
-
- if (nimaps == 0) {
- error = XFS_ERROR(ENOSPC);
- break;
- }
-
- startoffset_fsb += allocated_fsb;
- allocatesize_fsb -= allocated_fsb;
- }
-
- return error;
-
-error0: /* Cancel bmap, unlock inode, unreserve quota blocks, cancel trans */
- xfs_bmap_cancel(&free_list);
- xfs_trans_unreserve_quota_nblks(tp, ip, (long)qblocks, 0, quota_flag);
-
-error1: /* Just cancel transaction */
- xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- return error;
-}
-
-/*
- * Zero file bytes between startoff and endoff inclusive.
- * The iolock is held exclusive and no blocks are buffered.
- *
- * This function is used by xfs_free_file_space() to zero
- * partial blocks when the range to free is not block aligned.
- * When unreserving space with boundaries that are not block
- * aligned we round up the start and round down the end
- * boundaries and then use this function to zero the parts of
- * the blocks that got dropped during the rounding.
- */
-STATIC int
-xfs_zero_remaining_bytes(
- xfs_inode_t *ip,
- xfs_off_t startoff,
- xfs_off_t endoff)
-{
- xfs_bmbt_irec_t imap;
- xfs_fileoff_t offset_fsb;
- xfs_off_t lastoffset;
- xfs_off_t offset;
- xfs_buf_t *bp;
- xfs_mount_t *mp = ip->i_mount;
- int nimap;
- int error = 0;
-
- /*
- * Avoid doing I/O beyond eof - it's not necessary
- * since nothing can read beyond eof. The space will
- * be zeroed when the file is extended anyway.
- */
- if (startoff >= XFS_ISIZE(ip))
- return 0;
-
- if (endoff > XFS_ISIZE(ip))
- endoff = XFS_ISIZE(ip);
-
- bp = xfs_buf_get_uncached(XFS_IS_REALTIME_INODE(ip) ?
- mp->m_rtdev_targp : mp->m_ddev_targp,
- BTOBB(mp->m_sb.sb_blocksize), 0);
- if (!bp)
- return XFS_ERROR(ENOMEM);
-
- xfs_buf_unlock(bp);
-
- for (offset = startoff; offset <= endoff; offset = lastoffset + 1) {
- offset_fsb = XFS_B_TO_FSBT(mp, offset);
- nimap = 1;
- error = xfs_bmapi_read(ip, offset_fsb, 1, &imap, &nimap, 0);
- if (error || nimap < 1)
- break;
- ASSERT(imap.br_blockcount >= 1);
- ASSERT(imap.br_startoff == offset_fsb);
- lastoffset = XFS_FSB_TO_B(mp, imap.br_startoff + 1) - 1;
- if (lastoffset > endoff)
- lastoffset = endoff;
- if (imap.br_startblock == HOLESTARTBLOCK)
- continue;
- ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
- if (imap.br_state == XFS_EXT_UNWRITTEN)
- continue;
- XFS_BUF_UNDONE(bp);
- XFS_BUF_UNWRITE(bp);
- XFS_BUF_READ(bp);
- XFS_BUF_SET_ADDR(bp, xfs_fsb_to_db(ip, imap.br_startblock));
- xfsbdstrat(mp, bp);
- error = xfs_buf_iowait(bp);
- if (error) {
- xfs_buf_ioerror_alert(bp,
- "xfs_zero_remaining_bytes(read)");
- break;
- }
- memset(bp->b_addr +
- (offset - XFS_FSB_TO_B(mp, imap.br_startoff)),
- 0, lastoffset - offset + 1);
- XFS_BUF_UNDONE(bp);
- XFS_BUF_UNREAD(bp);
- XFS_BUF_WRITE(bp);
- xfsbdstrat(mp, bp);
- error = xfs_buf_iowait(bp);
- if (error) {
- xfs_buf_ioerror_alert(bp,
- "xfs_zero_remaining_bytes(write)");
- break;
- }
- }
- xfs_buf_free(bp);
- return error;
-}
-
-/*
- * xfs_free_file_space()
- * This routine frees disk space for the given file.
- *
- * This routine is only called by xfs_change_file_space
- * for an UNRESVSP type call.
- *
- * RETURNS:
- * 0 on success
- * errno on error
- *
- */
-STATIC int
-xfs_free_file_space(
- xfs_inode_t *ip,
- xfs_off_t offset,
- xfs_off_t len,
- int attr_flags)
-{
- int committed;
- int done;
- xfs_fileoff_t endoffset_fsb;
- int error;
- xfs_fsblock_t firstfsb;
- xfs_bmap_free_t free_list;
- xfs_bmbt_irec_t imap;
- xfs_off_t ioffset;
- xfs_extlen_t mod=0;
- xfs_mount_t *mp;
- int nimap;
- uint resblks;
- xfs_off_t rounding;
- int rt;
- xfs_fileoff_t startoffset_fsb;
- xfs_trans_t *tp;
- int need_iolock = 1;
-
- mp = ip->i_mount;
-
- trace_xfs_free_file_space(ip);
-
- error = xfs_qm_dqattach(ip, 0);
- if (error)
- return error;
-
- error = 0;
- if (len <= 0) /* if nothing being freed */
- return error;
- rt = XFS_IS_REALTIME_INODE(ip);
- startoffset_fsb = XFS_B_TO_FSB(mp, offset);
- endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
-
- if (attr_flags & XFS_ATTR_NOLOCK)
- need_iolock = 0;
- if (need_iolock) {
- xfs_ilock(ip, XFS_IOLOCK_EXCL);
- /* wait for the completion of any pending DIOs */
- inode_dio_wait(VFS_I(ip));
- }
-
- rounding = max_t(xfs_off_t, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
- ioffset = offset & ~(rounding - 1);
- error = -filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
- ioffset, -1);
- if (error)
- goto out_unlock_iolock;
- truncate_pagecache_range(VFS_I(ip), ioffset, -1);
-
- /*
- * Need to zero the stuff we're not freeing, on disk.
- * If it's a realtime file & can't use unwritten extents then we
- * actually need to zero the extent edges. Otherwise xfs_bunmapi
- * will take care of it for us.
- */
- if (rt && !xfs_sb_version_hasextflgbit(&mp->m_sb)) {
- nimap = 1;
- error = xfs_bmapi_read(ip, startoffset_fsb, 1,
- &imap, &nimap, 0);
- if (error)
- goto out_unlock_iolock;
- ASSERT(nimap == 0 || nimap == 1);
- if (nimap && imap.br_startblock != HOLESTARTBLOCK) {
- xfs_daddr_t block;
-
- ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
- block = imap.br_startblock;
- mod = do_div(block, mp->m_sb.sb_rextsize);
- if (mod)
- startoffset_fsb += mp->m_sb.sb_rextsize - mod;
- }
- nimap = 1;
- error = xfs_bmapi_read(ip, endoffset_fsb - 1, 1,
- &imap, &nimap, 0);
- if (error)
- goto out_unlock_iolock;
- ASSERT(nimap == 0 || nimap == 1);
- if (nimap && imap.br_startblock != HOLESTARTBLOCK) {
- ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
- mod++;
- if (mod && (mod != mp->m_sb.sb_rextsize))
- endoffset_fsb -= mod;
- }
- }
- if ((done = (endoffset_fsb <= startoffset_fsb)))
- /*
- * One contiguous piece to clear
- */
- error = xfs_zero_remaining_bytes(ip, offset, offset + len - 1);
- else {
- /*
- * Some full blocks, possibly two pieces to clear
- */
- if (offset < XFS_FSB_TO_B(mp, startoffset_fsb))
- error = xfs_zero_remaining_bytes(ip, offset,
- XFS_FSB_TO_B(mp, startoffset_fsb) - 1);
- if (!error &&
- XFS_FSB_TO_B(mp, endoffset_fsb) < offset + len)
- error = xfs_zero_remaining_bytes(ip,
- XFS_FSB_TO_B(mp, endoffset_fsb),
- offset + len - 1);
- }
-
- /*
- * free file space until done or until there is an error
- */
- resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
- while (!error && !done) {
-
- /*
- * allocate and setup the transaction. Allow this
- * transaction to dip into the reserve blocks to ensure
- * the freeing of the space succeeds at ENOSPC.
- */
- tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
- tp->t_flags |= XFS_TRANS_RESERVE;
- error = xfs_trans_reserve(tp,
- resblks,
- XFS_WRITE_LOG_RES(mp),
- 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_WRITE_LOG_COUNT);
-
- /*
- * check for running out of space
- */
- if (error) {
- /*
- * Free the transaction structure.
- */
- ASSERT(error == ENOSPC || XFS_FORCED_SHUTDOWN(mp));
- xfs_trans_cancel(tp, 0);
- break;
- }
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- error = xfs_trans_reserve_quota(tp, mp,
- ip->i_udquot, ip->i_gdquot,
- resblks, 0, XFS_QMOPT_RES_REGBLKS);
- if (error)
- goto error1;
-
- xfs_trans_ijoin(tp, ip, 0);
-
- /*
- * issue the bunmapi() call to free the blocks
- */
- xfs_bmap_init(&free_list, &firstfsb);
- error = xfs_bunmapi(tp, ip, startoffset_fsb,
- endoffset_fsb - startoffset_fsb,
- 0, 2, &firstfsb, &free_list, &done);
- if (error) {
- goto error0;
- }
-
- /*
- * complete the transaction
- */
- error = xfs_bmap_finish(&tp, &free_list, &committed);
- if (error) {
- goto error0;
- }
-
- error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- }
-
- out_unlock_iolock:
- if (need_iolock)
- xfs_iunlock(ip, XFS_IOLOCK_EXCL);
- return error;
-
- error0:
- xfs_bmap_cancel(&free_list);
- error1:
- xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
- xfs_iunlock(ip, need_iolock ? (XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL) :
- XFS_ILOCK_EXCL);
- return error;
-}
-
-
-STATIC int
-xfs_zero_file_space(
- struct xfs_inode *ip,
- xfs_off_t offset,
- xfs_off_t len,
- int attr_flags)
-{
- struct xfs_mount *mp = ip->i_mount;
- uint granularity;
- xfs_off_t start_boundary;
- xfs_off_t end_boundary;
- int error;
-
- granularity = max_t(uint, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
-
- /*
- * Round the range of extents we are going to convert inwards. If the
- * offset is aligned, then it doesn't get changed so we zero from the
- * start of the block offset points to.
- */
- start_boundary = round_up(offset, granularity);
- end_boundary = round_down(offset + len, granularity);
-
- ASSERT(start_boundary >= offset);
- ASSERT(end_boundary <= offset + len);
-
- if (!(attr_flags & XFS_ATTR_NOLOCK))
- xfs_ilock(ip, XFS_IOLOCK_EXCL);
-
- if (start_boundary < end_boundary - 1) {
- /* punch out the page cache over the conversion range */
- truncate_pagecache_range(VFS_I(ip), start_boundary,
- end_boundary - 1);
- /* convert the blocks */
- error = xfs_alloc_file_space(ip, start_boundary,
- end_boundary - start_boundary - 1,
- XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT,
- attr_flags);
- if (error)
- goto out_unlock;
-
- /* We've handled the interior of the range, now for the edges */
- if (start_boundary != offset)
- error = xfs_iozero(ip, offset, start_boundary - offset);
- if (error)
- goto out_unlock;
-
- if (end_boundary != offset + len)
- error = xfs_iozero(ip, end_boundary,
- offset + len - end_boundary);
-
- } else {
- /*
- * It's either a sub-granularity range or the range spanned lies
- * partially across two adjacent blocks.
- */
- error = xfs_iozero(ip, offset, len);
- }
-
-out_unlock:
- if (!(attr_flags & XFS_ATTR_NOLOCK))
- xfs_iunlock(ip, XFS_IOLOCK_EXCL);
- return error;
-
-}
-
-/*
- * xfs_change_file_space()
- * This routine allocates or frees disk space for the given file.
- * The user specified parameters are checked for alignment and size
- * limitations.
- *
- * RETURNS:
- * 0 on success
- * errno on error
- *
- */
-int
-xfs_change_file_space(
- xfs_inode_t *ip,
- int cmd,
- xfs_flock64_t *bf,
- xfs_off_t offset,
- int attr_flags)
-{
- xfs_mount_t *mp = ip->i_mount;
- int clrprealloc;
- int error;
- xfs_fsize_t fsize;
- int setprealloc;
- xfs_off_t startoffset;
- xfs_trans_t *tp;
- struct iattr iattr;
-
- if (!S_ISREG(ip->i_d.di_mode))
- return XFS_ERROR(EINVAL);
-
- switch (bf->l_whence) {
- case 0: /*SEEK_SET*/
- break;
- case 1: /*SEEK_CUR*/
- bf->l_start += offset;
- break;
- case 2: /*SEEK_END*/
- bf->l_start += XFS_ISIZE(ip);
- break;
- default:
- return XFS_ERROR(EINVAL);
- }
-
- /*
- * length of <= 0 for resv/unresv/zero is invalid. length for
- * alloc/free is ignored completely and we have no idea what userspace
- * might have set it to, so set it to zero to allow range
- * checks to pass.
- */
- switch (cmd) {
- case XFS_IOC_ZERO_RANGE:
- case XFS_IOC_RESVSP:
- case XFS_IOC_RESVSP64:
- case XFS_IOC_UNRESVSP:
- case XFS_IOC_UNRESVSP64:
- if (bf->l_len <= 0)
- return XFS_ERROR(EINVAL);
- break;
- default:
- bf->l_len = 0;
- break;
- }
-
- if (bf->l_start < 0 ||
- bf->l_start > mp->m_super->s_maxbytes ||
- bf->l_start + bf->l_len < 0 ||
- bf->l_start + bf->l_len >= mp->m_super->s_maxbytes)
- return XFS_ERROR(EINVAL);
-
- bf->l_whence = 0;
-
- startoffset = bf->l_start;
- fsize = XFS_ISIZE(ip);
-
- setprealloc = clrprealloc = 0;
- switch (cmd) {
- case XFS_IOC_ZERO_RANGE:
- error = xfs_zero_file_space(ip, startoffset, bf->l_len,
- attr_flags);
- if (error)
- return error;
- setprealloc = 1;
- break;
-
- case XFS_IOC_RESVSP:
- case XFS_IOC_RESVSP64:
- error = xfs_alloc_file_space(ip, startoffset, bf->l_len,
- XFS_BMAPI_PREALLOC, attr_flags);
- if (error)
- return error;
- setprealloc = 1;
- break;
-
- case XFS_IOC_UNRESVSP:
- case XFS_IOC_UNRESVSP64:
- if ((error = xfs_free_file_space(ip, startoffset, bf->l_len,
- attr_flags)))
- return error;
- break;
-
- case XFS_IOC_ALLOCSP:
- case XFS_IOC_ALLOCSP64:
- case XFS_IOC_FREESP:
- case XFS_IOC_FREESP64:
- /*
- * These operations actually do IO when extending the file, but
- * the allocation is done seperately to the zeroing that is
- * done. This set of operations need to be serialised against
- * other IO operations, such as truncate and buffered IO. We
- * need to take the IOLOCK here to serialise the allocation and
- * zeroing IO to prevent other IOLOCK holders (e.g. getbmap,
- * truncate, direct IO) from racing against the transient
- * allocated but not written state we can have here.
- */
- xfs_ilock(ip, XFS_IOLOCK_EXCL);
- if (startoffset > fsize) {
- error = xfs_alloc_file_space(ip, fsize,
- startoffset - fsize, 0,
- attr_flags | XFS_ATTR_NOLOCK);
- if (error) {
- xfs_iunlock(ip, XFS_IOLOCK_EXCL);
- break;
- }
- }
-
- iattr.ia_valid = ATTR_SIZE;
- iattr.ia_size = startoffset;
-
- error = xfs_setattr_size(ip, &iattr,
- attr_flags | XFS_ATTR_NOLOCK);
- xfs_iunlock(ip, XFS_IOLOCK_EXCL);
-
- if (error)
- return error;
-
- clrprealloc = 1;
- break;
-
- default:
- ASSERT(0);
- return XFS_ERROR(EINVAL);
- }
-
- /*
- * update the inode timestamp, mode, and prealloc flag bits
- */
- tp = xfs_trans_alloc(mp, XFS_TRANS_WRITEID);
-
- if ((error = xfs_trans_reserve(tp, 0, XFS_WRITEID_LOG_RES(mp),
- 0, 0, 0))) {
- /* ASSERT(0); */
- xfs_trans_cancel(tp, 0);
- return error;
- }
-
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
-
- if ((attr_flags & XFS_ATTR_DMI) == 0) {
- ip->i_d.di_mode &= ~S_ISUID;
-
- /*
- * Note that we don't have to worry about mandatory
- * file locking being disabled here because we only
- * clear the S_ISGID bit if the Group execute bit is
- * on, but if it was on then mandatory locking wouldn't
- * have been enabled.
- */
- if (ip->i_d.di_mode & S_IXGRP)
- ip->i_d.di_mode &= ~S_ISGID;
-
- xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
- }
- if (setprealloc)
- ip->i_d.di_flags |= XFS_DIFLAG_PREALLOC;
- else if (clrprealloc)
- ip->i_d.di_flags &= ~XFS_DIFLAG_PREALLOC;
-
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
- if (attr_flags & XFS_ATTR_SYNC)
- xfs_trans_set_sync(tp);
- return xfs_trans_commit(tp, 0);
-}
diff --git a/fs/xfs/xfs_vnodeops.h b/fs/xfs/xfs_vnodeops.h
deleted file mode 100644
index 5163022..0000000
--- a/fs/xfs/xfs_vnodeops.h
+++ /dev/null
@@ -1,56 +0,0 @@
-#ifndef _XFS_VNODEOPS_H
-#define _XFS_VNODEOPS_H 1
-
-struct attrlist_cursor_kern;
-struct file;
-struct iattr;
-struct inode;
-struct iovec;
-struct kiocb;
-struct pipe_inode_info;
-struct uio;
-struct xfs_inode;
-
-
-int xfs_setattr_nonsize(struct xfs_inode *ip, struct iattr *vap, int flags);
-int xfs_setattr_size(struct xfs_inode *ip, struct iattr *vap, int flags);
-#define XFS_ATTR_DMI 0x01 /* invocation from a DMI function */
-#define XFS_ATTR_NONBLOCK 0x02 /* return EAGAIN if operation would block */
-#define XFS_ATTR_NOLOCK 0x04 /* Don't grab any conflicting locks */
-#define XFS_ATTR_NOACL 0x08 /* Don't call xfs_acl_chmod */
-#define XFS_ATTR_SYNC 0x10 /* synchronous operation required */
-
-int xfs_readlink(struct xfs_inode *ip, char *link);
-int xfs_release(struct xfs_inode *ip);
-int xfs_inactive(struct xfs_inode *ip);
-int xfs_lookup(struct xfs_inode *dp, struct xfs_name *name,
- struct xfs_inode **ipp, struct xfs_name *ci_name);
-int xfs_create(struct xfs_inode *dp, struct xfs_name *name, umode_t mode,
- xfs_dev_t rdev, struct xfs_inode **ipp);
-int xfs_remove(struct xfs_inode *dp, struct xfs_name *name,
- struct xfs_inode *ip);
-int xfs_link(struct xfs_inode *tdp, struct xfs_inode *sip,
- struct xfs_name *target_name);
-int xfs_readdir(struct xfs_inode *dp, void *dirent, size_t bufsize,
- xfs_off_t *offset, filldir_t filldir);
-int xfs_symlink(struct xfs_inode *dp, struct xfs_name *link_name,
- const char *target_path, umode_t mode, struct xfs_inode **ipp);
-int xfs_set_dmattrs(struct xfs_inode *ip, u_int evmask, u_int16_t state);
-int xfs_change_file_space(struct xfs_inode *ip, int cmd,
- xfs_flock64_t *bf, xfs_off_t offset, int attr_flags);
-int xfs_rename(struct xfs_inode *src_dp, struct xfs_name *src_name,
- struct xfs_inode *src_ip, struct xfs_inode *target_dp,
- struct xfs_name *target_name, struct xfs_inode *target_ip);
-int xfs_attr_get(struct xfs_inode *ip, const unsigned char *name,
- unsigned char *value, int *valuelenp, int flags);
-int xfs_attr_set(struct xfs_inode *dp, const unsigned char *name,
- unsigned char *value, int valuelen, int flags);
-int xfs_attr_remove(struct xfs_inode *dp, const unsigned char *name, int flags);
-int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
- int flags, struct attrlist_cursor_kern *cursor);
-
-int xfs_iozero(struct xfs_inode *, loff_t, size_t);
-int xfs_zero_eof(struct xfs_inode *, xfs_off_t, xfs_fsize_t);
-int xfs_free_eofblocks(struct xfs_mount *, struct xfs_inode *, bool);
-
-#endif /* _XFS_VNODEOPS_H */
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 87d3e03..dd72088 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -23,7 +23,6 @@
#include "xfs_attr.h"
#include "xfs_attr_leaf.h"
#include "xfs_acl.h"
-#include "xfs_vnodeops.h"
#include <linux/posix_acl_xattr.h>
#include <linux/xattr.h>
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 20/60] xfs: move xfs_getbmap to xfs_extent_ops.c
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (18 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 19/60] xfs: consolidate xfs_vnodeops.c into xfs_inode_ops.c Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 21/60] xfs: introduce xfs_sb.c for sharing with libxfs Dave Chinner
` (41 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
While xfs_getbmap() walks the extent tree and formats it for
userspace, it's not really a core function of the block mapping API.
It is a user of the interface, and one that is kernel only
functionality. xfs_bmap.c is shared with libxfs, so to minimise the
differences between userspace and kernel files, move the getbmap
functionality to xfs_extent_ops.c. This provides a nice separation
between the core functionality and implementation of kernel
interfaces.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_bmap.c | 281 ----------------------------------------------
fs/xfs/xfs_bmap.h | 5 -
fs/xfs/xfs_extent_ops.c | 283 +++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_extent_ops.h | 6 +
fs/xfs/xfs_iops.c | 1 +
5 files changed, 290 insertions(+), 286 deletions(-)
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 533c786..181e7b6 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -5791,287 +5791,6 @@ error0:
}
/*
- * returns 1 for success, 0 if we failed to map the extent.
- */
-STATIC int
-xfs_getbmapx_fix_eof_hole(
- xfs_inode_t *ip, /* xfs incore inode pointer */
- struct getbmapx *out, /* output structure */
- int prealloced, /* this is a file with
- * preallocated data space */
- __int64_t end, /* last block requested */
- xfs_fsblock_t startblock)
-{
- __int64_t fixlen;
- xfs_mount_t *mp; /* file system mount point */
- xfs_ifork_t *ifp; /* inode fork pointer */
- xfs_extnum_t lastx; /* last extent pointer */
- xfs_fileoff_t fileblock;
-
- if (startblock == HOLESTARTBLOCK) {
- mp = ip->i_mount;
- out->bmv_block = -1;
- fixlen = XFS_FSB_TO_BB(mp, XFS_B_TO_FSB(mp, XFS_ISIZE(ip)));
- fixlen -= out->bmv_offset;
- if (prealloced && out->bmv_offset + out->bmv_length == end) {
- /* Came to hole at EOF. Trim it. */
- if (fixlen <= 0)
- return 0;
- out->bmv_length = fixlen;
- }
- } else {
- if (startblock == DELAYSTARTBLOCK)
- out->bmv_block = -2;
- else
- out->bmv_block = xfs_fsb_to_db(ip, startblock);
- fileblock = XFS_BB_TO_FSB(ip->i_mount, out->bmv_offset);
- ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
- if (xfs_iext_bno_to_ext(ifp, fileblock, &lastx) &&
- (lastx == (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))-1))
- out->bmv_oflags |= BMV_OF_LAST;
- }
-
- return 1;
-}
-
-/*
- * Get inode's extents as described in bmv, and format for output.
- * Calls formatter to fill the user's buffer until all extents
- * are mapped, until the passed-in bmv->bmv_count slots have
- * been filled, or until the formatter short-circuits the loop,
- * if it is tracking filled-in extents on its own.
- */
-int /* error code */
-xfs_getbmap(
- xfs_inode_t *ip,
- struct getbmapx *bmv, /* user bmap structure */
- xfs_bmap_format_t formatter, /* format to user */
- void *arg) /* formatter arg */
-{
- __int64_t bmvend; /* last block requested */
- int error = 0; /* return value */
- __int64_t fixlen; /* length for -1 case */
- int i; /* extent number */
- int lock; /* lock state */
- xfs_bmbt_irec_t *map; /* buffer for user's data */
- xfs_mount_t *mp; /* file system mount point */
- int nex; /* # of user extents can do */
- int nexleft; /* # of user extents left */
- int subnex; /* # of bmapi's can do */
- int nmap; /* number of map entries */
- struct getbmapx *out; /* output structure */
- int whichfork; /* data or attr fork */
- int prealloced; /* this is a file with
- * preallocated data space */
- int iflags; /* interface flags */
- int bmapi_flags; /* flags for xfs_bmapi */
- int cur_ext = 0;
-
- mp = ip->i_mount;
- iflags = bmv->bmv_iflags;
- whichfork = iflags & BMV_IF_ATTRFORK ? XFS_ATTR_FORK : XFS_DATA_FORK;
-
- if (whichfork == XFS_ATTR_FORK) {
- if (XFS_IFORK_Q(ip)) {
- if (ip->i_d.di_aformat != XFS_DINODE_FMT_EXTENTS &&
- ip->i_d.di_aformat != XFS_DINODE_FMT_BTREE &&
- ip->i_d.di_aformat != XFS_DINODE_FMT_LOCAL)
- return XFS_ERROR(EINVAL);
- } else if (unlikely(
- ip->i_d.di_aformat != 0 &&
- ip->i_d.di_aformat != XFS_DINODE_FMT_EXTENTS)) {
- XFS_ERROR_REPORT("xfs_getbmap", XFS_ERRLEVEL_LOW,
- ip->i_mount);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- prealloced = 0;
- fixlen = 1LL << 32;
- } else {
- if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
- ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
- ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
- return XFS_ERROR(EINVAL);
-
- if (xfs_get_extsz_hint(ip) ||
- ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC|XFS_DIFLAG_APPEND)){
- prealloced = 1;
- fixlen = mp->m_super->s_maxbytes;
- } else {
- prealloced = 0;
- fixlen = XFS_ISIZE(ip);
- }
- }
-
- if (bmv->bmv_length == -1) {
- fixlen = XFS_FSB_TO_BB(mp, XFS_B_TO_FSB(mp, fixlen));
- bmv->bmv_length =
- max_t(__int64_t, fixlen - bmv->bmv_offset, 0);
- } else if (bmv->bmv_length == 0) {
- bmv->bmv_entries = 0;
- return 0;
- } else if (bmv->bmv_length < 0) {
- return XFS_ERROR(EINVAL);
- }
-
- nex = bmv->bmv_count - 1;
- if (nex <= 0)
- return XFS_ERROR(EINVAL);
- bmvend = bmv->bmv_offset + bmv->bmv_length;
-
-
- if (bmv->bmv_count > ULONG_MAX / sizeof(struct getbmapx))
- return XFS_ERROR(ENOMEM);
- out = kmem_zalloc(bmv->bmv_count * sizeof(struct getbmapx), KM_MAYFAIL);
- if (!out) {
- out = kmem_zalloc_large(bmv->bmv_count *
- sizeof(struct getbmapx));
- if (!out)
- return XFS_ERROR(ENOMEM);
- }
-
- xfs_ilock(ip, XFS_IOLOCK_SHARED);
- if (whichfork == XFS_DATA_FORK && !(iflags & BMV_IF_DELALLOC)) {
- if (ip->i_delayed_blks || XFS_ISIZE(ip) > ip->i_d.di_size) {
- error = -filemap_write_and_wait(VFS_I(ip)->i_mapping);
- if (error)
- goto out_unlock_iolock;
- }
- /*
- * even after flushing the inode, there can still be delalloc
- * blocks on the inode beyond EOF due to speculative
- * preallocation. These are not removed until the release
- * function is called or the inode is inactivated. Hence we
- * cannot assert here that ip->i_delayed_blks == 0.
- */
- }
-
- lock = xfs_ilock_map_shared(ip);
-
- /*
- * Don't let nex be bigger than the number of extents
- * we can have assuming alternating holes and real extents.
- */
- if (nex > XFS_IFORK_NEXTENTS(ip, whichfork) * 2 + 1)
- nex = XFS_IFORK_NEXTENTS(ip, whichfork) * 2 + 1;
-
- bmapi_flags = xfs_bmapi_aflag(whichfork);
- if (!(iflags & BMV_IF_PREALLOC))
- bmapi_flags |= XFS_BMAPI_IGSTATE;
-
- /*
- * Allocate enough space to handle "subnex" maps at a time.
- */
- error = ENOMEM;
- subnex = 16;
- map = kmem_alloc(subnex * sizeof(*map), KM_MAYFAIL | KM_NOFS);
- if (!map)
- goto out_unlock_ilock;
-
- bmv->bmv_entries = 0;
-
- if (XFS_IFORK_NEXTENTS(ip, whichfork) == 0 &&
- (whichfork == XFS_ATTR_FORK || !(iflags & BMV_IF_DELALLOC))) {
- error = 0;
- goto out_free_map;
- }
-
- nexleft = nex;
-
- do {
- nmap = (nexleft > subnex) ? subnex : nexleft;
- error = xfs_bmapi_read(ip, XFS_BB_TO_FSBT(mp, bmv->bmv_offset),
- XFS_BB_TO_FSB(mp, bmv->bmv_length),
- map, &nmap, bmapi_flags);
- if (error)
- goto out_free_map;
- ASSERT(nmap <= subnex);
-
- for (i = 0; i < nmap && nexleft && bmv->bmv_length; i++) {
- out[cur_ext].bmv_oflags = 0;
- if (map[i].br_state == XFS_EXT_UNWRITTEN)
- out[cur_ext].bmv_oflags |= BMV_OF_PREALLOC;
- else if (map[i].br_startblock == DELAYSTARTBLOCK)
- out[cur_ext].bmv_oflags |= BMV_OF_DELALLOC;
- out[cur_ext].bmv_offset =
- XFS_FSB_TO_BB(mp, map[i].br_startoff);
- out[cur_ext].bmv_length =
- XFS_FSB_TO_BB(mp, map[i].br_blockcount);
- out[cur_ext].bmv_unused1 = 0;
- out[cur_ext].bmv_unused2 = 0;
-
- /*
- * delayed allocation extents that start beyond EOF can
- * occur due to speculative EOF allocation when the
- * delalloc extent is larger than the largest freespace
- * extent at conversion time. These extents cannot be
- * converted by data writeback, so can exist here even
- * if we are not supposed to be finding delalloc
- * extents.
- */
- if (map[i].br_startblock == DELAYSTARTBLOCK &&
- map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
- ASSERT((iflags & BMV_IF_DELALLOC) != 0);
-
- if (map[i].br_startblock == HOLESTARTBLOCK &&
- whichfork == XFS_ATTR_FORK) {
- /* came to the end of attribute fork */
- out[cur_ext].bmv_oflags |= BMV_OF_LAST;
- goto out_free_map;
- }
-
- if (!xfs_getbmapx_fix_eof_hole(ip, &out[cur_ext],
- prealloced, bmvend,
- map[i].br_startblock))
- goto out_free_map;
-
- bmv->bmv_offset =
- out[cur_ext].bmv_offset +
- out[cur_ext].bmv_length;
- bmv->bmv_length =
- max_t(__int64_t, 0, bmvend - bmv->bmv_offset);
-
- /*
- * In case we don't want to return the hole,
- * don't increase cur_ext so that we can reuse
- * it in the next loop.
- */
- if ((iflags & BMV_IF_NO_HOLES) &&
- map[i].br_startblock == HOLESTARTBLOCK) {
- memset(&out[cur_ext], 0, sizeof(out[cur_ext]));
- continue;
- }
-
- nexleft--;
- bmv->bmv_entries++;
- cur_ext++;
- }
- } while (nmap && nexleft && bmv->bmv_length);
-
- out_free_map:
- kmem_free(map);
- out_unlock_ilock:
- xfs_iunlock_map_shared(ip, lock);
- out_unlock_iolock:
- xfs_iunlock(ip, XFS_IOLOCK_SHARED);
-
- for (i = 0; i < cur_ext; i++) {
- int full = 0; /* user array is full */
-
- /* format results & advance arg */
- error = formatter(&arg, &out[i], &full);
- if (error || full)
- break;
- }
-
- if (is_vmalloc_addr(out))
- kmem_free_large(out);
- else
- kmem_free(out);
- return error;
-}
-
-/*
* dead simple method of punching delalyed allocation blocks from a range in
* the inode. Walks a block at a time so will be slow, but is only executed in
* rare error cases so the overhead is not critical. This will alays punch out
diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
index 1cf1292..cbcaf00 100644
--- a/fs/xfs/xfs_bmap.h
+++ b/fs/xfs/xfs_bmap.h
@@ -18,7 +18,6 @@
#ifndef __XFS_BMAP_H__
#define __XFS_BMAP_H__
-struct getbmap;
struct xfs_bmbt_irec;
struct xfs_ifork;
struct xfs_inode;
@@ -206,13 +205,9 @@ int xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
uint xfs_default_attroffset(struct xfs_inode *ip);
#ifdef __KERNEL__
-/* bmap to userspace formatter - copy to user & advance pointer */
-typedef int (*xfs_bmap_format_t)(void **, struct getbmapx *, int *);
int xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
int *committed);
-int xfs_getbmap(struct xfs_inode *ip, struct getbmapx *bmv,
- xfs_bmap_format_t formatter, void *arg);
int xfs_bmap_eof(struct xfs_inode *ip, xfs_fileoff_t endoff,
int whichfork, int *eof);
int xfs_bmap_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip,
diff --git a/fs/xfs/xfs_extent_ops.c b/fs/xfs/xfs_extent_ops.c
index 5590d23..a2f136c 100644
--- a/fs/xfs/xfs_extent_ops.c
+++ b/fs/xfs/xfs_extent_ops.c
@@ -49,6 +49,7 @@
#include "xfs_filestream.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
+#include "xfs_extent_ops.h"
/*
@@ -770,3 +771,285 @@ xfs_change_file_space(
xfs_trans_set_sync(tp);
return xfs_trans_commit(tp, 0);
}
+
+/*
+ * returns 1 for success, 0 if we failed to map the extent.
+ */
+STATIC int
+xfs_getbmapx_fix_eof_hole(
+ xfs_inode_t *ip, /* xfs incore inode pointer */
+ struct getbmapx *out, /* output structure */
+ int prealloced, /* this is a file with
+ * preallocated data space */
+ __int64_t end, /* last block requested */
+ xfs_fsblock_t startblock)
+{
+ __int64_t fixlen;
+ xfs_mount_t *mp; /* file system mount point */
+ xfs_ifork_t *ifp; /* inode fork pointer */
+ xfs_extnum_t lastx; /* last extent pointer */
+ xfs_fileoff_t fileblock;
+
+ if (startblock == HOLESTARTBLOCK) {
+ mp = ip->i_mount;
+ out->bmv_block = -1;
+ fixlen = XFS_FSB_TO_BB(mp, XFS_B_TO_FSB(mp, XFS_ISIZE(ip)));
+ fixlen -= out->bmv_offset;
+ if (prealloced && out->bmv_offset + out->bmv_length == end) {
+ /* Came to hole at EOF. Trim it. */
+ if (fixlen <= 0)
+ return 0;
+ out->bmv_length = fixlen;
+ }
+ } else {
+ if (startblock == DELAYSTARTBLOCK)
+ out->bmv_block = -2;
+ else
+ out->bmv_block = xfs_fsb_to_db(ip, startblock);
+ fileblock = XFS_BB_TO_FSB(ip->i_mount, out->bmv_offset);
+ ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+ if (xfs_iext_bno_to_ext(ifp, fileblock, &lastx) &&
+ (lastx == (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))-1))
+ out->bmv_oflags |= BMV_OF_LAST;
+ }
+
+ return 1;
+}
+
+/*
+ * Get inode's extents as described in bmv, and format for output.
+ * Calls formatter to fill the user's buffer until all extents
+ * are mapped, until the passed-in bmv->bmv_count slots have
+ * been filled, or until the formatter short-circuits the loop,
+ * if it is tracking filled-in extents on its own.
+ */
+int /* error code */
+xfs_getbmap(
+ xfs_inode_t *ip,
+ struct getbmapx *bmv, /* user bmap structure */
+ xfs_bmap_format_t formatter, /* format to user */
+ void *arg) /* formatter arg */
+{
+ __int64_t bmvend; /* last block requested */
+ int error = 0; /* return value */
+ __int64_t fixlen; /* length for -1 case */
+ int i; /* extent number */
+ int lock; /* lock state */
+ xfs_bmbt_irec_t *map; /* buffer for user's data */
+ xfs_mount_t *mp; /* file system mount point */
+ int nex; /* # of user extents can do */
+ int nexleft; /* # of user extents left */
+ int subnex; /* # of bmapi's can do */
+ int nmap; /* number of map entries */
+ struct getbmapx *out; /* output structure */
+ int whichfork; /* data or attr fork */
+ int prealloced; /* this is a file with
+ * preallocated data space */
+ int iflags; /* interface flags */
+ int bmapi_flags; /* flags for xfs_bmapi */
+ int cur_ext = 0;
+
+ mp = ip->i_mount;
+ iflags = bmv->bmv_iflags;
+ whichfork = iflags & BMV_IF_ATTRFORK ? XFS_ATTR_FORK : XFS_DATA_FORK;
+
+ if (whichfork == XFS_ATTR_FORK) {
+ if (XFS_IFORK_Q(ip)) {
+ if (ip->i_d.di_aformat != XFS_DINODE_FMT_EXTENTS &&
+ ip->i_d.di_aformat != XFS_DINODE_FMT_BTREE &&
+ ip->i_d.di_aformat != XFS_DINODE_FMT_LOCAL)
+ return XFS_ERROR(EINVAL);
+ } else if (unlikely(
+ ip->i_d.di_aformat != 0 &&
+ ip->i_d.di_aformat != XFS_DINODE_FMT_EXTENTS)) {
+ XFS_ERROR_REPORT("xfs_getbmap", XFS_ERRLEVEL_LOW,
+ ip->i_mount);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ prealloced = 0;
+ fixlen = 1LL << 32;
+ } else {
+ if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
+ ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
+ ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
+ return XFS_ERROR(EINVAL);
+
+ if (xfs_get_extsz_hint(ip) ||
+ ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC|XFS_DIFLAG_APPEND)){
+ prealloced = 1;
+ fixlen = mp->m_super->s_maxbytes;
+ } else {
+ prealloced = 0;
+ fixlen = XFS_ISIZE(ip);
+ }
+ }
+
+ if (bmv->bmv_length == -1) {
+ fixlen = XFS_FSB_TO_BB(mp, XFS_B_TO_FSB(mp, fixlen));
+ bmv->bmv_length =
+ max_t(__int64_t, fixlen - bmv->bmv_offset, 0);
+ } else if (bmv->bmv_length == 0) {
+ bmv->bmv_entries = 0;
+ return 0;
+ } else if (bmv->bmv_length < 0) {
+ return XFS_ERROR(EINVAL);
+ }
+
+ nex = bmv->bmv_count - 1;
+ if (nex <= 0)
+ return XFS_ERROR(EINVAL);
+ bmvend = bmv->bmv_offset + bmv->bmv_length;
+
+
+ if (bmv->bmv_count > ULONG_MAX / sizeof(struct getbmapx))
+ return XFS_ERROR(ENOMEM);
+ out = kmem_zalloc(bmv->bmv_count * sizeof(struct getbmapx), KM_MAYFAIL);
+ if (!out) {
+ out = kmem_zalloc_large(bmv->bmv_count *
+ sizeof(struct getbmapx));
+ if (!out)
+ return XFS_ERROR(ENOMEM);
+ }
+
+ xfs_ilock(ip, XFS_IOLOCK_SHARED);
+ if (whichfork == XFS_DATA_FORK && !(iflags & BMV_IF_DELALLOC)) {
+ if (ip->i_delayed_blks || XFS_ISIZE(ip) > ip->i_d.di_size) {
+ error = -filemap_write_and_wait(VFS_I(ip)->i_mapping);
+ if (error)
+ goto out_unlock_iolock;
+ }
+ /*
+ * even after flushing the inode, there can still be delalloc
+ * blocks on the inode beyond EOF due to speculative
+ * preallocation. These are not removed until the release
+ * function is called or the inode is inactivated. Hence we
+ * cannot assert here that ip->i_delayed_blks == 0.
+ */
+ }
+
+ lock = xfs_ilock_map_shared(ip);
+
+ /*
+ * Don't let nex be bigger than the number of extents
+ * we can have assuming alternating holes and real extents.
+ */
+ if (nex > XFS_IFORK_NEXTENTS(ip, whichfork) * 2 + 1)
+ nex = XFS_IFORK_NEXTENTS(ip, whichfork) * 2 + 1;
+
+ bmapi_flags = xfs_bmapi_aflag(whichfork);
+ if (!(iflags & BMV_IF_PREALLOC))
+ bmapi_flags |= XFS_BMAPI_IGSTATE;
+
+ /*
+ * Allocate enough space to handle "subnex" maps at a time.
+ */
+ error = ENOMEM;
+ subnex = 16;
+ map = kmem_alloc(subnex * sizeof(*map), KM_MAYFAIL | KM_NOFS);
+ if (!map)
+ goto out_unlock_ilock;
+
+ bmv->bmv_entries = 0;
+
+ if (XFS_IFORK_NEXTENTS(ip, whichfork) == 0 &&
+ (whichfork == XFS_ATTR_FORK || !(iflags & BMV_IF_DELALLOC))) {
+ error = 0;
+ goto out_free_map;
+ }
+
+ nexleft = nex;
+
+ do {
+ nmap = (nexleft > subnex) ? subnex : nexleft;
+ error = xfs_bmapi_read(ip, XFS_BB_TO_FSBT(mp, bmv->bmv_offset),
+ XFS_BB_TO_FSB(mp, bmv->bmv_length),
+ map, &nmap, bmapi_flags);
+ if (error)
+ goto out_free_map;
+ ASSERT(nmap <= subnex);
+
+ for (i = 0; i < nmap && nexleft && bmv->bmv_length; i++) {
+ out[cur_ext].bmv_oflags = 0;
+ if (map[i].br_state == XFS_EXT_UNWRITTEN)
+ out[cur_ext].bmv_oflags |= BMV_OF_PREALLOC;
+ else if (map[i].br_startblock == DELAYSTARTBLOCK)
+ out[cur_ext].bmv_oflags |= BMV_OF_DELALLOC;
+ out[cur_ext].bmv_offset =
+ XFS_FSB_TO_BB(mp, map[i].br_startoff);
+ out[cur_ext].bmv_length =
+ XFS_FSB_TO_BB(mp, map[i].br_blockcount);
+ out[cur_ext].bmv_unused1 = 0;
+ out[cur_ext].bmv_unused2 = 0;
+
+ /*
+ * delayed allocation extents that start beyond EOF can
+ * occur due to speculative EOF allocation when the
+ * delalloc extent is larger than the largest freespace
+ * extent at conversion time. These extents cannot be
+ * converted by data writeback, so can exist here even
+ * if we are not supposed to be finding delalloc
+ * extents.
+ */
+ if (map[i].br_startblock == DELAYSTARTBLOCK &&
+ map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
+ ASSERT((iflags & BMV_IF_DELALLOC) != 0);
+
+ if (map[i].br_startblock == HOLESTARTBLOCK &&
+ whichfork == XFS_ATTR_FORK) {
+ /* came to the end of attribute fork */
+ out[cur_ext].bmv_oflags |= BMV_OF_LAST;
+ goto out_free_map;
+ }
+
+ if (!xfs_getbmapx_fix_eof_hole(ip, &out[cur_ext],
+ prealloced, bmvend,
+ map[i].br_startblock))
+ goto out_free_map;
+
+ bmv->bmv_offset =
+ out[cur_ext].bmv_offset +
+ out[cur_ext].bmv_length;
+ bmv->bmv_length =
+ max_t(__int64_t, 0, bmvend - bmv->bmv_offset);
+
+ /*
+ * In case we don't want to return the hole,
+ * don't increase cur_ext so that we can reuse
+ * it in the next loop.
+ */
+ if ((iflags & BMV_IF_NO_HOLES) &&
+ map[i].br_startblock == HOLESTARTBLOCK) {
+ memset(&out[cur_ext], 0, sizeof(out[cur_ext]));
+ continue;
+ }
+
+ nexleft--;
+ bmv->bmv_entries++;
+ cur_ext++;
+ }
+ } while (nmap && nexleft && bmv->bmv_length);
+
+ out_free_map:
+ kmem_free(map);
+ out_unlock_ilock:
+ xfs_iunlock_map_shared(ip, lock);
+ out_unlock_iolock:
+ xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+
+ for (i = 0; i < cur_ext; i++) {
+ int full = 0; /* user array is full */
+
+ /* format results & advance arg */
+ error = formatter(&arg, &out[i], &full);
+ if (error || full)
+ break;
+ }
+
+ if (is_vmalloc_addr(out))
+ kmem_free_large(out);
+ else
+ kmem_free(out);
+ return error;
+}
+
diff --git a/fs/xfs/xfs_extent_ops.h b/fs/xfs/xfs_extent_ops.h
index a737640..550a37d 100644
--- a/fs/xfs/xfs_extent_ops.h
+++ b/fs/xfs/xfs_extent_ops.h
@@ -21,4 +21,10 @@
int xfs_change_file_space(struct xfs_inode *ip, int cmd,
xfs_flock64_t *bf, xfs_off_t offset, int attr_flags);
+/* bmap to userspace formatter - copy to user & advance pointer */
+typedef int (*xfs_bmap_format_t)(void **, struct getbmapx *, int *);
+
+int xfs_getbmap(struct xfs_inode *ip, struct getbmapx *bmv,
+ xfs_bmap_format_t formatter, void *arg);
+
#endif /* _XFS_EXTENT_OPS_H */
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 7f44efa..99775a2 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -39,6 +39,7 @@
#include "xfs_trace.h"
#include "xfs_icache.h"
#include "xfs_symlink.h"
+#include "xfs_extent_ops.h"
#include <linux/capability.h>
#include <linux/xattr.h>
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 21/60] xfs: introduce xfs_sb.c for sharing with libxfs
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (19 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 20/60] xfs: move xfs_getbmap to xfs_extent_ops.c Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 22/60] xfs: move xfs_trans_reservations to xfs_trans.h Dave Chinner
` (40 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
xfs_mount.c is shared with userspace, but the only functions that
are shared are to do with physical superblock manipulations. This
means that less than 25% of the xfs_mount.c code is actually shared
with userspace. Move all the superblock functions to xfs_sb.c and
share that instead with libxfs.
Note that this will leave all the in-core transaction related
superblock counter modifications in xfs_mount.c as none of that is
shared with userspace. With a few more small changes, xfs_mount.h
won't need to be shared with userspace anymore, either.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 3 +-
fs/xfs/xfs_mount.c | 828 ++++++----------------------------------------------
fs/xfs/xfs_mount.h | 19 +-
fs/xfs/xfs_sb.c | 716 +++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_sb.h | 19 ++
5 files changed, 823 insertions(+), 762 deletions(-)
create mode 100644 fs/xfs/xfs_sb.c
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 8567bd6..ba7d292 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -49,6 +49,7 @@ xfs-y += xfs_aops.o \
xfs_iops.o \
xfs_itable.o \
xfs_message.o \
+ xfs_mount.o \
xfs_mru_cache.o \
xfs_rename.o \
xfs_super.o \
@@ -78,7 +79,7 @@ xfs-y += xfs_alloc.o \
xfs_icreate_item.o \
xfs_inode.o \
xfs_log_recover.o \
- xfs_mount.o \
+ xfs_sb.o \
xfs_symlink.o \
xfs_trans.o
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index e7253e2..5117587 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -61,69 +61,6 @@ STATIC void xfs_icsb_disable_counter(xfs_mount_t *, xfs_sb_field_t);
#define xfs_icsb_balance_counter_locked(mp, a, b) do { } while (0)
#endif
-static const struct {
- short offset;
- short type; /* 0 = integer
- * 1 = binary / string (no translation)
- */
-} xfs_sb_info[] = {
- { offsetof(xfs_sb_t, sb_magicnum), 0 },
- { offsetof(xfs_sb_t, sb_blocksize), 0 },
- { offsetof(xfs_sb_t, sb_dblocks), 0 },
- { offsetof(xfs_sb_t, sb_rblocks), 0 },
- { offsetof(xfs_sb_t, sb_rextents), 0 },
- { offsetof(xfs_sb_t, sb_uuid), 1 },
- { offsetof(xfs_sb_t, sb_logstart), 0 },
- { offsetof(xfs_sb_t, sb_rootino), 0 },
- { offsetof(xfs_sb_t, sb_rbmino), 0 },
- { offsetof(xfs_sb_t, sb_rsumino), 0 },
- { offsetof(xfs_sb_t, sb_rextsize), 0 },
- { offsetof(xfs_sb_t, sb_agblocks), 0 },
- { offsetof(xfs_sb_t, sb_agcount), 0 },
- { offsetof(xfs_sb_t, sb_rbmblocks), 0 },
- { offsetof(xfs_sb_t, sb_logblocks), 0 },
- { offsetof(xfs_sb_t, sb_versionnum), 0 },
- { offsetof(xfs_sb_t, sb_sectsize), 0 },
- { offsetof(xfs_sb_t, sb_inodesize), 0 },
- { offsetof(xfs_sb_t, sb_inopblock), 0 },
- { offsetof(xfs_sb_t, sb_fname[0]), 1 },
- { offsetof(xfs_sb_t, sb_blocklog), 0 },
- { offsetof(xfs_sb_t, sb_sectlog), 0 },
- { offsetof(xfs_sb_t, sb_inodelog), 0 },
- { offsetof(xfs_sb_t, sb_inopblog), 0 },
- { offsetof(xfs_sb_t, sb_agblklog), 0 },
- { offsetof(xfs_sb_t, sb_rextslog), 0 },
- { offsetof(xfs_sb_t, sb_inprogress), 0 },
- { offsetof(xfs_sb_t, sb_imax_pct), 0 },
- { offsetof(xfs_sb_t, sb_icount), 0 },
- { offsetof(xfs_sb_t, sb_ifree), 0 },
- { offsetof(xfs_sb_t, sb_fdblocks), 0 },
- { offsetof(xfs_sb_t, sb_frextents), 0 },
- { offsetof(xfs_sb_t, sb_uquotino), 0 },
- { offsetof(xfs_sb_t, sb_gquotino), 0 },
- { offsetof(xfs_sb_t, sb_qflags), 0 },
- { offsetof(xfs_sb_t, sb_flags), 0 },
- { offsetof(xfs_sb_t, sb_shared_vn), 0 },
- { offsetof(xfs_sb_t, sb_inoalignmt), 0 },
- { offsetof(xfs_sb_t, sb_unit), 0 },
- { offsetof(xfs_sb_t, sb_width), 0 },
- { offsetof(xfs_sb_t, sb_dirblklog), 0 },
- { offsetof(xfs_sb_t, sb_logsectlog), 0 },
- { offsetof(xfs_sb_t, sb_logsectsize),0 },
- { offsetof(xfs_sb_t, sb_logsunit), 0 },
- { offsetof(xfs_sb_t, sb_features2), 0 },
- { offsetof(xfs_sb_t, sb_bad_features2), 0 },
- { offsetof(xfs_sb_t, sb_features_compat), 0 },
- { offsetof(xfs_sb_t, sb_features_ro_compat), 0 },
- { offsetof(xfs_sb_t, sb_features_incompat), 0 },
- { offsetof(xfs_sb_t, sb_features_log_incompat), 0 },
- { offsetof(xfs_sb_t, sb_crc), 0 },
- { offsetof(xfs_sb_t, sb_pad), 0 },
- { offsetof(xfs_sb_t, sb_pquotino), 0 },
- { offsetof(xfs_sb_t, sb_lsn), 0 },
- { sizeof(xfs_sb_t), 0 }
-};
-
static DEFINE_MUTEX(xfs_uuid_table_mutex);
static int xfs_uuid_table_size;
static uuid_t *xfs_uuid_table;
@@ -199,64 +136,6 @@ xfs_uuid_unmount(
}
-/*
- * Reference counting access wrappers to the perag structures.
- * Because we never free per-ag structures, the only thing we
- * have to protect against changes is the tree structure itself.
- */
-struct xfs_perag *
-xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t agno)
-{
- struct xfs_perag *pag;
- int ref = 0;
-
- rcu_read_lock();
- pag = radix_tree_lookup(&mp->m_perag_tree, agno);
- if (pag) {
- ASSERT(atomic_read(&pag->pag_ref) >= 0);
- ref = atomic_inc_return(&pag->pag_ref);
- }
- rcu_read_unlock();
- trace_xfs_perag_get(mp, agno, ref, _RET_IP_);
- return pag;
-}
-
-/*
- * search from @first to find the next perag with the given tag set.
- */
-struct xfs_perag *
-xfs_perag_get_tag(
- struct xfs_mount *mp,
- xfs_agnumber_t first,
- int tag)
-{
- struct xfs_perag *pag;
- int found;
- int ref;
-
- rcu_read_lock();
- found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
- (void **)&pag, first, 1, tag);
- if (found <= 0) {
- rcu_read_unlock();
- return NULL;
- }
- ref = atomic_inc_return(&pag->pag_ref);
- rcu_read_unlock();
- trace_xfs_perag_get_tag(mp, pag->pag_agno, ref, _RET_IP_);
- return pag;
-}
-
-void
-xfs_perag_put(struct xfs_perag *pag)
-{
- int ref;
-
- ASSERT(atomic_read(&pag->pag_ref) > 0);
- ref = atomic_dec_return(&pag->pag_ref);
- trace_xfs_perag_put(pag->pag_mount, pag->pag_agno, ref, _RET_IP_);
-}
-
STATIC void
__xfs_free_perag(
struct rcu_head *head)
@@ -287,198 +166,6 @@ xfs_free_perag(
}
}
-/*
- * Check size of device based on the (data/realtime) block count.
- * Note: this check is used by the growfs code as well as mount.
- */
-int
-xfs_sb_validate_fsb_count(
- xfs_sb_t *sbp,
- __uint64_t nblocks)
-{
- ASSERT(PAGE_SHIFT >= sbp->sb_blocklog);
- ASSERT(sbp->sb_blocklog >= BBSHIFT);
-
-#if XFS_BIG_BLKNOS /* Limited by ULONG_MAX of page cache index */
- if (nblocks >> (PAGE_CACHE_SHIFT - sbp->sb_blocklog) > ULONG_MAX)
- return EFBIG;
-#else /* Limited by UINT_MAX of sectors */
- if (nblocks << (sbp->sb_blocklog - BBSHIFT) > UINT_MAX)
- return EFBIG;
-#endif
- return 0;
-}
-
-/*
- * Check the validity of the SB found.
- */
-STATIC int
-xfs_mount_validate_sb(
- xfs_mount_t *mp,
- xfs_sb_t *sbp,
- bool check_inprogress,
- bool check_version)
-{
-
- /*
- * If the log device and data device have the
- * same device number, the log is internal.
- * Consequently, the sb_logstart should be non-zero. If
- * we have a zero sb_logstart in this case, we may be trying to mount
- * a volume filesystem in a non-volume manner.
- */
- if (sbp->sb_magicnum != XFS_SB_MAGIC) {
- xfs_warn(mp, "bad magic number");
- return XFS_ERROR(EWRONGFS);
- }
-
-
- if (!xfs_sb_good_version(sbp)) {
- xfs_warn(mp, "bad version");
- return XFS_ERROR(EWRONGFS);
- }
-
- /*
- * Version 5 superblock feature mask validation. Reject combinations the
- * kernel cannot support up front before checking anything else. For
- * write validation, we don't need to check feature masks.
- */
- if (check_version && XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
- xfs_alert(mp,
-"Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!\n"
-"Use of these features in this kernel is at your own risk!");
-
- if (xfs_sb_has_compat_feature(sbp,
- XFS_SB_FEAT_COMPAT_UNKNOWN)) {
- xfs_warn(mp,
-"Superblock has unknown compatible features (0x%x) enabled.\n"
-"Using a more recent kernel is recommended.",
- (sbp->sb_features_compat &
- XFS_SB_FEAT_COMPAT_UNKNOWN));
- }
-
- if (xfs_sb_has_ro_compat_feature(sbp,
- XFS_SB_FEAT_RO_COMPAT_UNKNOWN)) {
- xfs_alert(mp,
-"Superblock has unknown read-only compatible features (0x%x) enabled.",
- (sbp->sb_features_ro_compat &
- XFS_SB_FEAT_RO_COMPAT_UNKNOWN));
- if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
- xfs_warn(mp,
-"Attempted to mount read-only compatible filesystem read-write.\n"
-"Filesystem can only be safely mounted read only.");
- return XFS_ERROR(EINVAL);
- }
- }
- if (xfs_sb_has_incompat_feature(sbp,
- XFS_SB_FEAT_INCOMPAT_UNKNOWN)) {
- xfs_warn(mp,
-"Superblock has unknown incompatible features (0x%x) enabled.\n"
-"Filesystem can not be safely mounted by this kernel.",
- (sbp->sb_features_incompat &
- XFS_SB_FEAT_INCOMPAT_UNKNOWN));
- return XFS_ERROR(EINVAL);
- }
- }
-
- if (unlikely(
- sbp->sb_logstart == 0 && mp->m_logdev_targp == mp->m_ddev_targp)) {
- xfs_warn(mp,
- "filesystem is marked as having an external log; "
- "specify logdev on the mount command line.");
- return XFS_ERROR(EINVAL);
- }
-
- if (unlikely(
- sbp->sb_logstart != 0 && mp->m_logdev_targp != mp->m_ddev_targp)) {
- xfs_warn(mp,
- "filesystem is marked as having an internal log; "
- "do not specify logdev on the mount command line.");
- return XFS_ERROR(EINVAL);
- }
-
- /*
- * More sanity checking. Most of these were stolen directly from
- * xfs_repair.
- */
- if (unlikely(
- sbp->sb_agcount <= 0 ||
- sbp->sb_sectsize < XFS_MIN_SECTORSIZE ||
- sbp->sb_sectsize > XFS_MAX_SECTORSIZE ||
- sbp->sb_sectlog < XFS_MIN_SECTORSIZE_LOG ||
- sbp->sb_sectlog > XFS_MAX_SECTORSIZE_LOG ||
- sbp->sb_sectsize != (1 << sbp->sb_sectlog) ||
- sbp->sb_blocksize < XFS_MIN_BLOCKSIZE ||
- sbp->sb_blocksize > XFS_MAX_BLOCKSIZE ||
- sbp->sb_blocklog < XFS_MIN_BLOCKSIZE_LOG ||
- sbp->sb_blocklog > XFS_MAX_BLOCKSIZE_LOG ||
- sbp->sb_blocksize != (1 << sbp->sb_blocklog) ||
- sbp->sb_inodesize < XFS_DINODE_MIN_SIZE ||
- sbp->sb_inodesize > XFS_DINODE_MAX_SIZE ||
- sbp->sb_inodelog < XFS_DINODE_MIN_LOG ||
- sbp->sb_inodelog > XFS_DINODE_MAX_LOG ||
- sbp->sb_inodesize != (1 << sbp->sb_inodelog) ||
- (sbp->sb_blocklog - sbp->sb_inodelog != sbp->sb_inopblog) ||
- (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE) ||
- (sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE) ||
- (sbp->sb_imax_pct > 100 /* zero sb_imax_pct is valid */) ||
- sbp->sb_dblocks == 0 ||
- sbp->sb_dblocks > XFS_MAX_DBLOCKS(sbp) ||
- sbp->sb_dblocks < XFS_MIN_DBLOCKS(sbp))) {
- XFS_CORRUPTION_ERROR("SB sanity check failed",
- XFS_ERRLEVEL_LOW, mp, sbp);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- /*
- * Until this is fixed only page-sized or smaller data blocks work.
- */
- if (unlikely(sbp->sb_blocksize > PAGE_SIZE)) {
- xfs_warn(mp,
- "File system with blocksize %d bytes. "
- "Only pagesize (%ld) or less will currently work.",
- sbp->sb_blocksize, PAGE_SIZE);
- return XFS_ERROR(ENOSYS);
- }
-
- /*
- * Currently only very few inode sizes are supported.
- */
- switch (sbp->sb_inodesize) {
- case 256:
- case 512:
- case 1024:
- case 2048:
- break;
- default:
- xfs_warn(mp, "inode size of %d bytes not supported",
- sbp->sb_inodesize);
- return XFS_ERROR(ENOSYS);
- }
-
- if (xfs_sb_validate_fsb_count(sbp, sbp->sb_dblocks) ||
- xfs_sb_validate_fsb_count(sbp, sbp->sb_rblocks)) {
- xfs_warn(mp,
- "file system too large to be mounted on this system.");
- return XFS_ERROR(EFBIG);
- }
-
- if (check_inprogress && sbp->sb_inprogress) {
- xfs_warn(mp, "Offline file system operation in progress!");
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- /*
- * Version 1 directory format has never worked on Linux.
- */
- if (unlikely(!xfs_sb_version_hasdirv2(sbp))) {
- xfs_warn(mp, "file system using version 1 directory format");
- return XFS_ERROR(ENOSYS);
- }
-
- return 0;
-}
-
int
xfs_initialize_perag(
xfs_mount_t *mp,
@@ -563,241 +250,114 @@ out_unwind:
return error;
}
-void
-xfs_sb_from_disk(
- struct xfs_sb *to,
- xfs_dsb_t *from)
-{
- to->sb_magicnum = be32_to_cpu(from->sb_magicnum);
- to->sb_blocksize = be32_to_cpu(from->sb_blocksize);
- to->sb_dblocks = be64_to_cpu(from->sb_dblocks);
- to->sb_rblocks = be64_to_cpu(from->sb_rblocks);
- to->sb_rextents = be64_to_cpu(from->sb_rextents);
- memcpy(&to->sb_uuid, &from->sb_uuid, sizeof(to->sb_uuid));
- to->sb_logstart = be64_to_cpu(from->sb_logstart);
- to->sb_rootino = be64_to_cpu(from->sb_rootino);
- to->sb_rbmino = be64_to_cpu(from->sb_rbmino);
- to->sb_rsumino = be64_to_cpu(from->sb_rsumino);
- to->sb_rextsize = be32_to_cpu(from->sb_rextsize);
- to->sb_agblocks = be32_to_cpu(from->sb_agblocks);
- to->sb_agcount = be32_to_cpu(from->sb_agcount);
- to->sb_rbmblocks = be32_to_cpu(from->sb_rbmblocks);
- to->sb_logblocks = be32_to_cpu(from->sb_logblocks);
- to->sb_versionnum = be16_to_cpu(from->sb_versionnum);
- to->sb_sectsize = be16_to_cpu(from->sb_sectsize);
- to->sb_inodesize = be16_to_cpu(from->sb_inodesize);
- to->sb_inopblock = be16_to_cpu(from->sb_inopblock);
- memcpy(&to->sb_fname, &from->sb_fname, sizeof(to->sb_fname));
- to->sb_blocklog = from->sb_blocklog;
- to->sb_sectlog = from->sb_sectlog;
- to->sb_inodelog = from->sb_inodelog;
- to->sb_inopblog = from->sb_inopblog;
- to->sb_agblklog = from->sb_agblklog;
- to->sb_rextslog = from->sb_rextslog;
- to->sb_inprogress = from->sb_inprogress;
- to->sb_imax_pct = from->sb_imax_pct;
- to->sb_icount = be64_to_cpu(from->sb_icount);
- to->sb_ifree = be64_to_cpu(from->sb_ifree);
- to->sb_fdblocks = be64_to_cpu(from->sb_fdblocks);
- to->sb_frextents = be64_to_cpu(from->sb_frextents);
- to->sb_uquotino = be64_to_cpu(from->sb_uquotino);
- to->sb_gquotino = be64_to_cpu(from->sb_gquotino);
- to->sb_qflags = be16_to_cpu(from->sb_qflags);
- to->sb_flags = from->sb_flags;
- to->sb_shared_vn = from->sb_shared_vn;
- to->sb_inoalignmt = be32_to_cpu(from->sb_inoalignmt);
- to->sb_unit = be32_to_cpu(from->sb_unit);
- to->sb_width = be32_to_cpu(from->sb_width);
- to->sb_dirblklog = from->sb_dirblklog;
- to->sb_logsectlog = from->sb_logsectlog;
- to->sb_logsectsize = be16_to_cpu(from->sb_logsectsize);
- to->sb_logsunit = be32_to_cpu(from->sb_logsunit);
- to->sb_features2 = be32_to_cpu(from->sb_features2);
- to->sb_bad_features2 = be32_to_cpu(from->sb_bad_features2);
- to->sb_features_compat = be32_to_cpu(from->sb_features_compat);
- to->sb_features_ro_compat = be32_to_cpu(from->sb_features_ro_compat);
- to->sb_features_incompat = be32_to_cpu(from->sb_features_incompat);
- to->sb_features_log_incompat =
- be32_to_cpu(from->sb_features_log_incompat);
- to->sb_pad = 0;
- to->sb_pquotino = be64_to_cpu(from->sb_pquotino);
- to->sb_lsn = be64_to_cpu(from->sb_lsn);
-}
-
/*
- * Copy in core superblock to ondisk one.
- *
- * The fields argument is mask of superblock fields to copy.
+ * Update alignment values based on mount options and sb values
*/
-void
-xfs_sb_to_disk(
- xfs_dsb_t *to,
- xfs_sb_t *from,
- __int64_t fields)
+STATIC int
+xfs_update_alignment(xfs_mount_t *mp)
{
- xfs_caddr_t to_ptr = (xfs_caddr_t)to;
- xfs_caddr_t from_ptr = (xfs_caddr_t)from;
- xfs_sb_field_t f;
- int first;
- int size;
-
- ASSERT(fields);
- if (!fields)
- return;
-
- while (fields) {
- f = (xfs_sb_field_t)xfs_lowbit64((__uint64_t)fields);
- first = xfs_sb_info[f].offset;
- size = xfs_sb_info[f + 1].offset - first;
-
- ASSERT(xfs_sb_info[f].type == 0 || xfs_sb_info[f].type == 1);
+ xfs_sb_t *sbp = &(mp->m_sb);
- if (size == 1 || xfs_sb_info[f].type == 1) {
- memcpy(to_ptr + first, from_ptr + first, size);
+ if (mp->m_dalign) {
+ /*
+ * If stripe unit and stripe width are not multiples
+ * of the fs blocksize turn off alignment.
+ */
+ if ((BBTOB(mp->m_dalign) & mp->m_blockmask) ||
+ (BBTOB(mp->m_swidth) & mp->m_blockmask)) {
+ if (mp->m_flags & XFS_MOUNT_RETERR) {
+ xfs_warn(mp, "alignment check failed: "
+ "(sunit/swidth vs. blocksize)");
+ return XFS_ERROR(EINVAL);
+ }
+ mp->m_dalign = mp->m_swidth = 0;
} else {
- switch (size) {
- case 2:
- *(__be16 *)(to_ptr + first) =
- cpu_to_be16(*(__u16 *)(from_ptr + first));
- break;
- case 4:
- *(__be32 *)(to_ptr + first) =
- cpu_to_be32(*(__u32 *)(from_ptr + first));
- break;
- case 8:
- *(__be64 *)(to_ptr + first) =
- cpu_to_be64(*(__u64 *)(from_ptr + first));
- break;
- default:
- ASSERT(0);
+ /*
+ * Convert the stripe unit and width to FSBs.
+ */
+ mp->m_dalign = XFS_BB_TO_FSBT(mp, mp->m_dalign);
+ if (mp->m_dalign && (sbp->sb_agblocks % mp->m_dalign)) {
+ if (mp->m_flags & XFS_MOUNT_RETERR) {
+ xfs_warn(mp, "alignment check failed: "
+ "(sunit/swidth vs. ag size)");
+ return XFS_ERROR(EINVAL);
+ }
+ xfs_warn(mp,
+ "stripe alignment turned off: sunit(%d)/swidth(%d) "
+ "incompatible with agsize(%d)",
+ mp->m_dalign, mp->m_swidth,
+ sbp->sb_agblocks);
+
+ mp->m_dalign = 0;
+ mp->m_swidth = 0;
+ } else if (mp->m_dalign) {
+ mp->m_swidth = XFS_BB_TO_FSBT(mp, mp->m_swidth);
+ } else {
+ if (mp->m_flags & XFS_MOUNT_RETERR) {
+ xfs_warn(mp, "alignment check failed: "
+ "sunit(%d) less than bsize(%d)",
+ mp->m_dalign,
+ mp->m_blockmask +1);
+ return XFS_ERROR(EINVAL);
+ }
+ mp->m_swidth = 0;
}
}
- fields &= ~(1LL << f);
- }
-}
-
-static int
-xfs_sb_verify(
- struct xfs_buf *bp,
- bool check_version)
-{
- struct xfs_mount *mp = bp->b_target->bt_mount;
- struct xfs_sb sb;
-
- xfs_sb_from_disk(&sb, XFS_BUF_TO_SBP(bp));
-
- /*
- * Only check the in progress field for the primary superblock as
- * mkfs.xfs doesn't clear it from secondary superblocks.
- */
- return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR,
- check_version);
-}
-
-/*
- * If the superblock has the CRC feature bit set or the CRC field is non-null,
- * check that the CRC is valid. We check the CRC field is non-null because a
- * single bit error could clear the feature bit and unused parts of the
- * superblock are supposed to be zero. Hence a non-null crc field indicates that
- * we've potentially lost a feature bit and we should check it anyway.
- */
-static void
-xfs_sb_read_verify(
- struct xfs_buf *bp)
-{
- struct xfs_mount *mp = bp->b_target->bt_mount;
- struct xfs_dsb *dsb = XFS_BUF_TO_SBP(bp);
- int error;
-
- /*
- * open code the version check to avoid needing to convert the entire
- * superblock from disk order just to check the version number
- */
- if (dsb->sb_magicnum == cpu_to_be32(XFS_SB_MAGIC) &&
- (((be16_to_cpu(dsb->sb_versionnum) & XFS_SB_VERSION_NUMBITS) ==
- XFS_SB_VERSION_5) ||
- dsb->sb_crc != 0)) {
-
- if (!xfs_verify_cksum(bp->b_addr, be16_to_cpu(dsb->sb_sectsize),
- offsetof(struct xfs_sb, sb_crc))) {
- error = EFSCORRUPTED;
- goto out_error;
+ /*
+ * Update superblock with new values
+ * and log changes
+ */
+ if (xfs_sb_version_hasdalign(sbp)) {
+ if (sbp->sb_unit != mp->m_dalign) {
+ sbp->sb_unit = mp->m_dalign;
+ mp->m_update_flags |= XFS_SB_UNIT;
+ }
+ if (sbp->sb_width != mp->m_swidth) {
+ sbp->sb_width = mp->m_swidth;
+ mp->m_update_flags |= XFS_SB_WIDTH;
+ }
}
+ } else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
+ xfs_sb_version_hasdalign(&mp->m_sb)) {
+ mp->m_dalign = sbp->sb_unit;
+ mp->m_swidth = sbp->sb_width;
}
- error = xfs_sb_verify(bp, true);
-out_error:
- if (error) {
- XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
- xfs_buf_ioerror(bp, error);
- }
+ return 0;
}
/*
- * We may be probed for a filesystem match, so we may not want to emit
- * messages when the superblock buffer is not actually an XFS superblock.
- * If we find an XFS superblock, the run a normal, noisy mount because we are
- * really going to mount it and want to know about errors.
+ * Check size of device based on the (data/realtime) block count.
+ * Note: this check is used by the growfs code as well as mount.
*/
-static void
-xfs_sb_quiet_read_verify(
- struct xfs_buf *bp)
-{
- struct xfs_dsb *dsb = XFS_BUF_TO_SBP(bp);
-
-
- if (dsb->sb_magicnum == cpu_to_be32(XFS_SB_MAGIC)) {
- /* XFS filesystem, verify noisily! */
- xfs_sb_read_verify(bp);
- return;
- }
- /* quietly fail */
- xfs_buf_ioerror(bp, EWRONGFS);
-}
-
-static void
-xfs_sb_write_verify(
- struct xfs_buf *bp)
+int
+xfs_sb_validate_fsb_count(
+ xfs_sb_t *sbp,
+ __uint64_t nblocks)
{
- struct xfs_mount *mp = bp->b_target->bt_mount;
- struct xfs_buf_log_item *bip = bp->b_fspriv;
- int error;
-
- error = xfs_sb_verify(bp, false);
- if (error) {
- XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
- xfs_buf_ioerror(bp, error);
- return;
- }
-
- if (!xfs_sb_version_hascrc(&mp->m_sb))
- return;
-
- if (bip)
- XFS_BUF_TO_SBP(bp)->sb_lsn = cpu_to_be64(bip->bli_item.li_lsn);
+ ASSERT(PAGE_SHIFT >= sbp->sb_blocklog);
+ ASSERT(sbp->sb_blocklog >= BBSHIFT);
- xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
- offsetof(struct xfs_sb, sb_crc));
+#if XFS_BIG_BLKNOS /* Limited by ULONG_MAX of page cache index */
+ if (nblocks >> (PAGE_CACHE_SHIFT - sbp->sb_blocklog) > ULONG_MAX)
+ return EFBIG;
+#else /* Limited by UINT_MAX of sectors */
+ if (nblocks << (sbp->sb_blocklog - BBSHIFT) > UINT_MAX)
+ return EFBIG;
+#endif
+ return 0;
}
-const struct xfs_buf_ops xfs_sb_buf_ops = {
- .verify_read = xfs_sb_read_verify,
- .verify_write = xfs_sb_write_verify,
-};
-
-static const struct xfs_buf_ops xfs_sb_quiet_buf_ops = {
- .verify_read = xfs_sb_quiet_read_verify,
- .verify_write = xfs_sb_write_verify,
-};
-
/*
* xfs_readsb
*
* Does the initial read of the superblock.
*/
int
-xfs_readsb(xfs_mount_t *mp, int flags)
+xfs_readsb(
+ struct xfs_mount *mp,
+ int flags)
{
unsigned int sector_size;
struct xfs_buf *bp;
@@ -873,184 +433,6 @@ release_buf:
return error;
}
-
-/*
- * xfs_mount_common
- *
- * Mount initialization code establishing various mount
- * fields from the superblock associated with the given
- * mount structure
- */
-STATIC void
-xfs_mount_common(xfs_mount_t *mp, xfs_sb_t *sbp)
-{
- mp->m_agfrotor = mp->m_agirotor = 0;
- spin_lock_init(&mp->m_agirotor_lock);
- mp->m_maxagi = mp->m_sb.sb_agcount;
- mp->m_blkbit_log = sbp->sb_blocklog + XFS_NBBYLOG;
- mp->m_blkbb_log = sbp->sb_blocklog - BBSHIFT;
- mp->m_sectbb_log = sbp->sb_sectlog - BBSHIFT;
- mp->m_agno_log = xfs_highbit32(sbp->sb_agcount - 1) + 1;
- mp->m_agino_log = sbp->sb_inopblog + sbp->sb_agblklog;
- mp->m_blockmask = sbp->sb_blocksize - 1;
- mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG;
- mp->m_blockwmask = mp->m_blockwsize - 1;
-
- mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1);
- mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0);
- mp->m_alloc_mnr[0] = mp->m_alloc_mxr[0] / 2;
- mp->m_alloc_mnr[1] = mp->m_alloc_mxr[1] / 2;
-
- mp->m_inobt_mxr[0] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, 1);
- mp->m_inobt_mxr[1] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, 0);
- mp->m_inobt_mnr[0] = mp->m_inobt_mxr[0] / 2;
- mp->m_inobt_mnr[1] = mp->m_inobt_mxr[1] / 2;
-
- mp->m_bmap_dmxr[0] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, 1);
- mp->m_bmap_dmxr[1] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, 0);
- mp->m_bmap_dmnr[0] = mp->m_bmap_dmxr[0] / 2;
- mp->m_bmap_dmnr[1] = mp->m_bmap_dmxr[1] / 2;
-
- mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
- mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
- sbp->sb_inopblock);
- mp->m_ialloc_blks = mp->m_ialloc_inos >> sbp->sb_inopblog;
-}
-
-/*
- * xfs_initialize_perag_data
- *
- * Read in each per-ag structure so we can count up the number of
- * allocated inodes, free inodes and used filesystem blocks as this
- * information is no longer persistent in the superblock. Once we have
- * this information, write it into the in-core superblock structure.
- */
-STATIC int
-xfs_initialize_perag_data(xfs_mount_t *mp, xfs_agnumber_t agcount)
-{
- xfs_agnumber_t index;
- xfs_perag_t *pag;
- xfs_sb_t *sbp = &mp->m_sb;
- uint64_t ifree = 0;
- uint64_t ialloc = 0;
- uint64_t bfree = 0;
- uint64_t bfreelst = 0;
- uint64_t btree = 0;
- int error;
-
- for (index = 0; index < agcount; index++) {
- /*
- * read the agf, then the agi. This gets us
- * all the information we need and populates the
- * per-ag structures for us.
- */
- error = xfs_alloc_pagf_init(mp, NULL, index, 0);
- if (error)
- return error;
-
- error = xfs_ialloc_pagi_init(mp, NULL, index);
- if (error)
- return error;
- pag = xfs_perag_get(mp, index);
- ifree += pag->pagi_freecount;
- ialloc += pag->pagi_count;
- bfree += pag->pagf_freeblks;
- bfreelst += pag->pagf_flcount;
- btree += pag->pagf_btreeblks;
- xfs_perag_put(pag);
- }
- /*
- * Overwrite incore superblock counters with just-read data
- */
- spin_lock(&mp->m_sb_lock);
- sbp->sb_ifree = ifree;
- sbp->sb_icount = ialloc;
- sbp->sb_fdblocks = bfree + bfreelst + btree;
- spin_unlock(&mp->m_sb_lock);
-
- /* Fixup the per-cpu counters as well. */
- xfs_icsb_reinit_counters(mp);
-
- return 0;
-}
-
-/*
- * Update alignment values based on mount options and sb values
- */
-STATIC int
-xfs_update_alignment(xfs_mount_t *mp)
-{
- xfs_sb_t *sbp = &(mp->m_sb);
-
- if (mp->m_dalign) {
- /*
- * If stripe unit and stripe width are not multiples
- * of the fs blocksize turn off alignment.
- */
- if ((BBTOB(mp->m_dalign) & mp->m_blockmask) ||
- (BBTOB(mp->m_swidth) & mp->m_blockmask)) {
- if (mp->m_flags & XFS_MOUNT_RETERR) {
- xfs_warn(mp, "alignment check failed: "
- "(sunit/swidth vs. blocksize)");
- return XFS_ERROR(EINVAL);
- }
- mp->m_dalign = mp->m_swidth = 0;
- } else {
- /*
- * Convert the stripe unit and width to FSBs.
- */
- mp->m_dalign = XFS_BB_TO_FSBT(mp, mp->m_dalign);
- if (mp->m_dalign && (sbp->sb_agblocks % mp->m_dalign)) {
- if (mp->m_flags & XFS_MOUNT_RETERR) {
- xfs_warn(mp, "alignment check failed: "
- "(sunit/swidth vs. ag size)");
- return XFS_ERROR(EINVAL);
- }
- xfs_warn(mp,
- "stripe alignment turned off: sunit(%d)/swidth(%d) "
- "incompatible with agsize(%d)",
- mp->m_dalign, mp->m_swidth,
- sbp->sb_agblocks);
-
- mp->m_dalign = 0;
- mp->m_swidth = 0;
- } else if (mp->m_dalign) {
- mp->m_swidth = XFS_BB_TO_FSBT(mp, mp->m_swidth);
- } else {
- if (mp->m_flags & XFS_MOUNT_RETERR) {
- xfs_warn(mp, "alignment check failed: "
- "sunit(%d) less than bsize(%d)",
- mp->m_dalign,
- mp->m_blockmask +1);
- return XFS_ERROR(EINVAL);
- }
- mp->m_swidth = 0;
- }
- }
-
- /*
- * Update superblock with new values
- * and log changes
- */
- if (xfs_sb_version_hasdalign(sbp)) {
- if (sbp->sb_unit != mp->m_dalign) {
- sbp->sb_unit = mp->m_dalign;
- mp->m_update_flags |= XFS_SB_UNIT;
- }
- if (sbp->sb_width != mp->m_swidth) {
- sbp->sb_width = mp->m_swidth;
- mp->m_update_flags |= XFS_SB_WIDTH;
- }
- }
- } else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
- xfs_sb_version_hasdalign(&mp->m_sb)) {
- mp->m_dalign = sbp->sb_unit;
- mp->m_swidth = sbp->sb_width;
- }
-
- return 0;
-}
-
/*
* Set the maximum inode count for this filesystem
*/
@@ -1277,7 +659,7 @@ xfs_mountfs(
uint quotaflags = 0;
int error = 0;
- xfs_mount_common(mp, sbp);
+ xfs_sb_mount_common(mp, sbp);
/*
* Check for a mismatched features2 values. Older kernels
@@ -1714,48 +1096,6 @@ xfs_log_sbcount(xfs_mount_t *mp)
}
/*
- * xfs_mod_sb() can be used to copy arbitrary changes to the
- * in-core superblock into the superblock buffer to be logged.
- * It does not provide the higher level of locking that is
- * needed to protect the in-core superblock from concurrent
- * access.
- */
-void
-xfs_mod_sb(xfs_trans_t *tp, __int64_t fields)
-{
- xfs_buf_t *bp;
- int first;
- int last;
- xfs_mount_t *mp;
- xfs_sb_field_t f;
-
- ASSERT(fields);
- if (!fields)
- return;
- mp = tp->t_mountp;
- bp = xfs_trans_getsb(tp, mp, 0);
- first = sizeof(xfs_sb_t);
- last = 0;
-
- /* translate/copy */
-
- xfs_sb_to_disk(XFS_BUF_TO_SBP(bp), &mp->m_sb, fields);
-
- /* find modified range */
- f = (xfs_sb_field_t)xfs_highbit64((__uint64_t)fields);
- ASSERT((1LL << f) & XFS_SB_MOD_BITS);
- last = xfs_sb_info[f + 1].offset - 1;
-
- f = (xfs_sb_field_t)xfs_lowbit64((__uint64_t)fields);
- ASSERT((1LL << f) & XFS_SB_MOD_BITS);
- first = xfs_sb_info[f].offset;
-
- xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
- xfs_trans_log_buf(tp, bp, first, last);
-}
-
-
-/*
* xfs_mod_incore_sb_unlocked() is a utility routine common used to apply
* a delta to a specified field in the in-core superblock. Simply
* switch on the field indicated and apply the delta to that field.
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b004cec..10761c3 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -334,14 +334,6 @@ xfs_daddr_to_agbno(struct xfs_mount *mp, xfs_daddr_t d)
}
/*
- * perag get/put wrappers for ref counting
- */
-struct xfs_perag *xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t agno);
-struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *mp, xfs_agnumber_t agno,
- int tag);
-void xfs_perag_put(struct xfs_perag *pag);
-
-/*
* Per-cpu superblock locking functions
*/
#ifdef HAVE_PERCPU_SB
@@ -373,6 +365,8 @@ typedef struct xfs_mod_sb {
extern int xfs_log_sbcount(xfs_mount_t *);
extern __uint64_t xfs_default_resblks(xfs_mount_t *mp);
extern int xfs_mountfs(xfs_mount_t *mp);
+extern int xfs_initialize_perag(xfs_mount_t *mp, xfs_agnumber_t agcount,
+ xfs_agnumber_t *maxagi);
extern void xfs_unmountfs(xfs_mount_t *);
extern int xfs_mod_incore_sb(xfs_mount_t *, xfs_sb_field_t, int64_t, int);
@@ -391,13 +385,4 @@ extern void xfs_set_low_space_thresholds(struct xfs_mount *);
#endif /* __KERNEL__ */
-extern void xfs_sb_calc_crc(struct xfs_buf *);
-extern void xfs_mod_sb(struct xfs_trans *, __int64_t);
-extern int xfs_initialize_perag(struct xfs_mount *, xfs_agnumber_t,
- xfs_agnumber_t *);
-extern void xfs_sb_from_disk(struct xfs_sb *, struct xfs_dsb *);
-extern void xfs_sb_to_disk(struct xfs_dsb *, struct xfs_sb *, __int64_t);
-
-extern const struct xfs_buf_ops xfs_sb_buf_ops;
-
#endif /* __XFS_MOUNT_H__ */
diff --git a/fs/xfs/xfs_sb.c b/fs/xfs/xfs_sb.c
new file mode 100644
index 0000000..2a7e87a
--- /dev/null
+++ b/fs/xfs/xfs_sb.c
@@ -0,0 +1,716 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_inum.h"
+#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2_format.h"
+#include "xfs_dir2.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_alloc.h"
+#include "xfs_rtalloc.h"
+#include "xfs_bmap.h"
+#include "xfs_error.h"
+#include "xfs_quota.h"
+#include "xfs_fsops.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_buf_item.h"
+
+/*
+ * Physical superblock buffer manipulations. Shared with libxfs in userspace.
+ */
+
+static const struct {
+ short offset;
+ short type; /* 0 = integer
+ * 1 = binary / string (no translation)
+ */
+} xfs_sb_info[] = {
+ { offsetof(xfs_sb_t, sb_magicnum), 0 },
+ { offsetof(xfs_sb_t, sb_blocksize), 0 },
+ { offsetof(xfs_sb_t, sb_dblocks), 0 },
+ { offsetof(xfs_sb_t, sb_rblocks), 0 },
+ { offsetof(xfs_sb_t, sb_rextents), 0 },
+ { offsetof(xfs_sb_t, sb_uuid), 1 },
+ { offsetof(xfs_sb_t, sb_logstart), 0 },
+ { offsetof(xfs_sb_t, sb_rootino), 0 },
+ { offsetof(xfs_sb_t, sb_rbmino), 0 },
+ { offsetof(xfs_sb_t, sb_rsumino), 0 },
+ { offsetof(xfs_sb_t, sb_rextsize), 0 },
+ { offsetof(xfs_sb_t, sb_agblocks), 0 },
+ { offsetof(xfs_sb_t, sb_agcount), 0 },
+ { offsetof(xfs_sb_t, sb_rbmblocks), 0 },
+ { offsetof(xfs_sb_t, sb_logblocks), 0 },
+ { offsetof(xfs_sb_t, sb_versionnum), 0 },
+ { offsetof(xfs_sb_t, sb_sectsize), 0 },
+ { offsetof(xfs_sb_t, sb_inodesize), 0 },
+ { offsetof(xfs_sb_t, sb_inopblock), 0 },
+ { offsetof(xfs_sb_t, sb_fname[0]), 1 },
+ { offsetof(xfs_sb_t, sb_blocklog), 0 },
+ { offsetof(xfs_sb_t, sb_sectlog), 0 },
+ { offsetof(xfs_sb_t, sb_inodelog), 0 },
+ { offsetof(xfs_sb_t, sb_inopblog), 0 },
+ { offsetof(xfs_sb_t, sb_agblklog), 0 },
+ { offsetof(xfs_sb_t, sb_rextslog), 0 },
+ { offsetof(xfs_sb_t, sb_inprogress), 0 },
+ { offsetof(xfs_sb_t, sb_imax_pct), 0 },
+ { offsetof(xfs_sb_t, sb_icount), 0 },
+ { offsetof(xfs_sb_t, sb_ifree), 0 },
+ { offsetof(xfs_sb_t, sb_fdblocks), 0 },
+ { offsetof(xfs_sb_t, sb_frextents), 0 },
+ { offsetof(xfs_sb_t, sb_uquotino), 0 },
+ { offsetof(xfs_sb_t, sb_gquotino), 0 },
+ { offsetof(xfs_sb_t, sb_qflags), 0 },
+ { offsetof(xfs_sb_t, sb_flags), 0 },
+ { offsetof(xfs_sb_t, sb_shared_vn), 0 },
+ { offsetof(xfs_sb_t, sb_inoalignmt), 0 },
+ { offsetof(xfs_sb_t, sb_unit), 0 },
+ { offsetof(xfs_sb_t, sb_width), 0 },
+ { offsetof(xfs_sb_t, sb_dirblklog), 0 },
+ { offsetof(xfs_sb_t, sb_logsectlog), 0 },
+ { offsetof(xfs_sb_t, sb_logsectsize),0 },
+ { offsetof(xfs_sb_t, sb_logsunit), 0 },
+ { offsetof(xfs_sb_t, sb_features2), 0 },
+ { offsetof(xfs_sb_t, sb_bad_features2), 0 },
+ { offsetof(xfs_sb_t, sb_features_compat), 0 },
+ { offsetof(xfs_sb_t, sb_features_ro_compat), 0 },
+ { offsetof(xfs_sb_t, sb_features_incompat), 0 },
+ { offsetof(xfs_sb_t, sb_features_log_incompat), 0 },
+ { offsetof(xfs_sb_t, sb_crc), 0 },
+ { offsetof(xfs_sb_t, sb_pad), 0 },
+ { offsetof(xfs_sb_t, sb_pquotino), 0 },
+ { offsetof(xfs_sb_t, sb_lsn), 0 },
+ { sizeof(xfs_sb_t), 0 }
+};
+
+/*
+ * Reference counting access wrappers to the perag structures.
+ * Because we never free per-ag structures, the only thing we
+ * have to protect against changes is the tree structure itself.
+ */
+struct xfs_perag *
+xfs_perag_get(
+ struct xfs_mount *mp,
+ xfs_agnumber_t agno)
+{
+ struct xfs_perag *pag;
+ int ref = 0;
+
+ rcu_read_lock();
+ pag = radix_tree_lookup(&mp->m_perag_tree, agno);
+ if (pag) {
+ ASSERT(atomic_read(&pag->pag_ref) >= 0);
+ ref = atomic_inc_return(&pag->pag_ref);
+ }
+ rcu_read_unlock();
+ trace_xfs_perag_get(mp, agno, ref, _RET_IP_);
+ return pag;
+}
+
+/*
+ * search from @first to find the next perag with the given tag set.
+ */
+struct xfs_perag *
+xfs_perag_get_tag(
+ struct xfs_mount *mp,
+ xfs_agnumber_t first,
+ int tag)
+{
+ struct xfs_perag *pag;
+ int found;
+ int ref;
+
+ rcu_read_lock();
+ found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
+ (void **)&pag, first, 1, tag);
+ if (found <= 0) {
+ rcu_read_unlock();
+ return NULL;
+ }
+ ref = atomic_inc_return(&pag->pag_ref);
+ rcu_read_unlock();
+ trace_xfs_perag_get_tag(mp, pag->pag_agno, ref, _RET_IP_);
+ return pag;
+}
+
+void
+xfs_perag_put(
+ struct xfs_perag *pag)
+{
+ int ref;
+
+ ASSERT(atomic_read(&pag->pag_ref) > 0);
+ ref = atomic_dec_return(&pag->pag_ref);
+ trace_xfs_perag_put(pag->pag_mount, pag->pag_agno, ref, _RET_IP_);
+}
+
+/*
+ * Check the validity of the SB found.
+ */
+STATIC int
+xfs_mount_validate_sb(
+ xfs_mount_t *mp,
+ xfs_sb_t *sbp,
+ bool check_inprogress,
+ bool check_version)
+{
+
+ /*
+ * If the log device and data device have the
+ * same device number, the log is internal.
+ * Consequently, the sb_logstart should be non-zero. If
+ * we have a zero sb_logstart in this case, we may be trying to mount
+ * a volume filesystem in a non-volume manner.
+ */
+ if (sbp->sb_magicnum != XFS_SB_MAGIC) {
+ xfs_warn(mp, "bad magic number");
+ return XFS_ERROR(EWRONGFS);
+ }
+
+
+ if (!xfs_sb_good_version(sbp)) {
+ xfs_warn(mp, "bad version");
+ return XFS_ERROR(EWRONGFS);
+ }
+
+ /*
+ * Version 5 superblock feature mask validation. Reject combinations the
+ * kernel cannot support up front before checking anything else. For
+ * write validation, we don't need to check feature masks.
+ */
+ if (check_version && XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) {
+ xfs_alert(mp,
+"Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!\n"
+"Use of these features in this kernel is at your own risk!");
+
+ if (xfs_sb_has_compat_feature(sbp,
+ XFS_SB_FEAT_COMPAT_UNKNOWN)) {
+ xfs_warn(mp,
+"Superblock has unknown compatible features (0x%x) enabled.\n"
+"Using a more recent kernel is recommended.",
+ (sbp->sb_features_compat &
+ XFS_SB_FEAT_COMPAT_UNKNOWN));
+ }
+
+ if (xfs_sb_has_ro_compat_feature(sbp,
+ XFS_SB_FEAT_RO_COMPAT_UNKNOWN)) {
+ xfs_alert(mp,
+"Superblock has unknown read-only compatible features (0x%x) enabled.",
+ (sbp->sb_features_ro_compat &
+ XFS_SB_FEAT_RO_COMPAT_UNKNOWN));
+ if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
+ xfs_warn(mp,
+"Attempted to mount read-only compatible filesystem read-write.\n"
+"Filesystem can only be safely mounted read only.");
+ return XFS_ERROR(EINVAL);
+ }
+ }
+ if (xfs_sb_has_incompat_feature(sbp,
+ XFS_SB_FEAT_INCOMPAT_UNKNOWN)) {
+ xfs_warn(mp,
+"Superblock has unknown incompatible features (0x%x) enabled.\n"
+"Filesystem can not be safely mounted by this kernel.",
+ (sbp->sb_features_incompat &
+ XFS_SB_FEAT_INCOMPAT_UNKNOWN));
+ return XFS_ERROR(EINVAL);
+ }
+ }
+
+ if (unlikely(
+ sbp->sb_logstart == 0 && mp->m_logdev_targp == mp->m_ddev_targp)) {
+ xfs_warn(mp,
+ "filesystem is marked as having an external log; "
+ "specify logdev on the mount command line.");
+ return XFS_ERROR(EINVAL);
+ }
+
+ if (unlikely(
+ sbp->sb_logstart != 0 && mp->m_logdev_targp != mp->m_ddev_targp)) {
+ xfs_warn(mp,
+ "filesystem is marked as having an internal log; "
+ "do not specify logdev on the mount command line.");
+ return XFS_ERROR(EINVAL);
+ }
+
+ /*
+ * More sanity checking. Most of these were stolen directly from
+ * xfs_repair.
+ */
+ if (unlikely(
+ sbp->sb_agcount <= 0 ||
+ sbp->sb_sectsize < XFS_MIN_SECTORSIZE ||
+ sbp->sb_sectsize > XFS_MAX_SECTORSIZE ||
+ sbp->sb_sectlog < XFS_MIN_SECTORSIZE_LOG ||
+ sbp->sb_sectlog > XFS_MAX_SECTORSIZE_LOG ||
+ sbp->sb_sectsize != (1 << sbp->sb_sectlog) ||
+ sbp->sb_blocksize < XFS_MIN_BLOCKSIZE ||
+ sbp->sb_blocksize > XFS_MAX_BLOCKSIZE ||
+ sbp->sb_blocklog < XFS_MIN_BLOCKSIZE_LOG ||
+ sbp->sb_blocklog > XFS_MAX_BLOCKSIZE_LOG ||
+ sbp->sb_blocksize != (1 << sbp->sb_blocklog) ||
+ sbp->sb_inodesize < XFS_DINODE_MIN_SIZE ||
+ sbp->sb_inodesize > XFS_DINODE_MAX_SIZE ||
+ sbp->sb_inodelog < XFS_DINODE_MIN_LOG ||
+ sbp->sb_inodelog > XFS_DINODE_MAX_LOG ||
+ sbp->sb_inodesize != (1 << sbp->sb_inodelog) ||
+ (sbp->sb_blocklog - sbp->sb_inodelog != sbp->sb_inopblog) ||
+ (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE) ||
+ (sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE) ||
+ (sbp->sb_imax_pct > 100 /* zero sb_imax_pct is valid */) ||
+ sbp->sb_dblocks == 0 ||
+ sbp->sb_dblocks > XFS_MAX_DBLOCKS(sbp) ||
+ sbp->sb_dblocks < XFS_MIN_DBLOCKS(sbp))) {
+ XFS_CORRUPTION_ERROR("SB sanity check failed",
+ XFS_ERRLEVEL_LOW, mp, sbp);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ /*
+ * Until this is fixed only page-sized or smaller data blocks work.
+ */
+ if (unlikely(sbp->sb_blocksize > PAGE_SIZE)) {
+ xfs_warn(mp,
+ "File system with blocksize %d bytes. "
+ "Only pagesize (%ld) or less will currently work.",
+ sbp->sb_blocksize, PAGE_SIZE);
+ return XFS_ERROR(ENOSYS);
+ }
+
+ /*
+ * Currently only very few inode sizes are supported.
+ */
+ switch (sbp->sb_inodesize) {
+ case 256:
+ case 512:
+ case 1024:
+ case 2048:
+ break;
+ default:
+ xfs_warn(mp, "inode size of %d bytes not supported",
+ sbp->sb_inodesize);
+ return XFS_ERROR(ENOSYS);
+ }
+
+ if (xfs_sb_validate_fsb_count(sbp, sbp->sb_dblocks) ||
+ xfs_sb_validate_fsb_count(sbp, sbp->sb_rblocks)) {
+ xfs_warn(mp,
+ "file system too large to be mounted on this system.");
+ return XFS_ERROR(EFBIG);
+ }
+
+ if (check_inprogress && sbp->sb_inprogress) {
+ xfs_warn(mp, "Offline file system operation in progress!");
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ /*
+ * Version 1 directory format has never worked on Linux.
+ */
+ if (unlikely(!xfs_sb_version_hasdirv2(sbp))) {
+ xfs_warn(mp, "file system using version 1 directory format");
+ return XFS_ERROR(ENOSYS);
+ }
+
+ return 0;
+}
+
+void
+xfs_sb_from_disk(
+ struct xfs_sb *to,
+ xfs_dsb_t *from)
+{
+ to->sb_magicnum = be32_to_cpu(from->sb_magicnum);
+ to->sb_blocksize = be32_to_cpu(from->sb_blocksize);
+ to->sb_dblocks = be64_to_cpu(from->sb_dblocks);
+ to->sb_rblocks = be64_to_cpu(from->sb_rblocks);
+ to->sb_rextents = be64_to_cpu(from->sb_rextents);
+ memcpy(&to->sb_uuid, &from->sb_uuid, sizeof(to->sb_uuid));
+ to->sb_logstart = be64_to_cpu(from->sb_logstart);
+ to->sb_rootino = be64_to_cpu(from->sb_rootino);
+ to->sb_rbmino = be64_to_cpu(from->sb_rbmino);
+ to->sb_rsumino = be64_to_cpu(from->sb_rsumino);
+ to->sb_rextsize = be32_to_cpu(from->sb_rextsize);
+ to->sb_agblocks = be32_to_cpu(from->sb_agblocks);
+ to->sb_agcount = be32_to_cpu(from->sb_agcount);
+ to->sb_rbmblocks = be32_to_cpu(from->sb_rbmblocks);
+ to->sb_logblocks = be32_to_cpu(from->sb_logblocks);
+ to->sb_versionnum = be16_to_cpu(from->sb_versionnum);
+ to->sb_sectsize = be16_to_cpu(from->sb_sectsize);
+ to->sb_inodesize = be16_to_cpu(from->sb_inodesize);
+ to->sb_inopblock = be16_to_cpu(from->sb_inopblock);
+ memcpy(&to->sb_fname, &from->sb_fname, sizeof(to->sb_fname));
+ to->sb_blocklog = from->sb_blocklog;
+ to->sb_sectlog = from->sb_sectlog;
+ to->sb_inodelog = from->sb_inodelog;
+ to->sb_inopblog = from->sb_inopblog;
+ to->sb_agblklog = from->sb_agblklog;
+ to->sb_rextslog = from->sb_rextslog;
+ to->sb_inprogress = from->sb_inprogress;
+ to->sb_imax_pct = from->sb_imax_pct;
+ to->sb_icount = be64_to_cpu(from->sb_icount);
+ to->sb_ifree = be64_to_cpu(from->sb_ifree);
+ to->sb_fdblocks = be64_to_cpu(from->sb_fdblocks);
+ to->sb_frextents = be64_to_cpu(from->sb_frextents);
+ to->sb_uquotino = be64_to_cpu(from->sb_uquotino);
+ to->sb_gquotino = be64_to_cpu(from->sb_gquotino);
+ to->sb_qflags = be16_to_cpu(from->sb_qflags);
+ to->sb_flags = from->sb_flags;
+ to->sb_shared_vn = from->sb_shared_vn;
+ to->sb_inoalignmt = be32_to_cpu(from->sb_inoalignmt);
+ to->sb_unit = be32_to_cpu(from->sb_unit);
+ to->sb_width = be32_to_cpu(from->sb_width);
+ to->sb_dirblklog = from->sb_dirblklog;
+ to->sb_logsectlog = from->sb_logsectlog;
+ to->sb_logsectsize = be16_to_cpu(from->sb_logsectsize);
+ to->sb_logsunit = be32_to_cpu(from->sb_logsunit);
+ to->sb_features2 = be32_to_cpu(from->sb_features2);
+ to->sb_bad_features2 = be32_to_cpu(from->sb_bad_features2);
+ to->sb_features_compat = be32_to_cpu(from->sb_features_compat);
+ to->sb_features_ro_compat = be32_to_cpu(from->sb_features_ro_compat);
+ to->sb_features_incompat = be32_to_cpu(from->sb_features_incompat);
+ to->sb_features_log_incompat =
+ be32_to_cpu(from->sb_features_log_incompat);
+ to->sb_pad = 0;
+ to->sb_pquotino = be64_to_cpu(from->sb_pquotino);
+ to->sb_lsn = be64_to_cpu(from->sb_lsn);
+}
+
+/*
+ * Copy in core superblock to ondisk one.
+ *
+ * The fields argument is mask of superblock fields to copy.
+ */
+void
+xfs_sb_to_disk(
+ xfs_dsb_t *to,
+ xfs_sb_t *from,
+ __int64_t fields)
+{
+ xfs_caddr_t to_ptr = (xfs_caddr_t)to;
+ xfs_caddr_t from_ptr = (xfs_caddr_t)from;
+ xfs_sb_field_t f;
+ int first;
+ int size;
+
+ ASSERT(fields);
+ if (!fields)
+ return;
+
+ while (fields) {
+ f = (xfs_sb_field_t)xfs_lowbit64((__uint64_t)fields);
+ first = xfs_sb_info[f].offset;
+ size = xfs_sb_info[f + 1].offset - first;
+
+ ASSERT(xfs_sb_info[f].type == 0 || xfs_sb_info[f].type == 1);
+
+ if (size == 1 || xfs_sb_info[f].type == 1) {
+ memcpy(to_ptr + first, from_ptr + first, size);
+ } else {
+ switch (size) {
+ case 2:
+ *(__be16 *)(to_ptr + first) =
+ cpu_to_be16(*(__u16 *)(from_ptr + first));
+ break;
+ case 4:
+ *(__be32 *)(to_ptr + first) =
+ cpu_to_be32(*(__u32 *)(from_ptr + first));
+ break;
+ case 8:
+ *(__be64 *)(to_ptr + first) =
+ cpu_to_be64(*(__u64 *)(from_ptr + first));
+ break;
+ default:
+ ASSERT(0);
+ }
+ }
+
+ fields &= ~(1LL << f);
+ }
+}
+
+static int
+xfs_sb_verify(
+ struct xfs_buf *bp,
+ bool check_version)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+ struct xfs_sb sb;
+
+ xfs_sb_from_disk(&sb, XFS_BUF_TO_SBP(bp));
+
+ /*
+ * Only check the in progress field for the primary superblock as
+ * mkfs.xfs doesn't clear it from secondary superblocks.
+ */
+ return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR,
+ check_version);
+}
+
+/*
+ * If the superblock has the CRC feature bit set or the CRC field is non-null,
+ * check that the CRC is valid. We check the CRC field is non-null because a
+ * single bit error could clear the feature bit and unused parts of the
+ * superblock are supposed to be zero. Hence a non-null crc field indicates that
+ * we've potentially lost a feature bit and we should check it anyway.
+ */
+static void
+xfs_sb_read_verify(
+ struct xfs_buf *bp)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+ struct xfs_dsb *dsb = XFS_BUF_TO_SBP(bp);
+ int error;
+
+ /*
+ * open code the version check to avoid needing to convert the entire
+ * superblock from disk order just to check the version number
+ */
+ if (dsb->sb_magicnum == cpu_to_be32(XFS_SB_MAGIC) &&
+ (((be16_to_cpu(dsb->sb_versionnum) & XFS_SB_VERSION_NUMBITS) ==
+ XFS_SB_VERSION_5) ||
+ dsb->sb_crc != 0)) {
+
+ if (!xfs_verify_cksum(bp->b_addr, be16_to_cpu(dsb->sb_sectsize),
+ offsetof(struct xfs_sb, sb_crc))) {
+ error = EFSCORRUPTED;
+ goto out_error;
+ }
+ }
+ error = xfs_sb_verify(bp, true);
+
+out_error:
+ if (error) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
+ xfs_buf_ioerror(bp, error);
+ }
+}
+
+/*
+ * We may be probed for a filesystem match, so we may not want to emit
+ * messages when the superblock buffer is not actually an XFS superblock.
+ * If we find an XFS superblock, the run a normal, noisy mount because we are
+ * really going to mount it and want to know about errors.
+ */
+static void
+xfs_sb_quiet_read_verify(
+ struct xfs_buf *bp)
+{
+ struct xfs_dsb *dsb = XFS_BUF_TO_SBP(bp);
+
+
+ if (dsb->sb_magicnum == cpu_to_be32(XFS_SB_MAGIC)) {
+ /* XFS filesystem, verify noisily! */
+ xfs_sb_read_verify(bp);
+ return;
+ }
+ /* quietly fail */
+ xfs_buf_ioerror(bp, EWRONGFS);
+}
+
+static void
+xfs_sb_write_verify(
+ struct xfs_buf *bp)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+ struct xfs_buf_log_item *bip = bp->b_fspriv;
+ int error;
+
+ error = xfs_sb_verify(bp, false);
+ if (error) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
+ xfs_buf_ioerror(bp, error);
+ return;
+ }
+
+ if (!xfs_sb_version_hascrc(&mp->m_sb))
+ return;
+
+ if (bip)
+ XFS_BUF_TO_SBP(bp)->sb_lsn = cpu_to_be64(bip->bli_item.li_lsn);
+
+ xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
+ offsetof(struct xfs_sb, sb_crc));
+}
+
+const struct xfs_buf_ops xfs_sb_buf_ops = {
+ .verify_read = xfs_sb_read_verify,
+ .verify_write = xfs_sb_write_verify,
+};
+
+const struct xfs_buf_ops xfs_sb_quiet_buf_ops = {
+ .verify_read = xfs_sb_quiet_read_verify,
+ .verify_write = xfs_sb_write_verify,
+};
+
+/*
+ * xfs_mount_common
+ *
+ * Mount initialization code establishing various mount
+ * fields from the superblock associated with the given
+ * mount structure
+ */
+void
+xfs_sb_mount_common(
+ struct xfs_mount *mp,
+ struct xfs_sb *sbp)
+{
+ mp->m_agfrotor = mp->m_agirotor = 0;
+ spin_lock_init(&mp->m_agirotor_lock);
+ mp->m_maxagi = mp->m_sb.sb_agcount;
+ mp->m_blkbit_log = sbp->sb_blocklog + XFS_NBBYLOG;
+ mp->m_blkbb_log = sbp->sb_blocklog - BBSHIFT;
+ mp->m_sectbb_log = sbp->sb_sectlog - BBSHIFT;
+ mp->m_agno_log = xfs_highbit32(sbp->sb_agcount - 1) + 1;
+ mp->m_agino_log = sbp->sb_inopblog + sbp->sb_agblklog;
+ mp->m_blockmask = sbp->sb_blocksize - 1;
+ mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG;
+ mp->m_blockwmask = mp->m_blockwsize - 1;
+
+ mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1);
+ mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0);
+ mp->m_alloc_mnr[0] = mp->m_alloc_mxr[0] / 2;
+ mp->m_alloc_mnr[1] = mp->m_alloc_mxr[1] / 2;
+
+ mp->m_inobt_mxr[0] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, 1);
+ mp->m_inobt_mxr[1] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, 0);
+ mp->m_inobt_mnr[0] = mp->m_inobt_mxr[0] / 2;
+ mp->m_inobt_mnr[1] = mp->m_inobt_mxr[1] / 2;
+
+ mp->m_bmap_dmxr[0] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, 1);
+ mp->m_bmap_dmxr[1] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, 0);
+ mp->m_bmap_dmnr[0] = mp->m_bmap_dmxr[0] / 2;
+ mp->m_bmap_dmnr[1] = mp->m_bmap_dmxr[1] / 2;
+
+ mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
+ mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
+ sbp->sb_inopblock);
+ mp->m_ialloc_blks = mp->m_ialloc_inos >> sbp->sb_inopblog;
+}
+
+/*
+ * xfs_initialize_perag_data
+ *
+ * Read in each per-ag structure so we can count up the number of
+ * allocated inodes, free inodes and used filesystem blocks as this
+ * information is no longer persistent in the superblock. Once we have
+ * this information, write it into the in-core superblock structure.
+ */
+int
+xfs_initialize_perag_data(xfs_mount_t *mp, xfs_agnumber_t agcount)
+{
+ xfs_agnumber_t index;
+ xfs_perag_t *pag;
+ xfs_sb_t *sbp = &mp->m_sb;
+ uint64_t ifree = 0;
+ uint64_t ialloc = 0;
+ uint64_t bfree = 0;
+ uint64_t bfreelst = 0;
+ uint64_t btree = 0;
+ int error;
+
+ for (index = 0; index < agcount; index++) {
+ /*
+ * read the agf, then the agi. This gets us
+ * all the information we need and populates the
+ * per-ag structures for us.
+ */
+ error = xfs_alloc_pagf_init(mp, NULL, index, 0);
+ if (error)
+ return error;
+
+ error = xfs_ialloc_pagi_init(mp, NULL, index);
+ if (error)
+ return error;
+ pag = xfs_perag_get(mp, index);
+ ifree += pag->pagi_freecount;
+ ialloc += pag->pagi_count;
+ bfree += pag->pagf_freeblks;
+ bfreelst += pag->pagf_flcount;
+ btree += pag->pagf_btreeblks;
+ xfs_perag_put(pag);
+ }
+ /*
+ * Overwrite incore superblock counters with just-read data
+ */
+ spin_lock(&mp->m_sb_lock);
+ sbp->sb_ifree = ifree;
+ sbp->sb_icount = ialloc;
+ sbp->sb_fdblocks = bfree + bfreelst + btree;
+ spin_unlock(&mp->m_sb_lock);
+
+ /* Fixup the per-cpu counters as well. */
+ xfs_icsb_reinit_counters(mp);
+
+ return 0;
+}
+
+/*
+ * xfs_mod_sb() can be used to copy arbitrary changes to the
+ * in-core superblock into the superblock buffer to be logged.
+ * It does not provide the higher level of locking that is
+ * needed to protect the in-core superblock from concurrent
+ * access.
+ */
+void
+xfs_mod_sb(xfs_trans_t *tp, __int64_t fields)
+{
+ xfs_buf_t *bp;
+ int first;
+ int last;
+ xfs_mount_t *mp;
+ xfs_sb_field_t f;
+
+ ASSERT(fields);
+ if (!fields)
+ return;
+ mp = tp->t_mountp;
+ bp = xfs_trans_getsb(tp, mp, 0);
+ first = sizeof(xfs_sb_t);
+ last = 0;
+
+ /* translate/copy */
+
+ xfs_sb_to_disk(XFS_BUF_TO_SBP(bp), &mp->m_sb, fields);
+
+ /* find modified range */
+ f = (xfs_sb_field_t)xfs_highbit64((__uint64_t)fields);
+ ASSERT((1LL << f) & XFS_SB_MOD_BITS);
+ last = xfs_sb_info[f + 1].offset - 1;
+
+ f = (xfs_sb_field_t)xfs_lowbit64((__uint64_t)fields);
+ ASSERT((1LL << f) & XFS_SB_MOD_BITS);
+ first = xfs_sb_info[f].offset;
+
+ xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
+ xfs_trans_log_buf(tp, bp, first, last);
+}
diff --git a/fs/xfs/xfs_sb.h b/fs/xfs/xfs_sb.h
index 2de58a8..25d19d0 100644
--- a/fs/xfs/xfs_sb.h
+++ b/fs/xfs/xfs_sb.h
@@ -26,6 +26,7 @@
struct xfs_buf;
struct xfs_mount;
+struct xfs_trans;
#define XFS_SB_MAGIC 0x58465342 /* 'XFSB' */
#define XFS_SB_VERSION_1 1 /* 5.3, 6.0.1, 6.1 */
@@ -654,4 +655,22 @@ xfs_sb_has_incompat_log_feature(
#define XFS_B_TO_FSBT(mp,b) (((__uint64_t)(b)) >> (mp)->m_sb.sb_blocklog)
#define XFS_B_FSB_OFFSET(mp,b) ((b) & (mp)->m_blockmask)
+/*
+ * perag get/put wrappers for ref counting
+ */
+extern struct xfs_perag *xfs_perag_get(struct xfs_mount *, xfs_agnumber_t);
+extern struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *, xfs_agnumber_t,
+ int tag);
+extern void xfs_perag_put(struct xfs_perag *pag);
+extern int xfs_initialize_perag_data(struct xfs_mount *, xfs_agnumber_t);
+
+extern void xfs_sb_calc_crc(struct xfs_buf *);
+extern void xfs_mod_sb(struct xfs_trans *, __int64_t);
+extern void xfs_sb_mount_common(struct xfs_mount *, struct xfs_sb *);
+extern void xfs_sb_from_disk(struct xfs_sb *, struct xfs_dsb *);
+extern void xfs_sb_to_disk(struct xfs_dsb *, struct xfs_sb *, __int64_t);
+
+extern const struct xfs_buf_ops xfs_sb_buf_ops;
+extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops;
+
#endif /* __XFS_SB_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 22/60] xfs: move xfs_trans_reservations to xfs_trans.h
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (20 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 21/60] xfs: introduce xfs_sb.c for sharing with libxfs Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 23/60] xfs: sync minor header differences needed by userspace Dave Chinner
` (39 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
struct xfs_trans_reservations is the only non kernel-only definition
in xfs_mount.h. Move it to xfs_trans.h so that we no longer need
xfs_mount.h in the userspace code. Most places that include
xfs_mount.h already include xfs_trans.h and xfs_trans.h is shared
with userspace, so there are no major concerns about moving this
structure there.
While in xfs-mount.h, remove the remaining two #defines that are
defined for userspace - the kernel code has a separate definition of
the macros so this is purely userspace code. This allows us to
remove the __KERNEL__ switches from xfs_mount.h entirely.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_acl.c | 2 ++
fs/xfs/xfs_buf.c | 1 +
fs/xfs/xfs_discard.c | 1 +
fs/xfs/xfs_mount.h | 44 +-------------------------------------------
fs/xfs/xfs_quotaops.c | 1 +
fs/xfs/xfs_trans.h | 33 +++++++++++++++++++++++++++++++++
6 files changed, 39 insertions(+), 43 deletions(-)
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index fecdcd0..b9b9740 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -21,6 +21,8 @@
#include "xfs_bmap_btree.h"
#include "xfs_inode.h"
#include "xfs_sb.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
#include "xfs_mount.h"
#include "xfs_trace.h"
#include <linux/slab.h>
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 1b2472a..e0a6d0e 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -37,6 +37,7 @@
#include "xfs_sb.h"
#include "xfs_log.h"
#include "xfs_ag.h"
+#include "xfs_trans.h"
#include "xfs_mount.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index 69cf4fc..9fab41e 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -19,6 +19,7 @@
#include "xfs_sb.h"
#include "xfs_log.h"
#include "xfs_ag.h"
+#include "xfs_trans.h"
#include "xfs_mount.h"
#include "xfs_quota.h"
#include "xfs_trans.h"
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 10761c3..dedda31 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -18,46 +18,6 @@
#ifndef __XFS_MOUNT_H__
#define __XFS_MOUNT_H__
-typedef struct xfs_trans_reservations {
- uint tr_write; /* extent alloc trans */
- uint tr_itruncate; /* truncate trans */
- uint tr_rename; /* rename trans */
- uint tr_link; /* link trans */
- uint tr_remove; /* unlink trans */
- uint tr_symlink; /* symlink trans */
- uint tr_create; /* create trans */
- uint tr_mkdir; /* mkdir trans */
- uint tr_ifree; /* inode free trans */
- uint tr_ichange; /* inode update trans */
- uint tr_growdata; /* fs data section grow trans */
- uint tr_swrite; /* sync write inode trans */
- uint tr_addafork; /* cvt inode to attributed trans */
- uint tr_writeid; /* write setuid/setgid file */
- uint tr_attrinval; /* attr fork buffer invalidation */
- uint tr_attrsetm; /* set/create an attribute at mount time */
- uint tr_attrsetrt; /* set/create an attribute at runtime */
- uint tr_attrrm; /* remove an attribute */
- uint tr_clearagi; /* clear bad agi unlinked ino bucket */
- uint tr_growrtalloc; /* grow realtime allocations */
- uint tr_growrtzero; /* grow realtime zeroing */
- uint tr_growrtfree; /* grow realtime freeing */
- uint tr_qm_sbchange; /* change quota flags */
- uint tr_qm_setqlim; /* adjust quota limits */
- uint tr_qm_dqalloc; /* allocate quota on disk */
- uint tr_qm_quotaoff; /* turn quota off */
- uint tr_qm_equotaoff;/* end of turn quota off */
- uint tr_sb; /* modify superblock */
-} xfs_trans_reservations_t;
-
-#ifndef __KERNEL__
-
-#define xfs_daddr_to_agno(mp,d) \
- ((xfs_agnumber_t)(XFS_BB_TO_FSBT(mp, d) / (mp)->m_sb.sb_agblocks))
-#define xfs_daddr_to_agbno(mp,d) \
- ((xfs_agblock_t)(XFS_BB_TO_FSBT(mp, d) % (mp)->m_sb.sb_agblocks))
-
-#else /* __KERNEL__ */
-
struct xlog;
struct xfs_inode;
struct xfs_mru_cache;
@@ -174,7 +134,7 @@ typedef struct xfs_mount {
int m_ialloc_blks; /* blocks in inode allocation */
int m_inoalign_mask;/* mask sb_inoalignmt if used */
uint m_qflags; /* quota status flags */
- xfs_trans_reservations_t m_reservations;/* precomputed res values */
+ struct xfs_trans_reservations m_reservations;/* precomputed res values */
__uint64_t m_maxicount; /* maximum inode count */
__uint64_t m_resblks; /* total reserved blocks */
__uint64_t m_resblks_avail;/* available reserved blocks */
@@ -383,6 +343,4 @@ extern int xfs_dev_is_read_only(struct xfs_mount *, char *);
extern void xfs_set_low_space_thresholds(struct xfs_mount *);
-#endif /* __KERNEL__ */
-
#endif /* __XFS_MOUNT_H__ */
diff --git a/fs/xfs/xfs_quotaops.c b/fs/xfs/xfs_quotaops.c
index 71926d6..d7e2e43 100644
--- a/fs/xfs/xfs_quotaops.c
+++ b/fs/xfs/xfs_quotaops.c
@@ -19,6 +19,7 @@
#include "xfs_sb.h"
#include "xfs_log.h"
#include "xfs_ag.h"
+#include "xfs_trans.h"
#include "xfs_mount.h"
#include "xfs_quota.h"
#include "xfs_trans.h"
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 079d5a0..60427aa 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -208,6 +208,39 @@ struct xfs_log_item_desc {
#define XFS_TRANS_SB_REXTENTS 0x00001000
#define XFS_TRANS_SB_REXTSLOG 0x00002000
+/*
+ * structure for maintaining pre-calculated transaction reservations.
+ */
+struct xfs_trans_reservations {
+ uint tr_write; /* extent alloc trans */
+ uint tr_itruncate; /* truncate trans */
+ uint tr_rename; /* rename trans */
+ uint tr_link; /* link trans */
+ uint tr_remove; /* unlink trans */
+ uint tr_symlink; /* symlink trans */
+ uint tr_create; /* create trans */
+ uint tr_mkdir; /* mkdir trans */
+ uint tr_ifree; /* inode free trans */
+ uint tr_ichange; /* inode update trans */
+ uint tr_growdata; /* fs data section grow trans */
+ uint tr_swrite; /* sync write inode trans */
+ uint tr_addafork; /* cvt inode to attributed trans */
+ uint tr_writeid; /* write setuid/setgid file */
+ uint tr_attrinval; /* attr fork buffer invalidation */
+ uint tr_attrsetm; /* set/create an attribute at mount time */
+ uint tr_attrsetrt; /* set/create an attribute at runtime */
+ uint tr_attrrm; /* remove an attribute */
+ uint tr_clearagi; /* clear bad agi unlinked ino bucket */
+ uint tr_growrtalloc; /* grow realtime allocations */
+ uint tr_growrtzero; /* grow realtime zeroing */
+ uint tr_growrtfree; /* grow realtime freeing */
+ uint tr_qm_sbchange; /* change quota flags */
+ uint tr_qm_setqlim; /* adjust quota limits */
+ uint tr_qm_dqalloc; /* allocate quota on disk */
+ uint tr_qm_quotaoff; /* turn quota off */
+ uint tr_qm_equotaoff;/* end of turn quota off */
+ uint tr_sb; /* modify superblock */
+};
/*
* Per-extent log reservation for the allocation btree changes
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 23/60] xfs: sync minor header differences needed by userspace.
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (21 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 22/60] xfs: move xfs_trans_reservations to xfs_trans.h Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 24/60] xfs: move xfs_bmap_punch_delalloc() to xfs_aops.c Dave Chinner
` (38 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Little things like exported functions, __KERNEL__ protections, and
so on that ensure user and kernel shared headers are identical.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_attr.c | 1 +
fs/xfs/xfs_attr_inactive.c | 1 +
fs/xfs/xfs_attr_leaf.c | 1 +
fs/xfs/xfs_attr_leaf.h | 2 ++
fs/xfs/xfs_attr_remote.c | 1 +
fs/xfs/xfs_bmap.h | 2 ++
fs/xfs/xfs_btree.h | 2 --
fs/xfs/xfs_dir2_format.h | 3 +++
fs/xfs/xfs_fs.h | 17 +++++++++++++++++
fs/xfs/xfs_icreate_item.h | 4 ++++
fs/xfs/xfs_sb.h | 7 +++++++
fs/xfs/xfs_symlink.h | 3 ++-
fs/xfs/xfs_trans.h | 8 +++-----
fs/xfs/xfs_trans_priv.h | 3 +++
fs/xfs/xfs_types.h | 3 ++-
15 files changed, 49 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index 0c9bb2e..f61d2bb 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -21,6 +21,7 @@
#include "xfs_bit.h"
#include "xfs_log.h"
#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
#include "xfs_mount.h"
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index 0d344ed..a19439b 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -22,6 +22,7 @@
#include "xfs_bit.h"
#include "xfs_log.h"
#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
#include "xfs_mount.h"
diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 791ade2..a7fdd0d 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -22,6 +22,7 @@
#include "xfs_bit.h"
#include "xfs_log.h"
#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
#include "xfs_mount.h"
diff --git a/fs/xfs/xfs_attr_leaf.h b/fs/xfs/xfs_attr_leaf.h
index 444a770..c102213 100644
--- a/fs/xfs/xfs_attr_leaf.h
+++ b/fs/xfs/xfs_attr_leaf.h
@@ -333,6 +333,8 @@ int xfs_attr3_leaf_read(struct xfs_trans *tp, struct xfs_inode *dp,
struct xfs_buf **bpp);
void xfs_attr3_leaf_hdr_from_disk(struct xfs_attr3_icleaf_hdr *to,
struct xfs_attr_leafblock *from);
+void xfs_attr3_leaf_hdr_to_disk(struct xfs_attr_leafblock *to,
+ struct xfs_attr3_icleaf_hdr *from);
extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index ef6b0c1..39a59ea 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -22,6 +22,7 @@
#include "xfs_bit.h"
#include "xfs_log.h"
#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
#include "xfs_mount.h"
diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
index cbcaf00..31b7c0e 100644
--- a/fs/xfs/xfs_bmap.h
+++ b/fs/xfs/xfs_bmap.h
@@ -136,9 +136,11 @@ typedef struct xfs_bmalloca {
char conv; /* overwriting unwritten extents */
char stack_switch;
int flags;
+#ifdef __KERNEL__
struct completion *done;
struct work_struct work;
int result;
+#endif /* __KERNEL__ */
} xfs_bmalloca_t;
/*
diff --git a/fs/xfs/xfs_btree.h b/fs/xfs/xfs_btree.h
index 55e3c7c..c8473c7 100644
--- a/fs/xfs/xfs_btree.h
+++ b/fs/xfs/xfs_btree.h
@@ -88,13 +88,11 @@ struct xfs_btree_block {
#define XFS_BTREE_SBLOCK_CRC_LEN (XFS_BTREE_SBLOCK_LEN + 40)
#define XFS_BTREE_LBLOCK_CRC_LEN (XFS_BTREE_LBLOCK_LEN + 48)
-
#define XFS_BTREE_SBLOCK_CRC_OFF \
offsetof(struct xfs_btree_block, bb_u.s.bb_crc)
#define XFS_BTREE_LBLOCK_CRC_OFF \
offsetof(struct xfs_btree_block, bb_u.l.bb_crc)
-
/*
* Generic key, ptr and record wrapper structures.
*
diff --git a/fs/xfs/xfs_dir2_format.h b/fs/xfs/xfs_dir2_format.h
index 7826782..2095e17 100644
--- a/fs/xfs/xfs_dir2_format.h
+++ b/fs/xfs/xfs_dir2_format.h
@@ -519,6 +519,9 @@ struct xfs_dir3_leaf {
#define XFS_DIR3_LEAF_CRC_OFF offsetof(struct xfs_dir3_leaf_hdr, info.crc)
+extern void xfs_dir3_leaf_hdr_from_disk(struct xfs_dir3_icleaf_hdr *to,
+ struct xfs_dir2_leaf *from);
+
static inline int
xfs_dir3_leaf_hdr_size(struct xfs_dir2_leaf *lp)
{
diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
index d046955..68c2e18 100644
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -311,6 +311,17 @@ typedef struct xfs_bstat {
} xfs_bstat_t;
/*
+ * Project quota id helpers (previously projid was 16bit only
+ * and using two 16bit values to hold new 32bit projid was choosen
+ * to retain compatibility with "old" filesystems).
+ */
+static inline __uint32_t
+bstat_get_projid(struct xfs_bstat *bs)
+{
+ return (__uint32_t)bs->bs_projid_hi << 16 | bs->bs_projid_lo;
+}
+
+/*
* The user-level BulkStat Request interface structure.
*/
typedef struct xfs_fsop_bulkreq {
@@ -511,8 +522,14 @@ typedef struct xfs_handle {
#define XFS_IOC_ERROR_INJECTION _IOW ('X', 116, struct xfs_error_injection)
#define XFS_IOC_ERROR_CLEARALL _IOW ('X', 117, struct xfs_error_injection)
/* XFS_IOC_ATTRCTL_BY_HANDLE -- deprecated 118 */
+
/* XFS_IOC_FREEZE -- FIFREEZE 119 */
/* XFS_IOC_THAW -- FITHAW 120 */
+#ifndef FIFREEZE
+#define XFS_IOC_FREEZE _IOWR('X', 119, int)
+#define XFS_IOC_THAW _IOWR('X', 120, int)
+#endif
+
#define XFS_IOC_FSSETDM_BY_HANDLE _IOW ('X', 121, struct xfs_fsop_setdm_handlereq)
#define XFS_IOC_ATTRLIST_BY_HANDLE _IOW ('X', 122, struct xfs_fsop_attrlist_handlereq)
#define XFS_IOC_ATTRMULTI_BY_HANDLE _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
diff --git a/fs/xfs/xfs_icreate_item.h b/fs/xfs/xfs_icreate_item.h
index 88ba8aa..79df981 100644
--- a/fs/xfs/xfs_icreate_item.h
+++ b/fs/xfs/xfs_icreate_item.h
@@ -36,6 +36,8 @@ struct xfs_icreate_log {
__be32 icl_gen; /* inode generation number to use */
};
+#ifdef __KERNEL__
+
/* in memory log item structure */
struct xfs_icreate_item {
struct xfs_log_item ic_item;
@@ -49,4 +51,6 @@ void xfs_icreate_log(struct xfs_trans *tp, xfs_agnumber_t agno,
unsigned int inode_size, xfs_agblock_t length,
unsigned int generation);
+#endif /* __KERNEL__ */
+
#endif /* XFS_ICREATE_ITEM_H */
diff --git a/fs/xfs/xfs_sb.h b/fs/xfs/xfs_sb.h
index 25d19d0..8f78df1 100644
--- a/fs/xfs/xfs_sb.h
+++ b/fs/xfs/xfs_sb.h
@@ -555,6 +555,13 @@ static inline int xfs_sb_version_hasprojid32bit(xfs_sb_t *sbp)
(sbp->sb_features2 & XFS_SB_VERSION2_PROJID32BIT));
}
+static inline void xfs_sb_version_addprojid32bit(xfs_sb_t *sbp)
+{
+ sbp->sb_versionnum |= XFS_SB_VERSION_MOREBITSBIT;
+ sbp->sb_features2 |= XFS_SB_VERSION2_PROJID32BIT;
+ sbp->sb_bad_features2 |= XFS_SB_VERSION2_PROJID32BIT;
+}
+
static inline int xfs_sb_version_hascrc(xfs_sb_t *sbp)
{
return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5;
diff --git a/fs/xfs/xfs_symlink.h b/fs/xfs/xfs_symlink.h
index b39398d..4818edf 100644
--- a/fs/xfs/xfs_symlink.h
+++ b/fs/xfs/xfs_symlink.h
@@ -49,7 +49,8 @@ struct xfs_dsymlink_hdr {
sizeof(struct xfs_dsymlink_hdr) : 0))
int xfs_symlink_blocks(struct xfs_mount *mp, int pathlen);
-
+bool xfs_symlink_hdr_ok(struct xfs_mount *mp, xfs_ino_t ino, uint32_t offset,
+ uint32_t size, struct xfs_buf *bp);
void xfs_symlink_local_to_remote(struct xfs_trans *tp, struct xfs_buf *bp,
struct xfs_inode *ip, struct xfs_ifork *ifp);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 60427aa..2d6b5e8 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -57,7 +57,8 @@ typedef struct xfs_trans_header {
{ XFS_LI_INODE, "XFS_LI_INODE" }, \
{ XFS_LI_BUF, "XFS_LI_BUF" }, \
{ XFS_LI_DQUOT, "XFS_LI_DQUOT" }, \
- { XFS_LI_QUOTAOFF, "XFS_LI_QUOTAOFF" }
+ { XFS_LI_QUOTAOFF, "XFS_LI_QUOTAOFF" }, \
+ { XFS_LI_ICREATE, "XFS_LI_ICREATE" }
/*
* Transaction types. Used to distinguish types of buffers.
@@ -291,7 +292,7 @@ struct xfs_trans_reservations {
#define XFS_ADDAFORK_LOG_RES(mp) ((mp)->m_reservations.tr_addafork)
#define XFS_ATTRINVAL_LOG_RES(mp) ((mp)->m_reservations.tr_attrinval)
#define XFS_ATTRSETM_LOG_RES(mp) ((mp)->m_reservations.tr_attrsetm)
-#define XFS_ATTRSETRT_LOG_RES(mp) ((mp)->m_reservations.tr_attrsetrt)
+#define XFS_ATTRSETRT_LOG_RES(mp) ((mp)->m_reservations.tr_attrsetrt)
#define XFS_ATTRRM_LOG_RES(mp) ((mp)->m_reservations.tr_attrrm)
#define XFS_CLEAR_AGI_BUCKET_LOG_RES(mp) ((mp)->m_reservations.tr_clearagi)
#define XFS_QM_SBCHANGE_LOG_RES(mp) ((mp)->m_reservations.tr_qm_sbchange)
@@ -568,7 +569,4 @@ extern kmem_zone_t *xfs_log_item_desc_zone;
#endif /* __KERNEL__ */
-void xfs_trans_init(struct xfs_mount *);
-int xfs_trans_roll(struct xfs_trans **, struct xfs_inode *);
-
#endif /* __XFS_TRANS_H__ */
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 53b7c9b..d43b130 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -25,6 +25,9 @@ struct xfs_trans;
struct xfs_ail;
struct xfs_log_vec;
+
+void xfs_trans_init(struct xfs_mount *);
+int xfs_trans_roll(struct xfs_trans **, struct xfs_inode *);
void xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *);
void xfs_trans_del_item(struct xfs_log_item *);
void xfs_trans_free_items(struct xfs_trans *tp, xfs_lsn_t commit_lsn,
diff --git a/fs/xfs/xfs_types.h b/fs/xfs/xfs_types.h
index 61ba1cf..dd6bf71 100644
--- a/fs/xfs/xfs_types.h
+++ b/fs/xfs/xfs_types.h
@@ -32,7 +32,6 @@ typedef unsigned int __uint32_t;
typedef signed long long int __int64_t;
typedef unsigned long long int __uint64_t;
-typedef __uint32_t prid_t; /* project ID */
typedef __uint32_t inst_t; /* an instruction */
typedef __s64 xfs_off_t; /* <file offset> type */
@@ -55,6 +54,8 @@ typedef __uint64_t __psunsigned_t;
#endif /* __KERNEL__ */
+typedef __uint32_t prid_t; /* project ID */
+
typedef __uint32_t xfs_agblock_t; /* blockno in alloc. group */
typedef __uint32_t xfs_agino_t; /* inode # within allocation grp */
typedef __uint32_t xfs_extlen_t; /* extent length in blocks */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 24/60] xfs: move xfs_bmap_punch_delalloc() to xfs_aops.c
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (22 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 23/60] xfs: sync minor header differences needed by userspace Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 25/60] xfs: split out transaction reservation code Dave Chinner
` (37 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
It is only used in that file, and delayed allocation is only used
in the kernel. Hence it has to be removed from xfs_bmap.c in
userspace, so to minimise the differences between kernel and
userspace, move this function out of xfs_bmap.c.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_aops.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/xfs/xfs_bmap.c | 75 -------------------------------------------------
fs/xfs/xfs_bmap.h | 2 --
3 files changed, 78 insertions(+), 79 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2895178..82d2bfd 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -848,6 +848,82 @@ xfs_vm_invalidatepage(
block_invalidatepage(page, offset);
}
+
+/*
+ * dead simple method of punching delayed allocation blocks from a range in
+ * the inode. Walks a block at a time so will be slow, but is only executed in
+ * rare error cases so the overhead is not critical. This will alays punch out
+ * both the start and end blocks, even if the ranges only partially overlap
+ * them, so it is up to the caller to ensure that partial blocks are not
+ * passed in.
+ */
+STATIC int
+xfs_punch_delalloc_range(
+ struct xfs_inode *ip,
+ xfs_fileoff_t start_fsb,
+ xfs_fileoff_t length)
+{
+ xfs_fileoff_t remaining = length;
+ int error = 0;
+
+ ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+
+ do {
+ int done;
+ xfs_bmbt_irec_t imap;
+ int nimaps = 1;
+ xfs_fsblock_t firstblock;
+ xfs_bmap_free_t flist;
+
+ /*
+ * Map the range first and check that it is a delalloc extent
+ * before trying to unmap the range. Otherwise we will be
+ * trying to remove a real extent (which requires a
+ * transaction) or a hole, which is probably a bad idea...
+ */
+ error = xfs_bmapi_read(ip, start_fsb, 1, &imap, &nimaps,
+ XFS_BMAPI_ENTIRE);
+
+ if (error) {
+ /* something screwed, just bail */
+ if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
+ xfs_alert(ip->i_mount,
+ "Failed delalloc mapping lookup ino %lld fsb %lld.",
+ ip->i_ino, start_fsb);
+ }
+ break;
+ }
+ if (!nimaps) {
+ /* nothing there */
+ goto next_block;
+ }
+ if (imap.br_startblock != DELAYSTARTBLOCK) {
+ /* been converted, ignore */
+ goto next_block;
+ }
+ WARN_ON(imap.br_blockcount == 0);
+
+ /*
+ * Note: while we initialise the firstblock/flist pair, they
+ * should never be used because blocks should never be
+ * allocated or freed for a delalloc extent and hence we need
+ * don't cancel or finish them after the xfs_bunmapi() call.
+ */
+ xfs_bmap_init(&flist, &firstblock);
+ error = xfs_bunmapi(NULL, ip, start_fsb, 1, 0, 1, &firstblock,
+ &flist, &done);
+ if (error)
+ break;
+
+ ASSERT(!flist.xbf_count && !flist.xbf_first);
+next_block:
+ start_fsb++;
+ remaining--;
+ } while(remaining > 0);
+
+ return error;
+}
+
/*
* If the page has delalloc buffers on it, we need to punch them out before we
* invalidate the page. If we don't, we leave a stale delalloc mapping on the
@@ -893,7 +969,7 @@ xfs_aops_discard_page(
goto next_buffer;
start_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
- error = xfs_bmap_punch_delalloc_range(ip, start_fsb, 1);
+ error = xfs_punch_delalloc_range(ip, start_fsb, 1);
if (error) {
/* something screwed, just bail */
if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
@@ -1493,7 +1569,7 @@ xfs_vm_kill_delalloc_range(
return;
xfs_ilock(ip, XFS_ILOCK_EXCL);
- error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
+ error = xfs_punch_delalloc_range(ip, start_fsb,
end_fsb - start_fsb);
if (error) {
/* something screwed, just bail */
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 181e7b6..386758b 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -5789,78 +5789,3 @@ error0:
}
return error;
}
-
-/*
- * dead simple method of punching delalyed allocation blocks from a range in
- * the inode. Walks a block at a time so will be slow, but is only executed in
- * rare error cases so the overhead is not critical. This will alays punch out
- * both the start and end blocks, even if the ranges only partially overlap
- * them, so it is up to the caller to ensure that partial blocks are not
- * passed in.
- */
-int
-xfs_bmap_punch_delalloc_range(
- struct xfs_inode *ip,
- xfs_fileoff_t start_fsb,
- xfs_fileoff_t length)
-{
- xfs_fileoff_t remaining = length;
- int error = 0;
-
- ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-
- do {
- int done;
- xfs_bmbt_irec_t imap;
- int nimaps = 1;
- xfs_fsblock_t firstblock;
- xfs_bmap_free_t flist;
-
- /*
- * Map the range first and check that it is a delalloc extent
- * before trying to unmap the range. Otherwise we will be
- * trying to remove a real extent (which requires a
- * transaction) or a hole, which is probably a bad idea...
- */
- error = xfs_bmapi_read(ip, start_fsb, 1, &imap, &nimaps,
- XFS_BMAPI_ENTIRE);
-
- if (error) {
- /* something screwed, just bail */
- if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
- xfs_alert(ip->i_mount,
- "Failed delalloc mapping lookup ino %lld fsb %lld.",
- ip->i_ino, start_fsb);
- }
- break;
- }
- if (!nimaps) {
- /* nothing there */
- goto next_block;
- }
- if (imap.br_startblock != DELAYSTARTBLOCK) {
- /* been converted, ignore */
- goto next_block;
- }
- WARN_ON(imap.br_blockcount == 0);
-
- /*
- * Note: while we initialise the firstblock/flist pair, they
- * should never be used because blocks should never be
- * allocated or freed for a delalloc extent and hence we need
- * don't cancel or finish them after the xfs_bunmapi() call.
- */
- xfs_bmap_init(&flist, &firstblock);
- error = xfs_bunmapi(NULL, ip, start_fsb, 1, 0, 1, &firstblock,
- &flist, &done);
- if (error)
- break;
-
- ASSERT(!flist.xbf_count && !flist.xbf_first);
-next_block:
- start_fsb++;
- remaining--;
- } while(remaining > 0);
-
- return error;
-}
diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
index 31b7c0e..262f760 100644
--- a/fs/xfs/xfs_bmap.h
+++ b/fs/xfs/xfs_bmap.h
@@ -214,8 +214,6 @@ int xfs_bmap_eof(struct xfs_inode *ip, xfs_fileoff_t endoff,
int whichfork, int *eof);
int xfs_bmap_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip,
int whichfork, int *count);
-int xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
- xfs_fileoff_t start_fsb, xfs_fileoff_t length);
xfs_daddr_t xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb);
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 25/60] xfs: split out transaction reservation code
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (23 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 24/60] xfs: move xfs_bmap_punch_delalloc() to xfs_aops.c Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 26/60] xfs: minor cleanups Dave Chinner
` (36 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The transaction reservation size calculations is used by both kernel
and userspace, but most of the transaction code in xfs_trans.c is
kernel specific. Split all the transaction reservation code out into
it's own files to make shareing with userspace simpler.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 3 +-
fs/xfs/xfs_mount.h | 2 +-
fs/xfs/xfs_trans.c | 654 +------------------------------------------
fs/xfs/xfs_trans.h | 114 +-------
fs/xfs/xfs_trans_resv.c | 705 +++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_trans_resv.h | 135 +++++++++
6 files changed, 846 insertions(+), 767 deletions(-)
create mode 100644 fs/xfs/xfs_trans_resv.c
create mode 100644 fs/xfs/xfs_trans_resv.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index ba7d292..3281a20 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -53,6 +53,7 @@ xfs-y += xfs_aops.o \
xfs_mru_cache.o \
xfs_rename.o \
xfs_super.o \
+ xfs_trans.o \
xfs_utils.o \
xfs_xattr.o \
kmem.o \
@@ -81,7 +82,7 @@ xfs-y += xfs_alloc.o \
xfs_log_recover.o \
xfs_sb.o \
xfs_symlink.o \
- xfs_trans.o
+ xfs_trans_resv.o
# low-level transaction/log code
xfs-y += xfs_log.o \
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index dedda31..e3acebd 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -134,7 +134,7 @@ typedef struct xfs_mount {
int m_ialloc_blks; /* blocks in inode allocation */
int m_inoalign_mask;/* mask sb_inoalignmt if used */
uint m_qflags; /* quota status flags */
- struct xfs_trans_reservations m_reservations;/* precomputed res values */
+ struct xfs_trans_resv m_reservations;/* precomputed res values */
__uint64_t m_maxicount; /* maximum inode count */
__uint64_t m_resblks; /* total reserved blocks */
__uint64_t m_resblks_avail;/* available reserved blocks */
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 35a2299..ed24650 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -49,629 +49,6 @@ kmem_zone_t *xfs_trans_zone;
kmem_zone_t *xfs_log_item_desc_zone;
/*
- * A buffer has a format structure overhead in the log in addition
- * to the data, so we need to take this into account when reserving
- * space in a transaction for a buffer. Round the space required up
- * to a multiple of 128 bytes so that we don't change the historical
- * reservation that has been used for this overhead.
- */
-STATIC uint
-xfs_buf_log_overhead(void)
-{
- return round_up(sizeof(struct xlog_op_header) +
- sizeof(struct xfs_buf_log_format), 128);
-}
-
-/*
- * Calculate out transaction log reservation per item in bytes.
- *
- * The nbufs argument is used to indicate the number of items that
- * will be changed in a transaction. size is used to tell how many
- * bytes should be reserved per item.
- */
-STATIC uint
-xfs_calc_buf_res(
- uint nbufs,
- uint size)
-{
- return nbufs * (size + xfs_buf_log_overhead());
-}
-
-/*
- * Various log reservation values.
- *
- * These are based on the size of the file system block because that is what
- * most transactions manipulate. Each adds in an additional 128 bytes per
- * item logged to try to account for the overhead of the transaction mechanism.
- *
- * Note: Most of the reservations underestimate the number of allocation
- * groups into which they could free extents in the xfs_bmap_finish() call.
- * This is because the number in the worst case is quite high and quite
- * unusual. In order to fix this we need to change xfs_bmap_finish() to free
- * extents in only a single AG at a time. This will require changes to the
- * EFI code as well, however, so that the EFI for the extents not freed is
- * logged again in each transaction. See SGI PV #261917.
- *
- * Reservation functions here avoid a huge stack in xfs_trans_init due to
- * register overflow from temporaries in the calculations.
- */
-
-
-/*
- * In a write transaction we can allocate a maximum of 2
- * extents. This gives:
- * the inode getting the new extents: inode size
- * the inode's bmap btree: max depth * block size
- * the agfs of the ags from which the extents are allocated: 2 * sector
- * the superblock free block counter: sector size
- * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
- * And the bmap_finish transaction can free bmap blocks in a join:
- * the agfs of the ags containing the blocks: 2 * sector size
- * the agfls of the ags containing the blocks: 2 * sector size
- * the super block free block counter: sector size
- * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
- */
-STATIC uint
-xfs_calc_write_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- MAX((xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
- XFS_FSB_TO_B(mp, 1))),
- (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
- XFS_FSB_TO_B(mp, 1))));
-}
-
-/*
- * In truncating a file we free up to two extents at once. We can modify:
- * the inode being truncated: inode size
- * the inode's bmap btree: (max depth + 1) * block size
- * And the bmap_finish transaction can free the blocks and bmap blocks:
- * the agf for each of the ags: 4 * sector size
- * the agfl for each of the ags: 4 * sector size
- * the super block to reflect the freed blocks: sector size
- * worst case split in allocation btrees per extent assuming 4 extents:
- * 4 exts * 2 trees * (2 * max depth - 1) * block size
- * the inode btree: max depth * blocksize
- * the allocation btrees: 2 trees * (max depth - 1) * block size
- */
-STATIC uint
-xfs_calc_itruncate_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- MAX((xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1,
- XFS_FSB_TO_B(mp, 1))),
- (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(5, 0) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(2 + XFS_IALLOC_BLOCKS(mp) +
- mp->m_in_maxlevels, 0)));
-}
-
-/*
- * In renaming a files we can modify:
- * the four inodes involved: 4 * inode size
- * the two directory btrees: 2 * (max depth + v2) * dir block size
- * the two directory bmap btrees: 2 * max depth * block size
- * And the bmap_finish transaction can free dir and bmap blocks (two sets
- * of bmap blocks) giving:
- * the agf for the ags in which the blocks live: 3 * sector size
- * the agfl for the ags in which the blocks live: 3 * sector size
- * the superblock for the free block count: sector size
- * the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size
- */
-STATIC uint
-xfs_calc_rename_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- MAX((xfs_calc_buf_res(4, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
- XFS_FSB_TO_B(mp, 1))),
- (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 3),
- XFS_FSB_TO_B(mp, 1))));
-}
-
-/*
- * For creating a link to an inode:
- * the parent directory inode: inode size
- * the linked inode: inode size
- * the directory btree could split: (max depth + v2) * dir block size
- * the directory bmap btree could join or split: (max depth + v2) * blocksize
- * And the bmap_finish transaction can free some bmap blocks giving:
- * the agf for the ag in which the blocks live: sector size
- * the agfl for the ag in which the blocks live: sector size
- * the superblock for the free block count: sector size
- * the allocation btrees: 2 trees * (2 * max depth - 1) * block size
- */
-STATIC uint
-xfs_calc_link_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- MAX((xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
- XFS_FSB_TO_B(mp, 1))),
- (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1))));
-}
-
-/*
- * For removing a directory entry we can modify:
- * the parent directory inode: inode size
- * the removed inode: inode size
- * the directory btree could join: (max depth + v2) * dir block size
- * the directory bmap btree could join or split: (max depth + v2) * blocksize
- * And the bmap_finish transaction can free the dir and bmap blocks giving:
- * the agf for the ag in which the blocks live: 2 * sector size
- * the agfl for the ag in which the blocks live: 2 * sector size
- * the superblock for the free block count: sector size
- * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
- */
-STATIC uint
-xfs_calc_remove_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- MAX((xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
- XFS_FSB_TO_B(mp, 1))),
- (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
- XFS_FSB_TO_B(mp, 1))));
-}
-
-/*
- * For create, break it in to the two cases that the transaction
- * covers. We start with the modify case - allocation done by modification
- * of the state of existing inodes - and the allocation case.
- */
-
-/*
- * For create we can modify:
- * the parent directory inode: inode size
- * the new inode: inode size
- * the inode btree entry: block size
- * the superblock for the nlink flag: sector size
- * the directory btree: (max depth + v2) * dir block size
- * the directory inode's bmap btree: (max depth + v2) * block size
- */
-STATIC uint
-xfs_calc_create_resv_modify(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
- (uint)XFS_FSB_TO_B(mp, 1) +
- xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
-}
-
-/*
- * For create we can allocate some inodes giving:
- * the agi and agf of the ag getting the new inodes: 2 * sectorsize
- * the superblock for the nlink flag: sector size
- * the inode blocks allocated: XFS_IALLOC_BLOCKS * blocksize
- * the inode btree: max depth * blocksize
- * the allocation btrees: 2 trees * (max depth - 1) * block size
- */
-STATIC uint
-xfs_calc_create_resv_alloc(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
- mp->m_sb.sb_sectsize +
- xfs_calc_buf_res(XFS_IALLOC_BLOCKS(mp), XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1));
-}
-
-STATIC uint
-__xfs_calc_create_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- MAX(xfs_calc_create_resv_alloc(mp),
- xfs_calc_create_resv_modify(mp));
-}
-
-/*
- * For icreate we can allocate some inodes giving:
- * the agi and agf of the ag getting the new inodes: 2 * sectorsize
- * the superblock for the nlink flag: sector size
- * the inode btree: max depth * blocksize
- * the allocation btrees: 2 trees * (max depth - 1) * block size
- */
-STATIC uint
-xfs_calc_icreate_resv_alloc(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
- mp->m_sb.sb_sectsize +
- xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1));
-}
-
-STATIC uint
-xfs_calc_icreate_reservation(xfs_mount_t *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- MAX(xfs_calc_icreate_resv_alloc(mp),
- xfs_calc_create_resv_modify(mp));
-}
-
-STATIC uint
-xfs_calc_create_reservation(
- struct xfs_mount *mp)
-{
- if (xfs_sb_version_hascrc(&mp->m_sb))
- return xfs_calc_icreate_reservation(mp);
- return __xfs_calc_create_reservation(mp);
-
-}
-
-/*
- * Making a new directory is the same as creating a new file.
- */
-STATIC uint
-xfs_calc_mkdir_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_create_reservation(mp);
-}
-
-
-/*
- * Making a new symplink is the same as creating a new file, but
- * with the added blocks for remote symlink data which can be up to 1kB in
- * length (MAXPATHLEN).
- */
-STATIC uint
-xfs_calc_symlink_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_create_reservation(mp) +
- xfs_calc_buf_res(1, MAXPATHLEN);
-}
-
-/*
- * In freeing an inode we can modify:
- * the inode being freed: inode size
- * the super block free inode counter: sector size
- * the agi hash list and counters: sector size
- * the inode btree entry: block size
- * the on disk inode before ours in the agi hash list: inode cluster size
- * the inode btree: max depth * blocksize
- * the allocation btrees: 2 trees * (max depth - 1) * block size
- */
-STATIC uint
-xfs_calc_ifree_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, 1)) +
- MAX((__uint16_t)XFS_FSB_TO_B(mp, 1),
- XFS_INODE_CLUSTER_SIZE(mp)) +
- xfs_calc_buf_res(1, 0) +
- xfs_calc_buf_res(2 + XFS_IALLOC_BLOCKS(mp) +
- mp->m_in_maxlevels, 0) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1));
-}
-
-/*
- * When only changing the inode we log the inode and possibly the superblock
- * We also add a bit of slop for the transaction stuff.
- */
-STATIC uint
-xfs_calc_ichange_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- mp->m_sb.sb_inodesize +
- mp->m_sb.sb_sectsize +
- 512;
-
-}
-
-/*
- * Growing the data section of the filesystem.
- * superblock
- * agi and agf
- * allocation btrees
- */
-STATIC uint
-xfs_calc_growdata_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1));
-}
-
-/*
- * Growing the rt section of the filesystem.
- * In the first set of transactions (ALLOC) we allocate space to the
- * bitmap or summary files.
- * superblock: sector size
- * agf of the ag from which the extent is allocated: sector size
- * bmap btree for bitmap/summary inode: max depth * blocksize
- * bitmap/summary inode: inode size
- * allocation btrees for 1 block alloc: 2 * (2 * maxdepth - 1) * blocksize
- */
-STATIC uint
-xfs_calc_growrtalloc_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1));
-}
-
-/*
- * Growing the rt section of the filesystem.
- * In the second set of transactions (ZERO) we zero the new metadata blocks.
- * one bitmap/summary block: blocksize
- */
-STATIC uint
-xfs_calc_growrtzero_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(1, mp->m_sb.sb_blocksize);
-}
-
-/*
- * Growing the rt section of the filesystem.
- * In the third set of transactions (FREE) we update metadata without
- * allocating any new blocks.
- * superblock: sector size
- * bitmap inode: inode size
- * summary inode: inode size
- * one bitmap block: blocksize
- * summary blocks: new summary size
- */
-STATIC uint
-xfs_calc_growrtfree_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(1, mp->m_sb.sb_blocksize) +
- xfs_calc_buf_res(1, mp->m_rsumsize);
-}
-
-/*
- * Logging the inode modification timestamp on a synchronous write.
- * inode
- */
-STATIC uint
-xfs_calc_swrite_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(1, mp->m_sb.sb_inodesize);
-}
-
-/*
- * Logging the inode mode bits when writing a setuid/setgid file
- * inode
- */
-STATIC uint
-xfs_calc_writeid_reservation(xfs_mount_t *mp)
-{
- return xfs_calc_buf_res(1, mp->m_sb.sb_inodesize);
-}
-
-/*
- * Converting the inode from non-attributed to attributed.
- * the inode being converted: inode size
- * agf block and superblock (for block allocation)
- * the new block (directory sized)
- * bmap blocks for the new directory block
- * allocation btrees
- */
-STATIC uint
-xfs_calc_addafork_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(1, mp->m_dirblksize) +
- xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1,
- XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
- XFS_FSB_TO_B(mp, 1));
-}
-
-/*
- * Removing the attribute fork of a file
- * the inode being truncated: inode size
- * the inode's bmap btree: max depth * block size
- * And the bmap_finish transaction can free the blocks and bmap blocks:
- * the agf for each of the ags: 4 * sector size
- * the agfl for each of the ags: 4 * sector size
- * the super block to reflect the freed blocks: sector size
- * worst case split in allocation btrees per extent assuming 4 extents:
- * 4 exts * 2 trees * (2 * max depth - 1) * block size
- */
-STATIC uint
-xfs_calc_attrinval_reservation(
- struct xfs_mount *mp)
-{
- return MAX((xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
- XFS_FSB_TO_B(mp, 1))),
- (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
- XFS_FSB_TO_B(mp, 1))));
-}
-
-/*
- * Setting an attribute at mount time.
- * the inode getting the attribute
- * the superblock for allocations
- * the agfs extents are allocated from
- * the attribute btree * max depth
- * the inode allocation btree
- * Since attribute transaction space is dependent on the size of the attribute,
- * the calculation is done partially at mount time and partially at runtime(see
- * below).
- */
-STATIC uint
-xfs_calc_attrsetm_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH, XFS_FSB_TO_B(mp, 1));
-}
-
-/*
- * Setting an attribute at runtime, transaction space unit per block.
- * the superblock for allocations: sector size
- * the inode bmap btree could join or split: max depth * block size
- * Since the runtime attribute transaction space is dependent on the total
- * blocks needed for the 1st bmap, here we calculate out the space unit for
- * one block so that the caller could figure out the total space according
- * to the attibute extent length in blocks by: ext * XFS_ATTRSETRT_LOG_RES(mp).
- */
-STATIC uint
-xfs_calc_attrsetrt_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
- XFS_FSB_TO_B(mp, 1));
-}
-
-/*
- * Removing an attribute.
- * the inode: inode size
- * the attribute btree could join: max depth * block size
- * the inode bmap btree could join or split: max depth * block size
- * And the bmap_finish transaction can free the attr blocks freed giving:
- * the agf for the ag in which the blocks live: 2 * sector size
- * the agfl for the ag in which the blocks live: 2 * sector size
- * the superblock for the free block count: sector size
- * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
- */
-STATIC uint
-xfs_calc_attrrm_reservation(
- struct xfs_mount *mp)
-{
- return XFS_DQUOT_LOGRES(mp) +
- MAX((xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
- xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH,
- XFS_FSB_TO_B(mp, 1)) +
- (uint)XFS_FSB_TO_B(mp,
- XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
- xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
- (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
- XFS_FSB_TO_B(mp, 1))));
-}
-
-/*
- * Clearing a bad agino number in an agi hash bucket.
- */
-STATIC uint
-xfs_calc_clear_agi_bucket_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
-}
-
-/*
- * Clearing the quotaflags in the superblock.
- * the super block for changing quota flags: sector size
- */
-STATIC uint
-xfs_calc_qm_sbchange_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
-}
-
-/*
- * Adjusting quota limits.
- * the xfs_disk_dquot_t: sizeof(struct xfs_disk_dquot)
- */
-STATIC uint
-xfs_calc_qm_setqlim_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(1, sizeof(struct xfs_disk_dquot));
-}
-
-/*
- * Allocating quota on disk if needed.
- * the write transaction log space: XFS_WRITE_LOG_RES(mp)
- * the unit of quota allocation: one system block size
- */
-STATIC uint
-xfs_calc_qm_dqalloc_reservation(
- struct xfs_mount *mp)
-{
- return XFS_WRITE_LOG_RES(mp) +
- xfs_calc_buf_res(1,
- XFS_FSB_TO_B(mp, XFS_DQUOT_CLUSTER_SIZE_FSB) - 1);
-}
-
-/*
- * Turning off quotas.
- * the xfs_qoff_logitem_t: sizeof(struct xfs_qoff_logitem) * 2
- * the superblock for the quota flags: sector size
- */
-STATIC uint
-xfs_calc_qm_quotaoff_reservation(
- struct xfs_mount *mp)
-{
- return sizeof(struct xfs_qoff_logitem) * 2 +
- xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
-}
-
-/*
- * End of turning off quotas.
- * the xfs_qoff_logitem_t: sizeof(struct xfs_qoff_logitem) * 2
- */
-STATIC uint
-xfs_calc_qm_quotaoff_end_reservation(
- struct xfs_mount *mp)
-{
- return sizeof(struct xfs_qoff_logitem) * 2;
-}
-
-/*
- * Syncing the incore super block changes to disk.
- * the super block to reflect the changes: sector size
- */
-STATIC uint
-xfs_calc_sb_reservation(
- struct xfs_mount *mp)
-{
- return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
-}
-
-/*
* Initialize the precomputed transaction reservation values
* in the mount structure.
*/
@@ -679,36 +56,7 @@ void
xfs_trans_init(
struct xfs_mount *mp)
{
- struct xfs_trans_reservations *resp = &mp->m_reservations;
-
- resp->tr_write = xfs_calc_write_reservation(mp);
- resp->tr_itruncate = xfs_calc_itruncate_reservation(mp);
- resp->tr_rename = xfs_calc_rename_reservation(mp);
- resp->tr_link = xfs_calc_link_reservation(mp);
- resp->tr_remove = xfs_calc_remove_reservation(mp);
- resp->tr_symlink = xfs_calc_symlink_reservation(mp);
- resp->tr_create = xfs_calc_create_reservation(mp);
- resp->tr_mkdir = xfs_calc_mkdir_reservation(mp);
- resp->tr_ifree = xfs_calc_ifree_reservation(mp);
- resp->tr_ichange = xfs_calc_ichange_reservation(mp);
- resp->tr_growdata = xfs_calc_growdata_reservation(mp);
- resp->tr_swrite = xfs_calc_swrite_reservation(mp);
- resp->tr_writeid = xfs_calc_writeid_reservation(mp);
- resp->tr_addafork = xfs_calc_addafork_reservation(mp);
- resp->tr_attrinval = xfs_calc_attrinval_reservation(mp);
- resp->tr_attrsetm = xfs_calc_attrsetm_reservation(mp);
- resp->tr_attrsetrt = xfs_calc_attrsetrt_reservation(mp);
- resp->tr_attrrm = xfs_calc_attrrm_reservation(mp);
- resp->tr_clearagi = xfs_calc_clear_agi_bucket_reservation(mp);
- resp->tr_growrtalloc = xfs_calc_growrtalloc_reservation(mp);
- resp->tr_growrtzero = xfs_calc_growrtzero_reservation(mp);
- resp->tr_growrtfree = xfs_calc_growrtfree_reservation(mp);
- resp->tr_qm_sbchange = xfs_calc_qm_sbchange_reservation(mp);
- resp->tr_qm_setqlim = xfs_calc_qm_setqlim_reservation(mp);
- resp->tr_qm_dqalloc = xfs_calc_qm_dqalloc_reservation(mp);
- resp->tr_qm_quotaoff = xfs_calc_qm_quotaoff_reservation(mp);
- resp->tr_qm_equotaoff = xfs_calc_qm_quotaoff_end_reservation(mp);
- resp->tr_sb = xfs_calc_sb_reservation(mp);
+ xfs_trans_resv_calc(mp, &mp->m_reservations);
}
/*
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 2d6b5e8..21cafaa 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -20,6 +20,8 @@
struct xfs_log_item;
+#include "xfs_trans_resv.h"
+
/*
* This is the structure written in the log at the head of
* every transaction. It identifies the type and id of the
@@ -210,118 +212,6 @@ struct xfs_log_item_desc {
#define XFS_TRANS_SB_REXTSLOG 0x00002000
/*
- * structure for maintaining pre-calculated transaction reservations.
- */
-struct xfs_trans_reservations {
- uint tr_write; /* extent alloc trans */
- uint tr_itruncate; /* truncate trans */
- uint tr_rename; /* rename trans */
- uint tr_link; /* link trans */
- uint tr_remove; /* unlink trans */
- uint tr_symlink; /* symlink trans */
- uint tr_create; /* create trans */
- uint tr_mkdir; /* mkdir trans */
- uint tr_ifree; /* inode free trans */
- uint tr_ichange; /* inode update trans */
- uint tr_growdata; /* fs data section grow trans */
- uint tr_swrite; /* sync write inode trans */
- uint tr_addafork; /* cvt inode to attributed trans */
- uint tr_writeid; /* write setuid/setgid file */
- uint tr_attrinval; /* attr fork buffer invalidation */
- uint tr_attrsetm; /* set/create an attribute at mount time */
- uint tr_attrsetrt; /* set/create an attribute at runtime */
- uint tr_attrrm; /* remove an attribute */
- uint tr_clearagi; /* clear bad agi unlinked ino bucket */
- uint tr_growrtalloc; /* grow realtime allocations */
- uint tr_growrtzero; /* grow realtime zeroing */
- uint tr_growrtfree; /* grow realtime freeing */
- uint tr_qm_sbchange; /* change quota flags */
- uint tr_qm_setqlim; /* adjust quota limits */
- uint tr_qm_dqalloc; /* allocate quota on disk */
- uint tr_qm_quotaoff; /* turn quota off */
- uint tr_qm_equotaoff;/* end of turn quota off */
- uint tr_sb; /* modify superblock */
-};
-
-/*
- * Per-extent log reservation for the allocation btree changes
- * involved in freeing or allocating an extent.
- * 2 trees * (2 blocks/level * max depth - 1) * block size
- */
-#define XFS_ALLOCFREE_LOG_RES(mp,nx) \
- ((nx) * (2 * XFS_FSB_TO_B((mp), 2 * XFS_AG_MAXLEVELS(mp) - 1)))
-#define XFS_ALLOCFREE_LOG_COUNT(mp,nx) \
- ((nx) * (2 * (2 * XFS_AG_MAXLEVELS(mp) - 1)))
-
-/*
- * Per-directory log reservation for any directory change.
- * dir blocks: (1 btree block per level + data block + free block) * dblock size
- * bmap btree: (levels + 2) * max depth * block size
- * v2 directory blocks can be fragmented below the dirblksize down to the fsb
- * size, so account for that in the DAENTER macros.
- */
-#define XFS_DIROP_LOG_RES(mp) \
- (XFS_FSB_TO_B(mp, XFS_DAENTER_BLOCKS(mp, XFS_DATA_FORK)) + \
- (XFS_FSB_TO_B(mp, XFS_DAENTER_BMAPS(mp, XFS_DATA_FORK) + 1)))
-#define XFS_DIROP_LOG_COUNT(mp) \
- (XFS_DAENTER_BLOCKS(mp, XFS_DATA_FORK) + \
- XFS_DAENTER_BMAPS(mp, XFS_DATA_FORK) + 1)
-
-
-#define XFS_WRITE_LOG_RES(mp) ((mp)->m_reservations.tr_write)
-#define XFS_ITRUNCATE_LOG_RES(mp) ((mp)->m_reservations.tr_itruncate)
-#define XFS_RENAME_LOG_RES(mp) ((mp)->m_reservations.tr_rename)
-#define XFS_LINK_LOG_RES(mp) ((mp)->m_reservations.tr_link)
-#define XFS_REMOVE_LOG_RES(mp) ((mp)->m_reservations.tr_remove)
-#define XFS_SYMLINK_LOG_RES(mp) ((mp)->m_reservations.tr_symlink)
-#define XFS_CREATE_LOG_RES(mp) ((mp)->m_reservations.tr_create)
-#define XFS_MKDIR_LOG_RES(mp) ((mp)->m_reservations.tr_mkdir)
-#define XFS_IFREE_LOG_RES(mp) ((mp)->m_reservations.tr_ifree)
-#define XFS_ICHANGE_LOG_RES(mp) ((mp)->m_reservations.tr_ichange)
-#define XFS_GROWDATA_LOG_RES(mp) ((mp)->m_reservations.tr_growdata)
-#define XFS_GROWRTALLOC_LOG_RES(mp) ((mp)->m_reservations.tr_growrtalloc)
-#define XFS_GROWRTZERO_LOG_RES(mp) ((mp)->m_reservations.tr_growrtzero)
-#define XFS_GROWRTFREE_LOG_RES(mp) ((mp)->m_reservations.tr_growrtfree)
-#define XFS_SWRITE_LOG_RES(mp) ((mp)->m_reservations.tr_swrite)
-/*
- * Logging the inode timestamps on an fsync -- same as SWRITE
- * as long as SWRITE logs the entire inode core
- */
-#define XFS_FSYNC_TS_LOG_RES(mp) ((mp)->m_reservations.tr_swrite)
-#define XFS_WRITEID_LOG_RES(mp) ((mp)->m_reservations.tr_swrite)
-#define XFS_ADDAFORK_LOG_RES(mp) ((mp)->m_reservations.tr_addafork)
-#define XFS_ATTRINVAL_LOG_RES(mp) ((mp)->m_reservations.tr_attrinval)
-#define XFS_ATTRSETM_LOG_RES(mp) ((mp)->m_reservations.tr_attrsetm)
-#define XFS_ATTRSETRT_LOG_RES(mp) ((mp)->m_reservations.tr_attrsetrt)
-#define XFS_ATTRRM_LOG_RES(mp) ((mp)->m_reservations.tr_attrrm)
-#define XFS_CLEAR_AGI_BUCKET_LOG_RES(mp) ((mp)->m_reservations.tr_clearagi)
-#define XFS_QM_SBCHANGE_LOG_RES(mp) ((mp)->m_reservations.tr_qm_sbchange)
-#define XFS_QM_SETQLIM_LOG_RES(mp) ((mp)->m_reservations.tr_qm_setqlim)
-#define XFS_QM_DQALLOC_LOG_RES(mp) ((mp)->m_reservations.tr_qm_dqalloc)
-#define XFS_QM_QUOTAOFF_LOG_RES(mp) ((mp)->m_reservations.tr_qm_quotaoff)
-#define XFS_QM_QUOTAOFF_END_LOG_RES(mp) ((mp)->m_reservations.tr_qm_equotaoff)
-#define XFS_SB_LOG_RES(mp) ((mp)->m_reservations.tr_sb)
-
-/*
- * Various log count values.
- */
-#define XFS_DEFAULT_LOG_COUNT 1
-#define XFS_DEFAULT_PERM_LOG_COUNT 2
-#define XFS_ITRUNCATE_LOG_COUNT 2
-#define XFS_INACTIVE_LOG_COUNT 2
-#define XFS_CREATE_LOG_COUNT 2
-#define XFS_MKDIR_LOG_COUNT 3
-#define XFS_SYMLINK_LOG_COUNT 3
-#define XFS_REMOVE_LOG_COUNT 2
-#define XFS_LINK_LOG_COUNT 2
-#define XFS_RENAME_LOG_COUNT 2
-#define XFS_WRITE_LOG_COUNT 2
-#define XFS_ADDAFORK_LOG_COUNT 2
-#define XFS_ATTRINVAL_LOG_COUNT 1
-#define XFS_ATTRSET_LOG_COUNT 3
-#define XFS_ATTRRM_LOG_COUNT 3
-
-/*
* Here we centralize the specification of XFS meta-data buffer
* reference count values. This determine how hard the buffer
* cache tries to hold onto the buffer.
diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
new file mode 100644
index 0000000..91c6f42
--- /dev/null
+++ b/fs/xfs/xfs_trans_resv.c
@@ -0,0 +1,705 @@
+/*
+ * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
+ * Copyright (C) 2010 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_log.h"
+#include "xfs_trans_resv.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_error.h"
+#include "xfs_da_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_alloc.h"
+#include "xfs_extent_busy.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
+#include "xfs_qm.h"
+#include "xfs_trans_priv.h"
+#include "xfs_trans_space.h"
+#include "xfs_inode_item.h"
+#include "xfs_log_priv.h"
+#include "xfs_buf_item.h"
+#include "xfs_trace.h"
+
+/*
+ * A buffer has a format structure overhead in the log in addition
+ * to the data, so we need to take this into account when reserving
+ * space in a transaction for a buffer. Round the space required up
+ * to a multiple of 128 bytes so that we don't change the historical
+ * reservation that has been used for this overhead.
+ */
+STATIC uint
+xfs_buf_log_overhead(void)
+{
+ return round_up(sizeof(struct xlog_op_header) +
+ sizeof(struct xfs_buf_log_format), 128);
+}
+
+/*
+ * Calculate out transaction log reservation per item in bytes.
+ *
+ * The nbufs argument is used to indicate the number of items that
+ * will be changed in a transaction. size is used to tell how many
+ * bytes should be reserved per item.
+ */
+STATIC uint
+xfs_calc_buf_res(
+ uint nbufs,
+ uint size)
+{
+ return nbufs * (size + xfs_buf_log_overhead());
+}
+
+/*
+ * Various log reservation values.
+ *
+ * These are based on the size of the file system block because that is what
+ * most transactions manipulate. Each adds in an additional 128 bytes per
+ * item logged to try to account for the overhead of the transaction mechanism.
+ *
+ * Note: Most of the reservations underestimate the number of allocation
+ * groups into which they could free extents in the xfs_bmap_finish() call.
+ * This is because the number in the worst case is quite high and quite
+ * unusual. In order to fix this we need to change xfs_bmap_finish() to free
+ * extents in only a single AG at a time. This will require changes to the
+ * EFI code as well, however, so that the EFI for the extents not freed is
+ * logged again in each transaction. See SGI PV #261917.
+ *
+ * Reservation functions here avoid a huge stack in xfs_trans_init due to
+ * register overflow from temporaries in the calculations.
+ */
+
+
+/*
+ * In a write transaction we can allocate a maximum of 2
+ * extents. This gives:
+ * the inode getting the new extents: inode size
+ * the inode's bmap btree: max depth * block size
+ * the agfs of the ags from which the extents are allocated: 2 * sector
+ * the superblock free block counter: sector size
+ * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ * And the bmap_finish transaction can free bmap blocks in a join:
+ * the agfs of the ags containing the blocks: 2 * sector size
+ * the agfls of the ags containing the blocks: 2 * sector size
+ * the super block free block counter: sector size
+ * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ */
+STATIC uint
+xfs_calc_write_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ MAX((xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
+ XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+ XFS_FSB_TO_B(mp, 1))),
+ (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+ XFS_FSB_TO_B(mp, 1))));
+}
+
+/*
+ * In truncating a file we free up to two extents at once. We can modify:
+ * the inode being truncated: inode size
+ * the inode's bmap btree: (max depth + 1) * block size
+ * And the bmap_finish transaction can free the blocks and bmap blocks:
+ * the agf for each of the ags: 4 * sector size
+ * the agfl for each of the ags: 4 * sector size
+ * the super block to reflect the freed blocks: sector size
+ * worst case split in allocation btrees per extent assuming 4 extents:
+ * 4 exts * 2 trees * (2 * max depth - 1) * block size
+ * the inode btree: max depth * blocksize
+ * the allocation btrees: 2 trees * (max depth - 1) * block size
+ */
+STATIC uint
+xfs_calc_itruncate_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ MAX((xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1,
+ XFS_FSB_TO_B(mp, 1))),
+ (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
+ XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(5, 0) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(2 + XFS_IALLOC_BLOCKS(mp) +
+ mp->m_in_maxlevels, 0)));
+}
+
+/*
+ * In renaming a files we can modify:
+ * the four inodes involved: 4 * inode size
+ * the two directory btrees: 2 * (max depth + v2) * dir block size
+ * the two directory bmap btrees: 2 * max depth * block size
+ * And the bmap_finish transaction can free dir and bmap blocks (two sets
+ * of bmap blocks) giving:
+ * the agf for the ags in which the blocks live: 3 * sector size
+ * the agfl for the ags in which the blocks live: 3 * sector size
+ * the superblock for the free block count: sector size
+ * the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size
+ */
+STATIC uint
+xfs_calc_rename_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ MAX((xfs_calc_buf_res(4, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
+ XFS_FSB_TO_B(mp, 1))),
+ (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 3),
+ XFS_FSB_TO_B(mp, 1))));
+}
+
+/*
+ * For creating a link to an inode:
+ * the parent directory inode: inode size
+ * the linked inode: inode size
+ * the directory btree could split: (max depth + v2) * dir block size
+ * the directory bmap btree could join or split: (max depth + v2) * blocksize
+ * And the bmap_finish transaction can free some bmap blocks giving:
+ * the agf for the ag in which the blocks live: sector size
+ * the agfl for the ag in which the blocks live: sector size
+ * the superblock for the free block count: sector size
+ * the allocation btrees: 2 trees * (2 * max depth - 1) * block size
+ */
+STATIC uint
+xfs_calc_link_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ MAX((xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
+ XFS_FSB_TO_B(mp, 1))),
+ (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1))));
+}
+
+/*
+ * For removing a directory entry we can modify:
+ * the parent directory inode: inode size
+ * the removed inode: inode size
+ * the directory btree could join: (max depth + v2) * dir block size
+ * the directory bmap btree could join or split: (max depth + v2) * blocksize
+ * And the bmap_finish transaction can free the dir and bmap blocks giving:
+ * the agf for the ag in which the blocks live: 2 * sector size
+ * the agfl for the ag in which the blocks live: 2 * sector size
+ * the superblock for the free block count: sector size
+ * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ */
+STATIC uint
+xfs_calc_remove_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ MAX((xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
+ XFS_FSB_TO_B(mp, 1))),
+ (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+ XFS_FSB_TO_B(mp, 1))));
+}
+
+/*
+ * For create, break it in to the two cases that the transaction
+ * covers. We start with the modify case - allocation done by modification
+ * of the state of existing inodes - and the allocation case.
+ */
+
+/*
+ * For create we can modify:
+ * the parent directory inode: inode size
+ * the new inode: inode size
+ * the inode btree entry: block size
+ * the superblock for the nlink flag: sector size
+ * the directory btree: (max depth + v2) * dir block size
+ * the directory inode's bmap btree: (max depth + v2) * block size
+ */
+STATIC uint
+xfs_calc_create_resv_modify(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+ (uint)XFS_FSB_TO_B(mp, 1) +
+ xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
+}
+
+/*
+ * For create we can allocate some inodes giving:
+ * the agi and agf of the ag getting the new inodes: 2 * sectorsize
+ * the superblock for the nlink flag: sector size
+ * the inode blocks allocated: XFS_IALLOC_BLOCKS * blocksize
+ * the inode btree: max depth * blocksize
+ * the allocation btrees: 2 trees * (max depth - 1) * block size
+ */
+STATIC uint
+xfs_calc_create_resv_alloc(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+ mp->m_sb.sb_sectsize +
+ xfs_calc_buf_res(XFS_IALLOC_BLOCKS(mp), XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+STATIC uint
+__xfs_calc_create_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ MAX(xfs_calc_create_resv_alloc(mp),
+ xfs_calc_create_resv_modify(mp));
+}
+
+/*
+ * For icreate we can allocate some inodes giving:
+ * the agi and agf of the ag getting the new inodes: 2 * sectorsize
+ * the superblock for the nlink flag: sector size
+ * the inode btree: max depth * blocksize
+ * the allocation btrees: 2 trees * (max depth - 1) * block size
+ */
+STATIC uint
+xfs_calc_icreate_resv_alloc(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+ mp->m_sb.sb_sectsize +
+ xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+STATIC uint
+xfs_calc_icreate_reservation(xfs_mount_t *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ MAX(xfs_calc_icreate_resv_alloc(mp),
+ xfs_calc_create_resv_modify(mp));
+}
+
+STATIC uint
+xfs_calc_create_reservation(
+ struct xfs_mount *mp)
+{
+ if (xfs_sb_version_hascrc(&mp->m_sb))
+ return xfs_calc_icreate_reservation(mp);
+ return __xfs_calc_create_reservation(mp);
+
+}
+
+/*
+ * Making a new directory is the same as creating a new file.
+ */
+STATIC uint
+xfs_calc_mkdir_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_create_reservation(mp);
+}
+
+
+/*
+ * Making a new symplink is the same as creating a new file, but
+ * with the added blocks for remote symlink data which can be up to 1kB in
+ * length (MAXPATHLEN).
+ */
+STATIC uint
+xfs_calc_symlink_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_create_reservation(mp) +
+ xfs_calc_buf_res(1, MAXPATHLEN);
+}
+
+/*
+ * In freeing an inode we can modify:
+ * the inode being freed: inode size
+ * the super block free inode counter: sector size
+ * the agi hash list and counters: sector size
+ * the inode btree entry: block size
+ * the on disk inode before ours in the agi hash list: inode cluster size
+ * the inode btree: max depth * blocksize
+ * the allocation btrees: 2 trees * (max depth - 1) * block size
+ */
+STATIC uint
+xfs_calc_ifree_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, 1)) +
+ MAX((__uint16_t)XFS_FSB_TO_B(mp, 1),
+ XFS_INODE_CLUSTER_SIZE(mp)) +
+ xfs_calc_buf_res(1, 0) +
+ xfs_calc_buf_res(2 + XFS_IALLOC_BLOCKS(mp) +
+ mp->m_in_maxlevels, 0) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+/*
+ * When only changing the inode we log the inode and possibly the superblock
+ * We also add a bit of slop for the transaction stuff.
+ */
+STATIC uint
+xfs_calc_ichange_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ mp->m_sb.sb_inodesize +
+ mp->m_sb.sb_sectsize +
+ 512;
+
+}
+
+/*
+ * Growing the data section of the filesystem.
+ * superblock
+ * agi and agf
+ * allocation btrees
+ */
+STATIC uint
+xfs_calc_growdata_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+/*
+ * Growing the rt section of the filesystem.
+ * In the first set of transactions (ALLOC) we allocate space to the
+ * bitmap or summary files.
+ * superblock: sector size
+ * agf of the ag from which the extent is allocated: sector size
+ * bmap btree for bitmap/summary inode: max depth * blocksize
+ * bitmap/summary inode: inode size
+ * allocation btrees for 1 block alloc: 2 * (2 * maxdepth - 1) * blocksize
+ */
+STATIC uint
+xfs_calc_growrtalloc_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
+ XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+/*
+ * Growing the rt section of the filesystem.
+ * In the second set of transactions (ZERO) we zero the new metadata blocks.
+ * one bitmap/summary block: blocksize
+ */
+STATIC uint
+xfs_calc_growrtzero_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(1, mp->m_sb.sb_blocksize);
+}
+
+/*
+ * Growing the rt section of the filesystem.
+ * In the third set of transactions (FREE) we update metadata without
+ * allocating any new blocks.
+ * superblock: sector size
+ * bitmap inode: inode size
+ * summary inode: inode size
+ * one bitmap block: blocksize
+ * summary blocks: new summary size
+ */
+STATIC uint
+xfs_calc_growrtfree_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(2, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(1, mp->m_sb.sb_blocksize) +
+ xfs_calc_buf_res(1, mp->m_rsumsize);
+}
+
+/*
+ * Logging the inode modification timestamp on a synchronous write.
+ * inode
+ */
+STATIC uint
+xfs_calc_swrite_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(1, mp->m_sb.sb_inodesize);
+}
+
+/*
+ * Logging the inode mode bits when writing a setuid/setgid file
+ * inode
+ */
+STATIC uint
+xfs_calc_writeid_reservation(xfs_mount_t *mp)
+{
+ return xfs_calc_buf_res(1, mp->m_sb.sb_inodesize);
+}
+
+/*
+ * Converting the inode from non-attributed to attributed.
+ * the inode being converted: inode size
+ * agf block and superblock (for block allocation)
+ * the new block (directory sized)
+ * bmap blocks for the new directory block
+ * allocation btrees
+ */
+STATIC uint
+xfs_calc_addafork_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(1, mp->m_dirblksize) +
+ xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1,
+ XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+/*
+ * Removing the attribute fork of a file
+ * the inode being truncated: inode size
+ * the inode's bmap btree: max depth * block size
+ * And the bmap_finish transaction can free the blocks and bmap blocks:
+ * the agf for each of the ags: 4 * sector size
+ * the agfl for each of the ags: 4 * sector size
+ * the super block to reflect the freed blocks: sector size
+ * worst case split in allocation btrees per extent assuming 4 extents:
+ * 4 exts * 2 trees * (2 * max depth - 1) * block size
+ */
+STATIC uint
+xfs_calc_attrinval_reservation(
+ struct xfs_mount *mp)
+{
+ return MAX((xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
+ XFS_FSB_TO_B(mp, 1))),
+ (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
+ XFS_FSB_TO_B(mp, 1))));
+}
+
+/*
+ * Setting an attribute at mount time.
+ * the inode getting the attribute
+ * the superblock for allocations
+ * the agfs extents are allocated from
+ * the attribute btree * max depth
+ * the inode allocation btree
+ * Since attribute transaction space is dependent on the size of the attribute,
+ * the calculation is done partially at mount time and partially at runtime(see
+ * below).
+ */
+STATIC uint
+xfs_calc_attrsetm_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH, XFS_FSB_TO_B(mp, 1));
+}
+
+/*
+ * Setting an attribute at runtime, transaction space unit per block.
+ * the superblock for allocations: sector size
+ * the inode bmap btree could join or split: max depth * block size
+ * Since the runtime attribute transaction space is dependent on the total
+ * blocks needed for the 1st bmap, here we calculate out the space unit for
+ * one block so that the caller could figure out the total space according
+ * to the attibute extent length in blocks by: ext * XFS_ATTRSETRT_LOG_RES(mp).
+ */
+STATIC uint
+xfs_calc_attrsetrt_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
+ XFS_FSB_TO_B(mp, 1));
+}
+
+/*
+ * Removing an attribute.
+ * the inode: inode size
+ * the attribute btree could join: max depth * block size
+ * the inode bmap btree could join or split: max depth * block size
+ * And the bmap_finish transaction can free the attr blocks freed giving:
+ * the agf for the ag in which the blocks live: 2 * sector size
+ * the agfl for the ag in which the blocks live: 2 * sector size
+ * the superblock for the free block count: sector size
+ * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ */
+STATIC uint
+xfs_calc_attrrm_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_DQUOT_LOGRES(mp) +
+ MAX((xfs_calc_buf_res(1, mp->m_sb.sb_inodesize) +
+ xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH,
+ XFS_FSB_TO_B(mp, 1)) +
+ (uint)XFS_FSB_TO_B(mp,
+ XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
+ xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
+ (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
+ xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+ XFS_FSB_TO_B(mp, 1))));
+}
+
+/*
+ * Clearing a bad agino number in an agi hash bucket.
+ */
+STATIC uint
+xfs_calc_clear_agi_bucket_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
+}
+
+/*
+ * Clearing the quotaflags in the superblock.
+ * the super block for changing quota flags: sector size
+ */
+STATIC uint
+xfs_calc_qm_sbchange_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
+}
+
+/*
+ * Adjusting quota limits.
+ * the xfs_disk_dquot_t: sizeof(struct xfs_disk_dquot)
+ */
+STATIC uint
+xfs_calc_qm_setqlim_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(1, sizeof(struct xfs_disk_dquot));
+}
+
+/*
+ * Allocating quota on disk if needed.
+ * the write transaction log space: XFS_WRITE_LOG_RES(mp)
+ * the unit of quota allocation: one system block size
+ */
+STATIC uint
+xfs_calc_qm_dqalloc_reservation(
+ struct xfs_mount *mp)
+{
+ return XFS_WRITE_LOG_RES(mp) +
+ xfs_calc_buf_res(1,
+ XFS_FSB_TO_B(mp, XFS_DQUOT_CLUSTER_SIZE_FSB) - 1);
+}
+
+/*
+ * Turning off quotas.
+ * the xfs_qoff_logitem_t: sizeof(struct xfs_qoff_logitem) * 2
+ * the superblock for the quota flags: sector size
+ */
+STATIC uint
+xfs_calc_qm_quotaoff_reservation(
+ struct xfs_mount *mp)
+{
+ return sizeof(struct xfs_qoff_logitem) * 2 +
+ xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
+}
+
+/*
+ * End of turning off quotas.
+ * the xfs_qoff_logitem_t: sizeof(struct xfs_qoff_logitem) * 2
+ */
+STATIC uint
+xfs_calc_qm_quotaoff_end_reservation(
+ struct xfs_mount *mp)
+{
+ return sizeof(struct xfs_qoff_logitem) * 2;
+}
+
+/*
+ * Syncing the incore super block changes to disk.
+ * the super block to reflect the changes: sector size
+ */
+STATIC uint
+xfs_calc_sb_reservation(
+ struct xfs_mount *mp)
+{
+ return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
+}
+
+void
+xfs_trans_resv_calc(
+ struct xfs_mount *mp,
+ struct xfs_trans_resv *resp)
+{
+ resp->tr_write = xfs_calc_write_reservation(mp);
+ resp->tr_itruncate = xfs_calc_itruncate_reservation(mp);
+ resp->tr_rename = xfs_calc_rename_reservation(mp);
+ resp->tr_link = xfs_calc_link_reservation(mp);
+ resp->tr_remove = xfs_calc_remove_reservation(mp);
+ resp->tr_symlink = xfs_calc_symlink_reservation(mp);
+ resp->tr_create = xfs_calc_create_reservation(mp);
+ resp->tr_mkdir = xfs_calc_mkdir_reservation(mp);
+ resp->tr_ifree = xfs_calc_ifree_reservation(mp);
+ resp->tr_ichange = xfs_calc_ichange_reservation(mp);
+ resp->tr_growdata = xfs_calc_growdata_reservation(mp);
+ resp->tr_swrite = xfs_calc_swrite_reservation(mp);
+ resp->tr_writeid = xfs_calc_writeid_reservation(mp);
+ resp->tr_addafork = xfs_calc_addafork_reservation(mp);
+ resp->tr_attrinval = xfs_calc_attrinval_reservation(mp);
+ resp->tr_attrsetm = xfs_calc_attrsetm_reservation(mp);
+ resp->tr_attrsetrt = xfs_calc_attrsetrt_reservation(mp);
+ resp->tr_attrrm = xfs_calc_attrrm_reservation(mp);
+ resp->tr_clearagi = xfs_calc_clear_agi_bucket_reservation(mp);
+ resp->tr_growrtalloc = xfs_calc_growrtalloc_reservation(mp);
+ resp->tr_growrtzero = xfs_calc_growrtzero_reservation(mp);
+ resp->tr_growrtfree = xfs_calc_growrtfree_reservation(mp);
+ resp->tr_qm_sbchange = xfs_calc_qm_sbchange_reservation(mp);
+ resp->tr_qm_setqlim = xfs_calc_qm_setqlim_reservation(mp);
+ resp->tr_qm_dqalloc = xfs_calc_qm_dqalloc_reservation(mp);
+ resp->tr_qm_quotaoff = xfs_calc_qm_quotaoff_reservation(mp);
+ resp->tr_qm_equotaoff = xfs_calc_qm_quotaoff_end_reservation(mp);
+ resp->tr_sb = xfs_calc_sb_reservation(mp);
+}
diff --git a/fs/xfs/xfs_trans_resv.h b/fs/xfs/xfs_trans_resv.h
new file mode 100644
index 0000000..0936f93
--- /dev/null
+++ b/fs/xfs/xfs_trans_resv.h
@@ -0,0 +1,135 @@
+/*
+ * Copyright (c) 2000-2002,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_TRANS_RESV_H__
+#define __XFS_TRANS_RESV_H__
+
+/*
+ * structure for maintaining pre-calculated transaction reservations.
+ */
+struct xfs_trans_resv {
+ uint tr_write; /* extent alloc trans */
+ uint tr_itruncate; /* truncate trans */
+ uint tr_rename; /* rename trans */
+ uint tr_link; /* link trans */
+ uint tr_remove; /* unlink trans */
+ uint tr_symlink; /* symlink trans */
+ uint tr_create; /* create trans */
+ uint tr_mkdir; /* mkdir trans */
+ uint tr_ifree; /* inode free trans */
+ uint tr_ichange; /* inode update trans */
+ uint tr_growdata; /* fs data section grow trans */
+ uint tr_swrite; /* sync write inode trans */
+ uint tr_addafork; /* cvt inode to attributed trans */
+ uint tr_writeid; /* write setuid/setgid file */
+ uint tr_attrinval; /* attr fork buffer invalidation */
+ uint tr_attrsetm; /* set/create an attribute at mount time */
+ uint tr_attrsetrt; /* set/create an attribute at runtime */
+ uint tr_attrrm; /* remove an attribute */
+ uint tr_clearagi; /* clear bad agi unlinked ino bucket */
+ uint tr_growrtalloc; /* grow realtime allocations */
+ uint tr_growrtzero; /* grow realtime zeroing */
+ uint tr_growrtfree; /* grow realtime freeing */
+ uint tr_qm_sbchange; /* change quota flags */
+ uint tr_qm_setqlim; /* adjust quota limits */
+ uint tr_qm_dqalloc; /* allocate quota on disk */
+ uint tr_qm_quotaoff; /* turn quota off */
+ uint tr_qm_equotaoff;/* end of turn quota off */
+ uint tr_sb; /* modify superblock */
+};
+
+/*
+ * Per-extent log reservation for the allocation btree changes
+ * involved in freeing or allocating an extent.
+ * 2 trees * (2 blocks/level * max depth - 1) * block size
+ */
+#define XFS_ALLOCFREE_LOG_RES(mp,nx) \
+ ((nx) * (2 * XFS_FSB_TO_B((mp), 2 * XFS_AG_MAXLEVELS(mp) - 1)))
+#define XFS_ALLOCFREE_LOG_COUNT(mp,nx) \
+ ((nx) * (2 * (2 * XFS_AG_MAXLEVELS(mp) - 1)))
+
+/*
+ * Per-directory log reservation for any directory change.
+ * dir blocks: (1 btree block per level + data block + free block) * dblock size
+ * bmap btree: (levels + 2) * max depth * block size
+ * v2 directory blocks can be fragmented below the dirblksize down to the fsb
+ * size, so account for that in the DAENTER macros.
+ */
+#define XFS_DIROP_LOG_RES(mp) \
+ (XFS_FSB_TO_B(mp, XFS_DAENTER_BLOCKS(mp, XFS_DATA_FORK)) + \
+ (XFS_FSB_TO_B(mp, XFS_DAENTER_BMAPS(mp, XFS_DATA_FORK) + 1)))
+#define XFS_DIROP_LOG_COUNT(mp) \
+ (XFS_DAENTER_BLOCKS(mp, XFS_DATA_FORK) + \
+ XFS_DAENTER_BMAPS(mp, XFS_DATA_FORK) + 1)
+
+
+#define XFS_WRITE_LOG_RES(mp) ((mp)->m_reservations.tr_write)
+#define XFS_ITRUNCATE_LOG_RES(mp) ((mp)->m_reservations.tr_itruncate)
+#define XFS_RENAME_LOG_RES(mp) ((mp)->m_reservations.tr_rename)
+#define XFS_LINK_LOG_RES(mp) ((mp)->m_reservations.tr_link)
+#define XFS_REMOVE_LOG_RES(mp) ((mp)->m_reservations.tr_remove)
+#define XFS_SYMLINK_LOG_RES(mp) ((mp)->m_reservations.tr_symlink)
+#define XFS_CREATE_LOG_RES(mp) ((mp)->m_reservations.tr_create)
+#define XFS_MKDIR_LOG_RES(mp) ((mp)->m_reservations.tr_mkdir)
+#define XFS_IFREE_LOG_RES(mp) ((mp)->m_reservations.tr_ifree)
+#define XFS_ICHANGE_LOG_RES(mp) ((mp)->m_reservations.tr_ichange)
+#define XFS_GROWDATA_LOG_RES(mp) ((mp)->m_reservations.tr_growdata)
+#define XFS_GROWRTALLOC_LOG_RES(mp) ((mp)->m_reservations.tr_growrtalloc)
+#define XFS_GROWRTZERO_LOG_RES(mp) ((mp)->m_reservations.tr_growrtzero)
+#define XFS_GROWRTFREE_LOG_RES(mp) ((mp)->m_reservations.tr_growrtfree)
+#define XFS_SWRITE_LOG_RES(mp) ((mp)->m_reservations.tr_swrite)
+/*
+ * Logging the inode timestamps on an fsync -- same as SWRITE
+ * as long as SWRITE logs the entire inode core
+ */
+#define XFS_FSYNC_TS_LOG_RES(mp) ((mp)->m_reservations.tr_swrite)
+#define XFS_WRITEID_LOG_RES(mp) ((mp)->m_reservations.tr_swrite)
+#define XFS_ADDAFORK_LOG_RES(mp) ((mp)->m_reservations.tr_addafork)
+#define XFS_ATTRINVAL_LOG_RES(mp) ((mp)->m_reservations.tr_attrinval)
+#define XFS_ATTRSETM_LOG_RES(mp) ((mp)->m_reservations.tr_attrsetm)
+#define XFS_ATTRSETRT_LOG_RES(mp) ((mp)->m_reservations.tr_attrsetrt)
+#define XFS_ATTRRM_LOG_RES(mp) ((mp)->m_reservations.tr_attrrm)
+#define XFS_CLEAR_AGI_BUCKET_LOG_RES(mp) ((mp)->m_reservations.tr_clearagi)
+#define XFS_QM_SBCHANGE_LOG_RES(mp) ((mp)->m_reservations.tr_qm_sbchange)
+#define XFS_QM_SETQLIM_LOG_RES(mp) ((mp)->m_reservations.tr_qm_setqlim)
+#define XFS_QM_DQALLOC_LOG_RES(mp) ((mp)->m_reservations.tr_qm_dqalloc)
+#define XFS_QM_QUOTAOFF_LOG_RES(mp) ((mp)->m_reservations.tr_qm_quotaoff)
+#define XFS_QM_QUOTAOFF_END_LOG_RES(mp) ((mp)->m_reservations.tr_qm_equotaoff)
+#define XFS_SB_LOG_RES(mp) ((mp)->m_reservations.tr_sb)
+
+/*
+ * Various log count values.
+ */
+#define XFS_DEFAULT_LOG_COUNT 1
+#define XFS_DEFAULT_PERM_LOG_COUNT 2
+#define XFS_ITRUNCATE_LOG_COUNT 2
+#define XFS_INACTIVE_LOG_COUNT 2
+#define XFS_CREATE_LOG_COUNT 2
+#define XFS_MKDIR_LOG_COUNT 3
+#define XFS_SYMLINK_LOG_COUNT 3
+#define XFS_REMOVE_LOG_COUNT 2
+#define XFS_LINK_LOG_COUNT 2
+#define XFS_RENAME_LOG_COUNT 2
+#define XFS_WRITE_LOG_COUNT 2
+#define XFS_ADDAFORK_LOG_COUNT 2
+#define XFS_ATTRINVAL_LOG_COUNT 1
+#define XFS_ATTRSET_LOG_COUNT 3
+#define XFS_ATTRRM_LOG_COUNT 3
+
+void xfs_trans_resv_calc(struct xfs_mount *mp, struct xfs_trans_resv *resp);
+
+#endif /* __XFS_TRANS_RESV_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 26/60] xfs: minor cleanups
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (24 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 25/60] xfs: split out transaction reservation code Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 27/60] xfs: fix issues that cause userspace warnings Dave Chinner
` (35 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
These come from syncing the shared userspace and kernel code. Small
whitespace and trivial cleanups.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_attr_leaf.c | 12 ------------
fs/xfs/xfs_attr_remote.c | 8 +-------
fs/xfs/xfs_bmap_btree.c | 4 +---
fs/xfs/xfs_inode.c | 4 ++--
4 files changed, 4 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index a7fdd0d..18aada8 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -79,16 +79,6 @@ STATIC int xfs_attr3_leaf_figure_balance(xfs_da_state_t *state,
int *number_usedbytes_in_blk1);
/*
- * Routines used for shrinking the Btree.
- */
-STATIC int xfs_attr3_node_inactive(xfs_trans_t **trans, xfs_inode_t *dp,
- struct xfs_buf *bp, int level);
-STATIC int xfs_attr3_leaf_inactive(xfs_trans_t **trans, xfs_inode_t *dp,
- struct xfs_buf *bp);
-STATIC int xfs_attr3_leaf_freextent(xfs_trans_t **trans, xfs_inode_t *dp,
- xfs_dablk_t blkno, int blkcnt);
-
-/*
* Utility routines.
*/
STATIC void xfs_attr3_leaf_moveents(struct xfs_attr_leafblock *src_leaf,
@@ -1307,7 +1297,6 @@ xfs_attr3_leaf_compact(
ichdr_dst->freemap[0].size = ichdr_dst->firstused -
ichdr_dst->freemap[0].base;
-
/* write the header back to initialise the underlying buffer */
xfs_attr3_leaf_hdr_to_disk(leaf_dst, ichdr_dst);
@@ -2712,4 +2701,3 @@ xfs_attr3_leaf_flipflags(
return error;
}
-
diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index 39a59ea..fed6d78 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -250,7 +250,7 @@ xfs_attr_rmtval_copyout(
int hdr_size = 0;
int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, XFS_LBSIZE(mp));
- byte_cnt = min_t(int, *valuelen, byte_cnt);
+ byte_cnt = min(*valuelen, byte_cnt);
if (xfs_sb_version_hascrc(&mp->m_sb)) {
if (!xfs_attr3_rmt_hdr_ok(mp, src, ino, *offset,
@@ -544,11 +544,6 @@ xfs_attr_rmtval_remove(
/*
* Roll through the "value", invalidating the attribute value's blocks.
- * Note that args->rmtblkcnt is the minimum number of data blocks we'll
- * see for a CRC enabled remote attribute. Each extent will have a
- * header, and so we may have more blocks than we realise here. If we
- * fail to map the blocks correctly, we'll have problems with the buffer
- * lookups.
*/
lblkno = args->rmtblkno;
blkcnt = args->rmtblkcnt;
@@ -629,4 +624,3 @@ xfs_attr_rmtval_remove(
}
return(0);
}
-
diff --git a/fs/xfs/xfs_bmap_btree.c b/fs/xfs/xfs_bmap_btree.c
index 0c61a22..843ff45 100644
--- a/fs/xfs/xfs_bmap_btree.c
+++ b/fs/xfs/xfs_bmap_btree.c
@@ -722,7 +722,7 @@ xfs_bmbt_key_diff(
cur->bc_rec.b.br_startoff;
}
-static int
+static bool
xfs_bmbt_verify(
struct xfs_buf *bp)
{
@@ -775,7 +775,6 @@ xfs_bmbt_verify(
return false;
return true;
-
}
static void
@@ -789,7 +788,6 @@ xfs_bmbt_read_verify(
bp->b_target->bt_mount, bp->b_addr);
xfs_buf_ioerror(bp, EFSCORRUPTED);
}
-
}
static void
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e724464..111ebf4 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1192,8 +1192,8 @@ xfs_iextents_copy(
}
/* Translate to on disk format */
- put_unaligned(cpu_to_be64(ep->l0), &dp->l0);
- put_unaligned(cpu_to_be64(ep->l1), &dp->l1);
+ put_unaligned_be64(ep->l0, &dp->l0);
+ put_unaligned_be64(ep->l1, &dp->l1);
dp++;
copied++;
}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 27/60] xfs: fix issues that cause userspace warnings
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (25 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 26/60] xfs: minor cleanups Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 28/60] xfs: consolidate xfs_rename.c Dave Chinner
` (34 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Some of the code shared with userspace causes compilation warnings
from things turned off in the kernel code, such as differences in
variable signedness. Fix those issues.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_attr_remote.c | 8 ++++----
fs/xfs/xfs_bmap.c | 3 ---
fs/xfs/xfs_da_btree.c | 6 +++---
fs/xfs/xfs_dir2_node.c | 2 ++
fs/xfs/xfs_ialloc.c | 2 +-
fs/xfs/xfs_rtalloc.c | 4 ++--
6 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index fed6d78..b8ab302 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -238,7 +238,7 @@ xfs_attr_rmtval_copyout(
xfs_ino_t ino,
int *offset,
int *valuelen,
- char **dst)
+ __uint8_t **dst)
{
char *src = bp->b_addr;
xfs_daddr_t bno = bp->b_bn;
@@ -285,7 +285,7 @@ xfs_attr_rmtval_copyin(
xfs_ino_t ino,
int *offset,
int *valuelen,
- char **src)
+ __uint8_t **src)
{
char *dst = bp->b_addr;
xfs_daddr_t bno = bp->b_bn;
@@ -338,7 +338,7 @@ xfs_attr_rmtval_get(
struct xfs_mount *mp = args->dp->i_mount;
struct xfs_buf *bp;
xfs_dablk_t lblkno = args->rmtblkno;
- char *dst = args->value;
+ __uint8_t *dst = args->value;
int valuelen = args->valuelen;
int nmap;
int error;
@@ -402,7 +402,7 @@ xfs_attr_rmtval_set(
struct xfs_bmbt_irec map;
xfs_dablk_t lblkno;
xfs_fileoff_t lfileoff = 0;
- char *src = args->value;
+ __uint8_t *src = args->value;
int blkcnt;
int valuelen;
int nmap;
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 386758b..8f57042 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -4648,12 +4648,9 @@ __xfs_bmapi_allocate(
struct xfs_ifork *ifp = XFS_IFORK_PTR(bma->ip, whichfork);
int tmp_logflags = 0;
int error;
- int rt;
ASSERT(bma->length > 0);
- rt = (whichfork == XFS_DATA_FORK) && XFS_IS_REALTIME_INODE(bma->ip);
-
/*
* For the wasdelay case, we could also just allocate the stuff asked
* for in this bmap call but that wouldn't be as good.
diff --git a/fs/xfs/xfs_da_btree.c b/fs/xfs/xfs_da_btree.c
index 8bbd7ef..d4e59a4 100644
--- a/fs/xfs/xfs_da_btree.c
+++ b/fs/xfs/xfs_da_btree.c
@@ -399,7 +399,7 @@ xfs_da3_split(
struct xfs_da_intnode *node;
struct xfs_buf *bp;
int max;
- int action;
+ int action = 0;
int error;
int i;
@@ -2454,9 +2454,9 @@ static int
xfs_buf_map_from_irec(
struct xfs_mount *mp,
struct xfs_buf_map **mapp,
- unsigned int *nmaps,
+ int *nmaps,
struct xfs_bmbt_irec *irecs,
- unsigned int nirecs)
+ int nirecs)
{
struct xfs_buf_map *map;
int i;
diff --git a/fs/xfs/xfs_dir2_node.c b/fs/xfs/xfs_dir2_node.c
index b4bd9b6..18e287d 100644
--- a/fs/xfs/xfs_dir2_node.c
+++ b/fs/xfs/xfs_dir2_node.c
@@ -313,11 +313,13 @@ xfs_dir2_free_log_header(
struct xfs_trans *tp,
struct xfs_buf *bp)
{
+#ifdef DEBUG
xfs_dir2_free_t *free; /* freespace structure */
free = bp->b_addr;
ASSERT(free->hdr.magic == cpu_to_be32(XFS_DIR2_FREE_MAGIC) ||
free->hdr.magic == cpu_to_be32(XFS_DIR3_FREE_MAGIC));
+#endif
xfs_trans_log_buf(tp, bp, 0, xfs_dir3_free_hdr_size(tp->t_mountp) - 1);
}
diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index 319d9e4..96a53bd 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -1342,7 +1342,7 @@ xfs_imap(
xfs_agblock_t cluster_agbno; /* first block in inode cluster */
int error; /* error code */
int offset; /* index of inode in its buffer */
- int offset_agbno; /* blks from chunk start to inode */
+ xfs_agblock_t offset_agbno; /* blks from chunk start to inode */
ASSERT(ino != NULLFSINO);
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 7fed9c2..8623ba7 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -735,8 +735,8 @@ xfs_rtallocate_range(
{
xfs_rtblock_t end; /* end of the allocated extent */
int error; /* error value */
- xfs_rtblock_t postblock; /* first block allocated > end */
- xfs_rtblock_t preblock; /* first block allocated < start */
+ xfs_rtblock_t postblock = 0; /* first block allocated > end */
+ xfs_rtblock_t preblock = 0; /* first block allocated < start */
end = start + len - 1;
/*
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 28/60] xfs: consolidate xfs_rename.c
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (26 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 27/60] xfs: fix issues that cause userspace warnings Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 29/60] xfs: consolidate xfs_utils.c Dave Chinner
` (33 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Move the rename code to xfs_inode_ops.c to continue consolidating
all the kernel xfs_inode operations in the one place.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 -
fs/xfs/xfs_inode_ops.c | 306 ++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_rename.c | 346 ------------------------------------------------
3 files changed, 306 insertions(+), 347 deletions(-)
delete mode 100644 fs/xfs/xfs_rename.c
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 3281a20..000738f 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -51,7 +51,6 @@ xfs-y += xfs_aops.o \
xfs_message.o \
xfs_mount.o \
xfs_mru_cache.o \
- xfs_rename.o \
xfs_super.o \
xfs_trans.o \
xfs_utils.o \
diff --git a/fs/xfs/xfs_inode_ops.c b/fs/xfs/xfs_inode_ops.c
index 0699afa..a699f81 100644
--- a/fs/xfs/xfs_inode_ops.c
+++ b/fs/xfs/xfs_inode_ops.c
@@ -2843,3 +2843,309 @@ abort_out:
return error;
}
+/*
+ * Enter all inodes for a rename transaction into a sorted array.
+ */
+STATIC void
+xfs_sort_for_rename(
+ xfs_inode_t *dp1, /* in: old (source) directory inode */
+ xfs_inode_t *dp2, /* in: new (target) directory inode */
+ xfs_inode_t *ip1, /* in: inode of old entry */
+ xfs_inode_t *ip2, /* in: inode of new entry, if it
+ already exists, NULL otherwise. */
+ xfs_inode_t **i_tab,/* out: array of inode returned, sorted */
+ int *num_inodes) /* out: number of inodes in array */
+{
+ xfs_inode_t *temp;
+ int i, j;
+
+ /*
+ * i_tab contains a list of pointers to inodes. We initialize
+ * the table here & we'll sort it. We will then use it to
+ * order the acquisition of the inode locks.
+ *
+ * Note that the table may contain duplicates. e.g., dp1 == dp2.
+ */
+ i_tab[0] = dp1;
+ i_tab[1] = dp2;
+ i_tab[2] = ip1;
+ if (ip2) {
+ *num_inodes = 4;
+ i_tab[3] = ip2;
+ } else {
+ *num_inodes = 3;
+ i_tab[3] = NULL;
+ }
+
+ /*
+ * Sort the elements via bubble sort. (Remember, there are at
+ * most 4 elements to sort, so this is adequate.)
+ */
+ for (i = 0; i < *num_inodes; i++) {
+ for (j = 1; j < *num_inodes; j++) {
+ if (i_tab[j]->i_ino < i_tab[j-1]->i_ino) {
+ temp = i_tab[j];
+ i_tab[j] = i_tab[j-1];
+ i_tab[j-1] = temp;
+ }
+ }
+ }
+}
+
+/*
+ * xfs_rename
+ */
+int
+xfs_rename(
+ xfs_inode_t *src_dp,
+ struct xfs_name *src_name,
+ xfs_inode_t *src_ip,
+ xfs_inode_t *target_dp,
+ struct xfs_name *target_name,
+ xfs_inode_t *target_ip)
+{
+ xfs_trans_t *tp = NULL;
+ xfs_mount_t *mp = src_dp->i_mount;
+ int new_parent; /* moving to a new dir */
+ int src_is_directory; /* src_name is a directory */
+ int error;
+ xfs_bmap_free_t free_list;
+ xfs_fsblock_t first_block;
+ int cancel_flags;
+ int committed;
+ xfs_inode_t *inodes[4];
+ int spaceres;
+ int num_inodes;
+
+ trace_xfs_rename(src_dp, target_dp, src_name, target_name);
+
+ new_parent = (src_dp != target_dp);
+ src_is_directory = S_ISDIR(src_ip->i_d.di_mode);
+
+ xfs_sort_for_rename(src_dp, target_dp, src_ip, target_ip,
+ inodes, &num_inodes);
+
+ xfs_bmap_init(&free_list, &first_block);
+ tp = xfs_trans_alloc(mp, XFS_TRANS_RENAME);
+ cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
+ spaceres = XFS_RENAME_SPACE_RES(mp, target_name->len);
+ error = xfs_trans_reserve(tp, spaceres, XFS_RENAME_LOG_RES(mp), 0,
+ XFS_TRANS_PERM_LOG_RES, XFS_RENAME_LOG_COUNT);
+ if (error == ENOSPC) {
+ spaceres = 0;
+ error = xfs_trans_reserve(tp, 0, XFS_RENAME_LOG_RES(mp), 0,
+ XFS_TRANS_PERM_LOG_RES, XFS_RENAME_LOG_COUNT);
+ }
+ if (error) {
+ xfs_trans_cancel(tp, 0);
+ goto std_return;
+ }
+
+ /*
+ * Attach the dquots to the inodes
+ */
+ error = xfs_qm_vop_rename_dqattach(inodes);
+ if (error) {
+ xfs_trans_cancel(tp, cancel_flags);
+ goto std_return;
+ }
+
+ /*
+ * Lock all the participating inodes. Depending upon whether
+ * the target_name exists in the target directory, and
+ * whether the target directory is the same as the source
+ * directory, we can lock from 2 to 4 inodes.
+ */
+ xfs_lock_inodes(inodes, num_inodes, XFS_ILOCK_EXCL);
+
+ /*
+ * Join all the inodes to the transaction. From this point on,
+ * we can rely on either trans_commit or trans_cancel to unlock
+ * them.
+ */
+ xfs_trans_ijoin(tp, src_dp, XFS_ILOCK_EXCL);
+ if (new_parent)
+ xfs_trans_ijoin(tp, target_dp, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, src_ip, XFS_ILOCK_EXCL);
+ if (target_ip)
+ xfs_trans_ijoin(tp, target_ip, XFS_ILOCK_EXCL);
+
+ /*
+ * If we are using project inheritance, we only allow renames
+ * into our tree when the project IDs are the same; else the
+ * tree quota mechanism would be circumvented.
+ */
+ if (unlikely((target_dp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT) &&
+ (xfs_get_projid(target_dp) != xfs_get_projid(src_ip)))) {
+ error = XFS_ERROR(EXDEV);
+ goto error_return;
+ }
+
+ /*
+ * Set up the target.
+ */
+ if (target_ip == NULL) {
+ /*
+ * If there's no space reservation, check the entry will
+ * fit before actually inserting it.
+ */
+ error = xfs_dir_canenter(tp, target_dp, target_name, spaceres);
+ if (error)
+ goto error_return;
+ /*
+ * If target does not exist and the rename crosses
+ * directories, adjust the target directory link count
+ * to account for the ".." reference from the new entry.
+ */
+ error = xfs_dir_createname(tp, target_dp, target_name,
+ src_ip->i_ino, &first_block,
+ &free_list, spaceres);
+ if (error == ENOSPC)
+ goto error_return;
+ if (error)
+ goto abort_return;
+
+ xfs_trans_ichgtime(tp, target_dp,
+ XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
+
+ if (new_parent && src_is_directory) {
+ error = xfs_bumplink(tp, target_dp);
+ if (error)
+ goto abort_return;
+ }
+ } else { /* target_ip != NULL */
+ /*
+ * If target exists and it's a directory, check that both
+ * target and source are directories and that target can be
+ * destroyed, or that neither is a directory.
+ */
+ if (S_ISDIR(target_ip->i_d.di_mode)) {
+ /*
+ * Make sure target dir is empty.
+ */
+ if (!(xfs_dir_isempty(target_ip)) ||
+ (target_ip->i_d.di_nlink > 2)) {
+ error = XFS_ERROR(EEXIST);
+ goto error_return;
+ }
+ }
+
+ /*
+ * Link the source inode under the target name.
+ * If the source inode is a directory and we are moving
+ * it across directories, its ".." entry will be
+ * inconsistent until we replace that down below.
+ *
+ * In case there is already an entry with the same
+ * name at the destination directory, remove it first.
+ */
+ error = xfs_dir_replace(tp, target_dp, target_name,
+ src_ip->i_ino,
+ &first_block, &free_list, spaceres);
+ if (error)
+ goto abort_return;
+
+ xfs_trans_ichgtime(tp, target_dp,
+ XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
+
+ /*
+ * Decrement the link count on the target since the target
+ * dir no longer points to it.
+ */
+ error = xfs_droplink(tp, target_ip);
+ if (error)
+ goto abort_return;
+
+ if (src_is_directory) {
+ /*
+ * Drop the link from the old "." entry.
+ */
+ error = xfs_droplink(tp, target_ip);
+ if (error)
+ goto abort_return;
+ }
+ } /* target_ip != NULL */
+
+ /*
+ * Remove the source.
+ */
+ if (new_parent && src_is_directory) {
+ /*
+ * Rewrite the ".." entry to point to the new
+ * directory.
+ */
+ error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot,
+ target_dp->i_ino,
+ &first_block, &free_list, spaceres);
+ ASSERT(error != EEXIST);
+ if (error)
+ goto abort_return;
+ }
+
+ /*
+ * We always want to hit the ctime on the source inode.
+ *
+ * This isn't strictly required by the standards since the source
+ * inode isn't really being changed, but old unix file systems did
+ * it and some incremental backup programs won't work without it.
+ */
+ xfs_trans_ichgtime(tp, src_ip, XFS_ICHGTIME_CHG);
+ xfs_trans_log_inode(tp, src_ip, XFS_ILOG_CORE);
+
+ /*
+ * Adjust the link count on src_dp. This is necessary when
+ * renaming a directory, either within one parent when
+ * the target existed, or across two parent directories.
+ */
+ if (src_is_directory && (new_parent || target_ip != NULL)) {
+
+ /*
+ * Decrement link count on src_directory since the
+ * entry that's moved no longer points to it.
+ */
+ error = xfs_droplink(tp, src_dp);
+ if (error)
+ goto abort_return;
+ }
+
+ error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
+ &first_block, &free_list, spaceres);
+ if (error)
+ goto abort_return;
+
+ xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
+ xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
+ if (new_parent)
+ xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE);
+
+ /*
+ * If this is a synchronous mount, make sure that the
+ * rename transaction goes to disk before returning to
+ * the user.
+ */
+ if (mp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC)) {
+ xfs_trans_set_sync(tp);
+ }
+
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error) {
+ xfs_bmap_cancel(&free_list);
+ xfs_trans_cancel(tp, (XFS_TRANS_RELEASE_LOG_RES |
+ XFS_TRANS_ABORT));
+ goto std_return;
+ }
+
+ /*
+ * trans_commit will unlock src_ip, target_ip & decrement
+ * the vnode references.
+ */
+ return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+
+ abort_return:
+ cancel_flags |= XFS_TRANS_ABORT;
+ error_return:
+ xfs_bmap_cancel(&free_list);
+ xfs_trans_cancel(tp, cancel_flags);
+ std_return:
+ return error;
+}
diff --git a/fs/xfs/xfs_rename.c b/fs/xfs/xfs_rename.c
deleted file mode 100644
index e0e59f4..0000000
--- a/fs/xfs/xfs_rename.c
+++ /dev/null
@@ -1,346 +0,0 @@
-/*
- * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
- * All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it would be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write the Free Software Foundation,
- * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-#include "xfs.h"
-#include "xfs_fs.h"
-#include "xfs_types.h"
-#include "xfs_log.h"
-#include "xfs_trans.h"
-#include "xfs_sb.h"
-#include "xfs_ag.h"
-#include "xfs_mount.h"
-#include "xfs_da_btree.h"
-#include "xfs_dir2_format.h"
-#include "xfs_dir2.h"
-#include "xfs_bmap_btree.h"
-#include "xfs_dinode.h"
-#include "xfs_inode.h"
-#include "xfs_inode_item.h"
-#include "xfs_bmap.h"
-#include "xfs_error.h"
-#include "xfs_quota.h"
-#include "xfs_utils.h"
-#include "xfs_trans_space.h"
-#include "xfs_trace.h"
-
-
-/*
- * Enter all inodes for a rename transaction into a sorted array.
- */
-STATIC void
-xfs_sort_for_rename(
- xfs_inode_t *dp1, /* in: old (source) directory inode */
- xfs_inode_t *dp2, /* in: new (target) directory inode */
- xfs_inode_t *ip1, /* in: inode of old entry */
- xfs_inode_t *ip2, /* in: inode of new entry, if it
- already exists, NULL otherwise. */
- xfs_inode_t **i_tab,/* out: array of inode returned, sorted */
- int *num_inodes) /* out: number of inodes in array */
-{
- xfs_inode_t *temp;
- int i, j;
-
- /*
- * i_tab contains a list of pointers to inodes. We initialize
- * the table here & we'll sort it. We will then use it to
- * order the acquisition of the inode locks.
- *
- * Note that the table may contain duplicates. e.g., dp1 == dp2.
- */
- i_tab[0] = dp1;
- i_tab[1] = dp2;
- i_tab[2] = ip1;
- if (ip2) {
- *num_inodes = 4;
- i_tab[3] = ip2;
- } else {
- *num_inodes = 3;
- i_tab[3] = NULL;
- }
-
- /*
- * Sort the elements via bubble sort. (Remember, there are at
- * most 4 elements to sort, so this is adequate.)
- */
- for (i = 0; i < *num_inodes; i++) {
- for (j = 1; j < *num_inodes; j++) {
- if (i_tab[j]->i_ino < i_tab[j-1]->i_ino) {
- temp = i_tab[j];
- i_tab[j] = i_tab[j-1];
- i_tab[j-1] = temp;
- }
- }
- }
-}
-
-/*
- * xfs_rename
- */
-int
-xfs_rename(
- xfs_inode_t *src_dp,
- struct xfs_name *src_name,
- xfs_inode_t *src_ip,
- xfs_inode_t *target_dp,
- struct xfs_name *target_name,
- xfs_inode_t *target_ip)
-{
- xfs_trans_t *tp = NULL;
- xfs_mount_t *mp = src_dp->i_mount;
- int new_parent; /* moving to a new dir */
- int src_is_directory; /* src_name is a directory */
- int error;
- xfs_bmap_free_t free_list;
- xfs_fsblock_t first_block;
- int cancel_flags;
- int committed;
- xfs_inode_t *inodes[4];
- int spaceres;
- int num_inodes;
-
- trace_xfs_rename(src_dp, target_dp, src_name, target_name);
-
- new_parent = (src_dp != target_dp);
- src_is_directory = S_ISDIR(src_ip->i_d.di_mode);
-
- xfs_sort_for_rename(src_dp, target_dp, src_ip, target_ip,
- inodes, &num_inodes);
-
- xfs_bmap_init(&free_list, &first_block);
- tp = xfs_trans_alloc(mp, XFS_TRANS_RENAME);
- cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
- spaceres = XFS_RENAME_SPACE_RES(mp, target_name->len);
- error = xfs_trans_reserve(tp, spaceres, XFS_RENAME_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_RENAME_LOG_COUNT);
- if (error == ENOSPC) {
- spaceres = 0;
- error = xfs_trans_reserve(tp, 0, XFS_RENAME_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_RENAME_LOG_COUNT);
- }
- if (error) {
- xfs_trans_cancel(tp, 0);
- goto std_return;
- }
-
- /*
- * Attach the dquots to the inodes
- */
- error = xfs_qm_vop_rename_dqattach(inodes);
- if (error) {
- xfs_trans_cancel(tp, cancel_flags);
- goto std_return;
- }
-
- /*
- * Lock all the participating inodes. Depending upon whether
- * the target_name exists in the target directory, and
- * whether the target directory is the same as the source
- * directory, we can lock from 2 to 4 inodes.
- */
- xfs_lock_inodes(inodes, num_inodes, XFS_ILOCK_EXCL);
-
- /*
- * Join all the inodes to the transaction. From this point on,
- * we can rely on either trans_commit or trans_cancel to unlock
- * them.
- */
- xfs_trans_ijoin(tp, src_dp, XFS_ILOCK_EXCL);
- if (new_parent)
- xfs_trans_ijoin(tp, target_dp, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, src_ip, XFS_ILOCK_EXCL);
- if (target_ip)
- xfs_trans_ijoin(tp, target_ip, XFS_ILOCK_EXCL);
-
- /*
- * If we are using project inheritance, we only allow renames
- * into our tree when the project IDs are the same; else the
- * tree quota mechanism would be circumvented.
- */
- if (unlikely((target_dp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT) &&
- (xfs_get_projid(target_dp) != xfs_get_projid(src_ip)))) {
- error = XFS_ERROR(EXDEV);
- goto error_return;
- }
-
- /*
- * Set up the target.
- */
- if (target_ip == NULL) {
- /*
- * If there's no space reservation, check the entry will
- * fit before actually inserting it.
- */
- error = xfs_dir_canenter(tp, target_dp, target_name, spaceres);
- if (error)
- goto error_return;
- /*
- * If target does not exist and the rename crosses
- * directories, adjust the target directory link count
- * to account for the ".." reference from the new entry.
- */
- error = xfs_dir_createname(tp, target_dp, target_name,
- src_ip->i_ino, &first_block,
- &free_list, spaceres);
- if (error == ENOSPC)
- goto error_return;
- if (error)
- goto abort_return;
-
- xfs_trans_ichgtime(tp, target_dp,
- XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
-
- if (new_parent && src_is_directory) {
- error = xfs_bumplink(tp, target_dp);
- if (error)
- goto abort_return;
- }
- } else { /* target_ip != NULL */
- /*
- * If target exists and it's a directory, check that both
- * target and source are directories and that target can be
- * destroyed, or that neither is a directory.
- */
- if (S_ISDIR(target_ip->i_d.di_mode)) {
- /*
- * Make sure target dir is empty.
- */
- if (!(xfs_dir_isempty(target_ip)) ||
- (target_ip->i_d.di_nlink > 2)) {
- error = XFS_ERROR(EEXIST);
- goto error_return;
- }
- }
-
- /*
- * Link the source inode under the target name.
- * If the source inode is a directory and we are moving
- * it across directories, its ".." entry will be
- * inconsistent until we replace that down below.
- *
- * In case there is already an entry with the same
- * name at the destination directory, remove it first.
- */
- error = xfs_dir_replace(tp, target_dp, target_name,
- src_ip->i_ino,
- &first_block, &free_list, spaceres);
- if (error)
- goto abort_return;
-
- xfs_trans_ichgtime(tp, target_dp,
- XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
-
- /*
- * Decrement the link count on the target since the target
- * dir no longer points to it.
- */
- error = xfs_droplink(tp, target_ip);
- if (error)
- goto abort_return;
-
- if (src_is_directory) {
- /*
- * Drop the link from the old "." entry.
- */
- error = xfs_droplink(tp, target_ip);
- if (error)
- goto abort_return;
- }
- } /* target_ip != NULL */
-
- /*
- * Remove the source.
- */
- if (new_parent && src_is_directory) {
- /*
- * Rewrite the ".." entry to point to the new
- * directory.
- */
- error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot,
- target_dp->i_ino,
- &first_block, &free_list, spaceres);
- ASSERT(error != EEXIST);
- if (error)
- goto abort_return;
- }
-
- /*
- * We always want to hit the ctime on the source inode.
- *
- * This isn't strictly required by the standards since the source
- * inode isn't really being changed, but old unix file systems did
- * it and some incremental backup programs won't work without it.
- */
- xfs_trans_ichgtime(tp, src_ip, XFS_ICHGTIME_CHG);
- xfs_trans_log_inode(tp, src_ip, XFS_ILOG_CORE);
-
- /*
- * Adjust the link count on src_dp. This is necessary when
- * renaming a directory, either within one parent when
- * the target existed, or across two parent directories.
- */
- if (src_is_directory && (new_parent || target_ip != NULL)) {
-
- /*
- * Decrement link count on src_directory since the
- * entry that's moved no longer points to it.
- */
- error = xfs_droplink(tp, src_dp);
- if (error)
- goto abort_return;
- }
-
- error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
- &first_block, &free_list, spaceres);
- if (error)
- goto abort_return;
-
- xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
- xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
- if (new_parent)
- xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE);
-
- /*
- * If this is a synchronous mount, make sure that the
- * rename transaction goes to disk before returning to
- * the user.
- */
- if (mp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC)) {
- xfs_trans_set_sync(tp);
- }
-
- error = xfs_bmap_finish(&tp, &free_list, &committed);
- if (error) {
- xfs_bmap_cancel(&free_list);
- xfs_trans_cancel(tp, (XFS_TRANS_RELEASE_LOG_RES |
- XFS_TRANS_ABORT));
- goto std_return;
- }
-
- /*
- * trans_commit will unlock src_ip, target_ip & decrement
- * the vnode references.
- */
- return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
-
- abort_return:
- cancel_flags |= XFS_TRANS_ABORT;
- error_return:
- xfs_bmap_cancel(&free_list);
- xfs_trans_cancel(tp, cancel_flags);
- std_return:
- return error;
-}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 29/60] xfs: consolidate xfs_utils.c
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (27 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 28/60] xfs: consolidate xfs_rename.c Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 9:40 ` Christoph Hellwig
2013-06-19 4:50 ` [PATCH 30/60] xfs: split out inode log item format definition Dave Chinner
` (32 subsequent siblings)
61 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
There are a few small helper functions in xfs_util, all related to
xfs_inode modifications. Move them all to xfs_inode_ops.c so all
xfs_inode operations are consiolidated in the one place.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 -
fs/xfs/xfs_error.c | 1 -
fs/xfs/xfs_extent_ops.c | 1 -
fs/xfs/xfs_filestream.c | 1 -
fs/xfs/xfs_inode.c | 1 -
fs/xfs/xfs_inode_ops.c | 279 +++++++++++++++++++++++++++++++++++++++-
fs/xfs/xfs_inode_ops.h | 7 +
fs/xfs/xfs_ioctl.c | 1 -
fs/xfs/xfs_iomap.c | 1 -
fs/xfs/xfs_iops.c | 1 -
fs/xfs/xfs_log_recover.c | 1 -
fs/xfs/xfs_mount.c | 1 -
fs/xfs/xfs_qm.c | 1 -
fs/xfs/xfs_qm_syscalls.c | 1 -
fs/xfs/xfs_rtalloc.c | 1 -
fs/xfs/xfs_super.c | 1 -
fs/xfs/xfs_symlink.c | 1 -
fs/xfs/xfs_utils.c | 316 ----------------------------------------------
fs/xfs/xfs_utils.h | 27 ----
19 files changed, 285 insertions(+), 359 deletions(-)
delete mode 100644 fs/xfs/xfs_utils.c
delete mode 100644 fs/xfs/xfs_utils.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 000738f..66bbeb3 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -53,7 +53,6 @@ xfs-y += xfs_aops.o \
xfs_mru_cache.o \
xfs_super.o \
xfs_trans.o \
- xfs_utils.o \
xfs_xattr.o \
kmem.o \
uuid.o
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index 35d3f5b..1123d93f 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -26,7 +26,6 @@
#include "xfs_bmap_btree.h"
#include "xfs_dinode.h"
#include "xfs_inode.h"
-#include "xfs_utils.h"
#include "xfs_error.h"
#ifdef DEBUG
diff --git a/fs/xfs/xfs_extent_ops.c b/fs/xfs/xfs_extent_ops.c
index a2f136c..527af72 100644
--- a/fs/xfs/xfs_extent_ops.c
+++ b/fs/xfs/xfs_extent_ops.c
@@ -42,7 +42,6 @@
#include "xfs_attr.h"
#include "xfs_error.h"
#include "xfs_quota.h"
-#include "xfs_utils.h"
#include "xfs_rtalloc.h"
#include "xfs_trans_space.h"
#include "xfs_log_priv.h"
diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 5170306..86cd468 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -27,7 +27,6 @@
#include "xfs_mount.h"
#include "xfs_bmap.h"
#include "xfs_alloc.h"
-#include "xfs_utils.h"
#include "xfs_mru_cache.h"
#include "xfs_filestream.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 111ebf4..8740562 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -40,7 +40,6 @@
#include "xfs_ialloc.h"
#include "xfs_bmap.h"
#include "xfs_error.h"
-#include "xfs_utils.h"
#include "xfs_quota.h"
#include "xfs_filestream.h"
#include "xfs_cksum.h"
diff --git a/fs/xfs/xfs_inode_ops.c b/fs/xfs/xfs_inode_ops.c
index a699f81..7a7f19d 100644
--- a/fs/xfs/xfs_inode_ops.c
+++ b/fs/xfs/xfs_inode_ops.c
@@ -45,7 +45,6 @@
#include "xfs_ialloc.h"
#include "xfs_bmap.h"
#include "xfs_error.h"
-#include "xfs_utils.h"
#include "xfs_quota.h"
#include "xfs_filestream.h"
#include "xfs_cksum.h"
@@ -899,6 +898,284 @@ xfs_ialloc(
return 0;
}
+/*
+ * Allocates a new inode from disk and return a pointer to the
+ * incore copy. This routine will internally commit the current
+ * transaction and allocate a new one if the Space Manager needed
+ * to do an allocation to replenish the inode free-list.
+ *
+ * This routine is designed to be called from xfs_create and
+ * xfs_create_dir.
+ *
+ */
+int
+xfs_dir_ialloc(
+ xfs_trans_t **tpp, /* input: current transaction;
+ output: may be a new transaction. */
+ xfs_inode_t *dp, /* directory within whose allocate
+ the inode. */
+ umode_t mode,
+ xfs_nlink_t nlink,
+ xfs_dev_t rdev,
+ prid_t prid, /* project id */
+ int okalloc, /* ok to allocate new space */
+ xfs_inode_t **ipp, /* pointer to inode; it will be
+ locked. */
+ int *committed)
+
+{
+ xfs_trans_t *tp;
+ xfs_trans_t *ntp;
+ xfs_inode_t *ip;
+ xfs_buf_t *ialloc_context = NULL;
+ int code;
+ uint log_res;
+ uint log_count;
+ void *dqinfo;
+ uint tflags;
+
+ tp = *tpp;
+ ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
+
+ /*
+ * xfs_ialloc will return a pointer to an incore inode if
+ * the Space Manager has an available inode on the free
+ * list. Otherwise, it will do an allocation and replenish
+ * the freelist. Since we can only do one allocation per
+ * transaction without deadlocks, we will need to commit the
+ * current transaction and start a new one. We will then
+ * need to call xfs_ialloc again to get the inode.
+ *
+ * If xfs_ialloc did an allocation to replenish the freelist,
+ * it returns the bp containing the head of the freelist as
+ * ialloc_context. We will hold a lock on it across the
+ * transaction commit so that no other process can steal
+ * the inode(s) that we've just allocated.
+ */
+ code = xfs_ialloc(tp, dp, mode, nlink, rdev, prid, okalloc,
+ &ialloc_context, &ip);
+
+ /*
+ * Return an error if we were unable to allocate a new inode.
+ * This should only happen if we run out of space on disk or
+ * encounter a disk error.
+ */
+ if (code) {
+ *ipp = NULL;
+ return code;
+ }
+ if (!ialloc_context && !ip) {
+ *ipp = NULL;
+ return XFS_ERROR(ENOSPC);
+ }
+
+ /*
+ * If the AGI buffer is non-NULL, then we were unable to get an
+ * inode in one operation. We need to commit the current
+ * transaction and call xfs_ialloc() again. It is guaranteed
+ * to succeed the second time.
+ */
+ if (ialloc_context) {
+ /*
+ * Normally, xfs_trans_commit releases all the locks.
+ * We call bhold to hang on to the ialloc_context across
+ * the commit. Holding this buffer prevents any other
+ * processes from doing any allocations in this
+ * allocation group.
+ */
+ xfs_trans_bhold(tp, ialloc_context);
+ /*
+ * Save the log reservation so we can use
+ * them in the next transaction.
+ */
+ log_res = xfs_trans_get_log_res(tp);
+ log_count = xfs_trans_get_log_count(tp);
+
+ /*
+ * We want the quota changes to be associated with the next
+ * transaction, NOT this one. So, detach the dqinfo from this
+ * and attach it to the next transaction.
+ */
+ dqinfo = NULL;
+ tflags = 0;
+ if (tp->t_dqinfo) {
+ dqinfo = (void *)tp->t_dqinfo;
+ tp->t_dqinfo = NULL;
+ tflags = tp->t_flags & XFS_TRANS_DQ_DIRTY;
+ tp->t_flags &= ~(XFS_TRANS_DQ_DIRTY);
+ }
+
+ ntp = xfs_trans_dup(tp);
+ code = xfs_trans_commit(tp, 0);
+ tp = ntp;
+ if (committed != NULL) {
+ *committed = 1;
+ }
+ /*
+ * If we get an error during the commit processing,
+ * release the buffer that is still held and return
+ * to the caller.
+ */
+ if (code) {
+ xfs_buf_relse(ialloc_context);
+ if (dqinfo) {
+ tp->t_dqinfo = dqinfo;
+ xfs_trans_free_dqinfo(tp);
+ }
+ *tpp = ntp;
+ *ipp = NULL;
+ return code;
+ }
+
+ /*
+ * transaction commit worked ok so we can drop the extra ticket
+ * reference that we gained in xfs_trans_dup()
+ */
+ xfs_log_ticket_put(tp->t_ticket);
+ code = xfs_trans_reserve(tp, 0, log_res, 0,
+ XFS_TRANS_PERM_LOG_RES, log_count);
+ /*
+ * Re-attach the quota info that we detached from prev trx.
+ */
+ if (dqinfo) {
+ tp->t_dqinfo = dqinfo;
+ tp->t_flags |= tflags;
+ }
+
+ if (code) {
+ xfs_buf_relse(ialloc_context);
+ *tpp = ntp;
+ *ipp = NULL;
+ return code;
+ }
+ xfs_trans_bjoin(tp, ialloc_context);
+
+ /*
+ * Call ialloc again. Since we've locked out all
+ * other allocations in this allocation group,
+ * this call should always succeed.
+ */
+ code = xfs_ialloc(tp, dp, mode, nlink, rdev, prid,
+ okalloc, &ialloc_context, &ip);
+
+ /*
+ * If we get an error at this point, return to the caller
+ * so that the current transaction can be aborted.
+ */
+ if (code) {
+ *tpp = tp;
+ *ipp = NULL;
+ return code;
+ }
+ ASSERT(!ialloc_context && ip);
+
+ } else {
+ if (committed != NULL)
+ *committed = 0;
+ }
+
+ *ipp = ip;
+ *tpp = tp;
+
+ return 0;
+}
+
+/*
+ * Decrement the link count on an inode & log the change.
+ * If this causes the link count to go to zero, initiate the
+ * logging activity required to truncate a file.
+ */
+int /* error */
+xfs_droplink(
+ xfs_trans_t *tp,
+ xfs_inode_t *ip)
+{
+ int error;
+
+ xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
+
+ ASSERT (ip->i_d.di_nlink > 0);
+ ip->i_d.di_nlink--;
+ drop_nlink(VFS_I(ip));
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+ error = 0;
+ if (ip->i_d.di_nlink == 0) {
+ /*
+ * We're dropping the last link to this file.
+ * Move the on-disk inode to the AGI unlinked list.
+ * From xfs_inactive() we will pull the inode from
+ * the list and free it.
+ */
+ error = xfs_iunlink(tp, ip);
+ }
+ return error;
+}
+
+/*
+ * This gets called when the inode's version needs to be changed from 1 to 2.
+ * Currently this happens when the nlink field overflows the old 16-bit value
+ * or when chproj is called to change the project for the first time.
+ * As a side effect the superblock version will also get rev'd
+ * to contain the NLINK bit.
+ */
+void
+xfs_bump_ino_vers2(
+ xfs_trans_t *tp,
+ xfs_inode_t *ip)
+{
+ xfs_mount_t *mp;
+
+ ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+ ASSERT(ip->i_d.di_version == 1);
+
+ ip->i_d.di_version = 2;
+ ip->i_d.di_onlink = 0;
+ memset(&(ip->i_d.di_pad[0]), 0, sizeof(ip->i_d.di_pad));
+ mp = tp->t_mountp;
+ if (!xfs_sb_version_hasnlink(&mp->m_sb)) {
+ spin_lock(&mp->m_sb_lock);
+ if (!xfs_sb_version_hasnlink(&mp->m_sb)) {
+ xfs_sb_version_addnlink(&mp->m_sb);
+ spin_unlock(&mp->m_sb_lock);
+ xfs_mod_sb(tp, XFS_SB_VERSIONNUM);
+ } else {
+ spin_unlock(&mp->m_sb_lock);
+ }
+ }
+ /* Caller must log the inode */
+}
+
+/*
+ * Increment the link count on an inode & log the change.
+ */
+int
+xfs_bumplink(
+ xfs_trans_t *tp,
+ xfs_inode_t *ip)
+{
+ xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
+
+ ASSERT(ip->i_d.di_nlink > 0);
+ ip->i_d.di_nlink++;
+ inc_nlink(VFS_I(ip));
+ if ((ip->i_d.di_version == 1) &&
+ (ip->i_d.di_nlink > XFS_MAXLINK_1)) {
+ /*
+ * The inode has increased its number of links beyond
+ * what can fit in an old format inode. It now needs
+ * to be converted to a version 2 inode with a 32 bit
+ * link count. If this is the first inode in the file
+ * system to do this, then we need to bump the superblock
+ * version number as well.
+ */
+ xfs_bump_ino_vers2(tp, ip);
+ }
+
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ return 0;
+}
+
int
xfs_create(
xfs_inode_t *dp,
diff --git a/fs/xfs/xfs_inode_ops.h b/fs/xfs/xfs_inode_ops.h
index f4e6bf6..0c34561 100644
--- a/fs/xfs/xfs_inode_ops.h
+++ b/fs/xfs/xfs_inode_ops.h
@@ -368,4 +368,11 @@ do { \
iput(VFS_I(ip)); \
} while (0)
+int xfs_dir_ialloc(struct xfs_trans **, struct xfs_inode *, umode_t,
+ xfs_nlink_t, xfs_dev_t, prid_t, int,
+ struct xfs_inode **, int *);
+int xfs_droplink(struct xfs_trans *, struct xfs_inode *);
+int xfs_bumplink(struct xfs_trans *, struct xfs_inode *);
+void xfs_bump_ino_vers2(struct xfs_trans *, struct xfs_inode *);
+
#endif /* __XFS_INODE_OPS_H__ */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 633be2e..964a6dd 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -33,7 +33,6 @@
#include "xfs_attr.h"
#include "xfs_bmap.h"
#include "xfs_buf_item.h"
-#include "xfs_utils.h"
#include "xfs_dfrag.h"
#include "xfs_fsops.h"
#include "xfs_discard.h"
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 6a70964..effd1bc4 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -38,7 +38,6 @@
#include "xfs_attr.h"
#include "xfs_buf_item.h"
#include "xfs_trans_space.h"
-#include "xfs_utils.h"
#include "xfs_iomap.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 99775a2..29d2108 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -34,7 +34,6 @@
#include "xfs_itable.h"
#include "xfs_attr.h"
#include "xfs_buf_item.h"
-#include "xfs_utils.h"
#include "xfs_inode_item.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index ffa519d..8a810fe 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -41,7 +41,6 @@
#include "xfs_extfree_item.h"
#include "xfs_trans_priv.h"
#include "xfs_quota.h"
-#include "xfs_utils.h"
#include "xfs_cksum.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 5117587..6846b77 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -42,7 +42,6 @@
#include "xfs_error.h"
#include "xfs_quota.h"
#include "xfs_fsops.h"
-#include "xfs_utils.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
#include "xfs_cksum.h"
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index b75c9bb..6dddf1a 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -37,7 +37,6 @@
#include "xfs_attr.h"
#include "xfs_buf_item.h"
#include "xfs_trans_space.h"
-#include "xfs_utils.h"
#include "xfs_qm.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 6cdf6ff..7c260f8 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -37,7 +37,6 @@
#include "xfs_error.h"
#include "xfs_attr.h"
#include "xfs_buf_item.h"
-#include "xfs_utils.h"
#include "xfs_qm.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 8623ba7..490fc05 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -34,7 +34,6 @@
#include "xfs_error.h"
#include "xfs_inode_item.h"
#include "xfs_trans_space.h"
-#include "xfs_utils.h"
#include "xfs_trace.h"
#include "xfs_buf.h"
#include "xfs_icache.h"
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 229c99d..d094a41 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -39,7 +39,6 @@
#include "xfs_fsops.h"
#include "xfs_attr.h"
#include "xfs_buf_item.h"
-#include "xfs_utils.h"
#include "xfs_log_priv.h"
#include "xfs_trans_priv.h"
#include "xfs_filestream.h"
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 996925b..69ec1e6 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -39,7 +39,6 @@
#include "xfs_bmap.h"
#include "xfs_error.h"
#include "xfs_quota.h"
-#include "xfs_utils.h"
#include "xfs_trans_space.h"
#include "xfs_log_priv.h"
#include "xfs_trace.h"
diff --git a/fs/xfs/xfs_utils.c b/fs/xfs/xfs_utils.c
deleted file mode 100644
index 7bf8f13..0000000
--- a/fs/xfs/xfs_utils.c
+++ /dev/null
@@ -1,316 +0,0 @@
-/*
- * Copyright (c) 2000-2002,2005 Silicon Graphics, Inc.
- * All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it would be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write the Free Software Foundation,
- * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-#include "xfs.h"
-#include "xfs_fs.h"
-#include "xfs_types.h"
-#include "xfs_log.h"
-#include "xfs_trans.h"
-#include "xfs_sb.h"
-#include "xfs_ag.h"
-#include "xfs_mount.h"
-#include "xfs_da_btree.h"
-#include "xfs_dir2_format.h"
-#include "xfs_dir2.h"
-#include "xfs_bmap_btree.h"
-#include "xfs_dinode.h"
-#include "xfs_inode.h"
-#include "xfs_inode_item.h"
-#include "xfs_bmap.h"
-#include "xfs_error.h"
-#include "xfs_quota.h"
-#include "xfs_itable.h"
-#include "xfs_utils.h"
-
-
-/*
- * Allocates a new inode from disk and return a pointer to the
- * incore copy. This routine will internally commit the current
- * transaction and allocate a new one if the Space Manager needed
- * to do an allocation to replenish the inode free-list.
- *
- * This routine is designed to be called from xfs_create and
- * xfs_create_dir.
- *
- */
-int
-xfs_dir_ialloc(
- xfs_trans_t **tpp, /* input: current transaction;
- output: may be a new transaction. */
- xfs_inode_t *dp, /* directory within whose allocate
- the inode. */
- umode_t mode,
- xfs_nlink_t nlink,
- xfs_dev_t rdev,
- prid_t prid, /* project id */
- int okalloc, /* ok to allocate new space */
- xfs_inode_t **ipp, /* pointer to inode; it will be
- locked. */
- int *committed)
-
-{
- xfs_trans_t *tp;
- xfs_trans_t *ntp;
- xfs_inode_t *ip;
- xfs_buf_t *ialloc_context = NULL;
- int code;
- uint log_res;
- uint log_count;
- void *dqinfo;
- uint tflags;
-
- tp = *tpp;
- ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
-
- /*
- * xfs_ialloc will return a pointer to an incore inode if
- * the Space Manager has an available inode on the free
- * list. Otherwise, it will do an allocation and replenish
- * the freelist. Since we can only do one allocation per
- * transaction without deadlocks, we will need to commit the
- * current transaction and start a new one. We will then
- * need to call xfs_ialloc again to get the inode.
- *
- * If xfs_ialloc did an allocation to replenish the freelist,
- * it returns the bp containing the head of the freelist as
- * ialloc_context. We will hold a lock on it across the
- * transaction commit so that no other process can steal
- * the inode(s) that we've just allocated.
- */
- code = xfs_ialloc(tp, dp, mode, nlink, rdev, prid, okalloc,
- &ialloc_context, &ip);
-
- /*
- * Return an error if we were unable to allocate a new inode.
- * This should only happen if we run out of space on disk or
- * encounter a disk error.
- */
- if (code) {
- *ipp = NULL;
- return code;
- }
- if (!ialloc_context && !ip) {
- *ipp = NULL;
- return XFS_ERROR(ENOSPC);
- }
-
- /*
- * If the AGI buffer is non-NULL, then we were unable to get an
- * inode in one operation. We need to commit the current
- * transaction and call xfs_ialloc() again. It is guaranteed
- * to succeed the second time.
- */
- if (ialloc_context) {
- /*
- * Normally, xfs_trans_commit releases all the locks.
- * We call bhold to hang on to the ialloc_context across
- * the commit. Holding this buffer prevents any other
- * processes from doing any allocations in this
- * allocation group.
- */
- xfs_trans_bhold(tp, ialloc_context);
- /*
- * Save the log reservation so we can use
- * them in the next transaction.
- */
- log_res = xfs_trans_get_log_res(tp);
- log_count = xfs_trans_get_log_count(tp);
-
- /*
- * We want the quota changes to be associated with the next
- * transaction, NOT this one. So, detach the dqinfo from this
- * and attach it to the next transaction.
- */
- dqinfo = NULL;
- tflags = 0;
- if (tp->t_dqinfo) {
- dqinfo = (void *)tp->t_dqinfo;
- tp->t_dqinfo = NULL;
- tflags = tp->t_flags & XFS_TRANS_DQ_DIRTY;
- tp->t_flags &= ~(XFS_TRANS_DQ_DIRTY);
- }
-
- ntp = xfs_trans_dup(tp);
- code = xfs_trans_commit(tp, 0);
- tp = ntp;
- if (committed != NULL) {
- *committed = 1;
- }
- /*
- * If we get an error during the commit processing,
- * release the buffer that is still held and return
- * to the caller.
- */
- if (code) {
- xfs_buf_relse(ialloc_context);
- if (dqinfo) {
- tp->t_dqinfo = dqinfo;
- xfs_trans_free_dqinfo(tp);
- }
- *tpp = ntp;
- *ipp = NULL;
- return code;
- }
-
- /*
- * transaction commit worked ok so we can drop the extra ticket
- * reference that we gained in xfs_trans_dup()
- */
- xfs_log_ticket_put(tp->t_ticket);
- code = xfs_trans_reserve(tp, 0, log_res, 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
- /*
- * Re-attach the quota info that we detached from prev trx.
- */
- if (dqinfo) {
- tp->t_dqinfo = dqinfo;
- tp->t_flags |= tflags;
- }
-
- if (code) {
- xfs_buf_relse(ialloc_context);
- *tpp = ntp;
- *ipp = NULL;
- return code;
- }
- xfs_trans_bjoin(tp, ialloc_context);
-
- /*
- * Call ialloc again. Since we've locked out all
- * other allocations in this allocation group,
- * this call should always succeed.
- */
- code = xfs_ialloc(tp, dp, mode, nlink, rdev, prid,
- okalloc, &ialloc_context, &ip);
-
- /*
- * If we get an error at this point, return to the caller
- * so that the current transaction can be aborted.
- */
- if (code) {
- *tpp = tp;
- *ipp = NULL;
- return code;
- }
- ASSERT(!ialloc_context && ip);
-
- } else {
- if (committed != NULL)
- *committed = 0;
- }
-
- *ipp = ip;
- *tpp = tp;
-
- return 0;
-}
-
-/*
- * Decrement the link count on an inode & log the change.
- * If this causes the link count to go to zero, initiate the
- * logging activity required to truncate a file.
- */
-int /* error */
-xfs_droplink(
- xfs_trans_t *tp,
- xfs_inode_t *ip)
-{
- int error;
-
- xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
-
- ASSERT (ip->i_d.di_nlink > 0);
- ip->i_d.di_nlink--;
- drop_nlink(VFS_I(ip));
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-
- error = 0;
- if (ip->i_d.di_nlink == 0) {
- /*
- * We're dropping the last link to this file.
- * Move the on-disk inode to the AGI unlinked list.
- * From xfs_inactive() we will pull the inode from
- * the list and free it.
- */
- error = xfs_iunlink(tp, ip);
- }
- return error;
-}
-
-/*
- * This gets called when the inode's version needs to be changed from 1 to 2.
- * Currently this happens when the nlink field overflows the old 16-bit value
- * or when chproj is called to change the project for the first time.
- * As a side effect the superblock version will also get rev'd
- * to contain the NLINK bit.
- */
-void
-xfs_bump_ino_vers2(
- xfs_trans_t *tp,
- xfs_inode_t *ip)
-{
- xfs_mount_t *mp;
-
- ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
- ASSERT(ip->i_d.di_version == 1);
-
- ip->i_d.di_version = 2;
- ip->i_d.di_onlink = 0;
- memset(&(ip->i_d.di_pad[0]), 0, sizeof(ip->i_d.di_pad));
- mp = tp->t_mountp;
- if (!xfs_sb_version_hasnlink(&mp->m_sb)) {
- spin_lock(&mp->m_sb_lock);
- if (!xfs_sb_version_hasnlink(&mp->m_sb)) {
- xfs_sb_version_addnlink(&mp->m_sb);
- spin_unlock(&mp->m_sb_lock);
- xfs_mod_sb(tp, XFS_SB_VERSIONNUM);
- } else {
- spin_unlock(&mp->m_sb_lock);
- }
- }
- /* Caller must log the inode */
-}
-
-/*
- * Increment the link count on an inode & log the change.
- */
-int
-xfs_bumplink(
- xfs_trans_t *tp,
- xfs_inode_t *ip)
-{
- xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
-
- ASSERT(ip->i_d.di_nlink > 0);
- ip->i_d.di_nlink++;
- inc_nlink(VFS_I(ip));
- if ((ip->i_d.di_version == 1) &&
- (ip->i_d.di_nlink > XFS_MAXLINK_1)) {
- /*
- * The inode has increased its number of links beyond
- * what can fit in an old format inode. It now needs
- * to be converted to a version 2 inode with a 32 bit
- * link count. If this is the first inode in the file
- * system to do this, then we need to bump the superblock
- * version number as well.
- */
- xfs_bump_ino_vers2(tp, ip);
- }
-
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
- return 0;
-}
diff --git a/fs/xfs/xfs_utils.h b/fs/xfs/xfs_utils.h
deleted file mode 100644
index 5eeab46..0000000
--- a/fs/xfs/xfs_utils.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/*
- * Copyright (c) 2000-2002,2005 Silicon Graphics, Inc.
- * All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it would be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write the Free Software Foundation,
- * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-#ifndef __XFS_UTILS_H__
-#define __XFS_UTILS_H__
-
-extern int xfs_dir_ialloc(xfs_trans_t **, xfs_inode_t *, umode_t, xfs_nlink_t,
- xfs_dev_t, prid_t, int, xfs_inode_t **, int *);
-extern int xfs_droplink(xfs_trans_t *, xfs_inode_t *);
-extern int xfs_bumplink(xfs_trans_t *, xfs_inode_t *);
-extern void xfs_bump_ino_vers2(xfs_trans_t *, xfs_inode_t *);
-
-#endif /* __XFS_UTILS_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 29/60] xfs: consolidate xfs_utils.c
2013-06-19 4:50 ` [PATCH 29/60] xfs: consolidate xfs_utils.c Dave Chinner
@ 2013-06-19 9:40 ` Christoph Hellwig
0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2013-06-19 9:40 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> +/*
> + * Allocates a new inode from disk and return a pointer to the
> + * incore copy. This routine will internally commit the current
> + * transaction and allocate a new one if the Space Manager needed
> + * to do an allocation to replenish the inode free-list.
> + *
> + * This routine is designed to be called from xfs_create and
> + * xfs_create_dir.
xfs_create_dir is long gone. I suspect it's best to just kill that
sentence.
> +/*
> + * Decrement the link count on an inode & log the change.
> + * If this causes the link count to go to zero, initiate the
> + * logging activity required to truncate a file.
> + */
> +int /* error */
> +xfs_droplink(
Can be static now.
> +/*
> + * Increment the link count on an inode & log the change.
> + */
> +int
> +xfs_bumplink(
Same here.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 30/60] xfs: split out inode log item format definition
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (28 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 29/60] xfs: consolidate xfs_utils.c Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 31/60] xfs: split out buf log item format definitions Dave Chinner
` (31 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Th elog item format definitions are shared with userspace. split the
out of header files that contain kernel only defintions to make it
simple to shared them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.h | 63 +------------
fs/xfs/xfs_inode_item.h | 115 +----------------------
fs/xfs/xfs_inode_item_format.h | 198 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 202 insertions(+), 174 deletions(-)
create mode 100644 fs/xfs/xfs_inode_item_format.h
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 739a975..60d55de 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -115,67 +115,7 @@ struct xfs_imap {
* chain off the mount structure by xfs_sync calls.
*/
-typedef struct xfs_ictimestamp {
- __int32_t t_sec; /* timestamp seconds */
- __int32_t t_nsec; /* timestamp nanoseconds */
-} xfs_ictimestamp_t;
-
-/*
- * NOTE: This structure must be kept identical to struct xfs_dinode
- * in xfs_dinode.h except for the endianness annotations.
- */
-typedef struct xfs_icdinode {
- __uint16_t di_magic; /* inode magic # = XFS_DINODE_MAGIC */
- __uint16_t di_mode; /* mode and type of file */
- __int8_t di_version; /* inode version */
- __int8_t di_format; /* format of di_c data */
- __uint16_t di_onlink; /* old number of links to file */
- __uint32_t di_uid; /* owner's user id */
- __uint32_t di_gid; /* owner's group id */
- __uint32_t di_nlink; /* number of links to file */
- __uint16_t di_projid_lo; /* lower part of owner's project id */
- __uint16_t di_projid_hi; /* higher part of owner's project id */
- __uint8_t di_pad[6]; /* unused, zeroed space */
- __uint16_t di_flushiter; /* incremented on flush */
- xfs_ictimestamp_t di_atime; /* time last accessed */
- xfs_ictimestamp_t di_mtime; /* time last modified */
- xfs_ictimestamp_t di_ctime; /* time created/inode modified */
- xfs_fsize_t di_size; /* number of bytes in file */
- xfs_drfsbno_t di_nblocks; /* # of direct & btree blocks used */
- xfs_extlen_t di_extsize; /* basic/minimum extent size for file */
- xfs_extnum_t di_nextents; /* number of extents in data fork */
- xfs_aextnum_t di_anextents; /* number of extents in attribute fork*/
- __uint8_t di_forkoff; /* attr fork offs, <<3 for 64b align */
- __int8_t di_aformat; /* format of attr fork's data */
- __uint32_t di_dmevmask; /* DMIG event mask */
- __uint16_t di_dmstate; /* DMIG state info */
- __uint16_t di_flags; /* random flags, XFS_DIFLAG_... */
- __uint32_t di_gen; /* generation number */
-
- /* di_next_unlinked is the only non-core field in the old dinode */
- xfs_agino_t di_next_unlinked;/* agi unlinked list ptr */
-
- /* start of the extended dinode, writable fields */
- __uint32_t di_crc; /* CRC of the inode */
- __uint64_t di_changecount; /* number of attribute changes */
- xfs_lsn_t di_lsn; /* flush sequence */
- __uint64_t di_flags2; /* more random flags */
- __uint8_t di_pad2[16]; /* more padding for future expansion */
-
- /* fields only written to during inode creation */
- xfs_ictimestamp_t di_crtime; /* time created */
- xfs_ino_t di_ino; /* inode number */
- uuid_t di_uuid; /* UUID of the filesystem */
-
- /* structure must be padded to 64 bit alignment */
-} xfs_icdinode_t;
-
-static inline uint xfs_icdinode_size(int version)
-{
- if (version == 3)
- return sizeof(struct xfs_icdinode);
- return offsetof(struct xfs_icdinode, di_next_unlinked);
-}
+#include "xfs_inode_item_format.h"
/*
* Flags for xfs_ichgtime().
@@ -300,7 +240,6 @@ void xfs_inobp_check(struct xfs_mount *, struct xfs_buf *);
extern struct kmem_zone *xfs_ifork_zone;
extern struct kmem_zone *xfs_inode_zone;
-extern struct kmem_zone *xfs_ili_zone;
extern const struct xfs_buf_ops xfs_inode_buf_ops;
#endif /* __XFS_INODE_H__ */
diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h
index 779812f..65da466 100644
--- a/fs/xfs/xfs_inode_item.h
+++ b/fs/xfs/xfs_inode_item.h
@@ -18,123 +18,15 @@
#ifndef __XFS_INODE_ITEM_H__
#define __XFS_INODE_ITEM_H__
-/*
- * This is the structure used to lay out an inode log item in the
- * log. The size of the inline data/extents/b-tree root to be logged
- * (if any) is indicated in the ilf_dsize field. Changes to this structure
- * must be added on to the end.
- */
-typedef struct xfs_inode_log_format {
- __uint16_t ilf_type; /* inode log item type */
- __uint16_t ilf_size; /* size of this item */
- __uint32_t ilf_fields; /* flags for fields logged */
- __uint16_t ilf_asize; /* size of attr d/ext/root */
- __uint16_t ilf_dsize; /* size of data/ext/root */
- __uint64_t ilf_ino; /* inode number */
- union {
- __uint32_t ilfu_rdev; /* rdev value for dev inode*/
- uuid_t ilfu_uuid; /* mount point value */
- } ilf_u;
- __int64_t ilf_blkno; /* blkno of inode buffer */
- __int32_t ilf_len; /* len of inode buffer */
- __int32_t ilf_boffset; /* off of inode in buffer */
-} xfs_inode_log_format_t;
-
-typedef struct xfs_inode_log_format_32 {
- __uint16_t ilf_type; /* inode log item type */
- __uint16_t ilf_size; /* size of this item */
- __uint32_t ilf_fields; /* flags for fields logged */
- __uint16_t ilf_asize; /* size of attr d/ext/root */
- __uint16_t ilf_dsize; /* size of data/ext/root */
- __uint64_t ilf_ino; /* inode number */
- union {
- __uint32_t ilfu_rdev; /* rdev value for dev inode*/
- uuid_t ilfu_uuid; /* mount point value */
- } ilf_u;
- __int64_t ilf_blkno; /* blkno of inode buffer */
- __int32_t ilf_len; /* len of inode buffer */
- __int32_t ilf_boffset; /* off of inode in buffer */
-} __attribute__((packed)) xfs_inode_log_format_32_t;
-
-typedef struct xfs_inode_log_format_64 {
- __uint16_t ilf_type; /* inode log item type */
- __uint16_t ilf_size; /* size of this item */
- __uint32_t ilf_fields; /* flags for fields logged */
- __uint16_t ilf_asize; /* size of attr d/ext/root */
- __uint16_t ilf_dsize; /* size of data/ext/root */
- __uint32_t ilf_pad; /* pad for 64 bit boundary */
- __uint64_t ilf_ino; /* inode number */
- union {
- __uint32_t ilfu_rdev; /* rdev value for dev inode*/
- uuid_t ilfu_uuid; /* mount point value */
- } ilf_u;
- __int64_t ilf_blkno; /* blkno of inode buffer */
- __int32_t ilf_len; /* len of inode buffer */
- __int32_t ilf_boffset; /* off of inode in buffer */
-} xfs_inode_log_format_64_t;
-
-/*
- * Flags for xfs_trans_log_inode flags field.
- */
-#define XFS_ILOG_CORE 0x001 /* log standard inode fields */
-#define XFS_ILOG_DDATA 0x002 /* log i_df.if_data */
-#define XFS_ILOG_DEXT 0x004 /* log i_df.if_extents */
-#define XFS_ILOG_DBROOT 0x008 /* log i_df.i_broot */
-#define XFS_ILOG_DEV 0x010 /* log the dev field */
-#define XFS_ILOG_UUID 0x020 /* log the uuid field */
-#define XFS_ILOG_ADATA 0x040 /* log i_af.if_data */
-#define XFS_ILOG_AEXT 0x080 /* log i_af.if_extents */
-#define XFS_ILOG_ABROOT 0x100 /* log i_af.i_broot */
-
-
-/*
- * The timestamps are dirty, but not necessarily anything else in the inode
- * core. Unlike the other fields above this one must never make it to disk
- * in the ilf_fields of the inode_log_format, but is purely store in-memory in
- * ili_fields in the inode_log_item.
- */
-#define XFS_ILOG_TIMESTAMP 0x4000
-
-#define XFS_ILOG_NONCORE (XFS_ILOG_DDATA | XFS_ILOG_DEXT | \
- XFS_ILOG_DBROOT | XFS_ILOG_DEV | \
- XFS_ILOG_UUID | XFS_ILOG_ADATA | \
- XFS_ILOG_AEXT | XFS_ILOG_ABROOT)
-
-#define XFS_ILOG_DFORK (XFS_ILOG_DDATA | XFS_ILOG_DEXT | \
- XFS_ILOG_DBROOT)
+#include "xfs_inode_item_format.h"
-#define XFS_ILOG_AFORK (XFS_ILOG_ADATA | XFS_ILOG_AEXT | \
- XFS_ILOG_ABROOT)
-
-#define XFS_ILOG_ALL (XFS_ILOG_CORE | XFS_ILOG_DDATA | \
- XFS_ILOG_DEXT | XFS_ILOG_DBROOT | \
- XFS_ILOG_DEV | XFS_ILOG_UUID | \
- XFS_ILOG_ADATA | XFS_ILOG_AEXT | \
- XFS_ILOG_ABROOT | XFS_ILOG_TIMESTAMP)
-
-static inline int xfs_ilog_fbroot(int w)
-{
- return (w == XFS_DATA_FORK ? XFS_ILOG_DBROOT : XFS_ILOG_ABROOT);
-}
-
-static inline int xfs_ilog_fext(int w)
-{
- return (w == XFS_DATA_FORK ? XFS_ILOG_DEXT : XFS_ILOG_AEXT);
-}
-
-static inline int xfs_ilog_fdata(int w)
-{
- return (w == XFS_DATA_FORK ? XFS_ILOG_DDATA : XFS_ILOG_ADATA);
-}
-
-#ifdef __KERNEL__
+/* kernel only definitions */
struct xfs_buf;
struct xfs_bmbt_rec;
struct xfs_inode;
struct xfs_mount;
-
typedef struct xfs_inode_log_item {
xfs_log_item_t ili_item; /* common portion */
struct xfs_inode *ili_inode; /* inode ptr */
@@ -151,7 +43,6 @@ typedef struct xfs_inode_log_item {
xfs_inode_log_format_t ili_format; /* logged structure */
} xfs_inode_log_item_t;
-
static inline int xfs_inode_clean(xfs_inode_t *ip)
{
return !ip->i_itemp || !(ip->i_itemp->ili_fields & XFS_ILOG_ALL);
@@ -165,6 +56,6 @@ extern void xfs_iflush_abort(struct xfs_inode *, bool);
extern int xfs_inode_item_format_convert(xfs_log_iovec_t *,
xfs_inode_log_format_t *);
-#endif /* __KERNEL__ */
+extern struct kmem_zone *xfs_ili_zone;
#endif /* __XFS_INODE_ITEM_H__ */
diff --git a/fs/xfs/xfs_inode_item_format.h b/fs/xfs/xfs_inode_item_format.h
new file mode 100644
index 0000000..835121d
--- /dev/null
+++ b/fs/xfs/xfs_inode_item_format.h
@@ -0,0 +1,198 @@
+/*
+ * Copyright (c) 2000,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_INODE_ITEM_FORMAT_H__
+#define __XFS_INODE_ITEM_FORMAT_H__
+
+/*
+ * This is the structure used to lay out an inode log item in the
+ * log. The size of the inline data/extents/b-tree root to be logged
+ * (if any) is indicated in the ilf_dsize field. Changes to this structure
+ * must be added on to the end.
+ */
+typedef struct xfs_inode_log_format {
+ __uint16_t ilf_type; /* inode log item type */
+ __uint16_t ilf_size; /* size of this item */
+ __uint32_t ilf_fields; /* flags for fields logged */
+ __uint16_t ilf_asize; /* size of attr d/ext/root */
+ __uint16_t ilf_dsize; /* size of data/ext/root */
+ __uint64_t ilf_ino; /* inode number */
+ union {
+ __uint32_t ilfu_rdev; /* rdev value for dev inode*/
+ uuid_t ilfu_uuid; /* mount point value */
+ } ilf_u;
+ __int64_t ilf_blkno; /* blkno of inode buffer */
+ __int32_t ilf_len; /* len of inode buffer */
+ __int32_t ilf_boffset; /* off of inode in buffer */
+} xfs_inode_log_format_t;
+
+typedef struct xfs_inode_log_format_32 {
+ __uint16_t ilf_type; /* inode log item type */
+ __uint16_t ilf_size; /* size of this item */
+ __uint32_t ilf_fields; /* flags for fields logged */
+ __uint16_t ilf_asize; /* size of attr d/ext/root */
+ __uint16_t ilf_dsize; /* size of data/ext/root */
+ __uint64_t ilf_ino; /* inode number */
+ union {
+ __uint32_t ilfu_rdev; /* rdev value for dev inode*/
+ uuid_t ilfu_uuid; /* mount point value */
+ } ilf_u;
+ __int64_t ilf_blkno; /* blkno of inode buffer */
+ __int32_t ilf_len; /* len of inode buffer */
+ __int32_t ilf_boffset; /* off of inode in buffer */
+} __attribute__((packed)) xfs_inode_log_format_32_t;
+
+typedef struct xfs_inode_log_format_64 {
+ __uint16_t ilf_type; /* inode log item type */
+ __uint16_t ilf_size; /* size of this item */
+ __uint32_t ilf_fields; /* flags for fields logged */
+ __uint16_t ilf_asize; /* size of attr d/ext/root */
+ __uint16_t ilf_dsize; /* size of data/ext/root */
+ __uint32_t ilf_pad; /* pad for 64 bit boundary */
+ __uint64_t ilf_ino; /* inode number */
+ union {
+ __uint32_t ilfu_rdev; /* rdev value for dev inode*/
+ uuid_t ilfu_uuid; /* mount point value */
+ } ilf_u;
+ __int64_t ilf_blkno; /* blkno of inode buffer */
+ __int32_t ilf_len; /* len of inode buffer */
+ __int32_t ilf_boffset; /* off of inode in buffer */
+} xfs_inode_log_format_64_t;
+
+/*
+ * Flags for xfs_trans_log_inode flags field.
+ */
+#define XFS_ILOG_CORE 0x001 /* log standard inode fields */
+#define XFS_ILOG_DDATA 0x002 /* log i_df.if_data */
+#define XFS_ILOG_DEXT 0x004 /* log i_df.if_extents */
+#define XFS_ILOG_DBROOT 0x008 /* log i_df.i_broot */
+#define XFS_ILOG_DEV 0x010 /* log the dev field */
+#define XFS_ILOG_UUID 0x020 /* log the uuid field */
+#define XFS_ILOG_ADATA 0x040 /* log i_af.if_data */
+#define XFS_ILOG_AEXT 0x080 /* log i_af.if_extents */
+#define XFS_ILOG_ABROOT 0x100 /* log i_af.i_broot */
+
+
+/*
+ * The timestamps are dirty, but not necessarily anything else in the inode
+ * core. Unlike the other fields above this one must never make it to disk
+ * in the ilf_fields of the inode_log_format, but is purely store in-memory in
+ * ili_fields in the inode_log_item.
+ */
+#define XFS_ILOG_TIMESTAMP 0x4000
+
+#define XFS_ILOG_NONCORE (XFS_ILOG_DDATA | XFS_ILOG_DEXT | \
+ XFS_ILOG_DBROOT | XFS_ILOG_DEV | \
+ XFS_ILOG_UUID | XFS_ILOG_ADATA | \
+ XFS_ILOG_AEXT | XFS_ILOG_ABROOT)
+
+#define XFS_ILOG_DFORK (XFS_ILOG_DDATA | XFS_ILOG_DEXT | \
+ XFS_ILOG_DBROOT)
+
+#define XFS_ILOG_AFORK (XFS_ILOG_ADATA | XFS_ILOG_AEXT | \
+ XFS_ILOG_ABROOT)
+
+#define XFS_ILOG_ALL (XFS_ILOG_CORE | XFS_ILOG_DDATA | \
+ XFS_ILOG_DEXT | XFS_ILOG_DBROOT | \
+ XFS_ILOG_DEV | XFS_ILOG_UUID | \
+ XFS_ILOG_ADATA | XFS_ILOG_AEXT | \
+ XFS_ILOG_ABROOT | XFS_ILOG_TIMESTAMP)
+
+static inline int xfs_ilog_fbroot(int w)
+{
+ return (w == XFS_DATA_FORK ? XFS_ILOG_DBROOT : XFS_ILOG_ABROOT);
+}
+
+static inline int xfs_ilog_fext(int w)
+{
+ return (w == XFS_DATA_FORK ? XFS_ILOG_DEXT : XFS_ILOG_AEXT);
+}
+
+static inline int xfs_ilog_fdata(int w)
+{
+ return (w == XFS_DATA_FORK ? XFS_ILOG_DDATA : XFS_ILOG_ADATA);
+}
+
+/*
+ * Incore version of the on-disk inode core structures. We log this directly
+ * into the journal in host CPU format (for better or worse)i and as such
+ * directly mirrors the xfs_dinode structure as it must contain all the same
+ * information.
+ */
+typedef struct xfs_ictimestamp {
+ __int32_t t_sec; /* timestamp seconds */
+ __int32_t t_nsec; /* timestamp nanoseconds */
+} xfs_ictimestamp_t;
+
+/*
+ * NOTE: This structure must be kept identical to struct xfs_dinode
+ * in xfs_dinode.h except for the endianness annotations.
+ */
+typedef struct xfs_icdinode {
+ __uint16_t di_magic; /* inode magic # = XFS_DINODE_MAGIC */
+ __uint16_t di_mode; /* mode and type of file */
+ __int8_t di_version; /* inode version */
+ __int8_t di_format; /* format of di_c data */
+ __uint16_t di_onlink; /* old number of links to file */
+ __uint32_t di_uid; /* owner's user id */
+ __uint32_t di_gid; /* owner's group id */
+ __uint32_t di_nlink; /* number of links to file */
+ __uint16_t di_projid_lo; /* lower part of owner's project id */
+ __uint16_t di_projid_hi; /* higher part of owner's project id */
+ __uint8_t di_pad[6]; /* unused, zeroed space */
+ __uint16_t di_flushiter; /* incremented on flush */
+ xfs_ictimestamp_t di_atime; /* time last accessed */
+ xfs_ictimestamp_t di_mtime; /* time last modified */
+ xfs_ictimestamp_t di_ctime; /* time created/inode modified */
+ xfs_fsize_t di_size; /* number of bytes in file */
+ xfs_drfsbno_t di_nblocks; /* # of direct & btree blocks used */
+ xfs_extlen_t di_extsize; /* basic/minimum extent size for file */
+ xfs_extnum_t di_nextents; /* number of extents in data fork */
+ xfs_aextnum_t di_anextents; /* number of extents in attribute fork*/
+ __uint8_t di_forkoff; /* attr fork offs, <<3 for 64b align */
+ __int8_t di_aformat; /* format of attr fork's data */
+ __uint32_t di_dmevmask; /* DMIG event mask */
+ __uint16_t di_dmstate; /* DMIG state info */
+ __uint16_t di_flags; /* random flags, XFS_DIFLAG_... */
+ __uint32_t di_gen; /* generation number */
+
+ /* di_next_unlinked is the only non-core field in the old dinode */
+ xfs_agino_t di_next_unlinked;/* agi unlinked list ptr */
+
+ /* start of the extended dinode, writable fields */
+ __uint32_t di_crc; /* CRC of the inode */
+ __uint64_t di_changecount; /* number of attribute changes */
+ xfs_lsn_t di_lsn; /* flush sequence */
+ __uint64_t di_flags2; /* more random flags */
+ __uint8_t di_pad2[16]; /* more padding for future expansion */
+
+ /* fields only written to during inode creation */
+ xfs_ictimestamp_t di_crtime; /* time created */
+ xfs_ino_t di_ino; /* inode number */
+ uuid_t di_uuid; /* UUID of the filesystem */
+
+ /* structure must be padded to 64 bit alignment */
+} xfs_icdinode_t;
+
+static inline uint xfs_icdinode_size(int version)
+{
+ if (version == 3)
+ return sizeof(struct xfs_icdinode);
+ return offsetof(struct xfs_icdinode, di_next_unlinked);
+}
+
+#endif /* __XFS_INODE_ITEM_FORMAT_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 31/60] xfs: split out buf log item format definitions
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (29 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 30/60] xfs: split out inode log item format definition Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 32/60] xfs: move inode fork definitions to a new header file Dave Chinner
` (30 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_buf_item.h | 100 ++-----------------------------------
fs/xfs/xfs_buf_item_format.h | 111 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 115 insertions(+), 96 deletions(-)
create mode 100644 fs/xfs/xfs_buf_item_format.h
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 0f1c247..97cbb01 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -18,101 +18,11 @@
#ifndef __XFS_BUF_ITEM_H__
#define __XFS_BUF_ITEM_H__
-extern kmem_zone_t *xfs_buf_item_zone;
-
-/*
- * This flag indicates that the buffer contains on disk inodes
- * and requires special recovery handling.
- */
-#define XFS_BLF_INODE_BUF (1<<0)
-/*
- * This flag indicates that the buffer should not be replayed
- * during recovery because its blocks are being freed.
- */
-#define XFS_BLF_CANCEL (1<<1)
-
-/*
- * This flag indicates that the buffer contains on disk
- * user or group dquots and may require special recovery handling.
- */
-#define XFS_BLF_UDQUOT_BUF (1<<2)
-#define XFS_BLF_PDQUOT_BUF (1<<3)
-#define XFS_BLF_GDQUOT_BUF (1<<4)
-
-#define XFS_BLF_CHUNK 128
-#define XFS_BLF_SHIFT 7
-#define BIT_TO_WORD_SHIFT 5
-#define NBWORD (NBBY * sizeof(unsigned int))
-
-/*
- * This is the structure used to lay out a buf log item in the
- * log. The data map describes which 128 byte chunks of the buffer
- * have been logged.
- */
-#define XFS_BLF_DATAMAP_SIZE ((XFS_MAX_BLOCKSIZE / XFS_BLF_CHUNK) / NBWORD)
-
-typedef struct xfs_buf_log_format {
- unsigned short blf_type; /* buf log item type indicator */
- unsigned short blf_size; /* size of this item */
- ushort blf_flags; /* misc state */
- ushort blf_len; /* number of blocks in this buf */
- __int64_t blf_blkno; /* starting blkno of this buf */
- unsigned int blf_map_size; /* used size of data bitmap in words */
- unsigned int blf_data_map[XFS_BLF_DATAMAP_SIZE]; /* dirty bitmap */
-} xfs_buf_log_format_t;
-
-/*
- * All buffers now need to tell recovery where the magic number
- * is so that it can verify and calculate the CRCs on the buffer correctly
- * once the changes have been replayed into the buffer.
- *
- * The type value is held in the upper 5 bits of the blf_flags field, which is
- * an unsigned 16 bit field. Hence we need to shift it 11 bits up and down.
- */
-#define XFS_BLFT_BITS 5
-#define XFS_BLFT_SHIFT 11
-#define XFS_BLFT_MASK (((1 << XFS_BLFT_BITS) - 1) << XFS_BLFT_SHIFT)
+#include "xfs_buf_item_format.h"
-enum xfs_blft {
- XFS_BLFT_UNKNOWN_BUF = 0,
- XFS_BLFT_UDQUOT_BUF,
- XFS_BLFT_PDQUOT_BUF,
- XFS_BLFT_GDQUOT_BUF,
- XFS_BLFT_BTREE_BUF,
- XFS_BLFT_AGF_BUF,
- XFS_BLFT_AGFL_BUF,
- XFS_BLFT_AGI_BUF,
- XFS_BLFT_DINO_BUF,
- XFS_BLFT_SYMLINK_BUF,
- XFS_BLFT_DIR_BLOCK_BUF,
- XFS_BLFT_DIR_DATA_BUF,
- XFS_BLFT_DIR_FREE_BUF,
- XFS_BLFT_DIR_LEAF1_BUF,
- XFS_BLFT_DIR_LEAFN_BUF,
- XFS_BLFT_DA_NODE_BUF,
- XFS_BLFT_ATTR_LEAF_BUF,
- XFS_BLFT_ATTR_RMT_BUF,
- XFS_BLFT_SB_BUF,
- XFS_BLFT_MAX_BUF = (1 << XFS_BLFT_BITS),
-};
+/* kernel only definitions */
-static inline void
-xfs_blft_to_flags(struct xfs_buf_log_format *blf, enum xfs_blft type)
-{
- ASSERT(type > XFS_BLFT_UNKNOWN_BUF && type < XFS_BLFT_MAX_BUF);
- blf->blf_flags &= ~XFS_BLFT_MASK;
- blf->blf_flags |= ((type << XFS_BLFT_SHIFT) & XFS_BLFT_MASK);
-}
-
-static inline __uint16_t
-xfs_blft_from_flags(struct xfs_buf_log_format *blf)
-{
- return (blf->blf_flags & XFS_BLFT_MASK) >> XFS_BLFT_SHIFT;
-}
-
-/*
- * buf log item flags
- */
+/* buf log item flags */
#define XFS_BLI_HOLD 0x01
#define XFS_BLI_DIRTY 0x02
#define XFS_BLI_STALE 0x04
@@ -133,8 +43,6 @@ xfs_blft_from_flags(struct xfs_buf_log_format *blf)
{ XFS_BLI_ORDERED, "ORDERED" }
-#ifdef __KERNEL__
-
struct xfs_buf;
struct xfs_mount;
struct xfs_buf_log_item;
@@ -169,6 +77,6 @@ void xfs_trans_buf_set_type(struct xfs_trans *, struct xfs_buf *,
enum xfs_blft);
void xfs_trans_buf_copy_type(struct xfs_buf *dst_bp, struct xfs_buf *src_bp);
-#endif /* __KERNEL__ */
+extern kmem_zone_t *xfs_buf_item_zone;
#endif /* __XFS_BUF_ITEM_H__ */
diff --git a/fs/xfs/xfs_buf_item_format.h b/fs/xfs/xfs_buf_item_format.h
new file mode 100644
index 0000000..e2e9474
--- /dev/null
+++ b/fs/xfs/xfs_buf_item_format.h
@@ -0,0 +1,111 @@
+/*
+ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_BUF_ITEM_FORMAT_H__
+#define __XFS_BUF_ITEM_FORMAT_H__
+
+/*
+ * This flag indicates that the buffer contains on disk inodes
+ * and requires special recovery handling.
+ */
+#define XFS_BLF_INODE_BUF (1<<0)
+/*
+ * This flag indicates that the buffer should not be replayed
+ * during recovery because its blocks are being freed.
+ */
+#define XFS_BLF_CANCEL (1<<1)
+
+/*
+ * This flag indicates that the buffer contains on disk
+ * user or group dquots and may require special recovery handling.
+ */
+#define XFS_BLF_UDQUOT_BUF (1<<2)
+#define XFS_BLF_PDQUOT_BUF (1<<3)
+#define XFS_BLF_GDQUOT_BUF (1<<4)
+
+#define XFS_BLF_CHUNK 128
+#define XFS_BLF_SHIFT 7
+#define BIT_TO_WORD_SHIFT 5
+#define NBWORD (NBBY * sizeof(unsigned int))
+
+/*
+ * This is the structure used to lay out a buf log item in the
+ * log. The data map describes which 128 byte chunks of the buffer
+ * have been logged.
+ */
+#define XFS_BLF_DATAMAP_SIZE ((XFS_MAX_BLOCKSIZE / XFS_BLF_CHUNK) / NBWORD)
+
+typedef struct xfs_buf_log_format {
+ unsigned short blf_type; /* buf log item type indicator */
+ unsigned short blf_size; /* size of this item */
+ ushort blf_flags; /* misc state */
+ ushort blf_len; /* number of blocks in this buf */
+ __int64_t blf_blkno; /* starting blkno of this buf */
+ unsigned int blf_map_size; /* used size of data bitmap in words */
+ unsigned int blf_data_map[XFS_BLF_DATAMAP_SIZE]; /* dirty bitmap */
+} xfs_buf_log_format_t;
+
+/*
+ * All buffers now need to tell recovery where the magic number
+ * is so that it can verify and calculate the CRCs on the buffer correctly
+ * once the changes have been replayed into the buffer.
+ *
+ * The type value is held in the upper 5 bits of the blf_flags field, which is
+ * an unsigned 16 bit field. Hence we need to shift it 11 bits up and down.
+ */
+#define XFS_BLFT_BITS 5
+#define XFS_BLFT_SHIFT 11
+#define XFS_BLFT_MASK (((1 << XFS_BLFT_BITS) - 1) << XFS_BLFT_SHIFT)
+
+enum xfs_blft {
+ XFS_BLFT_UNKNOWN_BUF = 0,
+ XFS_BLFT_UDQUOT_BUF,
+ XFS_BLFT_PDQUOT_BUF,
+ XFS_BLFT_GDQUOT_BUF,
+ XFS_BLFT_BTREE_BUF,
+ XFS_BLFT_AGF_BUF,
+ XFS_BLFT_AGFL_BUF,
+ XFS_BLFT_AGI_BUF,
+ XFS_BLFT_DINO_BUF,
+ XFS_BLFT_SYMLINK_BUF,
+ XFS_BLFT_DIR_BLOCK_BUF,
+ XFS_BLFT_DIR_DATA_BUF,
+ XFS_BLFT_DIR_FREE_BUF,
+ XFS_BLFT_DIR_LEAF1_BUF,
+ XFS_BLFT_DIR_LEAFN_BUF,
+ XFS_BLFT_DA_NODE_BUF,
+ XFS_BLFT_ATTR_LEAF_BUF,
+ XFS_BLFT_ATTR_RMT_BUF,
+ XFS_BLFT_SB_BUF,
+ XFS_BLFT_MAX_BUF = (1 << XFS_BLFT_BITS),
+};
+
+static inline void
+xfs_blft_to_flags(struct xfs_buf_log_format *blf, enum xfs_blft type)
+{
+ ASSERT(type > XFS_BLFT_UNKNOWN_BUF && type < XFS_BLFT_MAX_BUF);
+ blf->blf_flags &= ~XFS_BLFT_MASK;
+ blf->blf_flags |= ((type << XFS_BLFT_SHIFT) & XFS_BLFT_MASK);
+}
+
+static inline __uint16_t
+xfs_blft_from_flags(struct xfs_buf_log_format *blf)
+{
+ return (blf->blf_flags & XFS_BLFT_MASK) >> XFS_BLFT_SHIFT;
+}
+
+#endif /* __XFS_BUF_ITEM_FORMAT_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 32/60] xfs: move inode fork definitions to a new header file
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (30 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 31/60] xfs: split out buf log item format definitions Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 33/60] xfs: move unrealted definitions out of xfs_inode.h Dave Chinner
` (29 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The inode fork definitions are a combination of on-disk format
definition and in-memory tracking and manipulation. They are both
shared with userspace, so move them all into their own file so
sharing is easy to do and track. This removes all inode fork
related information from xfs_inode.h.
Do the same for the all the C code that currently resides in
xfs_inode.c for the same reason.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_inode.c | 1877 +--------------------------------------------
fs/xfs/xfs_inode.h | 148 +---
fs/xfs/xfs_inode_fork.c | 1921 +++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_inode_fork.h | 169 +++++
fs/xfs/xfs_inode_ops.h | 6 +
6 files changed, 2101 insertions(+), 2021 deletions(-)
create mode 100644 fs/xfs/xfs_inode_fork.c
create mode 100644 fs/xfs/xfs_inode_fork.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 66bbeb3..00723b7 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -77,6 +77,7 @@ xfs-y += xfs_alloc.o \
xfs_ialloc_btree.o \
xfs_icreate_item.o \
xfs_inode.o \
+ xfs_inode_fork.o \
xfs_log_recover.o \
xfs_sb.o \
xfs_symlink.o \
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 8740562..d9f8d37 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -46,47 +46,14 @@
#include "xfs_trace.h"
#include "xfs_icache.h"
-kmem_zone_t *xfs_ifork_zone;
kmem_zone_t *xfs_inode_zone;
-STATIC int xfs_iformat_local(xfs_inode_t *, xfs_dinode_t *, int, int);
-STATIC int xfs_iformat_extents(xfs_inode_t *, xfs_dinode_t *, int);
-STATIC int xfs_iformat_btree(xfs_inode_t *, xfs_dinode_t *, int);
-
-#ifdef DEBUG
-/*
- * Make sure that the extents in the given memory buffer
- * are valid.
- */
-STATIC void
-xfs_validate_extents(
- xfs_ifork_t *ifp,
- int nrecs,
- xfs_exntfmt_t fmt)
-{
- xfs_bmbt_irec_t irec;
- xfs_bmbt_rec_host_t rec;
- int i;
-
- for (i = 0; i < nrecs; i++) {
- xfs_bmbt_rec_host_t *ep = xfs_iext_get_ext(ifp, i);
- rec.l0 = get_unaligned(&ep->l0);
- rec.l1 = get_unaligned(&ep->l1);
- xfs_bmbt_get_all(&rec, &irec);
- if (fmt == XFS_EXTFMT_NOSTATE)
- ASSERT(irec.br_state == XFS_EXT_NORM);
- }
-}
-#else /* DEBUG */
-#define xfs_validate_extents(ifp, nrecs, fmt)
-#endif /* DEBUG */
-
/*
* Check that none of the inode's in the buffer have a next
* unlinked field of 0.
*/
#if defined(DEBUG)
-void
+STATIC void
xfs_inobp_check(
xfs_mount_t *mp,
xfs_buf_t *bp)
@@ -215,351 +182,6 @@ xfs_imap_to_bp(
return 0;
}
-/*
- * Move inode type and inode format specific information from the
- * on-disk inode to the in-core inode. For fifos, devs, and sockets
- * this means set if_rdev to the proper value. For files, directories,
- * and symlinks this means to bring in the in-line data or extent
- * pointers. For a file in B-tree format, only the root is immediately
- * brought in-core. The rest will be in-lined in if_extents when it
- * is first referenced (see xfs_iread_extents()).
- */
-STATIC int
-xfs_iformat(
- xfs_inode_t *ip,
- xfs_dinode_t *dip)
-{
- xfs_attr_shortform_t *atp;
- int size;
- int error = 0;
- xfs_fsize_t di_size;
-
- if (unlikely(be32_to_cpu(dip->di_nextents) +
- be16_to_cpu(dip->di_anextents) >
- be64_to_cpu(dip->di_nblocks))) {
- xfs_warn(ip->i_mount,
- "corrupt dinode %Lu, extent total = %d, nblocks = %Lu.",
- (unsigned long long)ip->i_ino,
- (int)(be32_to_cpu(dip->di_nextents) +
- be16_to_cpu(dip->di_anextents)),
- (unsigned long long)
- be64_to_cpu(dip->di_nblocks));
- XFS_CORRUPTION_ERROR("xfs_iformat(1)", XFS_ERRLEVEL_LOW,
- ip->i_mount, dip);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- if (unlikely(dip->di_forkoff > ip->i_mount->m_sb.sb_inodesize)) {
- xfs_warn(ip->i_mount, "corrupt dinode %Lu, forkoff = 0x%x.",
- (unsigned long long)ip->i_ino,
- dip->di_forkoff);
- XFS_CORRUPTION_ERROR("xfs_iformat(2)", XFS_ERRLEVEL_LOW,
- ip->i_mount, dip);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- if (unlikely((ip->i_d.di_flags & XFS_DIFLAG_REALTIME) &&
- !ip->i_mount->m_rtdev_targp)) {
- xfs_warn(ip->i_mount,
- "corrupt dinode %Lu, has realtime flag set.",
- ip->i_ino);
- XFS_CORRUPTION_ERROR("xfs_iformat(realtime)",
- XFS_ERRLEVEL_LOW, ip->i_mount, dip);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- switch (ip->i_d.di_mode & S_IFMT) {
- case S_IFIFO:
- case S_IFCHR:
- case S_IFBLK:
- case S_IFSOCK:
- if (unlikely(dip->di_format != XFS_DINODE_FMT_DEV)) {
- XFS_CORRUPTION_ERROR("xfs_iformat(3)", XFS_ERRLEVEL_LOW,
- ip->i_mount, dip);
- return XFS_ERROR(EFSCORRUPTED);
- }
- ip->i_d.di_size = 0;
- ip->i_df.if_u2.if_rdev = xfs_dinode_get_rdev(dip);
- break;
-
- case S_IFREG:
- case S_IFLNK:
- case S_IFDIR:
- switch (dip->di_format) {
- case XFS_DINODE_FMT_LOCAL:
- /*
- * no local regular files yet
- */
- if (unlikely(S_ISREG(be16_to_cpu(dip->di_mode)))) {
- xfs_warn(ip->i_mount,
- "corrupt inode %Lu (local format for regular file).",
- (unsigned long long) ip->i_ino);
- XFS_CORRUPTION_ERROR("xfs_iformat(4)",
- XFS_ERRLEVEL_LOW,
- ip->i_mount, dip);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- di_size = be64_to_cpu(dip->di_size);
- if (unlikely(di_size > XFS_DFORK_DSIZE(dip, ip->i_mount))) {
- xfs_warn(ip->i_mount,
- "corrupt inode %Lu (bad size %Ld for local inode).",
- (unsigned long long) ip->i_ino,
- (long long) di_size);
- XFS_CORRUPTION_ERROR("xfs_iformat(5)",
- XFS_ERRLEVEL_LOW,
- ip->i_mount, dip);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- size = (int)di_size;
- error = xfs_iformat_local(ip, dip, XFS_DATA_FORK, size);
- break;
- case XFS_DINODE_FMT_EXTENTS:
- error = xfs_iformat_extents(ip, dip, XFS_DATA_FORK);
- break;
- case XFS_DINODE_FMT_BTREE:
- error = xfs_iformat_btree(ip, dip, XFS_DATA_FORK);
- break;
- default:
- XFS_ERROR_REPORT("xfs_iformat(6)", XFS_ERRLEVEL_LOW,
- ip->i_mount);
- return XFS_ERROR(EFSCORRUPTED);
- }
- break;
-
- default:
- XFS_ERROR_REPORT("xfs_iformat(7)", XFS_ERRLEVEL_LOW, ip->i_mount);
- return XFS_ERROR(EFSCORRUPTED);
- }
- if (error) {
- return error;
- }
- if (!XFS_DFORK_Q(dip))
- return 0;
-
- ASSERT(ip->i_afp == NULL);
- ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_SLEEP | KM_NOFS);
-
- switch (dip->di_aformat) {
- case XFS_DINODE_FMT_LOCAL:
- atp = (xfs_attr_shortform_t *)XFS_DFORK_APTR(dip);
- size = be16_to_cpu(atp->hdr.totsize);
-
- if (unlikely(size < sizeof(struct xfs_attr_sf_hdr))) {
- xfs_warn(ip->i_mount,
- "corrupt inode %Lu (bad attr fork size %Ld).",
- (unsigned long long) ip->i_ino,
- (long long) size);
- XFS_CORRUPTION_ERROR("xfs_iformat(8)",
- XFS_ERRLEVEL_LOW,
- ip->i_mount, dip);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- error = xfs_iformat_local(ip, dip, XFS_ATTR_FORK, size);
- break;
- case XFS_DINODE_FMT_EXTENTS:
- error = xfs_iformat_extents(ip, dip, XFS_ATTR_FORK);
- break;
- case XFS_DINODE_FMT_BTREE:
- error = xfs_iformat_btree(ip, dip, XFS_ATTR_FORK);
- break;
- default:
- error = XFS_ERROR(EFSCORRUPTED);
- break;
- }
- if (error) {
- kmem_zone_free(xfs_ifork_zone, ip->i_afp);
- ip->i_afp = NULL;
- xfs_idestroy_fork(ip, XFS_DATA_FORK);
- }
- return error;
-}
-
-/*
- * The file is in-lined in the on-disk inode.
- * If it fits into if_inline_data, then copy
- * it there, otherwise allocate a buffer for it
- * and copy the data there. Either way, set
- * if_data to point at the data.
- * If we allocate a buffer for the data, make
- * sure that its size is a multiple of 4 and
- * record the real size in i_real_bytes.
- */
-STATIC int
-xfs_iformat_local(
- xfs_inode_t *ip,
- xfs_dinode_t *dip,
- int whichfork,
- int size)
-{
- xfs_ifork_t *ifp;
- int real_size;
-
- /*
- * If the size is unreasonable, then something
- * is wrong and we just bail out rather than crash in
- * kmem_alloc() or memcpy() below.
- */
- if (unlikely(size > XFS_DFORK_SIZE(dip, ip->i_mount, whichfork))) {
- xfs_warn(ip->i_mount,
- "corrupt inode %Lu (bad size %d for local fork, size = %d).",
- (unsigned long long) ip->i_ino, size,
- XFS_DFORK_SIZE(dip, ip->i_mount, whichfork));
- XFS_CORRUPTION_ERROR("xfs_iformat_local", XFS_ERRLEVEL_LOW,
- ip->i_mount, dip);
- return XFS_ERROR(EFSCORRUPTED);
- }
- ifp = XFS_IFORK_PTR(ip, whichfork);
- real_size = 0;
- if (size == 0)
- ifp->if_u1.if_data = NULL;
- else if (size <= sizeof(ifp->if_u2.if_inline_data))
- ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
- else {
- real_size = roundup(size, 4);
- ifp->if_u1.if_data = kmem_alloc(real_size, KM_SLEEP | KM_NOFS);
- }
- ifp->if_bytes = size;
- ifp->if_real_bytes = real_size;
- if (size)
- memcpy(ifp->if_u1.if_data, XFS_DFORK_PTR(dip, whichfork), size);
- ifp->if_flags &= ~XFS_IFEXTENTS;
- ifp->if_flags |= XFS_IFINLINE;
- return 0;
-}
-
-/*
- * The file consists of a set of extents all
- * of which fit into the on-disk inode.
- * If there are few enough extents to fit into
- * the if_inline_ext, then copy them there.
- * Otherwise allocate a buffer for them and copy
- * them into it. Either way, set if_extents
- * to point at the extents.
- */
-STATIC int
-xfs_iformat_extents(
- xfs_inode_t *ip,
- xfs_dinode_t *dip,
- int whichfork)
-{
- xfs_bmbt_rec_t *dp;
- xfs_ifork_t *ifp;
- int nex;
- int size;
- int i;
-
- ifp = XFS_IFORK_PTR(ip, whichfork);
- nex = XFS_DFORK_NEXTENTS(dip, whichfork);
- size = nex * (uint)sizeof(xfs_bmbt_rec_t);
-
- /*
- * If the number of extents is unreasonable, then something
- * is wrong and we just bail out rather than crash in
- * kmem_alloc() or memcpy() below.
- */
- if (unlikely(size < 0 || size > XFS_DFORK_SIZE(dip, ip->i_mount, whichfork))) {
- xfs_warn(ip->i_mount, "corrupt inode %Lu ((a)extents = %d).",
- (unsigned long long) ip->i_ino, nex);
- XFS_CORRUPTION_ERROR("xfs_iformat_extents(1)", XFS_ERRLEVEL_LOW,
- ip->i_mount, dip);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- ifp->if_real_bytes = 0;
- if (nex == 0)
- ifp->if_u1.if_extents = NULL;
- else if (nex <= XFS_INLINE_EXTS)
- ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext;
- else
- xfs_iext_add(ifp, 0, nex);
-
- ifp->if_bytes = size;
- if (size) {
- dp = (xfs_bmbt_rec_t *) XFS_DFORK_PTR(dip, whichfork);
- xfs_validate_extents(ifp, nex, XFS_EXTFMT_INODE(ip));
- for (i = 0; i < nex; i++, dp++) {
- xfs_bmbt_rec_host_t *ep = xfs_iext_get_ext(ifp, i);
- ep->l0 = get_unaligned_be64(&dp->l0);
- ep->l1 = get_unaligned_be64(&dp->l1);
- }
- XFS_BMAP_TRACE_EXLIST(ip, nex, whichfork);
- if (whichfork != XFS_DATA_FORK ||
- XFS_EXTFMT_INODE(ip) == XFS_EXTFMT_NOSTATE)
- if (unlikely(xfs_check_nostate_extents(
- ifp, 0, nex))) {
- XFS_ERROR_REPORT("xfs_iformat_extents(2)",
- XFS_ERRLEVEL_LOW,
- ip->i_mount);
- return XFS_ERROR(EFSCORRUPTED);
- }
- }
- ifp->if_flags |= XFS_IFEXTENTS;
- return 0;
-}
-
-/*
- * The file has too many extents to fit into
- * the inode, so they are in B-tree format.
- * Allocate a buffer for the root of the B-tree
- * and copy the root into it. The i_extents
- * field will remain NULL until all of the
- * extents are read in (when they are needed).
- */
-STATIC int
-xfs_iformat_btree(
- xfs_inode_t *ip,
- xfs_dinode_t *dip,
- int whichfork)
-{
- struct xfs_mount *mp = ip->i_mount;
- xfs_bmdr_block_t *dfp;
- xfs_ifork_t *ifp;
- /* REFERENCED */
- int nrecs;
- int size;
-
- ifp = XFS_IFORK_PTR(ip, whichfork);
- dfp = (xfs_bmdr_block_t *)XFS_DFORK_PTR(dip, whichfork);
- size = XFS_BMAP_BROOT_SPACE(mp, dfp);
- nrecs = be16_to_cpu(dfp->bb_numrecs);
-
- /*
- * blow out if -- fork has less extents than can fit in
- * fork (fork shouldn't be a btree format), root btree
- * block has more records than can fit into the fork,
- * or the number of extents is greater than the number of
- * blocks.
- */
- if (unlikely(XFS_IFORK_NEXTENTS(ip, whichfork) <=
- XFS_IFORK_MAXEXT(ip, whichfork) ||
- XFS_BMDR_SPACE_CALC(nrecs) >
- XFS_DFORK_SIZE(dip, mp, whichfork) ||
- XFS_IFORK_NEXTENTS(ip, whichfork) > ip->i_d.di_nblocks)) {
- xfs_warn(mp, "corrupt inode %Lu (btree).",
- (unsigned long long) ip->i_ino);
- XFS_CORRUPTION_ERROR("xfs_iformat_btree", XFS_ERRLEVEL_LOW,
- mp, dip);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- ifp->if_broot_bytes = size;
- ifp->if_broot = kmem_alloc(size, KM_SLEEP | KM_NOFS);
- ASSERT(ifp->if_broot != NULL);
- /*
- * Copy and convert from the on-disk structure
- * to the in-memory structure.
- */
- xfs_bmdr_to_bmbt(ip, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, whichfork),
- ifp->if_broot, size);
- ifp->if_flags &= ~XFS_IFEXTENTS;
- ifp->if_flags |= XFS_IFBROOT;
-
- return 0;
-}
-
STATIC void
xfs_dinode_from_disk(
xfs_icdinode_t *to,
@@ -757,13 +379,13 @@ xfs_iread(
/*
* If the on-disk inode is already linked to a directory
* entry, copy all of the inode into the in-core inode.
- * xfs_iformat() handles copying in the inode format
+ * xfs_iformat_fork() handles copying in the inode format
* specific information.
* Otherwise, just get the truly permanent information.
*/
if (dip->di_mode) {
xfs_dinode_from_disk(&ip->i_d, dip);
- error = xfs_iformat(ip, dip);
+ error = xfs_iformat_fork(ip, dip);
if (error) {
#ifdef DEBUG
xfs_alert(mp, "%s: xfs_iformat() returned error %d",
@@ -838,1496 +460,3 @@ xfs_iread(
xfs_trans_brelse(tp, bp);
return error;
}
-
-/*
- * Read in extents from a btree-format inode.
- * Allocate and fill in if_extents. Real work is done in xfs_bmap.c.
- */
-int
-xfs_iread_extents(
- xfs_trans_t *tp,
- xfs_inode_t *ip,
- int whichfork)
-{
- int error;
- xfs_ifork_t *ifp;
- xfs_extnum_t nextents;
-
- if (unlikely(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE)) {
- XFS_ERROR_REPORT("xfs_iread_extents", XFS_ERRLEVEL_LOW,
- ip->i_mount);
- return XFS_ERROR(EFSCORRUPTED);
- }
- nextents = XFS_IFORK_NEXTENTS(ip, whichfork);
- ifp = XFS_IFORK_PTR(ip, whichfork);
-
- /*
- * We know that the size is valid (it's checked in iformat_btree)
- */
- ifp->if_bytes = ifp->if_real_bytes = 0;
- ifp->if_flags |= XFS_IFEXTENTS;
- xfs_iext_add(ifp, 0, nextents);
- error = xfs_bmap_read_extents(tp, ip, whichfork);
- if (error) {
- xfs_iext_destroy(ifp);
- ifp->if_flags &= ~XFS_IFEXTENTS;
- return error;
- }
- xfs_validate_extents(ifp, nextents, XFS_EXTFMT_INODE(ip));
- return 0;
-}
-
-/*
- * Reallocate the space for if_broot based on the number of records
- * being added or deleted as indicated in rec_diff. Move the records
- * and pointers in if_broot to fit the new size. When shrinking this
- * will eliminate holes between the records and pointers created by
- * the caller. When growing this will create holes to be filled in
- * by the caller.
- *
- * The caller must not request to add more records than would fit in
- * the on-disk inode root. If the if_broot is currently NULL, then
- * if we adding records one will be allocated. The caller must also
- * not request that the number of records go below zero, although
- * it can go to zero.
- *
- * ip -- the inode whose if_broot area is changing
- * ext_diff -- the change in the number of records, positive or negative,
- * requested for the if_broot array.
- */
-void
-xfs_iroot_realloc(
- xfs_inode_t *ip,
- int rec_diff,
- int whichfork)
-{
- struct xfs_mount *mp = ip->i_mount;
- int cur_max;
- xfs_ifork_t *ifp;
- struct xfs_btree_block *new_broot;
- int new_max;
- size_t new_size;
- char *np;
- char *op;
-
- /*
- * Handle the degenerate case quietly.
- */
- if (rec_diff == 0) {
- return;
- }
-
- ifp = XFS_IFORK_PTR(ip, whichfork);
- if (rec_diff > 0) {
- /*
- * If there wasn't any memory allocated before, just
- * allocate it now and get out.
- */
- if (ifp->if_broot_bytes == 0) {
- new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, rec_diff);
- ifp->if_broot = kmem_alloc(new_size, KM_SLEEP | KM_NOFS);
- ifp->if_broot_bytes = (int)new_size;
- return;
- }
-
- /*
- * If there is already an existing if_broot, then we need
- * to realloc() it and shift the pointers to their new
- * location. The records don't change location because
- * they are kept butted up against the btree block header.
- */
- cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
- new_max = cur_max + rec_diff;
- new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
- ifp->if_broot = kmem_realloc(ifp->if_broot, new_size,
- XFS_BMAP_BROOT_SPACE_CALC(mp, cur_max),
- KM_SLEEP | KM_NOFS);
- op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
- ifp->if_broot_bytes);
- np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
- (int)new_size);
- ifp->if_broot_bytes = (int)new_size;
- ASSERT(ifp->if_broot_bytes <=
- XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ(ip));
- memmove(np, op, cur_max * (uint)sizeof(xfs_dfsbno_t));
- return;
- }
-
- /*
- * rec_diff is less than 0. In this case, we are shrinking the
- * if_broot buffer. It must already exist. If we go to zero
- * records, just get rid of the root and clear the status bit.
- */
- ASSERT((ifp->if_broot != NULL) && (ifp->if_broot_bytes > 0));
- cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
- new_max = cur_max + rec_diff;
- ASSERT(new_max >= 0);
- if (new_max > 0)
- new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
- else
- new_size = 0;
- if (new_size > 0) {
- new_broot = kmem_alloc(new_size, KM_SLEEP | KM_NOFS);
- /*
- * First copy over the btree block header.
- */
- memcpy(new_broot, ifp->if_broot,
- XFS_BMBT_BLOCK_LEN(ip->i_mount));
- } else {
- new_broot = NULL;
- ifp->if_flags &= ~XFS_IFBROOT;
- }
-
- /*
- * Only copy the records and pointers if there are any.
- */
- if (new_max > 0) {
- /*
- * First copy the records.
- */
- op = (char *)XFS_BMBT_REC_ADDR(mp, ifp->if_broot, 1);
- np = (char *)XFS_BMBT_REC_ADDR(mp, new_broot, 1);
- memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_rec_t));
-
- /*
- * Then copy the pointers.
- */
- op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
- ifp->if_broot_bytes);
- np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, new_broot, 1,
- (int)new_size);
- memcpy(np, op, new_max * (uint)sizeof(xfs_dfsbno_t));
- }
- kmem_free(ifp->if_broot);
- ifp->if_broot = new_broot;
- ifp->if_broot_bytes = (int)new_size;
- ASSERT(ifp->if_broot_bytes <=
- XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ(ip));
- return;
-}
-
-
-/*
- * This is called when the amount of space needed for if_data
- * is increased or decreased. The change in size is indicated by
- * the number of bytes that need to be added or deleted in the
- * byte_diff parameter.
- *
- * If the amount of space needed has decreased below the size of the
- * inline buffer, then switch to using the inline buffer. Otherwise,
- * use kmem_realloc() or kmem_alloc() to adjust the size of the buffer
- * to what is needed.
- *
- * ip -- the inode whose if_data area is changing
- * byte_diff -- the change in the number of bytes, positive or negative,
- * requested for the if_data array.
- */
-void
-xfs_idata_realloc(
- xfs_inode_t *ip,
- int byte_diff,
- int whichfork)
-{
- xfs_ifork_t *ifp;
- int new_size;
- int real_size;
-
- if (byte_diff == 0) {
- return;
- }
-
- ifp = XFS_IFORK_PTR(ip, whichfork);
- new_size = (int)ifp->if_bytes + byte_diff;
- ASSERT(new_size >= 0);
-
- if (new_size == 0) {
- if (ifp->if_u1.if_data != ifp->if_u2.if_inline_data) {
- kmem_free(ifp->if_u1.if_data);
- }
- ifp->if_u1.if_data = NULL;
- real_size = 0;
- } else if (new_size <= sizeof(ifp->if_u2.if_inline_data)) {
- /*
- * If the valid extents/data can fit in if_inline_ext/data,
- * copy them from the malloc'd vector and free it.
- */
- if (ifp->if_u1.if_data == NULL) {
- ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
- } else if (ifp->if_u1.if_data != ifp->if_u2.if_inline_data) {
- ASSERT(ifp->if_real_bytes != 0);
- memcpy(ifp->if_u2.if_inline_data, ifp->if_u1.if_data,
- new_size);
- kmem_free(ifp->if_u1.if_data);
- ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
- }
- real_size = 0;
- } else {
- /*
- * Stuck with malloc/realloc.
- * For inline data, the underlying buffer must be
- * a multiple of 4 bytes in size so that it can be
- * logged and stay on word boundaries. We enforce
- * that here.
- */
- real_size = roundup(new_size, 4);
- if (ifp->if_u1.if_data == NULL) {
- ASSERT(ifp->if_real_bytes == 0);
- ifp->if_u1.if_data = kmem_alloc(real_size,
- KM_SLEEP | KM_NOFS);
- } else if (ifp->if_u1.if_data != ifp->if_u2.if_inline_data) {
- /*
- * Only do the realloc if the underlying size
- * is really changing.
- */
- if (ifp->if_real_bytes != real_size) {
- ifp->if_u1.if_data =
- kmem_realloc(ifp->if_u1.if_data,
- real_size,
- ifp->if_real_bytes,
- KM_SLEEP | KM_NOFS);
- }
- } else {
- ASSERT(ifp->if_real_bytes == 0);
- ifp->if_u1.if_data = kmem_alloc(real_size,
- KM_SLEEP | KM_NOFS);
- memcpy(ifp->if_u1.if_data, ifp->if_u2.if_inline_data,
- ifp->if_bytes);
- }
- }
- ifp->if_real_bytes = real_size;
- ifp->if_bytes = new_size;
- ASSERT(ifp->if_bytes <= XFS_IFORK_SIZE(ip, whichfork));
-}
-
-void
-xfs_idestroy_fork(
- xfs_inode_t *ip,
- int whichfork)
-{
- xfs_ifork_t *ifp;
-
- ifp = XFS_IFORK_PTR(ip, whichfork);
- if (ifp->if_broot != NULL) {
- kmem_free(ifp->if_broot);
- ifp->if_broot = NULL;
- }
-
- /*
- * If the format is local, then we can't have an extents
- * array so just look for an inline data array. If we're
- * not local then we may or may not have an extents list,
- * so check and free it up if we do.
- */
- if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
- if ((ifp->if_u1.if_data != ifp->if_u2.if_inline_data) &&
- (ifp->if_u1.if_data != NULL)) {
- ASSERT(ifp->if_real_bytes != 0);
- kmem_free(ifp->if_u1.if_data);
- ifp->if_u1.if_data = NULL;
- ifp->if_real_bytes = 0;
- }
- } else if ((ifp->if_flags & XFS_IFEXTENTS) &&
- ((ifp->if_flags & XFS_IFEXTIREC) ||
- ((ifp->if_u1.if_extents != NULL) &&
- (ifp->if_u1.if_extents != ifp->if_u2.if_inline_ext)))) {
- ASSERT(ifp->if_real_bytes != 0);
- xfs_iext_destroy(ifp);
- }
- ASSERT(ifp->if_u1.if_extents == NULL ||
- ifp->if_u1.if_extents == ifp->if_u2.if_inline_ext);
- ASSERT(ifp->if_real_bytes == 0);
- if (whichfork == XFS_ATTR_FORK) {
- kmem_zone_free(xfs_ifork_zone, ip->i_afp);
- ip->i_afp = NULL;
- }
-}
-
-/*
- * xfs_iextents_copy()
- *
- * This is called to copy the REAL extents (as opposed to the delayed
- * allocation extents) from the inode into the given buffer. It
- * returns the number of bytes copied into the buffer.
- *
- * If there are no delayed allocation extents, then we can just
- * memcpy() the extents into the buffer. Otherwise, we need to
- * examine each extent in turn and skip those which are delayed.
- */
-int
-xfs_iextents_copy(
- xfs_inode_t *ip,
- xfs_bmbt_rec_t *dp,
- int whichfork)
-{
- int copied;
- int i;
- xfs_ifork_t *ifp;
- int nrecs;
- xfs_fsblock_t start_block;
-
- ifp = XFS_IFORK_PTR(ip, whichfork);
- ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
- ASSERT(ifp->if_bytes > 0);
-
- nrecs = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
- XFS_BMAP_TRACE_EXLIST(ip, nrecs, whichfork);
- ASSERT(nrecs > 0);
-
- /*
- * There are some delayed allocation extents in the
- * inode, so copy the extents one at a time and skip
- * the delayed ones. There must be at least one
- * non-delayed extent.
- */
- copied = 0;
- for (i = 0; i < nrecs; i++) {
- xfs_bmbt_rec_host_t *ep = xfs_iext_get_ext(ifp, i);
- start_block = xfs_bmbt_get_startblock(ep);
- if (isnullstartblock(start_block)) {
- /*
- * It's a delayed allocation extent, so skip it.
- */
- continue;
- }
-
- /* Translate to on disk format */
- put_unaligned_be64(ep->l0, &dp->l0);
- put_unaligned_be64(ep->l1, &dp->l1);
- dp++;
- copied++;
- }
- ASSERT(copied != 0);
- xfs_validate_extents(ifp, copied, XFS_EXTFMT_INODE(ip));
-
- return (copied * (uint)sizeof(xfs_bmbt_rec_t));
-}
-
-/*
- * Each of the following cases stores data into the same region
- * of the on-disk inode, so only one of them can be valid at
- * any given time. While it is possible to have conflicting formats
- * and log flags, e.g. having XFS_ILOG_?DATA set when the fork is
- * in EXTENTS format, this can only happen when the fork has
- * changed formats after being modified but before being flushed.
- * In these cases, the format always takes precedence, because the
- * format indicates the current state of the fork.
- */
-void
-xfs_iflush_fork(
- xfs_inode_t *ip,
- xfs_dinode_t *dip,
- xfs_inode_log_item_t *iip,
- int whichfork,
- xfs_buf_t *bp)
-{
- char *cp;
- xfs_ifork_t *ifp;
- xfs_mount_t *mp;
- static const short brootflag[2] =
- { XFS_ILOG_DBROOT, XFS_ILOG_ABROOT };
- static const short dataflag[2] =
- { XFS_ILOG_DDATA, XFS_ILOG_ADATA };
- static const short extflag[2] =
- { XFS_ILOG_DEXT, XFS_ILOG_AEXT };
-
- if (!iip)
- return;
- ifp = XFS_IFORK_PTR(ip, whichfork);
- /*
- * This can happen if we gave up in iformat in an error path,
- * for the attribute fork.
- */
- if (!ifp) {
- ASSERT(whichfork == XFS_ATTR_FORK);
- return;
- }
- cp = XFS_DFORK_PTR(dip, whichfork);
- mp = ip->i_mount;
- switch (XFS_IFORK_FORMAT(ip, whichfork)) {
- case XFS_DINODE_FMT_LOCAL:
- if ((iip->ili_fields & dataflag[whichfork]) &&
- (ifp->if_bytes > 0)) {
- ASSERT(ifp->if_u1.if_data != NULL);
- ASSERT(ifp->if_bytes <= XFS_IFORK_SIZE(ip, whichfork));
- memcpy(cp, ifp->if_u1.if_data, ifp->if_bytes);
- }
- break;
-
- case XFS_DINODE_FMT_EXTENTS:
- ASSERT((ifp->if_flags & XFS_IFEXTENTS) ||
- !(iip->ili_fields & extflag[whichfork]));
- if ((iip->ili_fields & extflag[whichfork]) &&
- (ifp->if_bytes > 0)) {
- ASSERT(xfs_iext_get_ext(ifp, 0));
- ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) > 0);
- (void)xfs_iextents_copy(ip, (xfs_bmbt_rec_t *)cp,
- whichfork);
- }
- break;
-
- case XFS_DINODE_FMT_BTREE:
- if ((iip->ili_fields & brootflag[whichfork]) &&
- (ifp->if_broot_bytes > 0)) {
- ASSERT(ifp->if_broot != NULL);
- ASSERT(ifp->if_broot_bytes <=
- (XFS_IFORK_SIZE(ip, whichfork) +
- XFS_BROOT_SIZE_ADJ(ip)));
- xfs_bmbt_to_bmdr(mp, ifp->if_broot, ifp->if_broot_bytes,
- (xfs_bmdr_block_t *)cp,
- XFS_DFORK_SIZE(dip, mp, whichfork));
- }
- break;
-
- case XFS_DINODE_FMT_DEV:
- if (iip->ili_fields & XFS_ILOG_DEV) {
- ASSERT(whichfork == XFS_DATA_FORK);
- xfs_dinode_put_rdev(dip, ip->i_df.if_u2.if_rdev);
- }
- break;
-
- case XFS_DINODE_FMT_UUID:
- if (iip->ili_fields & XFS_ILOG_UUID) {
- ASSERT(whichfork == XFS_DATA_FORK);
- memcpy(XFS_DFORK_DPTR(dip),
- &ip->i_df.if_u2.if_uuid,
- sizeof(uuid_t));
- }
- break;
-
- default:
- ASSERT(0);
- break;
- }
-}
-
-/*
- * Return a pointer to the extent record at file index idx.
- */
-xfs_bmbt_rec_host_t *
-xfs_iext_get_ext(
- xfs_ifork_t *ifp, /* inode fork pointer */
- xfs_extnum_t idx) /* index of target extent */
-{
- ASSERT(idx >= 0);
- ASSERT(idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t));
-
- if ((ifp->if_flags & XFS_IFEXTIREC) && (idx == 0)) {
- return ifp->if_u1.if_ext_irec->er_extbuf;
- } else if (ifp->if_flags & XFS_IFEXTIREC) {
- xfs_ext_irec_t *erp; /* irec pointer */
- int erp_idx = 0; /* irec index */
- xfs_extnum_t page_idx = idx; /* ext index in target list */
-
- erp = xfs_iext_idx_to_irec(ifp, &page_idx, &erp_idx, 0);
- return &erp->er_extbuf[page_idx];
- } else if (ifp->if_bytes) {
- return &ifp->if_u1.if_extents[idx];
- } else {
- return NULL;
- }
-}
-
-/*
- * Create, destroy, or resize a linear (direct) block of extents.
- */
-STATIC void
-xfs_iext_realloc_direct(
- xfs_ifork_t *ifp, /* inode fork pointer */
- int new_size) /* new size of extents */
-{
- int rnew_size; /* real new size of extents */
-
- rnew_size = new_size;
-
- ASSERT(!(ifp->if_flags & XFS_IFEXTIREC) ||
- ((new_size >= 0) && (new_size <= XFS_IEXT_BUFSZ) &&
- (new_size != ifp->if_real_bytes)));
-
- /* Free extent records */
- if (new_size == 0) {
- xfs_iext_destroy(ifp);
- }
- /* Resize direct extent list and zero any new bytes */
- else if (ifp->if_real_bytes) {
- /* Check if extents will fit inside the inode */
- if (new_size <= XFS_INLINE_EXTS * sizeof(xfs_bmbt_rec_t)) {
- xfs_iext_direct_to_inline(ifp, new_size /
- (uint)sizeof(xfs_bmbt_rec_t));
- ifp->if_bytes = new_size;
- return;
- }
- if (!is_power_of_2(new_size)){
- rnew_size = roundup_pow_of_two(new_size);
- }
- if (rnew_size != ifp->if_real_bytes) {
- ifp->if_u1.if_extents =
- kmem_realloc(ifp->if_u1.if_extents,
- rnew_size,
- ifp->if_real_bytes, KM_NOFS);
- }
- if (rnew_size > ifp->if_real_bytes) {
- memset(&ifp->if_u1.if_extents[ifp->if_bytes /
- (uint)sizeof(xfs_bmbt_rec_t)], 0,
- rnew_size - ifp->if_real_bytes);
- }
- }
- /*
- * Switch from the inline extent buffer to a direct
- * extent list. Be sure to include the inline extent
- * bytes in new_size.
- */
- else {
- new_size += ifp->if_bytes;
- if (!is_power_of_2(new_size)) {
- rnew_size = roundup_pow_of_two(new_size);
- }
- xfs_iext_inline_to_direct(ifp, rnew_size);
- }
- ifp->if_real_bytes = rnew_size;
- ifp->if_bytes = new_size;
-}
-
-/*
- * Insert new item(s) into the extent records for incore inode
- * fork 'ifp'. 'count' new items are inserted at index 'idx'.
- */
-void
-xfs_iext_insert(
- xfs_inode_t *ip, /* incore inode pointer */
- xfs_extnum_t idx, /* starting index of new items */
- xfs_extnum_t count, /* number of inserted items */
- xfs_bmbt_irec_t *new, /* items to insert */
- int state) /* type of extent conversion */
-{
- xfs_ifork_t *ifp = (state & BMAP_ATTRFORK) ? ip->i_afp : &ip->i_df;
- xfs_extnum_t i; /* extent record index */
-
- trace_xfs_iext_insert(ip, idx, new, state, _RET_IP_);
-
- ASSERT(ifp->if_flags & XFS_IFEXTENTS);
- xfs_iext_add(ifp, idx, count);
- for (i = idx; i < idx + count; i++, new++)
- xfs_bmbt_set_all(xfs_iext_get_ext(ifp, i), new);
-}
-
-/*
- * This is called when the amount of space required for incore file
- * extents needs to be increased. The ext_diff parameter stores the
- * number of new extents being added and the idx parameter contains
- * the extent index where the new extents will be added. If the new
- * extents are being appended, then we just need to (re)allocate and
- * initialize the space. Otherwise, if the new extents are being
- * inserted into the middle of the existing entries, a bit more work
- * is required to make room for the new extents to be inserted. The
- * caller is responsible for filling in the new extent entries upon
- * return.
- */
-void
-xfs_iext_add(
- xfs_ifork_t *ifp, /* inode fork pointer */
- xfs_extnum_t idx, /* index to begin adding exts */
- int ext_diff) /* number of extents to add */
-{
- int byte_diff; /* new bytes being added */
- int new_size; /* size of extents after adding */
- xfs_extnum_t nextents; /* number of extents in file */
-
- nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
- ASSERT((idx >= 0) && (idx <= nextents));
- byte_diff = ext_diff * sizeof(xfs_bmbt_rec_t);
- new_size = ifp->if_bytes + byte_diff;
- /*
- * If the new number of extents (nextents + ext_diff)
- * fits inside the inode, then continue to use the inline
- * extent buffer.
- */
- if (nextents + ext_diff <= XFS_INLINE_EXTS) {
- if (idx < nextents) {
- memmove(&ifp->if_u2.if_inline_ext[idx + ext_diff],
- &ifp->if_u2.if_inline_ext[idx],
- (nextents - idx) * sizeof(xfs_bmbt_rec_t));
- memset(&ifp->if_u2.if_inline_ext[idx], 0, byte_diff);
- }
- ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext;
- ifp->if_real_bytes = 0;
- }
- /*
- * Otherwise use a linear (direct) extent list.
- * If the extents are currently inside the inode,
- * xfs_iext_realloc_direct will switch us from
- * inline to direct extent allocation mode.
- */
- else if (nextents + ext_diff <= XFS_LINEAR_EXTS) {
- xfs_iext_realloc_direct(ifp, new_size);
- if (idx < nextents) {
- memmove(&ifp->if_u1.if_extents[idx + ext_diff],
- &ifp->if_u1.if_extents[idx],
- (nextents - idx) * sizeof(xfs_bmbt_rec_t));
- memset(&ifp->if_u1.if_extents[idx], 0, byte_diff);
- }
- }
- /* Indirection array */
- else {
- xfs_ext_irec_t *erp;
- int erp_idx = 0;
- int page_idx = idx;
-
- ASSERT(nextents + ext_diff > XFS_LINEAR_EXTS);
- if (ifp->if_flags & XFS_IFEXTIREC) {
- erp = xfs_iext_idx_to_irec(ifp, &page_idx, &erp_idx, 1);
- } else {
- xfs_iext_irec_init(ifp);
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- erp = ifp->if_u1.if_ext_irec;
- }
- /* Extents fit in target extent page */
- if (erp && erp->er_extcount + ext_diff <= XFS_LINEAR_EXTS) {
- if (page_idx < erp->er_extcount) {
- memmove(&erp->er_extbuf[page_idx + ext_diff],
- &erp->er_extbuf[page_idx],
- (erp->er_extcount - page_idx) *
- sizeof(xfs_bmbt_rec_t));
- memset(&erp->er_extbuf[page_idx], 0, byte_diff);
- }
- erp->er_extcount += ext_diff;
- xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, ext_diff);
- }
- /* Insert a new extent page */
- else if (erp) {
- xfs_iext_add_indirect_multi(ifp,
- erp_idx, page_idx, ext_diff);
- }
- /*
- * If extent(s) are being appended to the last page in
- * the indirection array and the new extent(s) don't fit
- * in the page, then erp is NULL and erp_idx is set to
- * the next index needed in the indirection array.
- */
- else {
- int count = ext_diff;
-
- while (count) {
- erp = xfs_iext_irec_new(ifp, erp_idx);
- erp->er_extcount = count;
- count -= MIN(count, (int)XFS_LINEAR_EXTS);
- if (count) {
- erp_idx++;
- }
- }
- }
- }
- ifp->if_bytes = new_size;
-}
-
-/*
- * This is called when incore extents are being added to the indirection
- * array and the new extents do not fit in the target extent list. The
- * erp_idx parameter contains the irec index for the target extent list
- * in the indirection array, and the idx parameter contains the extent
- * index within the list. The number of extents being added is stored
- * in the count parameter.
- *
- * |-------| |-------|
- * | | | | idx - number of extents before idx
- * | idx | | count |
- * | | | | count - number of extents being inserted at idx
- * |-------| |-------|
- * | count | | nex2 | nex2 - number of extents after idx + count
- * |-------| |-------|
- */
-void
-xfs_iext_add_indirect_multi(
- xfs_ifork_t *ifp, /* inode fork pointer */
- int erp_idx, /* target extent irec index */
- xfs_extnum_t idx, /* index within target list */
- int count) /* new extents being added */
-{
- int byte_diff; /* new bytes being added */
- xfs_ext_irec_t *erp; /* pointer to irec entry */
- xfs_extnum_t ext_diff; /* number of extents to add */
- xfs_extnum_t ext_cnt; /* new extents still needed */
- xfs_extnum_t nex2; /* extents after idx + count */
- xfs_bmbt_rec_t *nex2_ep = NULL; /* temp list for nex2 extents */
- int nlists; /* number of irec's (lists) */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- erp = &ifp->if_u1.if_ext_irec[erp_idx];
- nex2 = erp->er_extcount - idx;
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
-
- /*
- * Save second part of target extent list
- * (all extents past */
- if (nex2) {
- byte_diff = nex2 * sizeof(xfs_bmbt_rec_t);
- nex2_ep = (xfs_bmbt_rec_t *) kmem_alloc(byte_diff, KM_NOFS);
- memmove(nex2_ep, &erp->er_extbuf[idx], byte_diff);
- erp->er_extcount -= nex2;
- xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, -nex2);
- memset(&erp->er_extbuf[idx], 0, byte_diff);
- }
-
- /*
- * Add the new extents to the end of the target
- * list, then allocate new irec record(s) and
- * extent buffer(s) as needed to store the rest
- * of the new extents.
- */
- ext_cnt = count;
- ext_diff = MIN(ext_cnt, (int)XFS_LINEAR_EXTS - erp->er_extcount);
- if (ext_diff) {
- erp->er_extcount += ext_diff;
- xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, ext_diff);
- ext_cnt -= ext_diff;
- }
- while (ext_cnt) {
- erp_idx++;
- erp = xfs_iext_irec_new(ifp, erp_idx);
- ext_diff = MIN(ext_cnt, (int)XFS_LINEAR_EXTS);
- erp->er_extcount = ext_diff;
- xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, ext_diff);
- ext_cnt -= ext_diff;
- }
-
- /* Add nex2 extents back to indirection array */
- if (nex2) {
- xfs_extnum_t ext_avail;
- int i;
-
- byte_diff = nex2 * sizeof(xfs_bmbt_rec_t);
- ext_avail = XFS_LINEAR_EXTS - erp->er_extcount;
- i = 0;
- /*
- * If nex2 extents fit in the current page, append
- * nex2_ep after the new extents.
- */
- if (nex2 <= ext_avail) {
- i = erp->er_extcount;
- }
- /*
- * Otherwise, check if space is available in the
- * next page.
- */
- else if ((erp_idx < nlists - 1) &&
- (nex2 <= (ext_avail = XFS_LINEAR_EXTS -
- ifp->if_u1.if_ext_irec[erp_idx+1].er_extcount))) {
- erp_idx++;
- erp++;
- /* Create a hole for nex2 extents */
- memmove(&erp->er_extbuf[nex2], erp->er_extbuf,
- erp->er_extcount * sizeof(xfs_bmbt_rec_t));
- }
- /*
- * Final choice, create a new extent page for
- * nex2 extents.
- */
- else {
- erp_idx++;
- erp = xfs_iext_irec_new(ifp, erp_idx);
- }
- memmove(&erp->er_extbuf[i], nex2_ep, byte_diff);
- kmem_free(nex2_ep);
- erp->er_extcount += nex2;
- xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, nex2);
- }
-}
-
-/*
- * This is called when the amount of space required for incore file
- * extents needs to be decreased. The ext_diff parameter stores the
- * number of extents to be removed and the idx parameter contains
- * the extent index where the extents will be removed from.
- *
- * If the amount of space needed has decreased below the linear
- * limit, XFS_IEXT_BUFSZ, then switch to using the contiguous
- * extent array. Otherwise, use kmem_realloc() to adjust the
- * size to what is needed.
- */
-void
-xfs_iext_remove(
- xfs_inode_t *ip, /* incore inode pointer */
- xfs_extnum_t idx, /* index to begin removing exts */
- int ext_diff, /* number of extents to remove */
- int state) /* type of extent conversion */
-{
- xfs_ifork_t *ifp = (state & BMAP_ATTRFORK) ? ip->i_afp : &ip->i_df;
- xfs_extnum_t nextents; /* number of extents in file */
- int new_size; /* size of extents after removal */
-
- trace_xfs_iext_remove(ip, idx, state, _RET_IP_);
-
- ASSERT(ext_diff > 0);
- nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
- new_size = (nextents - ext_diff) * sizeof(xfs_bmbt_rec_t);
-
- if (new_size == 0) {
- xfs_iext_destroy(ifp);
- } else if (ifp->if_flags & XFS_IFEXTIREC) {
- xfs_iext_remove_indirect(ifp, idx, ext_diff);
- } else if (ifp->if_real_bytes) {
- xfs_iext_remove_direct(ifp, idx, ext_diff);
- } else {
- xfs_iext_remove_inline(ifp, idx, ext_diff);
- }
- ifp->if_bytes = new_size;
-}
-
-/*
- * This removes ext_diff extents from the inline buffer, beginning
- * at extent index idx.
- */
-void
-xfs_iext_remove_inline(
- xfs_ifork_t *ifp, /* inode fork pointer */
- xfs_extnum_t idx, /* index to begin removing exts */
- int ext_diff) /* number of extents to remove */
-{
- int nextents; /* number of extents in file */
-
- ASSERT(!(ifp->if_flags & XFS_IFEXTIREC));
- ASSERT(idx < XFS_INLINE_EXTS);
- nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
- ASSERT(((nextents - ext_diff) > 0) &&
- (nextents - ext_diff) < XFS_INLINE_EXTS);
-
- if (idx + ext_diff < nextents) {
- memmove(&ifp->if_u2.if_inline_ext[idx],
- &ifp->if_u2.if_inline_ext[idx + ext_diff],
- (nextents - (idx + ext_diff)) *
- sizeof(xfs_bmbt_rec_t));
- memset(&ifp->if_u2.if_inline_ext[nextents - ext_diff],
- 0, ext_diff * sizeof(xfs_bmbt_rec_t));
- } else {
- memset(&ifp->if_u2.if_inline_ext[idx], 0,
- ext_diff * sizeof(xfs_bmbt_rec_t));
- }
-}
-
-/*
- * This removes ext_diff extents from a linear (direct) extent list,
- * beginning at extent index idx. If the extents are being removed
- * from the end of the list (ie. truncate) then we just need to re-
- * allocate the list to remove the extra space. Otherwise, if the
- * extents are being removed from the middle of the existing extent
- * entries, then we first need to move the extent records beginning
- * at idx + ext_diff up in the list to overwrite the records being
- * removed, then remove the extra space via kmem_realloc.
- */
-void
-xfs_iext_remove_direct(
- xfs_ifork_t *ifp, /* inode fork pointer */
- xfs_extnum_t idx, /* index to begin removing exts */
- int ext_diff) /* number of extents to remove */
-{
- xfs_extnum_t nextents; /* number of extents in file */
- int new_size; /* size of extents after removal */
-
- ASSERT(!(ifp->if_flags & XFS_IFEXTIREC));
- new_size = ifp->if_bytes -
- (ext_diff * sizeof(xfs_bmbt_rec_t));
- nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
-
- if (new_size == 0) {
- xfs_iext_destroy(ifp);
- return;
- }
- /* Move extents up in the list (if needed) */
- if (idx + ext_diff < nextents) {
- memmove(&ifp->if_u1.if_extents[idx],
- &ifp->if_u1.if_extents[idx + ext_diff],
- (nextents - (idx + ext_diff)) *
- sizeof(xfs_bmbt_rec_t));
- }
- memset(&ifp->if_u1.if_extents[nextents - ext_diff],
- 0, ext_diff * sizeof(xfs_bmbt_rec_t));
- /*
- * Reallocate the direct extent list. If the extents
- * will fit inside the inode then xfs_iext_realloc_direct
- * will switch from direct to inline extent allocation
- * mode for us.
- */
- xfs_iext_realloc_direct(ifp, new_size);
- ifp->if_bytes = new_size;
-}
-
-/*
- * This is called when incore extents are being removed from the
- * indirection array and the extents being removed span multiple extent
- * buffers. The idx parameter contains the file extent index where we
- * want to begin removing extents, and the count parameter contains
- * how many extents need to be removed.
- *
- * |-------| |-------|
- * | nex1 | | | nex1 - number of extents before idx
- * |-------| | count |
- * | | | | count - number of extents being removed at idx
- * | count | |-------|
- * | | | nex2 | nex2 - number of extents after idx + count
- * |-------| |-------|
- */
-void
-xfs_iext_remove_indirect(
- xfs_ifork_t *ifp, /* inode fork pointer */
- xfs_extnum_t idx, /* index to begin removing extents */
- int count) /* number of extents to remove */
-{
- xfs_ext_irec_t *erp; /* indirection array pointer */
- int erp_idx = 0; /* indirection array index */
- xfs_extnum_t ext_cnt; /* extents left to remove */
- xfs_extnum_t ext_diff; /* extents to remove in current list */
- xfs_extnum_t nex1; /* number of extents before idx */
- xfs_extnum_t nex2; /* extents after idx + count */
- int page_idx = idx; /* index in target extent list */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- erp = xfs_iext_idx_to_irec(ifp, &page_idx, &erp_idx, 0);
- ASSERT(erp != NULL);
- nex1 = page_idx;
- ext_cnt = count;
- while (ext_cnt) {
- nex2 = MAX((erp->er_extcount - (nex1 + ext_cnt)), 0);
- ext_diff = MIN(ext_cnt, (erp->er_extcount - nex1));
- /*
- * Check for deletion of entire list;
- * xfs_iext_irec_remove() updates extent offsets.
- */
- if (ext_diff == erp->er_extcount) {
- xfs_iext_irec_remove(ifp, erp_idx);
- ext_cnt -= ext_diff;
- nex1 = 0;
- if (ext_cnt) {
- ASSERT(erp_idx < ifp->if_real_bytes /
- XFS_IEXT_BUFSZ);
- erp = &ifp->if_u1.if_ext_irec[erp_idx];
- nex1 = 0;
- continue;
- } else {
- break;
- }
- }
- /* Move extents up (if needed) */
- if (nex2) {
- memmove(&erp->er_extbuf[nex1],
- &erp->er_extbuf[nex1 + ext_diff],
- nex2 * sizeof(xfs_bmbt_rec_t));
- }
- /* Zero out rest of page */
- memset(&erp->er_extbuf[nex1 + nex2], 0, (XFS_IEXT_BUFSZ -
- ((nex1 + nex2) * sizeof(xfs_bmbt_rec_t))));
- /* Update remaining counters */
- erp->er_extcount -= ext_diff;
- xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, -ext_diff);
- ext_cnt -= ext_diff;
- nex1 = 0;
- erp_idx++;
- erp++;
- }
- ifp->if_bytes -= count * sizeof(xfs_bmbt_rec_t);
- xfs_iext_irec_compact(ifp);
-}
-
-/*
- * Switch from linear (direct) extent records to inline buffer.
- */
-void
-xfs_iext_direct_to_inline(
- xfs_ifork_t *ifp, /* inode fork pointer */
- xfs_extnum_t nextents) /* number of extents in file */
-{
- ASSERT(ifp->if_flags & XFS_IFEXTENTS);
- ASSERT(nextents <= XFS_INLINE_EXTS);
- /*
- * The inline buffer was zeroed when we switched
- * from inline to direct extent allocation mode,
- * so we don't need to clear it here.
- */
- memcpy(ifp->if_u2.if_inline_ext, ifp->if_u1.if_extents,
- nextents * sizeof(xfs_bmbt_rec_t));
- kmem_free(ifp->if_u1.if_extents);
- ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext;
- ifp->if_real_bytes = 0;
-}
-
-/*
- * Switch from inline buffer to linear (direct) extent records.
- * new_size should already be rounded up to the next power of 2
- * by the caller (when appropriate), so use new_size as it is.
- * However, since new_size may be rounded up, we can't update
- * if_bytes here. It is the caller's responsibility to update
- * if_bytes upon return.
- */
-void
-xfs_iext_inline_to_direct(
- xfs_ifork_t *ifp, /* inode fork pointer */
- int new_size) /* number of extents in file */
-{
- ifp->if_u1.if_extents = kmem_alloc(new_size, KM_NOFS);
- memset(ifp->if_u1.if_extents, 0, new_size);
- if (ifp->if_bytes) {
- memcpy(ifp->if_u1.if_extents, ifp->if_u2.if_inline_ext,
- ifp->if_bytes);
- memset(ifp->if_u2.if_inline_ext, 0, XFS_INLINE_EXTS *
- sizeof(xfs_bmbt_rec_t));
- }
- ifp->if_real_bytes = new_size;
-}
-
-/*
- * Resize an extent indirection array to new_size bytes.
- */
-STATIC void
-xfs_iext_realloc_indirect(
- xfs_ifork_t *ifp, /* inode fork pointer */
- int new_size) /* new indirection array size */
-{
- int nlists; /* number of irec's (ex lists) */
- int size; /* current indirection array size */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
- size = nlists * sizeof(xfs_ext_irec_t);
- ASSERT(ifp->if_real_bytes);
- ASSERT((new_size >= 0) && (new_size != size));
- if (new_size == 0) {
- xfs_iext_destroy(ifp);
- } else {
- ifp->if_u1.if_ext_irec = (xfs_ext_irec_t *)
- kmem_realloc(ifp->if_u1.if_ext_irec,
- new_size, size, KM_NOFS);
- }
-}
-
-/*
- * Switch from indirection array to linear (direct) extent allocations.
- */
-STATIC void
-xfs_iext_indirect_to_direct(
- xfs_ifork_t *ifp) /* inode fork pointer */
-{
- xfs_bmbt_rec_host_t *ep; /* extent record pointer */
- xfs_extnum_t nextents; /* number of extents in file */
- int size; /* size of file extents */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
- ASSERT(nextents <= XFS_LINEAR_EXTS);
- size = nextents * sizeof(xfs_bmbt_rec_t);
-
- xfs_iext_irec_compact_pages(ifp);
- ASSERT(ifp->if_real_bytes == XFS_IEXT_BUFSZ);
-
- ep = ifp->if_u1.if_ext_irec->er_extbuf;
- kmem_free(ifp->if_u1.if_ext_irec);
- ifp->if_flags &= ~XFS_IFEXTIREC;
- ifp->if_u1.if_extents = ep;
- ifp->if_bytes = size;
- if (nextents < XFS_LINEAR_EXTS) {
- xfs_iext_realloc_direct(ifp, size);
- }
-}
-
-/*
- * Free incore file extents.
- */
-void
-xfs_iext_destroy(
- xfs_ifork_t *ifp) /* inode fork pointer */
-{
- if (ifp->if_flags & XFS_IFEXTIREC) {
- int erp_idx;
- int nlists;
-
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
- for (erp_idx = nlists - 1; erp_idx >= 0 ; erp_idx--) {
- xfs_iext_irec_remove(ifp, erp_idx);
- }
- ifp->if_flags &= ~XFS_IFEXTIREC;
- } else if (ifp->if_real_bytes) {
- kmem_free(ifp->if_u1.if_extents);
- } else if (ifp->if_bytes) {
- memset(ifp->if_u2.if_inline_ext, 0, XFS_INLINE_EXTS *
- sizeof(xfs_bmbt_rec_t));
- }
- ifp->if_u1.if_extents = NULL;
- ifp->if_real_bytes = 0;
- ifp->if_bytes = 0;
-}
-
-/*
- * Return a pointer to the extent record for file system block bno.
- */
-xfs_bmbt_rec_host_t * /* pointer to found extent record */
-xfs_iext_bno_to_ext(
- xfs_ifork_t *ifp, /* inode fork pointer */
- xfs_fileoff_t bno, /* block number to search for */
- xfs_extnum_t *idxp) /* index of target extent */
-{
- xfs_bmbt_rec_host_t *base; /* pointer to first extent */
- xfs_filblks_t blockcount = 0; /* number of blocks in extent */
- xfs_bmbt_rec_host_t *ep = NULL; /* pointer to target extent */
- xfs_ext_irec_t *erp = NULL; /* indirection array pointer */
- int high; /* upper boundary in search */
- xfs_extnum_t idx = 0; /* index of target extent */
- int low; /* lower boundary in search */
- xfs_extnum_t nextents; /* number of file extents */
- xfs_fileoff_t startoff = 0; /* start offset of extent */
-
- nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
- if (nextents == 0) {
- *idxp = 0;
- return NULL;
- }
- low = 0;
- if (ifp->if_flags & XFS_IFEXTIREC) {
- /* Find target extent list */
- int erp_idx = 0;
- erp = xfs_iext_bno_to_irec(ifp, bno, &erp_idx);
- base = erp->er_extbuf;
- high = erp->er_extcount - 1;
- } else {
- base = ifp->if_u1.if_extents;
- high = nextents - 1;
- }
- /* Binary search extent records */
- while (low <= high) {
- idx = (low + high) >> 1;
- ep = base + idx;
- startoff = xfs_bmbt_get_startoff(ep);
- blockcount = xfs_bmbt_get_blockcount(ep);
- if (bno < startoff) {
- high = idx - 1;
- } else if (bno >= startoff + blockcount) {
- low = idx + 1;
- } else {
- /* Convert back to file-based extent index */
- if (ifp->if_flags & XFS_IFEXTIREC) {
- idx += erp->er_extoff;
- }
- *idxp = idx;
- return ep;
- }
- }
- /* Convert back to file-based extent index */
- if (ifp->if_flags & XFS_IFEXTIREC) {
- idx += erp->er_extoff;
- }
- if (bno >= startoff + blockcount) {
- if (++idx == nextents) {
- ep = NULL;
- } else {
- ep = xfs_iext_get_ext(ifp, idx);
- }
- }
- *idxp = idx;
- return ep;
-}
-
-/*
- * Return a pointer to the indirection array entry containing the
- * extent record for filesystem block bno. Store the index of the
- * target irec in *erp_idxp.
- */
-xfs_ext_irec_t * /* pointer to found extent record */
-xfs_iext_bno_to_irec(
- xfs_ifork_t *ifp, /* inode fork pointer */
- xfs_fileoff_t bno, /* block number to search for */
- int *erp_idxp) /* irec index of target ext list */
-{
- xfs_ext_irec_t *erp = NULL; /* indirection array pointer */
- xfs_ext_irec_t *erp_next; /* next indirection array entry */
- int erp_idx; /* indirection array index */
- int nlists; /* number of extent irec's (lists) */
- int high; /* binary search upper limit */
- int low; /* binary search lower limit */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
- erp_idx = 0;
- low = 0;
- high = nlists - 1;
- while (low <= high) {
- erp_idx = (low + high) >> 1;
- erp = &ifp->if_u1.if_ext_irec[erp_idx];
- erp_next = erp_idx < nlists - 1 ? erp + 1 : NULL;
- if (bno < xfs_bmbt_get_startoff(erp->er_extbuf)) {
- high = erp_idx - 1;
- } else if (erp_next && bno >=
- xfs_bmbt_get_startoff(erp_next->er_extbuf)) {
- low = erp_idx + 1;
- } else {
- break;
- }
- }
- *erp_idxp = erp_idx;
- return erp;
-}
-
-/*
- * Return a pointer to the indirection array entry containing the
- * extent record at file extent index *idxp. Store the index of the
- * target irec in *erp_idxp and store the page index of the target
- * extent record in *idxp.
- */
-xfs_ext_irec_t *
-xfs_iext_idx_to_irec(
- xfs_ifork_t *ifp, /* inode fork pointer */
- xfs_extnum_t *idxp, /* extent index (file -> page) */
- int *erp_idxp, /* pointer to target irec */
- int realloc) /* new bytes were just added */
-{
- xfs_ext_irec_t *prev; /* pointer to previous irec */
- xfs_ext_irec_t *erp = NULL; /* pointer to current irec */
- int erp_idx; /* indirection array index */
- int nlists; /* number of irec's (ex lists) */
- int high; /* binary search upper limit */
- int low; /* binary search lower limit */
- xfs_extnum_t page_idx = *idxp; /* extent index in target list */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- ASSERT(page_idx >= 0);
- ASSERT(page_idx <= ifp->if_bytes / sizeof(xfs_bmbt_rec_t));
- ASSERT(page_idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t) || realloc);
-
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
- erp_idx = 0;
- low = 0;
- high = nlists - 1;
-
- /* Binary search extent irec's */
- while (low <= high) {
- erp_idx = (low + high) >> 1;
- erp = &ifp->if_u1.if_ext_irec[erp_idx];
- prev = erp_idx > 0 ? erp - 1 : NULL;
- if (page_idx < erp->er_extoff || (page_idx == erp->er_extoff &&
- realloc && prev && prev->er_extcount < XFS_LINEAR_EXTS)) {
- high = erp_idx - 1;
- } else if (page_idx > erp->er_extoff + erp->er_extcount ||
- (page_idx == erp->er_extoff + erp->er_extcount &&
- !realloc)) {
- low = erp_idx + 1;
- } else if (page_idx == erp->er_extoff + erp->er_extcount &&
- erp->er_extcount == XFS_LINEAR_EXTS) {
- ASSERT(realloc);
- page_idx = 0;
- erp_idx++;
- erp = erp_idx < nlists ? erp + 1 : NULL;
- break;
- } else {
- page_idx -= erp->er_extoff;
- break;
- }
- }
- *idxp = page_idx;
- *erp_idxp = erp_idx;
- return(erp);
-}
-
-/*
- * Allocate and initialize an indirection array once the space needed
- * for incore extents increases above XFS_IEXT_BUFSZ.
- */
-void
-xfs_iext_irec_init(
- xfs_ifork_t *ifp) /* inode fork pointer */
-{
- xfs_ext_irec_t *erp; /* indirection array pointer */
- xfs_extnum_t nextents; /* number of extents in file */
-
- ASSERT(!(ifp->if_flags & XFS_IFEXTIREC));
- nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
- ASSERT(nextents <= XFS_LINEAR_EXTS);
-
- erp = kmem_alloc(sizeof(xfs_ext_irec_t), KM_NOFS);
-
- if (nextents == 0) {
- ifp->if_u1.if_extents = kmem_alloc(XFS_IEXT_BUFSZ, KM_NOFS);
- } else if (!ifp->if_real_bytes) {
- xfs_iext_inline_to_direct(ifp, XFS_IEXT_BUFSZ);
- } else if (ifp->if_real_bytes < XFS_IEXT_BUFSZ) {
- xfs_iext_realloc_direct(ifp, XFS_IEXT_BUFSZ);
- }
- erp->er_extbuf = ifp->if_u1.if_extents;
- erp->er_extcount = nextents;
- erp->er_extoff = 0;
-
- ifp->if_flags |= XFS_IFEXTIREC;
- ifp->if_real_bytes = XFS_IEXT_BUFSZ;
- ifp->if_bytes = nextents * sizeof(xfs_bmbt_rec_t);
- ifp->if_u1.if_ext_irec = erp;
-
- return;
-}
-
-/*
- * Allocate and initialize a new entry in the indirection array.
- */
-xfs_ext_irec_t *
-xfs_iext_irec_new(
- xfs_ifork_t *ifp, /* inode fork pointer */
- int erp_idx) /* index for new irec */
-{
- xfs_ext_irec_t *erp; /* indirection array pointer */
- int i; /* loop counter */
- int nlists; /* number of irec's (ex lists) */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
-
- /* Resize indirection array */
- xfs_iext_realloc_indirect(ifp, ++nlists *
- sizeof(xfs_ext_irec_t));
- /*
- * Move records down in the array so the
- * new page can use erp_idx.
- */
- erp = ifp->if_u1.if_ext_irec;
- for (i = nlists - 1; i > erp_idx; i--) {
- memmove(&erp[i], &erp[i-1], sizeof(xfs_ext_irec_t));
- }
- ASSERT(i == erp_idx);
-
- /* Initialize new extent record */
- erp = ifp->if_u1.if_ext_irec;
- erp[erp_idx].er_extbuf = kmem_alloc(XFS_IEXT_BUFSZ, KM_NOFS);
- ifp->if_real_bytes = nlists * XFS_IEXT_BUFSZ;
- memset(erp[erp_idx].er_extbuf, 0, XFS_IEXT_BUFSZ);
- erp[erp_idx].er_extcount = 0;
- erp[erp_idx].er_extoff = erp_idx > 0 ?
- erp[erp_idx-1].er_extoff + erp[erp_idx-1].er_extcount : 0;
- return (&erp[erp_idx]);
-}
-
-/*
- * Remove a record from the indirection array.
- */
-void
-xfs_iext_irec_remove(
- xfs_ifork_t *ifp, /* inode fork pointer */
- int erp_idx) /* irec index to remove */
-{
- xfs_ext_irec_t *erp; /* indirection array pointer */
- int i; /* loop counter */
- int nlists; /* number of irec's (ex lists) */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
- erp = &ifp->if_u1.if_ext_irec[erp_idx];
- if (erp->er_extbuf) {
- xfs_iext_irec_update_extoffs(ifp, erp_idx + 1,
- -erp->er_extcount);
- kmem_free(erp->er_extbuf);
- }
- /* Compact extent records */
- erp = ifp->if_u1.if_ext_irec;
- for (i = erp_idx; i < nlists - 1; i++) {
- memmove(&erp[i], &erp[i+1], sizeof(xfs_ext_irec_t));
- }
- /*
- * Manually free the last extent record from the indirection
- * array. A call to xfs_iext_realloc_indirect() with a size
- * of zero would result in a call to xfs_iext_destroy() which
- * would in turn call this function again, creating a nasty
- * infinite loop.
- */
- if (--nlists) {
- xfs_iext_realloc_indirect(ifp,
- nlists * sizeof(xfs_ext_irec_t));
- } else {
- kmem_free(ifp->if_u1.if_ext_irec);
- }
- ifp->if_real_bytes = nlists * XFS_IEXT_BUFSZ;
-}
-
-/*
- * This is called to clean up large amounts of unused memory allocated
- * by the indirection array. Before compacting anything though, verify
- * that the indirection array is still needed and switch back to the
- * linear extent list (or even the inline buffer) if possible. The
- * compaction policy is as follows:
- *
- * Full Compaction: Extents fit into a single page (or inline buffer)
- * Partial Compaction: Extents occupy less than 50% of allocated space
- * No Compaction: Extents occupy at least 50% of allocated space
- */
-void
-xfs_iext_irec_compact(
- xfs_ifork_t *ifp) /* inode fork pointer */
-{
- xfs_extnum_t nextents; /* number of extents in file */
- int nlists; /* number of irec's (ex lists) */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
- nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
-
- if (nextents == 0) {
- xfs_iext_destroy(ifp);
- } else if (nextents <= XFS_INLINE_EXTS) {
- xfs_iext_indirect_to_direct(ifp);
- xfs_iext_direct_to_inline(ifp, nextents);
- } else if (nextents <= XFS_LINEAR_EXTS) {
- xfs_iext_indirect_to_direct(ifp);
- } else if (nextents < (nlists * XFS_LINEAR_EXTS) >> 1) {
- xfs_iext_irec_compact_pages(ifp);
- }
-}
-
-/*
- * Combine extents from neighboring extent pages.
- */
-void
-xfs_iext_irec_compact_pages(
- xfs_ifork_t *ifp) /* inode fork pointer */
-{
- xfs_ext_irec_t *erp, *erp_next;/* pointers to irec entries */
- int erp_idx = 0; /* indirection array index */
- int nlists; /* number of irec's (ex lists) */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
- while (erp_idx < nlists - 1) {
- erp = &ifp->if_u1.if_ext_irec[erp_idx];
- erp_next = erp + 1;
- if (erp_next->er_extcount <=
- (XFS_LINEAR_EXTS - erp->er_extcount)) {
- memcpy(&erp->er_extbuf[erp->er_extcount],
- erp_next->er_extbuf, erp_next->er_extcount *
- sizeof(xfs_bmbt_rec_t));
- erp->er_extcount += erp_next->er_extcount;
- /*
- * Free page before removing extent record
- * so er_extoffs don't get modified in
- * xfs_iext_irec_remove.
- */
- kmem_free(erp_next->er_extbuf);
- erp_next->er_extbuf = NULL;
- xfs_iext_irec_remove(ifp, erp_idx + 1);
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
- } else {
- erp_idx++;
- }
- }
-}
-
-/*
- * This is called to update the er_extoff field in the indirection
- * array when extents have been added or removed from one of the
- * extent lists. erp_idx contains the irec index to begin updating
- * at and ext_diff contains the number of extents that were added
- * or removed.
- */
-void
-xfs_iext_irec_update_extoffs(
- xfs_ifork_t *ifp, /* inode fork pointer */
- int erp_idx, /* irec index to update */
- int ext_diff) /* number of new extents */
-{
- int i; /* loop counter */
- int nlists; /* number of irec's (ex lists */
-
- ASSERT(ifp->if_flags & XFS_IFEXTIREC);
- nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
- for (i = erp_idx; i < nlists; i++) {
- ifp->if_u1.if_ext_irec[i].er_extoff += ext_diff;
- }
-}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 60d55de..a55d68b 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -21,66 +21,6 @@
struct posix_acl;
struct xfs_dinode;
struct xfs_inode;
-struct xfs_inode_log_item;
-
-/*
- * Fork identifiers.
- */
-#define XFS_DATA_FORK 0
-#define XFS_ATTR_FORK 1
-
-/*
- * The following xfs_ext_irec_t struct introduces a second (top) level
- * to the in-core extent allocation scheme. These structs are allocated
- * in a contiguous block, creating an indirection array where each entry
- * (irec) contains a pointer to a buffer of in-core extent records which
- * it manages. Each extent buffer is 4k in size, since 4k is the system
- * page size on Linux i386 and systems with larger page sizes don't seem
- * to gain much, if anything, by using their native page size as the
- * extent buffer size. Also, using 4k extent buffers everywhere provides
- * a consistent interface for CXFS across different platforms.
- *
- * There is currently no limit on the number of irec's (extent lists)
- * allowed, so heavily fragmented files may require an indirection array
- * which spans multiple system pages of memory. The number of extents
- * which would require this amount of contiguous memory is very large
- * and should not cause problems in the foreseeable future. However,
- * if the memory needed for the contiguous array ever becomes a problem,
- * it is possible that a third level of indirection may be required.
- */
-typedef struct xfs_ext_irec {
- xfs_bmbt_rec_host_t *er_extbuf; /* block of extent records */
- xfs_extnum_t er_extoff; /* extent offset in file */
- xfs_extnum_t er_extcount; /* number of extents in page/block */
-} xfs_ext_irec_t;
-
-/*
- * File incore extent information, present for each of data & attr forks.
- */
-#define XFS_IEXT_BUFSZ 4096
-#define XFS_LINEAR_EXTS (XFS_IEXT_BUFSZ / (uint)sizeof(xfs_bmbt_rec_t))
-#define XFS_INLINE_EXTS 2
-#define XFS_INLINE_DATA 32
-typedef struct xfs_ifork {
- int if_bytes; /* bytes in if_u1 */
- int if_real_bytes; /* bytes allocated in if_u1 */
- struct xfs_btree_block *if_broot; /* file's incore btree root */
- short if_broot_bytes; /* bytes allocated for root */
- unsigned char if_flags; /* per-fork flags */
- union {
- xfs_bmbt_rec_host_t *if_extents;/* linear map file exts */
- xfs_ext_irec_t *if_ext_irec; /* irec map file exts */
- char *if_data; /* inline file data */
- } if_u1;
- union {
- xfs_bmbt_rec_host_t if_inline_ext[XFS_INLINE_EXTS];
- /* very small file extents */
- char if_inline_data[XFS_INLINE_DATA];
- /* very small file data */
- xfs_dev_t if_rdev; /* dev number if special */
- uuid_t if_uuid; /* mount point value */
- } if_u2;
-} xfs_ifork_t;
/*
* Inode location information. Stored in the inode and passed to
@@ -115,6 +55,7 @@ struct xfs_imap {
* chain off the mount structure by xfs_sync calls.
*/
+#include "xfs_inode_fork.h"
#include "xfs_inode_item_format.h"
/*
@@ -124,57 +65,6 @@ struct xfs_imap {
#define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */
#define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */
-/*
- * Per-fork incore inode flags.
- */
-#define XFS_IFINLINE 0x01 /* Inline data is read in */
-#define XFS_IFEXTENTS 0x02 /* All extent pointers are read in */
-#define XFS_IFBROOT 0x04 /* i_broot points to the bmap b-tree root */
-#define XFS_IFEXTIREC 0x08 /* Indirection array of extent blocks */
-
-/*
- * Fork handling.
- */
-
-#define XFS_IFORK_Q(ip) ((ip)->i_d.di_forkoff != 0)
-#define XFS_IFORK_BOFF(ip) ((int)((ip)->i_d.di_forkoff << 3))
-
-#define XFS_IFORK_PTR(ip,w) \
- ((w) == XFS_DATA_FORK ? \
- &(ip)->i_df : \
- (ip)->i_afp)
-#define XFS_IFORK_DSIZE(ip) \
- (XFS_IFORK_Q(ip) ? \
- XFS_IFORK_BOFF(ip) : \
- XFS_LITINO((ip)->i_mount, (ip)->i_d.di_version))
-#define XFS_IFORK_ASIZE(ip) \
- (XFS_IFORK_Q(ip) ? \
- XFS_LITINO((ip)->i_mount, (ip)->i_d.di_version) - \
- XFS_IFORK_BOFF(ip) : \
- 0)
-#define XFS_IFORK_SIZE(ip,w) \
- ((w) == XFS_DATA_FORK ? \
- XFS_IFORK_DSIZE(ip) : \
- XFS_IFORK_ASIZE(ip))
-#define XFS_IFORK_FORMAT(ip,w) \
- ((w) == XFS_DATA_FORK ? \
- (ip)->i_d.di_format : \
- (ip)->i_d.di_aformat)
-#define XFS_IFORK_FMT_SET(ip,w,n) \
- ((w) == XFS_DATA_FORK ? \
- ((ip)->i_d.di_format = (n)) : \
- ((ip)->i_d.di_aformat = (n)))
-#define XFS_IFORK_NEXTENTS(ip,w) \
- ((w) == XFS_DATA_FORK ? \
- (ip)->i_d.di_nextents : \
- (ip)->i_d.di_anextents)
-#define XFS_IFORK_NEXT_SET(ip,w,n) \
- ((w) == XFS_DATA_FORK ? \
- ((ip)->i_d.di_nextents = (n)) : \
- ((ip)->i_d.di_anextents = (n)))
-#define XFS_IFORK_MAXEXT(ip, w) \
- (XFS_IFORK_SIZE(ip, w) / sizeof(xfs_bmbt_rec_t))
-
#ifdef __KERNEL__
@@ -197,40 +87,6 @@ int xfs_iread(struct xfs_mount *, struct xfs_trans *,
void xfs_dinode_calc_crc(struct xfs_mount *, struct xfs_dinode *);
void xfs_dinode_to_disk(struct xfs_dinode *,
struct xfs_icdinode *);
-void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
- struct xfs_inode_log_item *, int,
- struct xfs_buf *);
-void xfs_idestroy_fork(struct xfs_inode *, int);
-void xfs_idata_realloc(struct xfs_inode *, int, int);
-void xfs_iroot_realloc(struct xfs_inode *, int, int);
-int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int);
-int xfs_iextents_copy(struct xfs_inode *, xfs_bmbt_rec_t *, int);
-
-xfs_bmbt_rec_host_t *xfs_iext_get_ext(xfs_ifork_t *, xfs_extnum_t);
-void xfs_iext_insert(struct xfs_inode *, xfs_extnum_t, xfs_extnum_t,
- struct xfs_bmbt_irec *, int);
-void xfs_iext_add(xfs_ifork_t *, xfs_extnum_t, int);
-void xfs_iext_add_indirect_multi(xfs_ifork_t *, int, xfs_extnum_t, int);
-void xfs_iext_remove(struct xfs_inode *, xfs_extnum_t, int, int);
-void xfs_iext_remove_inline(xfs_ifork_t *, xfs_extnum_t, int);
-void xfs_iext_remove_direct(xfs_ifork_t *, xfs_extnum_t, int);
-void xfs_iext_remove_indirect(xfs_ifork_t *, xfs_extnum_t, int);
-void xfs_iext_direct_to_inline(xfs_ifork_t *, xfs_extnum_t);
-void xfs_iext_inline_to_direct(xfs_ifork_t *, int);
-void xfs_iext_destroy(xfs_ifork_t *);
-xfs_bmbt_rec_host_t *xfs_iext_bno_to_ext(xfs_ifork_t *, xfs_fileoff_t, int *);
-xfs_ext_irec_t *xfs_iext_bno_to_irec(xfs_ifork_t *, xfs_fileoff_t, int *);
-xfs_ext_irec_t *xfs_iext_idx_to_irec(xfs_ifork_t *, xfs_extnum_t *, int *, int);
-void xfs_iext_irec_init(xfs_ifork_t *);
-xfs_ext_irec_t *xfs_iext_irec_new(xfs_ifork_t *, int);
-void xfs_iext_irec_remove(xfs_ifork_t *, int);
-void xfs_iext_irec_compact(xfs_ifork_t *);
-void xfs_iext_irec_compact_pages(xfs_ifork_t *);
-void xfs_iext_irec_compact_full(xfs_ifork_t *);
-void xfs_iext_irec_update_extoffs(xfs_ifork_t *, int, int);
-bool xfs_can_free_eofblocks(struct xfs_inode *, bool);
-
-#define xfs_ipincount(ip) ((unsigned int) atomic_read(&ip->i_pincount))
#if defined(DEBUG)
void xfs_inobp_check(struct xfs_mount *, struct xfs_buf *);
@@ -238,8 +94,6 @@ void xfs_inobp_check(struct xfs_mount *, struct xfs_buf *);
#define xfs_inobp_check(mp, bp)
#endif /* DEBUG */
-extern struct kmem_zone *xfs_ifork_zone;
-extern struct kmem_zone *xfs_inode_zone;
extern const struct xfs_buf_ops xfs_inode_buf_ops;
#endif /* __XFS_INODE_H__ */
diff --git a/fs/xfs/xfs_inode_fork.c b/fs/xfs/xfs_inode_fork.c
new file mode 100644
index 0000000..2db4814
--- /dev/null
+++ b/fs/xfs/xfs_inode_fork.c
@@ -0,0 +1,1921 @@
+/*
+ * Copyright (c) 2000-2006 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include <linux/log2.h>
+
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_log.h"
+#include "xfs_inum.h"
+#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_attr_sf.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_buf_item.h"
+#include "xfs_inode_item.h"
+#include "xfs_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_ialloc.h"
+#include "xfs_bmap.h"
+#include "xfs_error.h"
+#include "xfs_quota.h"
+#include "xfs_filestream.h"
+#include "xfs_cksum.h"
+#include "xfs_trace.h"
+#include "xfs_icache.h"
+
+kmem_zone_t *xfs_ifork_zone;
+
+STATIC int xfs_iformat_local(xfs_inode_t *, xfs_dinode_t *, int, int);
+STATIC int xfs_iformat_extents(xfs_inode_t *, xfs_dinode_t *, int);
+STATIC int xfs_iformat_btree(xfs_inode_t *, xfs_dinode_t *, int);
+
+#ifdef DEBUG
+/*
+ * Make sure that the extents in the given memory buffer
+ * are valid.
+ */
+void
+xfs_validate_extents(
+ xfs_ifork_t *ifp,
+ int nrecs,
+ xfs_exntfmt_t fmt)
+{
+ xfs_bmbt_irec_t irec;
+ xfs_bmbt_rec_host_t rec;
+ int i;
+
+ for (i = 0; i < nrecs; i++) {
+ xfs_bmbt_rec_host_t *ep = xfs_iext_get_ext(ifp, i);
+ rec.l0 = get_unaligned(&ep->l0);
+ rec.l1 = get_unaligned(&ep->l1);
+ xfs_bmbt_get_all(&rec, &irec);
+ if (fmt == XFS_EXTFMT_NOSTATE)
+ ASSERT(irec.br_state == XFS_EXT_NORM);
+ }
+}
+#else /* DEBUG */
+#define xfs_validate_extents(ifp, nrecs, fmt)
+#endif /* DEBUG */
+
+
+/*
+ * Move inode type and inode format specific information from the
+ * on-disk inode to the in-core inode. For fifos, devs, and sockets
+ * this means set if_rdev to the proper value. For files, directories,
+ * and symlinks this means to bring in the in-line data or extent
+ * pointers. For a file in B-tree format, only the root is immediately
+ * brought in-core. The rest will be in-lined in if_extents when it
+ * is first referenced (see xfs_iread_extents()).
+ */
+int
+xfs_iformat_fork(
+ xfs_inode_t *ip,
+ xfs_dinode_t *dip)
+{
+ xfs_attr_shortform_t *atp;
+ int size;
+ int error = 0;
+ xfs_fsize_t di_size;
+
+ if (unlikely(be32_to_cpu(dip->di_nextents) +
+ be16_to_cpu(dip->di_anextents) >
+ be64_to_cpu(dip->di_nblocks))) {
+ xfs_warn(ip->i_mount,
+ "corrupt dinode %Lu, extent total = %d, nblocks = %Lu.",
+ (unsigned long long)ip->i_ino,
+ (int)(be32_to_cpu(dip->di_nextents) +
+ be16_to_cpu(dip->di_anextents)),
+ (unsigned long long)
+ be64_to_cpu(dip->di_nblocks));
+ XFS_CORRUPTION_ERROR("xfs_iformat(1)", XFS_ERRLEVEL_LOW,
+ ip->i_mount, dip);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ if (unlikely(dip->di_forkoff > ip->i_mount->m_sb.sb_inodesize)) {
+ xfs_warn(ip->i_mount, "corrupt dinode %Lu, forkoff = 0x%x.",
+ (unsigned long long)ip->i_ino,
+ dip->di_forkoff);
+ XFS_CORRUPTION_ERROR("xfs_iformat(2)", XFS_ERRLEVEL_LOW,
+ ip->i_mount, dip);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ if (unlikely((ip->i_d.di_flags & XFS_DIFLAG_REALTIME) &&
+ !ip->i_mount->m_rtdev_targp)) {
+ xfs_warn(ip->i_mount,
+ "corrupt dinode %Lu, has realtime flag set.",
+ ip->i_ino);
+ XFS_CORRUPTION_ERROR("xfs_iformat(realtime)",
+ XFS_ERRLEVEL_LOW, ip->i_mount, dip);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ switch (ip->i_d.di_mode & S_IFMT) {
+ case S_IFIFO:
+ case S_IFCHR:
+ case S_IFBLK:
+ case S_IFSOCK:
+ if (unlikely(dip->di_format != XFS_DINODE_FMT_DEV)) {
+ XFS_CORRUPTION_ERROR("xfs_iformat(3)", XFS_ERRLEVEL_LOW,
+ ip->i_mount, dip);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+ ip->i_d.di_size = 0;
+ ip->i_df.if_u2.if_rdev = xfs_dinode_get_rdev(dip);
+ break;
+
+ case S_IFREG:
+ case S_IFLNK:
+ case S_IFDIR:
+ switch (dip->di_format) {
+ case XFS_DINODE_FMT_LOCAL:
+ /*
+ * no local regular files yet
+ */
+ if (unlikely(S_ISREG(be16_to_cpu(dip->di_mode)))) {
+ xfs_warn(ip->i_mount,
+ "corrupt inode %Lu (local format for regular file).",
+ (unsigned long long) ip->i_ino);
+ XFS_CORRUPTION_ERROR("xfs_iformat(4)",
+ XFS_ERRLEVEL_LOW,
+ ip->i_mount, dip);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ di_size = be64_to_cpu(dip->di_size);
+ if (unlikely(di_size > XFS_DFORK_DSIZE(dip, ip->i_mount))) {
+ xfs_warn(ip->i_mount,
+ "corrupt inode %Lu (bad size %Ld for local inode).",
+ (unsigned long long) ip->i_ino,
+ (long long) di_size);
+ XFS_CORRUPTION_ERROR("xfs_iformat(5)",
+ XFS_ERRLEVEL_LOW,
+ ip->i_mount, dip);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ size = (int)di_size;
+ error = xfs_iformat_local(ip, dip, XFS_DATA_FORK, size);
+ break;
+ case XFS_DINODE_FMT_EXTENTS:
+ error = xfs_iformat_extents(ip, dip, XFS_DATA_FORK);
+ break;
+ case XFS_DINODE_FMT_BTREE:
+ error = xfs_iformat_btree(ip, dip, XFS_DATA_FORK);
+ break;
+ default:
+ XFS_ERROR_REPORT("xfs_iformat(6)", XFS_ERRLEVEL_LOW,
+ ip->i_mount);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+ break;
+
+ default:
+ XFS_ERROR_REPORT("xfs_iformat(7)", XFS_ERRLEVEL_LOW, ip->i_mount);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+ if (error) {
+ return error;
+ }
+ if (!XFS_DFORK_Q(dip))
+ return 0;
+
+ ASSERT(ip->i_afp == NULL);
+ ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_SLEEP | KM_NOFS);
+
+ switch (dip->di_aformat) {
+ case XFS_DINODE_FMT_LOCAL:
+ atp = (xfs_attr_shortform_t *)XFS_DFORK_APTR(dip);
+ size = be16_to_cpu(atp->hdr.totsize);
+
+ if (unlikely(size < sizeof(struct xfs_attr_sf_hdr))) {
+ xfs_warn(ip->i_mount,
+ "corrupt inode %Lu (bad attr fork size %Ld).",
+ (unsigned long long) ip->i_ino,
+ (long long) size);
+ XFS_CORRUPTION_ERROR("xfs_iformat(8)",
+ XFS_ERRLEVEL_LOW,
+ ip->i_mount, dip);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ error = xfs_iformat_local(ip, dip, XFS_ATTR_FORK, size);
+ break;
+ case XFS_DINODE_FMT_EXTENTS:
+ error = xfs_iformat_extents(ip, dip, XFS_ATTR_FORK);
+ break;
+ case XFS_DINODE_FMT_BTREE:
+ error = xfs_iformat_btree(ip, dip, XFS_ATTR_FORK);
+ break;
+ default:
+ error = XFS_ERROR(EFSCORRUPTED);
+ break;
+ }
+ if (error) {
+ kmem_zone_free(xfs_ifork_zone, ip->i_afp);
+ ip->i_afp = NULL;
+ xfs_idestroy_fork(ip, XFS_DATA_FORK);
+ }
+ return error;
+}
+
+/*
+ * The file is in-lined in the on-disk inode.
+ * If it fits into if_inline_data, then copy
+ * it there, otherwise allocate a buffer for it
+ * and copy the data there. Either way, set
+ * if_data to point at the data.
+ * If we allocate a buffer for the data, make
+ * sure that its size is a multiple of 4 and
+ * record the real size in i_real_bytes.
+ */
+STATIC int
+xfs_iformat_local(
+ xfs_inode_t *ip,
+ xfs_dinode_t *dip,
+ int whichfork,
+ int size)
+{
+ xfs_ifork_t *ifp;
+ int real_size;
+
+ /*
+ * If the size is unreasonable, then something
+ * is wrong and we just bail out rather than crash in
+ * kmem_alloc() or memcpy() below.
+ */
+ if (unlikely(size > XFS_DFORK_SIZE(dip, ip->i_mount, whichfork))) {
+ xfs_warn(ip->i_mount,
+ "corrupt inode %Lu (bad size %d for local fork, size = %d).",
+ (unsigned long long) ip->i_ino, size,
+ XFS_DFORK_SIZE(dip, ip->i_mount, whichfork));
+ XFS_CORRUPTION_ERROR("xfs_iformat_local", XFS_ERRLEVEL_LOW,
+ ip->i_mount, dip);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ real_size = 0;
+ if (size == 0)
+ ifp->if_u1.if_data = NULL;
+ else if (size <= sizeof(ifp->if_u2.if_inline_data))
+ ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
+ else {
+ real_size = roundup(size, 4);
+ ifp->if_u1.if_data = kmem_alloc(real_size, KM_SLEEP | KM_NOFS);
+ }
+ ifp->if_bytes = size;
+ ifp->if_real_bytes = real_size;
+ if (size)
+ memcpy(ifp->if_u1.if_data, XFS_DFORK_PTR(dip, whichfork), size);
+ ifp->if_flags &= ~XFS_IFEXTENTS;
+ ifp->if_flags |= XFS_IFINLINE;
+ return 0;
+}
+
+/*
+ * The file consists of a set of extents all
+ * of which fit into the on-disk inode.
+ * If there are few enough extents to fit into
+ * the if_inline_ext, then copy them there.
+ * Otherwise allocate a buffer for them and copy
+ * them into it. Either way, set if_extents
+ * to point at the extents.
+ */
+STATIC int
+xfs_iformat_extents(
+ xfs_inode_t *ip,
+ xfs_dinode_t *dip,
+ int whichfork)
+{
+ xfs_bmbt_rec_t *dp;
+ xfs_ifork_t *ifp;
+ int nex;
+ int size;
+ int i;
+
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ nex = XFS_DFORK_NEXTENTS(dip, whichfork);
+ size = nex * (uint)sizeof(xfs_bmbt_rec_t);
+
+ /*
+ * If the number of extents is unreasonable, then something
+ * is wrong and we just bail out rather than crash in
+ * kmem_alloc() or memcpy() below.
+ */
+ if (unlikely(size < 0 || size > XFS_DFORK_SIZE(dip, ip->i_mount, whichfork))) {
+ xfs_warn(ip->i_mount, "corrupt inode %Lu ((a)extents = %d).",
+ (unsigned long long) ip->i_ino, nex);
+ XFS_CORRUPTION_ERROR("xfs_iformat_extents(1)", XFS_ERRLEVEL_LOW,
+ ip->i_mount, dip);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ ifp->if_real_bytes = 0;
+ if (nex == 0)
+ ifp->if_u1.if_extents = NULL;
+ else if (nex <= XFS_INLINE_EXTS)
+ ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext;
+ else
+ xfs_iext_add(ifp, 0, nex);
+
+ ifp->if_bytes = size;
+ if (size) {
+ dp = (xfs_bmbt_rec_t *) XFS_DFORK_PTR(dip, whichfork);
+ xfs_validate_extents(ifp, nex, XFS_EXTFMT_INODE(ip));
+ for (i = 0; i < nex; i++, dp++) {
+ xfs_bmbt_rec_host_t *ep = xfs_iext_get_ext(ifp, i);
+ ep->l0 = get_unaligned_be64(&dp->l0);
+ ep->l1 = get_unaligned_be64(&dp->l1);
+ }
+ XFS_BMAP_TRACE_EXLIST(ip, nex, whichfork);
+ if (whichfork != XFS_DATA_FORK ||
+ XFS_EXTFMT_INODE(ip) == XFS_EXTFMT_NOSTATE)
+ if (unlikely(xfs_check_nostate_extents(
+ ifp, 0, nex))) {
+ XFS_ERROR_REPORT("xfs_iformat_extents(2)",
+ XFS_ERRLEVEL_LOW,
+ ip->i_mount);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+ }
+ ifp->if_flags |= XFS_IFEXTENTS;
+ return 0;
+}
+
+/*
+ * The file has too many extents to fit into
+ * the inode, so they are in B-tree format.
+ * Allocate a buffer for the root of the B-tree
+ * and copy the root into it. The i_extents
+ * field will remain NULL until all of the
+ * extents are read in (when they are needed).
+ */
+STATIC int
+xfs_iformat_btree(
+ xfs_inode_t *ip,
+ xfs_dinode_t *dip,
+ int whichfork)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ xfs_bmdr_block_t *dfp;
+ xfs_ifork_t *ifp;
+ /* REFERENCED */
+ int nrecs;
+ int size;
+
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ dfp = (xfs_bmdr_block_t *)XFS_DFORK_PTR(dip, whichfork);
+ size = XFS_BMAP_BROOT_SPACE(mp, dfp);
+ nrecs = be16_to_cpu(dfp->bb_numrecs);
+
+ /*
+ * blow out if -- fork has less extents than can fit in
+ * fork (fork shouldn't be a btree format), root btree
+ * block has more records than can fit into the fork,
+ * or the number of extents is greater than the number of
+ * blocks.
+ */
+ if (unlikely(XFS_IFORK_NEXTENTS(ip, whichfork) <=
+ XFS_IFORK_MAXEXT(ip, whichfork) ||
+ XFS_BMDR_SPACE_CALC(nrecs) >
+ XFS_DFORK_SIZE(dip, mp, whichfork) ||
+ XFS_IFORK_NEXTENTS(ip, whichfork) > ip->i_d.di_nblocks)) {
+ xfs_warn(mp, "corrupt inode %Lu (btree).",
+ (unsigned long long) ip->i_ino);
+ XFS_CORRUPTION_ERROR("xfs_iformat_btree", XFS_ERRLEVEL_LOW,
+ mp, dip);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ ifp->if_broot_bytes = size;
+ ifp->if_broot = kmem_alloc(size, KM_SLEEP | KM_NOFS);
+ ASSERT(ifp->if_broot != NULL);
+ /*
+ * Copy and convert from the on-disk structure
+ * to the in-memory structure.
+ */
+ xfs_bmdr_to_bmbt(ip, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, whichfork),
+ ifp->if_broot, size);
+ ifp->if_flags &= ~XFS_IFEXTENTS;
+ ifp->if_flags |= XFS_IFBROOT;
+
+ return 0;
+}
+
+/*
+ * Read in extents from a btree-format inode.
+ * Allocate and fill in if_extents. Real work is done in xfs_bmap.c.
+ */
+int
+xfs_iread_extents(
+ xfs_trans_t *tp,
+ xfs_inode_t *ip,
+ int whichfork)
+{
+ int error;
+ xfs_ifork_t *ifp;
+ xfs_extnum_t nextents;
+
+ if (unlikely(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE)) {
+ XFS_ERROR_REPORT("xfs_iread_extents", XFS_ERRLEVEL_LOW,
+ ip->i_mount);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+ nextents = XFS_IFORK_NEXTENTS(ip, whichfork);
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+
+ /*
+ * We know that the size is valid (it's checked in iformat_btree)
+ */
+ ifp->if_bytes = ifp->if_real_bytes = 0;
+ ifp->if_flags |= XFS_IFEXTENTS;
+ xfs_iext_add(ifp, 0, nextents);
+ error = xfs_bmap_read_extents(tp, ip, whichfork);
+ if (error) {
+ xfs_iext_destroy(ifp);
+ ifp->if_flags &= ~XFS_IFEXTENTS;
+ return error;
+ }
+ xfs_validate_extents(ifp, nextents, XFS_EXTFMT_INODE(ip));
+ return 0;
+}
+
+/*
+ * Reallocate the space for if_broot based on the number of records
+ * being added or deleted as indicated in rec_diff. Move the records
+ * and pointers in if_broot to fit the new size. When shrinking this
+ * will eliminate holes between the records and pointers created by
+ * the caller. When growing this will create holes to be filled in
+ * by the caller.
+ *
+ * The caller must not request to add more records than would fit in
+ * the on-disk inode root. If the if_broot is currently NULL, then
+ * if we adding records one will be allocated. The caller must also
+ * not request that the number of records go below zero, although
+ * it can go to zero.
+ *
+ * ip -- the inode whose if_broot area is changing
+ * ext_diff -- the change in the number of records, positive or negative,
+ * requested for the if_broot array.
+ */
+void
+xfs_iroot_realloc(
+ xfs_inode_t *ip,
+ int rec_diff,
+ int whichfork)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ int cur_max;
+ xfs_ifork_t *ifp;
+ struct xfs_btree_block *new_broot;
+ int new_max;
+ size_t new_size;
+ char *np;
+ char *op;
+
+ /*
+ * Handle the degenerate case quietly.
+ */
+ if (rec_diff == 0) {
+ return;
+ }
+
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ if (rec_diff > 0) {
+ /*
+ * If there wasn't any memory allocated before, just
+ * allocate it now and get out.
+ */
+ if (ifp->if_broot_bytes == 0) {
+ new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, rec_diff);
+ ifp->if_broot = kmem_alloc(new_size, KM_SLEEP | KM_NOFS);
+ ifp->if_broot_bytes = (int)new_size;
+ return;
+ }
+
+ /*
+ * If there is already an existing if_broot, then we need
+ * to realloc() it and shift the pointers to their new
+ * location. The records don't change location because
+ * they are kept butted up against the btree block header.
+ */
+ cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
+ new_max = cur_max + rec_diff;
+ new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
+ ifp->if_broot = kmem_realloc(ifp->if_broot, new_size,
+ XFS_BMAP_BROOT_SPACE_CALC(mp, cur_max),
+ KM_SLEEP | KM_NOFS);
+ op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
+ ifp->if_broot_bytes);
+ np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
+ (int)new_size);
+ ifp->if_broot_bytes = (int)new_size;
+ ASSERT(ifp->if_broot_bytes <=
+ XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ(ip));
+ memmove(np, op, cur_max * (uint)sizeof(xfs_dfsbno_t));
+ return;
+ }
+
+ /*
+ * rec_diff is less than 0. In this case, we are shrinking the
+ * if_broot buffer. It must already exist. If we go to zero
+ * records, just get rid of the root and clear the status bit.
+ */
+ ASSERT((ifp->if_broot != NULL) && (ifp->if_broot_bytes > 0));
+ cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
+ new_max = cur_max + rec_diff;
+ ASSERT(new_max >= 0);
+ if (new_max > 0)
+ new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
+ else
+ new_size = 0;
+ if (new_size > 0) {
+ new_broot = kmem_alloc(new_size, KM_SLEEP | KM_NOFS);
+ /*
+ * First copy over the btree block header.
+ */
+ memcpy(new_broot, ifp->if_broot,
+ XFS_BMBT_BLOCK_LEN(ip->i_mount));
+ } else {
+ new_broot = NULL;
+ ifp->if_flags &= ~XFS_IFBROOT;
+ }
+
+ /*
+ * Only copy the records and pointers if there are any.
+ */
+ if (new_max > 0) {
+ /*
+ * First copy the records.
+ */
+ op = (char *)XFS_BMBT_REC_ADDR(mp, ifp->if_broot, 1);
+ np = (char *)XFS_BMBT_REC_ADDR(mp, new_broot, 1);
+ memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_rec_t));
+
+ /*
+ * Then copy the pointers.
+ */
+ op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
+ ifp->if_broot_bytes);
+ np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, new_broot, 1,
+ (int)new_size);
+ memcpy(np, op, new_max * (uint)sizeof(xfs_dfsbno_t));
+ }
+ kmem_free(ifp->if_broot);
+ ifp->if_broot = new_broot;
+ ifp->if_broot_bytes = (int)new_size;
+ ASSERT(ifp->if_broot_bytes <=
+ XFS_IFORK_SIZE(ip, whichfork) + XFS_BROOT_SIZE_ADJ(ip));
+ return;
+}
+
+
+/*
+ * This is called when the amount of space needed for if_data
+ * is increased or decreased. The change in size is indicated by
+ * the number of bytes that need to be added or deleted in the
+ * byte_diff parameter.
+ *
+ * If the amount of space needed has decreased below the size of the
+ * inline buffer, then switch to using the inline buffer. Otherwise,
+ * use kmem_realloc() or kmem_alloc() to adjust the size of the buffer
+ * to what is needed.
+ *
+ * ip -- the inode whose if_data area is changing
+ * byte_diff -- the change in the number of bytes, positive or negative,
+ * requested for the if_data array.
+ */
+void
+xfs_idata_realloc(
+ xfs_inode_t *ip,
+ int byte_diff,
+ int whichfork)
+{
+ xfs_ifork_t *ifp;
+ int new_size;
+ int real_size;
+
+ if (byte_diff == 0) {
+ return;
+ }
+
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ new_size = (int)ifp->if_bytes + byte_diff;
+ ASSERT(new_size >= 0);
+
+ if (new_size == 0) {
+ if (ifp->if_u1.if_data != ifp->if_u2.if_inline_data) {
+ kmem_free(ifp->if_u1.if_data);
+ }
+ ifp->if_u1.if_data = NULL;
+ real_size = 0;
+ } else if (new_size <= sizeof(ifp->if_u2.if_inline_data)) {
+ /*
+ * If the valid extents/data can fit in if_inline_ext/data,
+ * copy them from the malloc'd vector and free it.
+ */
+ if (ifp->if_u1.if_data == NULL) {
+ ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
+ } else if (ifp->if_u1.if_data != ifp->if_u2.if_inline_data) {
+ ASSERT(ifp->if_real_bytes != 0);
+ memcpy(ifp->if_u2.if_inline_data, ifp->if_u1.if_data,
+ new_size);
+ kmem_free(ifp->if_u1.if_data);
+ ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
+ }
+ real_size = 0;
+ } else {
+ /*
+ * Stuck with malloc/realloc.
+ * For inline data, the underlying buffer must be
+ * a multiple of 4 bytes in size so that it can be
+ * logged and stay on word boundaries. We enforce
+ * that here.
+ */
+ real_size = roundup(new_size, 4);
+ if (ifp->if_u1.if_data == NULL) {
+ ASSERT(ifp->if_real_bytes == 0);
+ ifp->if_u1.if_data = kmem_alloc(real_size,
+ KM_SLEEP | KM_NOFS);
+ } else if (ifp->if_u1.if_data != ifp->if_u2.if_inline_data) {
+ /*
+ * Only do the realloc if the underlying size
+ * is really changing.
+ */
+ if (ifp->if_real_bytes != real_size) {
+ ifp->if_u1.if_data =
+ kmem_realloc(ifp->if_u1.if_data,
+ real_size,
+ ifp->if_real_bytes,
+ KM_SLEEP | KM_NOFS);
+ }
+ } else {
+ ASSERT(ifp->if_real_bytes == 0);
+ ifp->if_u1.if_data = kmem_alloc(real_size,
+ KM_SLEEP | KM_NOFS);
+ memcpy(ifp->if_u1.if_data, ifp->if_u2.if_inline_data,
+ ifp->if_bytes);
+ }
+ }
+ ifp->if_real_bytes = real_size;
+ ifp->if_bytes = new_size;
+ ASSERT(ifp->if_bytes <= XFS_IFORK_SIZE(ip, whichfork));
+}
+
+void
+xfs_idestroy_fork(
+ xfs_inode_t *ip,
+ int whichfork)
+{
+ xfs_ifork_t *ifp;
+
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ if (ifp->if_broot != NULL) {
+ kmem_free(ifp->if_broot);
+ ifp->if_broot = NULL;
+ }
+
+ /*
+ * If the format is local, then we can't have an extents
+ * array so just look for an inline data array. If we're
+ * not local then we may or may not have an extents list,
+ * so check and free it up if we do.
+ */
+ if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
+ if ((ifp->if_u1.if_data != ifp->if_u2.if_inline_data) &&
+ (ifp->if_u1.if_data != NULL)) {
+ ASSERT(ifp->if_real_bytes != 0);
+ kmem_free(ifp->if_u1.if_data);
+ ifp->if_u1.if_data = NULL;
+ ifp->if_real_bytes = 0;
+ }
+ } else if ((ifp->if_flags & XFS_IFEXTENTS) &&
+ ((ifp->if_flags & XFS_IFEXTIREC) ||
+ ((ifp->if_u1.if_extents != NULL) &&
+ (ifp->if_u1.if_extents != ifp->if_u2.if_inline_ext)))) {
+ ASSERT(ifp->if_real_bytes != 0);
+ xfs_iext_destroy(ifp);
+ }
+ ASSERT(ifp->if_u1.if_extents == NULL ||
+ ifp->if_u1.if_extents == ifp->if_u2.if_inline_ext);
+ ASSERT(ifp->if_real_bytes == 0);
+ if (whichfork == XFS_ATTR_FORK) {
+ kmem_zone_free(xfs_ifork_zone, ip->i_afp);
+ ip->i_afp = NULL;
+ }
+}
+
+/*
+ * xfs_iextents_copy()
+ *
+ * This is called to copy the REAL extents (as opposed to the delayed
+ * allocation extents) from the inode into the given buffer. It
+ * returns the number of bytes copied into the buffer.
+ *
+ * If there are no delayed allocation extents, then we can just
+ * memcpy() the extents into the buffer. Otherwise, we need to
+ * examine each extent in turn and skip those which are delayed.
+ */
+int
+xfs_iextents_copy(
+ xfs_inode_t *ip,
+ xfs_bmbt_rec_t *dp,
+ int whichfork)
+{
+ int copied;
+ int i;
+ xfs_ifork_t *ifp;
+ int nrecs;
+ xfs_fsblock_t start_block;
+
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
+ ASSERT(ifp->if_bytes > 0);
+
+ nrecs = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
+ XFS_BMAP_TRACE_EXLIST(ip, nrecs, whichfork);
+ ASSERT(nrecs > 0);
+
+ /*
+ * There are some delayed allocation extents in the
+ * inode, so copy the extents one at a time and skip
+ * the delayed ones. There must be at least one
+ * non-delayed extent.
+ */
+ copied = 0;
+ for (i = 0; i < nrecs; i++) {
+ xfs_bmbt_rec_host_t *ep = xfs_iext_get_ext(ifp, i);
+ start_block = xfs_bmbt_get_startblock(ep);
+ if (isnullstartblock(start_block)) {
+ /*
+ * It's a delayed allocation extent, so skip it.
+ */
+ continue;
+ }
+
+ /* Translate to on disk format */
+ put_unaligned_be64(ep->l0, &dp->l0);
+ put_unaligned_be64(ep->l1, &dp->l1);
+ dp++;
+ copied++;
+ }
+ ASSERT(copied != 0);
+ xfs_validate_extents(ifp, copied, XFS_EXTFMT_INODE(ip));
+
+ return (copied * (uint)sizeof(xfs_bmbt_rec_t));
+}
+
+/*
+ * Each of the following cases stores data into the same region
+ * of the on-disk inode, so only one of them can be valid at
+ * any given time. While it is possible to have conflicting formats
+ * and log flags, e.g. having XFS_ILOG_?DATA set when the fork is
+ * in EXTENTS format, this can only happen when the fork has
+ * changed formats after being modified but before being flushed.
+ * In these cases, the format always takes precedence, because the
+ * format indicates the current state of the fork.
+ */
+void
+xfs_iflush_fork(
+ xfs_inode_t *ip,
+ xfs_dinode_t *dip,
+ xfs_inode_log_item_t *iip,
+ int whichfork,
+ xfs_buf_t *bp)
+{
+ char *cp;
+ xfs_ifork_t *ifp;
+ xfs_mount_t *mp;
+ static const short brootflag[2] =
+ { XFS_ILOG_DBROOT, XFS_ILOG_ABROOT };
+ static const short dataflag[2] =
+ { XFS_ILOG_DDATA, XFS_ILOG_ADATA };
+ static const short extflag[2] =
+ { XFS_ILOG_DEXT, XFS_ILOG_AEXT };
+
+ if (!iip)
+ return;
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ /*
+ * This can happen if we gave up in iformat in an error path,
+ * for the attribute fork.
+ */
+ if (!ifp) {
+ ASSERT(whichfork == XFS_ATTR_FORK);
+ return;
+ }
+ cp = XFS_DFORK_PTR(dip, whichfork);
+ mp = ip->i_mount;
+ switch (XFS_IFORK_FORMAT(ip, whichfork)) {
+ case XFS_DINODE_FMT_LOCAL:
+ if ((iip->ili_fields & dataflag[whichfork]) &&
+ (ifp->if_bytes > 0)) {
+ ASSERT(ifp->if_u1.if_data != NULL);
+ ASSERT(ifp->if_bytes <= XFS_IFORK_SIZE(ip, whichfork));
+ memcpy(cp, ifp->if_u1.if_data, ifp->if_bytes);
+ }
+ break;
+
+ case XFS_DINODE_FMT_EXTENTS:
+ ASSERT((ifp->if_flags & XFS_IFEXTENTS) ||
+ !(iip->ili_fields & extflag[whichfork]));
+ if ((iip->ili_fields & extflag[whichfork]) &&
+ (ifp->if_bytes > 0)) {
+ ASSERT(xfs_iext_get_ext(ifp, 0));
+ ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) > 0);
+ (void)xfs_iextents_copy(ip, (xfs_bmbt_rec_t *)cp,
+ whichfork);
+ }
+ break;
+
+ case XFS_DINODE_FMT_BTREE:
+ if ((iip->ili_fields & brootflag[whichfork]) &&
+ (ifp->if_broot_bytes > 0)) {
+ ASSERT(ifp->if_broot != NULL);
+ ASSERT(ifp->if_broot_bytes <=
+ (XFS_IFORK_SIZE(ip, whichfork) +
+ XFS_BROOT_SIZE_ADJ(ip)));
+ xfs_bmbt_to_bmdr(mp, ifp->if_broot, ifp->if_broot_bytes,
+ (xfs_bmdr_block_t *)cp,
+ XFS_DFORK_SIZE(dip, mp, whichfork));
+ }
+ break;
+
+ case XFS_DINODE_FMT_DEV:
+ if (iip->ili_fields & XFS_ILOG_DEV) {
+ ASSERT(whichfork == XFS_DATA_FORK);
+ xfs_dinode_put_rdev(dip, ip->i_df.if_u2.if_rdev);
+ }
+ break;
+
+ case XFS_DINODE_FMT_UUID:
+ if (iip->ili_fields & XFS_ILOG_UUID) {
+ ASSERT(whichfork == XFS_DATA_FORK);
+ memcpy(XFS_DFORK_DPTR(dip),
+ &ip->i_df.if_u2.if_uuid,
+ sizeof(uuid_t));
+ }
+ break;
+
+ default:
+ ASSERT(0);
+ break;
+ }
+}
+
+/*
+ * Return a pointer to the extent record at file index idx.
+ */
+xfs_bmbt_rec_host_t *
+xfs_iext_get_ext(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ xfs_extnum_t idx) /* index of target extent */
+{
+ ASSERT(idx >= 0);
+ ASSERT(idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t));
+
+ if ((ifp->if_flags & XFS_IFEXTIREC) && (idx == 0)) {
+ return ifp->if_u1.if_ext_irec->er_extbuf;
+ } else if (ifp->if_flags & XFS_IFEXTIREC) {
+ xfs_ext_irec_t *erp; /* irec pointer */
+ int erp_idx = 0; /* irec index */
+ xfs_extnum_t page_idx = idx; /* ext index in target list */
+
+ erp = xfs_iext_idx_to_irec(ifp, &page_idx, &erp_idx, 0);
+ return &erp->er_extbuf[page_idx];
+ } else if (ifp->if_bytes) {
+ return &ifp->if_u1.if_extents[idx];
+ } else {
+ return NULL;
+ }
+}
+
+/*
+ * Create, destroy, or resize a linear (direct) block of extents.
+ */
+STATIC void
+xfs_iext_realloc_direct(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ int new_size) /* new size of extents */
+{
+ int rnew_size; /* real new size of extents */
+
+ rnew_size = new_size;
+
+ ASSERT(!(ifp->if_flags & XFS_IFEXTIREC) ||
+ ((new_size >= 0) && (new_size <= XFS_IEXT_BUFSZ) &&
+ (new_size != ifp->if_real_bytes)));
+
+ /* Free extent records */
+ if (new_size == 0) {
+ xfs_iext_destroy(ifp);
+ }
+ /* Resize direct extent list and zero any new bytes */
+ else if (ifp->if_real_bytes) {
+ /* Check if extents will fit inside the inode */
+ if (new_size <= XFS_INLINE_EXTS * sizeof(xfs_bmbt_rec_t)) {
+ xfs_iext_direct_to_inline(ifp, new_size /
+ (uint)sizeof(xfs_bmbt_rec_t));
+ ifp->if_bytes = new_size;
+ return;
+ }
+ if (!is_power_of_2(new_size)){
+ rnew_size = roundup_pow_of_two(new_size);
+ }
+ if (rnew_size != ifp->if_real_bytes) {
+ ifp->if_u1.if_extents =
+ kmem_realloc(ifp->if_u1.if_extents,
+ rnew_size,
+ ifp->if_real_bytes, KM_NOFS);
+ }
+ if (rnew_size > ifp->if_real_bytes) {
+ memset(&ifp->if_u1.if_extents[ifp->if_bytes /
+ (uint)sizeof(xfs_bmbt_rec_t)], 0,
+ rnew_size - ifp->if_real_bytes);
+ }
+ }
+ /*
+ * Switch from the inline extent buffer to a direct
+ * extent list. Be sure to include the inline extent
+ * bytes in new_size.
+ */
+ else {
+ new_size += ifp->if_bytes;
+ if (!is_power_of_2(new_size)) {
+ rnew_size = roundup_pow_of_two(new_size);
+ }
+ xfs_iext_inline_to_direct(ifp, rnew_size);
+ }
+ ifp->if_real_bytes = rnew_size;
+ ifp->if_bytes = new_size;
+}
+
+/*
+ * Insert new item(s) into the extent records for incore inode
+ * fork 'ifp'. 'count' new items are inserted at index 'idx'.
+ */
+void
+xfs_iext_insert(
+ xfs_inode_t *ip, /* incore inode pointer */
+ xfs_extnum_t idx, /* starting index of new items */
+ xfs_extnum_t count, /* number of inserted items */
+ xfs_bmbt_irec_t *new, /* items to insert */
+ int state) /* type of extent conversion */
+{
+ xfs_ifork_t *ifp = (state & BMAP_ATTRFORK) ? ip->i_afp : &ip->i_df;
+ xfs_extnum_t i; /* extent record index */
+
+ trace_xfs_iext_insert(ip, idx, new, state, _RET_IP_);
+
+ ASSERT(ifp->if_flags & XFS_IFEXTENTS);
+ xfs_iext_add(ifp, idx, count);
+ for (i = idx; i < idx + count; i++, new++)
+ xfs_bmbt_set_all(xfs_iext_get_ext(ifp, i), new);
+}
+
+/*
+ * This is called when the amount of space required for incore file
+ * extents needs to be increased. The ext_diff parameter stores the
+ * number of new extents being added and the idx parameter contains
+ * the extent index where the new extents will be added. If the new
+ * extents are being appended, then we just need to (re)allocate and
+ * initialize the space. Otherwise, if the new extents are being
+ * inserted into the middle of the existing entries, a bit more work
+ * is required to make room for the new extents to be inserted. The
+ * caller is responsible for filling in the new extent entries upon
+ * return.
+ */
+void
+xfs_iext_add(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ xfs_extnum_t idx, /* index to begin adding exts */
+ int ext_diff) /* number of extents to add */
+{
+ int byte_diff; /* new bytes being added */
+ int new_size; /* size of extents after adding */
+ xfs_extnum_t nextents; /* number of extents in file */
+
+ nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
+ ASSERT((idx >= 0) && (idx <= nextents));
+ byte_diff = ext_diff * sizeof(xfs_bmbt_rec_t);
+ new_size = ifp->if_bytes + byte_diff;
+ /*
+ * If the new number of extents (nextents + ext_diff)
+ * fits inside the inode, then continue to use the inline
+ * extent buffer.
+ */
+ if (nextents + ext_diff <= XFS_INLINE_EXTS) {
+ if (idx < nextents) {
+ memmove(&ifp->if_u2.if_inline_ext[idx + ext_diff],
+ &ifp->if_u2.if_inline_ext[idx],
+ (nextents - idx) * sizeof(xfs_bmbt_rec_t));
+ memset(&ifp->if_u2.if_inline_ext[idx], 0, byte_diff);
+ }
+ ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext;
+ ifp->if_real_bytes = 0;
+ }
+ /*
+ * Otherwise use a linear (direct) extent list.
+ * If the extents are currently inside the inode,
+ * xfs_iext_realloc_direct will switch us from
+ * inline to direct extent allocation mode.
+ */
+ else if (nextents + ext_diff <= XFS_LINEAR_EXTS) {
+ xfs_iext_realloc_direct(ifp, new_size);
+ if (idx < nextents) {
+ memmove(&ifp->if_u1.if_extents[idx + ext_diff],
+ &ifp->if_u1.if_extents[idx],
+ (nextents - idx) * sizeof(xfs_bmbt_rec_t));
+ memset(&ifp->if_u1.if_extents[idx], 0, byte_diff);
+ }
+ }
+ /* Indirection array */
+ else {
+ xfs_ext_irec_t *erp;
+ int erp_idx = 0;
+ int page_idx = idx;
+
+ ASSERT(nextents + ext_diff > XFS_LINEAR_EXTS);
+ if (ifp->if_flags & XFS_IFEXTIREC) {
+ erp = xfs_iext_idx_to_irec(ifp, &page_idx, &erp_idx, 1);
+ } else {
+ xfs_iext_irec_init(ifp);
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ erp = ifp->if_u1.if_ext_irec;
+ }
+ /* Extents fit in target extent page */
+ if (erp && erp->er_extcount + ext_diff <= XFS_LINEAR_EXTS) {
+ if (page_idx < erp->er_extcount) {
+ memmove(&erp->er_extbuf[page_idx + ext_diff],
+ &erp->er_extbuf[page_idx],
+ (erp->er_extcount - page_idx) *
+ sizeof(xfs_bmbt_rec_t));
+ memset(&erp->er_extbuf[page_idx], 0, byte_diff);
+ }
+ erp->er_extcount += ext_diff;
+ xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, ext_diff);
+ }
+ /* Insert a new extent page */
+ else if (erp) {
+ xfs_iext_add_indirect_multi(ifp,
+ erp_idx, page_idx, ext_diff);
+ }
+ /*
+ * If extent(s) are being appended to the last page in
+ * the indirection array and the new extent(s) don't fit
+ * in the page, then erp is NULL and erp_idx is set to
+ * the next index needed in the indirection array.
+ */
+ else {
+ int count = ext_diff;
+
+ while (count) {
+ erp = xfs_iext_irec_new(ifp, erp_idx);
+ erp->er_extcount = count;
+ count -= MIN(count, (int)XFS_LINEAR_EXTS);
+ if (count) {
+ erp_idx++;
+ }
+ }
+ }
+ }
+ ifp->if_bytes = new_size;
+}
+
+/*
+ * This is called when incore extents are being added to the indirection
+ * array and the new extents do not fit in the target extent list. The
+ * erp_idx parameter contains the irec index for the target extent list
+ * in the indirection array, and the idx parameter contains the extent
+ * index within the list. The number of extents being added is stored
+ * in the count parameter.
+ *
+ * |-------| |-------|
+ * | | | | idx - number of extents before idx
+ * | idx | | count |
+ * | | | | count - number of extents being inserted at idx
+ * |-------| |-------|
+ * | count | | nex2 | nex2 - number of extents after idx + count
+ * |-------| |-------|
+ */
+void
+xfs_iext_add_indirect_multi(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ int erp_idx, /* target extent irec index */
+ xfs_extnum_t idx, /* index within target list */
+ int count) /* new extents being added */
+{
+ int byte_diff; /* new bytes being added */
+ xfs_ext_irec_t *erp; /* pointer to irec entry */
+ xfs_extnum_t ext_diff; /* number of extents to add */
+ xfs_extnum_t ext_cnt; /* new extents still needed */
+ xfs_extnum_t nex2; /* extents after idx + count */
+ xfs_bmbt_rec_t *nex2_ep = NULL; /* temp list for nex2 extents */
+ int nlists; /* number of irec's (lists) */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ erp = &ifp->if_u1.if_ext_irec[erp_idx];
+ nex2 = erp->er_extcount - idx;
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+
+ /*
+ * Save second part of target extent list
+ * (all extents past */
+ if (nex2) {
+ byte_diff = nex2 * sizeof(xfs_bmbt_rec_t);
+ nex2_ep = (xfs_bmbt_rec_t *) kmem_alloc(byte_diff, KM_NOFS);
+ memmove(nex2_ep, &erp->er_extbuf[idx], byte_diff);
+ erp->er_extcount -= nex2;
+ xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, -nex2);
+ memset(&erp->er_extbuf[idx], 0, byte_diff);
+ }
+
+ /*
+ * Add the new extents to the end of the target
+ * list, then allocate new irec record(s) and
+ * extent buffer(s) as needed to store the rest
+ * of the new extents.
+ */
+ ext_cnt = count;
+ ext_diff = MIN(ext_cnt, (int)XFS_LINEAR_EXTS - erp->er_extcount);
+ if (ext_diff) {
+ erp->er_extcount += ext_diff;
+ xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, ext_diff);
+ ext_cnt -= ext_diff;
+ }
+ while (ext_cnt) {
+ erp_idx++;
+ erp = xfs_iext_irec_new(ifp, erp_idx);
+ ext_diff = MIN(ext_cnt, (int)XFS_LINEAR_EXTS);
+ erp->er_extcount = ext_diff;
+ xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, ext_diff);
+ ext_cnt -= ext_diff;
+ }
+
+ /* Add nex2 extents back to indirection array */
+ if (nex2) {
+ xfs_extnum_t ext_avail;
+ int i;
+
+ byte_diff = nex2 * sizeof(xfs_bmbt_rec_t);
+ ext_avail = XFS_LINEAR_EXTS - erp->er_extcount;
+ i = 0;
+ /*
+ * If nex2 extents fit in the current page, append
+ * nex2_ep after the new extents.
+ */
+ if (nex2 <= ext_avail) {
+ i = erp->er_extcount;
+ }
+ /*
+ * Otherwise, check if space is available in the
+ * next page.
+ */
+ else if ((erp_idx < nlists - 1) &&
+ (nex2 <= (ext_avail = XFS_LINEAR_EXTS -
+ ifp->if_u1.if_ext_irec[erp_idx+1].er_extcount))) {
+ erp_idx++;
+ erp++;
+ /* Create a hole for nex2 extents */
+ memmove(&erp->er_extbuf[nex2], erp->er_extbuf,
+ erp->er_extcount * sizeof(xfs_bmbt_rec_t));
+ }
+ /*
+ * Final choice, create a new extent page for
+ * nex2 extents.
+ */
+ else {
+ erp_idx++;
+ erp = xfs_iext_irec_new(ifp, erp_idx);
+ }
+ memmove(&erp->er_extbuf[i], nex2_ep, byte_diff);
+ kmem_free(nex2_ep);
+ erp->er_extcount += nex2;
+ xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, nex2);
+ }
+}
+
+/*
+ * This is called when the amount of space required for incore file
+ * extents needs to be decreased. The ext_diff parameter stores the
+ * number of extents to be removed and the idx parameter contains
+ * the extent index where the extents will be removed from.
+ *
+ * If the amount of space needed has decreased below the linear
+ * limit, XFS_IEXT_BUFSZ, then switch to using the contiguous
+ * extent array. Otherwise, use kmem_realloc() to adjust the
+ * size to what is needed.
+ */
+void
+xfs_iext_remove(
+ xfs_inode_t *ip, /* incore inode pointer */
+ xfs_extnum_t idx, /* index to begin removing exts */
+ int ext_diff, /* number of extents to remove */
+ int state) /* type of extent conversion */
+{
+ xfs_ifork_t *ifp = (state & BMAP_ATTRFORK) ? ip->i_afp : &ip->i_df;
+ xfs_extnum_t nextents; /* number of extents in file */
+ int new_size; /* size of extents after removal */
+
+ trace_xfs_iext_remove(ip, idx, state, _RET_IP_);
+
+ ASSERT(ext_diff > 0);
+ nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
+ new_size = (nextents - ext_diff) * sizeof(xfs_bmbt_rec_t);
+
+ if (new_size == 0) {
+ xfs_iext_destroy(ifp);
+ } else if (ifp->if_flags & XFS_IFEXTIREC) {
+ xfs_iext_remove_indirect(ifp, idx, ext_diff);
+ } else if (ifp->if_real_bytes) {
+ xfs_iext_remove_direct(ifp, idx, ext_diff);
+ } else {
+ xfs_iext_remove_inline(ifp, idx, ext_diff);
+ }
+ ifp->if_bytes = new_size;
+}
+
+/*
+ * This removes ext_diff extents from the inline buffer, beginning
+ * at extent index idx.
+ */
+void
+xfs_iext_remove_inline(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ xfs_extnum_t idx, /* index to begin removing exts */
+ int ext_diff) /* number of extents to remove */
+{
+ int nextents; /* number of extents in file */
+
+ ASSERT(!(ifp->if_flags & XFS_IFEXTIREC));
+ ASSERT(idx < XFS_INLINE_EXTS);
+ nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
+ ASSERT(((nextents - ext_diff) > 0) &&
+ (nextents - ext_diff) < XFS_INLINE_EXTS);
+
+ if (idx + ext_diff < nextents) {
+ memmove(&ifp->if_u2.if_inline_ext[idx],
+ &ifp->if_u2.if_inline_ext[idx + ext_diff],
+ (nextents - (idx + ext_diff)) *
+ sizeof(xfs_bmbt_rec_t));
+ memset(&ifp->if_u2.if_inline_ext[nextents - ext_diff],
+ 0, ext_diff * sizeof(xfs_bmbt_rec_t));
+ } else {
+ memset(&ifp->if_u2.if_inline_ext[idx], 0,
+ ext_diff * sizeof(xfs_bmbt_rec_t));
+ }
+}
+
+/*
+ * This removes ext_diff extents from a linear (direct) extent list,
+ * beginning at extent index idx. If the extents are being removed
+ * from the end of the list (ie. truncate) then we just need to re-
+ * allocate the list to remove the extra space. Otherwise, if the
+ * extents are being removed from the middle of the existing extent
+ * entries, then we first need to move the extent records beginning
+ * at idx + ext_diff up in the list to overwrite the records being
+ * removed, then remove the extra space via kmem_realloc.
+ */
+void
+xfs_iext_remove_direct(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ xfs_extnum_t idx, /* index to begin removing exts */
+ int ext_diff) /* number of extents to remove */
+{
+ xfs_extnum_t nextents; /* number of extents in file */
+ int new_size; /* size of extents after removal */
+
+ ASSERT(!(ifp->if_flags & XFS_IFEXTIREC));
+ new_size = ifp->if_bytes -
+ (ext_diff * sizeof(xfs_bmbt_rec_t));
+ nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
+
+ if (new_size == 0) {
+ xfs_iext_destroy(ifp);
+ return;
+ }
+ /* Move extents up in the list (if needed) */
+ if (idx + ext_diff < nextents) {
+ memmove(&ifp->if_u1.if_extents[idx],
+ &ifp->if_u1.if_extents[idx + ext_diff],
+ (nextents - (idx + ext_diff)) *
+ sizeof(xfs_bmbt_rec_t));
+ }
+ memset(&ifp->if_u1.if_extents[nextents - ext_diff],
+ 0, ext_diff * sizeof(xfs_bmbt_rec_t));
+ /*
+ * Reallocate the direct extent list. If the extents
+ * will fit inside the inode then xfs_iext_realloc_direct
+ * will switch from direct to inline extent allocation
+ * mode for us.
+ */
+ xfs_iext_realloc_direct(ifp, new_size);
+ ifp->if_bytes = new_size;
+}
+
+/*
+ * This is called when incore extents are being removed from the
+ * indirection array and the extents being removed span multiple extent
+ * buffers. The idx parameter contains the file extent index where we
+ * want to begin removing extents, and the count parameter contains
+ * how many extents need to be removed.
+ *
+ * |-------| |-------|
+ * | nex1 | | | nex1 - number of extents before idx
+ * |-------| | count |
+ * | | | | count - number of extents being removed at idx
+ * | count | |-------|
+ * | | | nex2 | nex2 - number of extents after idx + count
+ * |-------| |-------|
+ */
+void
+xfs_iext_remove_indirect(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ xfs_extnum_t idx, /* index to begin removing extents */
+ int count) /* number of extents to remove */
+{
+ xfs_ext_irec_t *erp; /* indirection array pointer */
+ int erp_idx = 0; /* indirection array index */
+ xfs_extnum_t ext_cnt; /* extents left to remove */
+ xfs_extnum_t ext_diff; /* extents to remove in current list */
+ xfs_extnum_t nex1; /* number of extents before idx */
+ xfs_extnum_t nex2; /* extents after idx + count */
+ int page_idx = idx; /* index in target extent list */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ erp = xfs_iext_idx_to_irec(ifp, &page_idx, &erp_idx, 0);
+ ASSERT(erp != NULL);
+ nex1 = page_idx;
+ ext_cnt = count;
+ while (ext_cnt) {
+ nex2 = MAX((erp->er_extcount - (nex1 + ext_cnt)), 0);
+ ext_diff = MIN(ext_cnt, (erp->er_extcount - nex1));
+ /*
+ * Check for deletion of entire list;
+ * xfs_iext_irec_remove() updates extent offsets.
+ */
+ if (ext_diff == erp->er_extcount) {
+ xfs_iext_irec_remove(ifp, erp_idx);
+ ext_cnt -= ext_diff;
+ nex1 = 0;
+ if (ext_cnt) {
+ ASSERT(erp_idx < ifp->if_real_bytes /
+ XFS_IEXT_BUFSZ);
+ erp = &ifp->if_u1.if_ext_irec[erp_idx];
+ nex1 = 0;
+ continue;
+ } else {
+ break;
+ }
+ }
+ /* Move extents up (if needed) */
+ if (nex2) {
+ memmove(&erp->er_extbuf[nex1],
+ &erp->er_extbuf[nex1 + ext_diff],
+ nex2 * sizeof(xfs_bmbt_rec_t));
+ }
+ /* Zero out rest of page */
+ memset(&erp->er_extbuf[nex1 + nex2], 0, (XFS_IEXT_BUFSZ -
+ ((nex1 + nex2) * sizeof(xfs_bmbt_rec_t))));
+ /* Update remaining counters */
+ erp->er_extcount -= ext_diff;
+ xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, -ext_diff);
+ ext_cnt -= ext_diff;
+ nex1 = 0;
+ erp_idx++;
+ erp++;
+ }
+ ifp->if_bytes -= count * sizeof(xfs_bmbt_rec_t);
+ xfs_iext_irec_compact(ifp);
+}
+
+/*
+ * Switch from linear (direct) extent records to inline buffer.
+ */
+void
+xfs_iext_direct_to_inline(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ xfs_extnum_t nextents) /* number of extents in file */
+{
+ ASSERT(ifp->if_flags & XFS_IFEXTENTS);
+ ASSERT(nextents <= XFS_INLINE_EXTS);
+ /*
+ * The inline buffer was zeroed when we switched
+ * from inline to direct extent allocation mode,
+ * so we don't need to clear it here.
+ */
+ memcpy(ifp->if_u2.if_inline_ext, ifp->if_u1.if_extents,
+ nextents * sizeof(xfs_bmbt_rec_t));
+ kmem_free(ifp->if_u1.if_extents);
+ ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext;
+ ifp->if_real_bytes = 0;
+}
+
+/*
+ * Switch from inline buffer to linear (direct) extent records.
+ * new_size should already be rounded up to the next power of 2
+ * by the caller (when appropriate), so use new_size as it is.
+ * However, since new_size may be rounded up, we can't update
+ * if_bytes here. It is the caller's responsibility to update
+ * if_bytes upon return.
+ */
+void
+xfs_iext_inline_to_direct(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ int new_size) /* number of extents in file */
+{
+ ifp->if_u1.if_extents = kmem_alloc(new_size, KM_NOFS);
+ memset(ifp->if_u1.if_extents, 0, new_size);
+ if (ifp->if_bytes) {
+ memcpy(ifp->if_u1.if_extents, ifp->if_u2.if_inline_ext,
+ ifp->if_bytes);
+ memset(ifp->if_u2.if_inline_ext, 0, XFS_INLINE_EXTS *
+ sizeof(xfs_bmbt_rec_t));
+ }
+ ifp->if_real_bytes = new_size;
+}
+
+/*
+ * Resize an extent indirection array to new_size bytes.
+ */
+STATIC void
+xfs_iext_realloc_indirect(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ int new_size) /* new indirection array size */
+{
+ int nlists; /* number of irec's (ex lists) */
+ int size; /* current indirection array size */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+ size = nlists * sizeof(xfs_ext_irec_t);
+ ASSERT(ifp->if_real_bytes);
+ ASSERT((new_size >= 0) && (new_size != size));
+ if (new_size == 0) {
+ xfs_iext_destroy(ifp);
+ } else {
+ ifp->if_u1.if_ext_irec = (xfs_ext_irec_t *)
+ kmem_realloc(ifp->if_u1.if_ext_irec,
+ new_size, size, KM_NOFS);
+ }
+}
+
+/*
+ * Switch from indirection array to linear (direct) extent allocations.
+ */
+STATIC void
+xfs_iext_indirect_to_direct(
+ xfs_ifork_t *ifp) /* inode fork pointer */
+{
+ xfs_bmbt_rec_host_t *ep; /* extent record pointer */
+ xfs_extnum_t nextents; /* number of extents in file */
+ int size; /* size of file extents */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
+ ASSERT(nextents <= XFS_LINEAR_EXTS);
+ size = nextents * sizeof(xfs_bmbt_rec_t);
+
+ xfs_iext_irec_compact_pages(ifp);
+ ASSERT(ifp->if_real_bytes == XFS_IEXT_BUFSZ);
+
+ ep = ifp->if_u1.if_ext_irec->er_extbuf;
+ kmem_free(ifp->if_u1.if_ext_irec);
+ ifp->if_flags &= ~XFS_IFEXTIREC;
+ ifp->if_u1.if_extents = ep;
+ ifp->if_bytes = size;
+ if (nextents < XFS_LINEAR_EXTS) {
+ xfs_iext_realloc_direct(ifp, size);
+ }
+}
+
+/*
+ * Free incore file extents.
+ */
+void
+xfs_iext_destroy(
+ xfs_ifork_t *ifp) /* inode fork pointer */
+{
+ if (ifp->if_flags & XFS_IFEXTIREC) {
+ int erp_idx;
+ int nlists;
+
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+ for (erp_idx = nlists - 1; erp_idx >= 0 ; erp_idx--) {
+ xfs_iext_irec_remove(ifp, erp_idx);
+ }
+ ifp->if_flags &= ~XFS_IFEXTIREC;
+ } else if (ifp->if_real_bytes) {
+ kmem_free(ifp->if_u1.if_extents);
+ } else if (ifp->if_bytes) {
+ memset(ifp->if_u2.if_inline_ext, 0, XFS_INLINE_EXTS *
+ sizeof(xfs_bmbt_rec_t));
+ }
+ ifp->if_u1.if_extents = NULL;
+ ifp->if_real_bytes = 0;
+ ifp->if_bytes = 0;
+}
+
+/*
+ * Return a pointer to the extent record for file system block bno.
+ */
+xfs_bmbt_rec_host_t * /* pointer to found extent record */
+xfs_iext_bno_to_ext(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ xfs_fileoff_t bno, /* block number to search for */
+ xfs_extnum_t *idxp) /* index of target extent */
+{
+ xfs_bmbt_rec_host_t *base; /* pointer to first extent */
+ xfs_filblks_t blockcount = 0; /* number of blocks in extent */
+ xfs_bmbt_rec_host_t *ep = NULL; /* pointer to target extent */
+ xfs_ext_irec_t *erp = NULL; /* indirection array pointer */
+ int high; /* upper boundary in search */
+ xfs_extnum_t idx = 0; /* index of target extent */
+ int low; /* lower boundary in search */
+ xfs_extnum_t nextents; /* number of file extents */
+ xfs_fileoff_t startoff = 0; /* start offset of extent */
+
+ nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
+ if (nextents == 0) {
+ *idxp = 0;
+ return NULL;
+ }
+ low = 0;
+ if (ifp->if_flags & XFS_IFEXTIREC) {
+ /* Find target extent list */
+ int erp_idx = 0;
+ erp = xfs_iext_bno_to_irec(ifp, bno, &erp_idx);
+ base = erp->er_extbuf;
+ high = erp->er_extcount - 1;
+ } else {
+ base = ifp->if_u1.if_extents;
+ high = nextents - 1;
+ }
+ /* Binary search extent records */
+ while (low <= high) {
+ idx = (low + high) >> 1;
+ ep = base + idx;
+ startoff = xfs_bmbt_get_startoff(ep);
+ blockcount = xfs_bmbt_get_blockcount(ep);
+ if (bno < startoff) {
+ high = idx - 1;
+ } else if (bno >= startoff + blockcount) {
+ low = idx + 1;
+ } else {
+ /* Convert back to file-based extent index */
+ if (ifp->if_flags & XFS_IFEXTIREC) {
+ idx += erp->er_extoff;
+ }
+ *idxp = idx;
+ return ep;
+ }
+ }
+ /* Convert back to file-based extent index */
+ if (ifp->if_flags & XFS_IFEXTIREC) {
+ idx += erp->er_extoff;
+ }
+ if (bno >= startoff + blockcount) {
+ if (++idx == nextents) {
+ ep = NULL;
+ } else {
+ ep = xfs_iext_get_ext(ifp, idx);
+ }
+ }
+ *idxp = idx;
+ return ep;
+}
+
+/*
+ * Return a pointer to the indirection array entry containing the
+ * extent record for filesystem block bno. Store the index of the
+ * target irec in *erp_idxp.
+ */
+xfs_ext_irec_t * /* pointer to found extent record */
+xfs_iext_bno_to_irec(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ xfs_fileoff_t bno, /* block number to search for */
+ int *erp_idxp) /* irec index of target ext list */
+{
+ xfs_ext_irec_t *erp = NULL; /* indirection array pointer */
+ xfs_ext_irec_t *erp_next; /* next indirection array entry */
+ int erp_idx; /* indirection array index */
+ int nlists; /* number of extent irec's (lists) */
+ int high; /* binary search upper limit */
+ int low; /* binary search lower limit */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+ erp_idx = 0;
+ low = 0;
+ high = nlists - 1;
+ while (low <= high) {
+ erp_idx = (low + high) >> 1;
+ erp = &ifp->if_u1.if_ext_irec[erp_idx];
+ erp_next = erp_idx < nlists - 1 ? erp + 1 : NULL;
+ if (bno < xfs_bmbt_get_startoff(erp->er_extbuf)) {
+ high = erp_idx - 1;
+ } else if (erp_next && bno >=
+ xfs_bmbt_get_startoff(erp_next->er_extbuf)) {
+ low = erp_idx + 1;
+ } else {
+ break;
+ }
+ }
+ *erp_idxp = erp_idx;
+ return erp;
+}
+
+/*
+ * Return a pointer to the indirection array entry containing the
+ * extent record at file extent index *idxp. Store the index of the
+ * target irec in *erp_idxp and store the page index of the target
+ * extent record in *idxp.
+ */
+xfs_ext_irec_t *
+xfs_iext_idx_to_irec(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ xfs_extnum_t *idxp, /* extent index (file -> page) */
+ int *erp_idxp, /* pointer to target irec */
+ int realloc) /* new bytes were just added */
+{
+ xfs_ext_irec_t *prev; /* pointer to previous irec */
+ xfs_ext_irec_t *erp = NULL; /* pointer to current irec */
+ int erp_idx; /* indirection array index */
+ int nlists; /* number of irec's (ex lists) */
+ int high; /* binary search upper limit */
+ int low; /* binary search lower limit */
+ xfs_extnum_t page_idx = *idxp; /* extent index in target list */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ ASSERT(page_idx >= 0);
+ ASSERT(page_idx <= ifp->if_bytes / sizeof(xfs_bmbt_rec_t));
+ ASSERT(page_idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t) || realloc);
+
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+ erp_idx = 0;
+ low = 0;
+ high = nlists - 1;
+
+ /* Binary search extent irec's */
+ while (low <= high) {
+ erp_idx = (low + high) >> 1;
+ erp = &ifp->if_u1.if_ext_irec[erp_idx];
+ prev = erp_idx > 0 ? erp - 1 : NULL;
+ if (page_idx < erp->er_extoff || (page_idx == erp->er_extoff &&
+ realloc && prev && prev->er_extcount < XFS_LINEAR_EXTS)) {
+ high = erp_idx - 1;
+ } else if (page_idx > erp->er_extoff + erp->er_extcount ||
+ (page_idx == erp->er_extoff + erp->er_extcount &&
+ !realloc)) {
+ low = erp_idx + 1;
+ } else if (page_idx == erp->er_extoff + erp->er_extcount &&
+ erp->er_extcount == XFS_LINEAR_EXTS) {
+ ASSERT(realloc);
+ page_idx = 0;
+ erp_idx++;
+ erp = erp_idx < nlists ? erp + 1 : NULL;
+ break;
+ } else {
+ page_idx -= erp->er_extoff;
+ break;
+ }
+ }
+ *idxp = page_idx;
+ *erp_idxp = erp_idx;
+ return(erp);
+}
+
+/*
+ * Allocate and initialize an indirection array once the space needed
+ * for incore extents increases above XFS_IEXT_BUFSZ.
+ */
+void
+xfs_iext_irec_init(
+ xfs_ifork_t *ifp) /* inode fork pointer */
+{
+ xfs_ext_irec_t *erp; /* indirection array pointer */
+ xfs_extnum_t nextents; /* number of extents in file */
+
+ ASSERT(!(ifp->if_flags & XFS_IFEXTIREC));
+ nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
+ ASSERT(nextents <= XFS_LINEAR_EXTS);
+
+ erp = kmem_alloc(sizeof(xfs_ext_irec_t), KM_NOFS);
+
+ if (nextents == 0) {
+ ifp->if_u1.if_extents = kmem_alloc(XFS_IEXT_BUFSZ, KM_NOFS);
+ } else if (!ifp->if_real_bytes) {
+ xfs_iext_inline_to_direct(ifp, XFS_IEXT_BUFSZ);
+ } else if (ifp->if_real_bytes < XFS_IEXT_BUFSZ) {
+ xfs_iext_realloc_direct(ifp, XFS_IEXT_BUFSZ);
+ }
+ erp->er_extbuf = ifp->if_u1.if_extents;
+ erp->er_extcount = nextents;
+ erp->er_extoff = 0;
+
+ ifp->if_flags |= XFS_IFEXTIREC;
+ ifp->if_real_bytes = XFS_IEXT_BUFSZ;
+ ifp->if_bytes = nextents * sizeof(xfs_bmbt_rec_t);
+ ifp->if_u1.if_ext_irec = erp;
+
+ return;
+}
+
+/*
+ * Allocate and initialize a new entry in the indirection array.
+ */
+xfs_ext_irec_t *
+xfs_iext_irec_new(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ int erp_idx) /* index for new irec */
+{
+ xfs_ext_irec_t *erp; /* indirection array pointer */
+ int i; /* loop counter */
+ int nlists; /* number of irec's (ex lists) */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+
+ /* Resize indirection array */
+ xfs_iext_realloc_indirect(ifp, ++nlists *
+ sizeof(xfs_ext_irec_t));
+ /*
+ * Move records down in the array so the
+ * new page can use erp_idx.
+ */
+ erp = ifp->if_u1.if_ext_irec;
+ for (i = nlists - 1; i > erp_idx; i--) {
+ memmove(&erp[i], &erp[i-1], sizeof(xfs_ext_irec_t));
+ }
+ ASSERT(i == erp_idx);
+
+ /* Initialize new extent record */
+ erp = ifp->if_u1.if_ext_irec;
+ erp[erp_idx].er_extbuf = kmem_alloc(XFS_IEXT_BUFSZ, KM_NOFS);
+ ifp->if_real_bytes = nlists * XFS_IEXT_BUFSZ;
+ memset(erp[erp_idx].er_extbuf, 0, XFS_IEXT_BUFSZ);
+ erp[erp_idx].er_extcount = 0;
+ erp[erp_idx].er_extoff = erp_idx > 0 ?
+ erp[erp_idx-1].er_extoff + erp[erp_idx-1].er_extcount : 0;
+ return (&erp[erp_idx]);
+}
+
+/*
+ * Remove a record from the indirection array.
+ */
+void
+xfs_iext_irec_remove(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ int erp_idx) /* irec index to remove */
+{
+ xfs_ext_irec_t *erp; /* indirection array pointer */
+ int i; /* loop counter */
+ int nlists; /* number of irec's (ex lists) */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+ erp = &ifp->if_u1.if_ext_irec[erp_idx];
+ if (erp->er_extbuf) {
+ xfs_iext_irec_update_extoffs(ifp, erp_idx + 1,
+ -erp->er_extcount);
+ kmem_free(erp->er_extbuf);
+ }
+ /* Compact extent records */
+ erp = ifp->if_u1.if_ext_irec;
+ for (i = erp_idx; i < nlists - 1; i++) {
+ memmove(&erp[i], &erp[i+1], sizeof(xfs_ext_irec_t));
+ }
+ /*
+ * Manually free the last extent record from the indirection
+ * array. A call to xfs_iext_realloc_indirect() with a size
+ * of zero would result in a call to xfs_iext_destroy() which
+ * would in turn call this function again, creating a nasty
+ * infinite loop.
+ */
+ if (--nlists) {
+ xfs_iext_realloc_indirect(ifp,
+ nlists * sizeof(xfs_ext_irec_t));
+ } else {
+ kmem_free(ifp->if_u1.if_ext_irec);
+ }
+ ifp->if_real_bytes = nlists * XFS_IEXT_BUFSZ;
+}
+
+/*
+ * This is called to clean up large amounts of unused memory allocated
+ * by the indirection array. Before compacting anything though, verify
+ * that the indirection array is still needed and switch back to the
+ * linear extent list (or even the inline buffer) if possible. The
+ * compaction policy is as follows:
+ *
+ * Full Compaction: Extents fit into a single page (or inline buffer)
+ * Partial Compaction: Extents occupy less than 50% of allocated space
+ * No Compaction: Extents occupy at least 50% of allocated space
+ */
+void
+xfs_iext_irec_compact(
+ xfs_ifork_t *ifp) /* inode fork pointer */
+{
+ xfs_extnum_t nextents; /* number of extents in file */
+ int nlists; /* number of irec's (ex lists) */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+ nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
+
+ if (nextents == 0) {
+ xfs_iext_destroy(ifp);
+ } else if (nextents <= XFS_INLINE_EXTS) {
+ xfs_iext_indirect_to_direct(ifp);
+ xfs_iext_direct_to_inline(ifp, nextents);
+ } else if (nextents <= XFS_LINEAR_EXTS) {
+ xfs_iext_indirect_to_direct(ifp);
+ } else if (nextents < (nlists * XFS_LINEAR_EXTS) >> 1) {
+ xfs_iext_irec_compact_pages(ifp);
+ }
+}
+
+/*
+ * Combine extents from neighboring extent pages.
+ */
+void
+xfs_iext_irec_compact_pages(
+ xfs_ifork_t *ifp) /* inode fork pointer */
+{
+ xfs_ext_irec_t *erp, *erp_next;/* pointers to irec entries */
+ int erp_idx = 0; /* indirection array index */
+ int nlists; /* number of irec's (ex lists) */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+ while (erp_idx < nlists - 1) {
+ erp = &ifp->if_u1.if_ext_irec[erp_idx];
+ erp_next = erp + 1;
+ if (erp_next->er_extcount <=
+ (XFS_LINEAR_EXTS - erp->er_extcount)) {
+ memcpy(&erp->er_extbuf[erp->er_extcount],
+ erp_next->er_extbuf, erp_next->er_extcount *
+ sizeof(xfs_bmbt_rec_t));
+ erp->er_extcount += erp_next->er_extcount;
+ /*
+ * Free page before removing extent record
+ * so er_extoffs don't get modified in
+ * xfs_iext_irec_remove.
+ */
+ kmem_free(erp_next->er_extbuf);
+ erp_next->er_extbuf = NULL;
+ xfs_iext_irec_remove(ifp, erp_idx + 1);
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+ } else {
+ erp_idx++;
+ }
+ }
+}
+
+/*
+ * This is called to update the er_extoff field in the indirection
+ * array when extents have been added or removed from one of the
+ * extent lists. erp_idx contains the irec index to begin updating
+ * at and ext_diff contains the number of extents that were added
+ * or removed.
+ */
+void
+xfs_iext_irec_update_extoffs(
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ int erp_idx, /* irec index to update */
+ int ext_diff) /* number of new extents */
+{
+ int i; /* loop counter */
+ int nlists; /* number of irec's (ex lists */
+
+ ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+ nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+ for (i = erp_idx; i < nlists; i++) {
+ ifp->if_u1.if_ext_irec[i].er_extoff += ext_diff;
+ }
+}
+
diff --git a/fs/xfs/xfs_inode_fork.h b/fs/xfs/xfs_inode_fork.h
new file mode 100644
index 0000000..e70eaf4
--- /dev/null
+++ b/fs/xfs/xfs_inode_fork.h
@@ -0,0 +1,169 @@
+/*
+ * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_INODE_FORK_H__
+#define __XFS_INODE_FORK_H__
+
+struct xfs_inode_log_item;
+
+/*
+ * Fork identifiers.
+ */
+#define XFS_DATA_FORK 0
+#define XFS_ATTR_FORK 1
+
+/*
+ * The following xfs_ext_irec_t struct introduces a second (top) level
+ * to the in-core extent allocation scheme. These structs are allocated
+ * in a contiguous block, creating an indirection array where each entry
+ * (irec) contains a pointer to a buffer of in-core extent records which
+ * it manages. Each extent buffer is 4k in size, since 4k is the system
+ * page size on Linux i386 and systems with larger page sizes don't seem
+ * to gain much, if anything, by using their native page size as the
+ * extent buffer size. Also, using 4k extent buffers everywhere provides
+ * a consistent interface for CXFS across different platforms.
+ *
+ * There is currently no limit on the number of irec's (extent lists)
+ * allowed, so heavily fragmented files may require an indirection array
+ * which spans multiple system pages of memory. The number of extents
+ * which would require this amount of contiguous memory is very large
+ * and should not cause problems in the foreseeable future. However,
+ * if the memory needed for the contiguous array ever becomes a problem,
+ * it is possible that a third level of indirection may be required.
+ */
+typedef struct xfs_ext_irec {
+ xfs_bmbt_rec_host_t *er_extbuf; /* block of extent records */
+ xfs_extnum_t er_extoff; /* extent offset in file */
+ xfs_extnum_t er_extcount; /* number of extents in page/block */
+} xfs_ext_irec_t;
+
+/*
+ * File incore extent information, present for each of data & attr forks.
+ */
+#define XFS_IEXT_BUFSZ 4096
+#define XFS_LINEAR_EXTS (XFS_IEXT_BUFSZ / (uint)sizeof(xfs_bmbt_rec_t))
+#define XFS_INLINE_EXTS 2
+#define XFS_INLINE_DATA 32
+typedef struct xfs_ifork {
+ int if_bytes; /* bytes in if_u1 */
+ int if_real_bytes; /* bytes allocated in if_u1 */
+ struct xfs_btree_block *if_broot; /* file's incore btree root */
+ short if_broot_bytes; /* bytes allocated for root */
+ unsigned char if_flags; /* per-fork flags */
+ union {
+ xfs_bmbt_rec_host_t *if_extents;/* linear map file exts */
+ xfs_ext_irec_t *if_ext_irec; /* irec map file exts */
+ char *if_data; /* inline file data */
+ } if_u1;
+ union {
+ xfs_bmbt_rec_host_t if_inline_ext[XFS_INLINE_EXTS];
+ /* very small file extents */
+ char if_inline_data[XFS_INLINE_DATA];
+ /* very small file data */
+ xfs_dev_t if_rdev; /* dev number if special */
+ uuid_t if_uuid; /* mount point value */
+ } if_u2;
+} xfs_ifork_t;
+
+/*
+ * Per-fork incore inode flags.
+ */
+#define XFS_IFINLINE 0x01 /* Inline data is read in */
+#define XFS_IFEXTENTS 0x02 /* All extent pointers are read in */
+#define XFS_IFBROOT 0x04 /* i_broot points to the bmap b-tree root */
+#define XFS_IFEXTIREC 0x08 /* Indirection array of extent blocks */
+
+/*
+ * Fork handling.
+ */
+
+#define XFS_IFORK_Q(ip) ((ip)->i_d.di_forkoff != 0)
+#define XFS_IFORK_BOFF(ip) ((int)((ip)->i_d.di_forkoff << 3))
+
+#define XFS_IFORK_PTR(ip,w) \
+ ((w) == XFS_DATA_FORK ? \
+ &(ip)->i_df : \
+ (ip)->i_afp)
+#define XFS_IFORK_DSIZE(ip) \
+ (XFS_IFORK_Q(ip) ? \
+ XFS_IFORK_BOFF(ip) : \
+ XFS_LITINO((ip)->i_mount, (ip)->i_d.di_version))
+#define XFS_IFORK_ASIZE(ip) \
+ (XFS_IFORK_Q(ip) ? \
+ XFS_LITINO((ip)->i_mount, (ip)->i_d.di_version) - \
+ XFS_IFORK_BOFF(ip) : \
+ 0)
+#define XFS_IFORK_SIZE(ip,w) \
+ ((w) == XFS_DATA_FORK ? \
+ XFS_IFORK_DSIZE(ip) : \
+ XFS_IFORK_ASIZE(ip))
+#define XFS_IFORK_FORMAT(ip,w) \
+ ((w) == XFS_DATA_FORK ? \
+ (ip)->i_d.di_format : \
+ (ip)->i_d.di_aformat)
+#define XFS_IFORK_FMT_SET(ip,w,n) \
+ ((w) == XFS_DATA_FORK ? \
+ ((ip)->i_d.di_format = (n)) : \
+ ((ip)->i_d.di_aformat = (n)))
+#define XFS_IFORK_NEXTENTS(ip,w) \
+ ((w) == XFS_DATA_FORK ? \
+ (ip)->i_d.di_nextents : \
+ (ip)->i_d.di_anextents)
+#define XFS_IFORK_NEXT_SET(ip,w,n) \
+ ((w) == XFS_DATA_FORK ? \
+ ((ip)->i_d.di_nextents = (n)) : \
+ ((ip)->i_d.di_anextents = (n)))
+#define XFS_IFORK_MAXEXT(ip, w) \
+ (XFS_IFORK_SIZE(ip, w) / sizeof(xfs_bmbt_rec_t))
+
+
+int xfs_iformat_fork(struct xfs_inode *, struct xfs_dinode *);
+void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
+ struct xfs_inode_log_item *, int,
+ struct xfs_buf *);
+void xfs_idestroy_fork(struct xfs_inode *, int);
+void xfs_idata_realloc(struct xfs_inode *, int, int);
+void xfs_iroot_realloc(struct xfs_inode *, int, int);
+int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int);
+int xfs_iextents_copy(struct xfs_inode *, xfs_bmbt_rec_t *, int);
+
+xfs_bmbt_rec_host_t *xfs_iext_get_ext(xfs_ifork_t *, xfs_extnum_t);
+void xfs_iext_insert(struct xfs_inode *, xfs_extnum_t, xfs_extnum_t,
+ struct xfs_bmbt_irec *, int);
+void xfs_iext_add(xfs_ifork_t *, xfs_extnum_t, int);
+void xfs_iext_add_indirect_multi(xfs_ifork_t *, int, xfs_extnum_t, int);
+void xfs_iext_remove(struct xfs_inode *, xfs_extnum_t, int, int);
+void xfs_iext_remove_inline(xfs_ifork_t *, xfs_extnum_t, int);
+void xfs_iext_remove_direct(xfs_ifork_t *, xfs_extnum_t, int);
+void xfs_iext_remove_indirect(xfs_ifork_t *, xfs_extnum_t, int);
+void xfs_iext_direct_to_inline(xfs_ifork_t *, xfs_extnum_t);
+void xfs_iext_inline_to_direct(xfs_ifork_t *, int);
+void xfs_iext_destroy(xfs_ifork_t *);
+xfs_bmbt_rec_host_t *xfs_iext_bno_to_ext(xfs_ifork_t *, xfs_fileoff_t, int *);
+xfs_ext_irec_t *xfs_iext_bno_to_irec(xfs_ifork_t *, xfs_fileoff_t, int *);
+xfs_ext_irec_t *xfs_iext_idx_to_irec(xfs_ifork_t *, xfs_extnum_t *, int *, int);
+void xfs_iext_irec_init(xfs_ifork_t *);
+xfs_ext_irec_t *xfs_iext_irec_new(xfs_ifork_t *, int);
+void xfs_iext_irec_remove(xfs_ifork_t *, int);
+void xfs_iext_irec_compact(xfs_ifork_t *);
+void xfs_iext_irec_compact_pages(xfs_ifork_t *);
+void xfs_iext_irec_compact_full(xfs_ifork_t *);
+void xfs_iext_irec_update_extoffs(xfs_ifork_t *, int, int);
+
+extern struct kmem_zone *xfs_ifork_zone;
+
+#endif /* __XFS_INODE_FORK_H__ */
diff --git a/fs/xfs/xfs_inode_ops.h b/fs/xfs/xfs_inode_ops.h
index 0c34561..ecaa019 100644
--- a/fs/xfs/xfs_inode_ops.h
+++ b/fs/xfs/xfs_inode_ops.h
@@ -322,6 +322,7 @@ int xfs_rename(struct xfs_inode *src_dp, struct xfs_name *src_name,
struct xfs_name *target_name,
struct xfs_inode *target_ip);
int xfs_free_eofblocks(struct xfs_mount *, struct xfs_inode *, bool);
+bool xfs_can_free_eofblocks(struct xfs_inode *, bool);
void xfs_ilock(xfs_inode_t *, uint);
int xfs_ilock_nowait(xfs_inode_t *, uint);
@@ -344,6 +345,9 @@ int xfs_iunlink(struct xfs_trans *, xfs_inode_t *);
void xfs_iext_realloc(xfs_inode_t *, int, int);
void xfs_iunpin_wait(xfs_inode_t *);
+
+#define xfs_ipincount(ip) ((unsigned int) atomic_read(&ip->i_pincount))
+
int xfs_iflush(struct xfs_inode *, struct xfs_buf **);
void xfs_lock_inodes(xfs_inode_t **, int, uint);
void xfs_lock_two_inodes(xfs_inode_t *, xfs_inode_t *, uint);
@@ -375,4 +379,6 @@ int xfs_droplink(struct xfs_trans *, struct xfs_inode *);
int xfs_bumplink(struct xfs_trans *, struct xfs_inode *);
void xfs_bump_ino_vers2(struct xfs_trans *, struct xfs_inode *);
+extern struct kmem_zone *xfs_inode_zone;
+
#endif /* __XFS_INODE_OPS_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 33/60] xfs: move unrealted definitions out of xfs_inode.h
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (31 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 32/60] xfs: move inode fork definitions to a new header file Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 34/60] xfs: introduce xfs_inode_buf.c for inode buffer operations Dave Chinner
` (28 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_ialloc.c | 1 +
fs/xfs/xfs_icache.h | 7 +++++++
fs/xfs/xfs_inode.h | 38 --------------------------------------
fs/xfs/xfs_trans.h | 7 +++++++
4 files changed, 15 insertions(+), 38 deletions(-)
diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index 96a53bd..2de4b4c 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -39,6 +39,7 @@
#include "xfs_cksum.h"
#include "xfs_buf_item.h"
#include "xfs_icreate_item.h"
+#include "xfs_icache.h"
/*
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index e0f138c..20179ae 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -24,6 +24,13 @@ struct xfs_perag;
#define SYNC_WAIT 0x0001 /* wait for i/o to complete */
#define SYNC_TRYLOCK 0x0002 /* only try to lock inodes */
+/*
+ * Flags for xfs_iget()
+ */
+#define XFS_IGET_CREATE 0x1
+#define XFS_IGET_UNTRUSTED 0x2
+#define XFS_IGET_DONTCACHE 0x4
+
int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino,
uint flags, uint lock_flags, xfs_inode_t **ipp);
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index a55d68b..3f8c56a 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -32,53 +32,15 @@ struct xfs_imap {
ushort im_boffset; /* inode offset in block in bytes */
};
-/*
- * This is the xfs in-core inode structure.
- * Most of the on-disk inode is embedded in the i_d field.
- *
- * The extent pointers/inline file space, however, are managed
- * separately. The memory for this information is pointed to by
- * the if_u1 unions depending on the type of the data.
- * This is used to linearize the array of extents for fast in-core
- * access. This is used until the file's number of extents
- * surpasses XFS_MAX_INCORE_EXTENTS, at which point all extent pointers
- * are accessed through the buffer cache.
- *
- * Other state kept in the in-core inode is used for identification,
- * locking, transactional updating, etc of the inode.
- *
- * Generally, we do not want to hold the i_rlock while holding the
- * i_ilock. Hierarchy is i_iolock followed by i_rlock.
- *
- * xfs_iptr_t contains all the inode fields up to and including the
- * i_mnext and i_mprev fields, it is used as a marker in the inode
- * chain off the mount structure by xfs_sync calls.
- */
-
#include "xfs_inode_fork.h"
#include "xfs_inode_item_format.h"
-/*
- * Flags for xfs_ichgtime().
- */
-#define XFS_ICHGTIME_MOD 0x1 /* data fork modification timestamp */
-#define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */
-#define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */
-
-
#ifdef __KERNEL__
#include "xfs_inode_ops.h"
#endif /* __KERNEL__ */
-/*
- * Flags for xfs_iget()
- */
-#define XFS_IGET_CREATE 0x1
-#define XFS_IGET_UNTRUSTED 0x2
-#define XFS_IGET_DONTCACHE 0x4
-
int xfs_imap_to_bp(struct xfs_mount *, struct xfs_trans *,
struct xfs_imap *, struct xfs_dinode **,
struct xfs_buf **, uint, uint);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 21cafaa..4654b58 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -227,6 +227,13 @@ struct xfs_log_item_desc {
#define XFS_ATTR_BTREE_REF 1
#define XFS_DQUOT_REF 1
+/*
+ * Flags for xfs_trans_ichgtime().
+ */
+#define XFS_ICHGTIME_MOD 0x1 /* data fork modification timestamp */
+#define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */
+#define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */
+
#ifdef __KERNEL__
struct xfs_buf;
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 34/60] xfs: introduce xfs_inode_buf.c for inode buffer operations
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (32 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 33/60] xfs: move unrealted definitions out of xfs_inode.h Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 35/60] xfs: start repopulating xfs_inode.[ch] with kernel code Dave Chinner
` (27 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The only thing remaining in xfs_inode.[ch] are the operations that
read, write or verify physical inodes in their underlying buffers.
Move all this code to xfs_inode_buf.[ch] and so we can stop sharing
xfs_inode.[ch] with userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_inode.c | 412 --------------------------------------------
fs/xfs/xfs_inode.h | 29 +---
fs/xfs/xfs_inode_buf.c | 444 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_inode_buf.h | 52 ++++++
5 files changed, 498 insertions(+), 440 deletions(-)
create mode 100644 fs/xfs/xfs_inode_buf.c
create mode 100644 fs/xfs/xfs_inode_buf.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 00723b7..d545286 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -78,6 +78,7 @@ xfs-y += xfs_alloc.o \
xfs_icreate_item.o \
xfs_inode.o \
xfs_inode_fork.o \
+ xfs_inode_buf.o \
xfs_log_recover.o \
xfs_sb.o \
xfs_symlink.o \
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index d9f8d37..c032da6 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -48,415 +48,3 @@
kmem_zone_t *xfs_inode_zone;
-/*
- * Check that none of the inode's in the buffer have a next
- * unlinked field of 0.
- */
-#if defined(DEBUG)
-STATIC void
-xfs_inobp_check(
- xfs_mount_t *mp,
- xfs_buf_t *bp)
-{
- int i;
- int j;
- xfs_dinode_t *dip;
-
- j = mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog;
-
- for (i = 0; i < j; i++) {
- dip = (xfs_dinode_t *)xfs_buf_offset(bp,
- i * mp->m_sb.sb_inodesize);
- if (!dip->di_next_unlinked) {
- xfs_alert(mp,
- "Detected bogus zero next_unlinked field in incore inode buffer 0x%p.",
- bp);
- ASSERT(dip->di_next_unlinked);
- }
- }
-}
-#endif
-
-static void
-xfs_inode_buf_verify(
- struct xfs_buf *bp)
-{
- struct xfs_mount *mp = bp->b_target->bt_mount;
- int i;
- int ni;
-
- /*
- * Validate the magic number and version of every inode in the buffer
- */
- ni = XFS_BB_TO_FSB(mp, bp->b_length) * mp->m_sb.sb_inopblock;
- for (i = 0; i < ni; i++) {
- int di_ok;
- xfs_dinode_t *dip;
-
- dip = (struct xfs_dinode *)xfs_buf_offset(bp,
- (i << mp->m_sb.sb_inodelog));
- di_ok = dip->di_magic == cpu_to_be16(XFS_DINODE_MAGIC) &&
- XFS_DINODE_GOOD_VERSION(dip->di_version);
- if (unlikely(XFS_TEST_ERROR(!di_ok, mp,
- XFS_ERRTAG_ITOBP_INOTOBP,
- XFS_RANDOM_ITOBP_INOTOBP))) {
- xfs_buf_ioerror(bp, EFSCORRUPTED);
- XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_HIGH,
- mp, dip);
-#ifdef DEBUG
- xfs_emerg(mp,
- "bad inode magic/vsn daddr %lld #%d (magic=%x)",
- (unsigned long long)bp->b_bn, i,
- be16_to_cpu(dip->di_magic));
- ASSERT(0);
-#endif
- }
- }
- xfs_inobp_check(mp, bp);
-}
-
-
-static void
-xfs_inode_buf_read_verify(
- struct xfs_buf *bp)
-{
- xfs_inode_buf_verify(bp);
-}
-
-static void
-xfs_inode_buf_write_verify(
- struct xfs_buf *bp)
-{
- xfs_inode_buf_verify(bp);
-}
-
-const struct xfs_buf_ops xfs_inode_buf_ops = {
- .verify_read = xfs_inode_buf_read_verify,
- .verify_write = xfs_inode_buf_write_verify,
-};
-
-
-/*
- * This routine is called to map an inode to the buffer containing the on-disk
- * version of the inode. It returns a pointer to the buffer containing the
- * on-disk inode in the bpp parameter, and in the dipp parameter it returns a
- * pointer to the on-disk inode within that buffer.
- *
- * If a non-zero error is returned, then the contents of bpp and dipp are
- * undefined.
- */
-int
-xfs_imap_to_bp(
- struct xfs_mount *mp,
- struct xfs_trans *tp,
- struct xfs_imap *imap,
- struct xfs_dinode **dipp,
- struct xfs_buf **bpp,
- uint buf_flags,
- uint iget_flags)
-{
- struct xfs_buf *bp;
- int error;
-
- buf_flags |= XBF_UNMAPPED;
- error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap->im_blkno,
- (int)imap->im_len, buf_flags, &bp,
- &xfs_inode_buf_ops);
- if (error) {
- if (error == EAGAIN) {
- ASSERT(buf_flags & XBF_TRYLOCK);
- return error;
- }
-
- if (error == EFSCORRUPTED &&
- (iget_flags & XFS_IGET_UNTRUSTED))
- return XFS_ERROR(EINVAL);
-
- xfs_warn(mp, "%s: xfs_trans_read_buf() returned error %d.",
- __func__, error);
- return error;
- }
-
- *bpp = bp;
- *dipp = (struct xfs_dinode *)xfs_buf_offset(bp, imap->im_boffset);
- return 0;
-}
-
-STATIC void
-xfs_dinode_from_disk(
- xfs_icdinode_t *to,
- xfs_dinode_t *from)
-{
- to->di_magic = be16_to_cpu(from->di_magic);
- to->di_mode = be16_to_cpu(from->di_mode);
- to->di_version = from ->di_version;
- to->di_format = from->di_format;
- to->di_onlink = be16_to_cpu(from->di_onlink);
- to->di_uid = be32_to_cpu(from->di_uid);
- to->di_gid = be32_to_cpu(from->di_gid);
- to->di_nlink = be32_to_cpu(from->di_nlink);
- to->di_projid_lo = be16_to_cpu(from->di_projid_lo);
- to->di_projid_hi = be16_to_cpu(from->di_projid_hi);
- memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
- to->di_flushiter = be16_to_cpu(from->di_flushiter);
- to->di_atime.t_sec = be32_to_cpu(from->di_atime.t_sec);
- to->di_atime.t_nsec = be32_to_cpu(from->di_atime.t_nsec);
- to->di_mtime.t_sec = be32_to_cpu(from->di_mtime.t_sec);
- to->di_mtime.t_nsec = be32_to_cpu(from->di_mtime.t_nsec);
- to->di_ctime.t_sec = be32_to_cpu(from->di_ctime.t_sec);
- to->di_ctime.t_nsec = be32_to_cpu(from->di_ctime.t_nsec);
- to->di_size = be64_to_cpu(from->di_size);
- to->di_nblocks = be64_to_cpu(from->di_nblocks);
- to->di_extsize = be32_to_cpu(from->di_extsize);
- to->di_nextents = be32_to_cpu(from->di_nextents);
- to->di_anextents = be16_to_cpu(from->di_anextents);
- to->di_forkoff = from->di_forkoff;
- to->di_aformat = from->di_aformat;
- to->di_dmevmask = be32_to_cpu(from->di_dmevmask);
- to->di_dmstate = be16_to_cpu(from->di_dmstate);
- to->di_flags = be16_to_cpu(from->di_flags);
- to->di_gen = be32_to_cpu(from->di_gen);
-
- if (to->di_version == 3) {
- to->di_changecount = be64_to_cpu(from->di_changecount);
- to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec);
- to->di_crtime.t_nsec = be32_to_cpu(from->di_crtime.t_nsec);
- to->di_flags2 = be64_to_cpu(from->di_flags2);
- to->di_ino = be64_to_cpu(from->di_ino);
- to->di_lsn = be64_to_cpu(from->di_lsn);
- memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
- uuid_copy(&to->di_uuid, &from->di_uuid);
- }
-}
-
-void
-xfs_dinode_to_disk(
- xfs_dinode_t *to,
- xfs_icdinode_t *from)
-{
- to->di_magic = cpu_to_be16(from->di_magic);
- to->di_mode = cpu_to_be16(from->di_mode);
- to->di_version = from ->di_version;
- to->di_format = from->di_format;
- to->di_onlink = cpu_to_be16(from->di_onlink);
- to->di_uid = cpu_to_be32(from->di_uid);
- to->di_gid = cpu_to_be32(from->di_gid);
- to->di_nlink = cpu_to_be32(from->di_nlink);
- to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
- to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
- memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
- to->di_flushiter = cpu_to_be16(from->di_flushiter);
- to->di_atime.t_sec = cpu_to_be32(from->di_atime.t_sec);
- to->di_atime.t_nsec = cpu_to_be32(from->di_atime.t_nsec);
- to->di_mtime.t_sec = cpu_to_be32(from->di_mtime.t_sec);
- to->di_mtime.t_nsec = cpu_to_be32(from->di_mtime.t_nsec);
- to->di_ctime.t_sec = cpu_to_be32(from->di_ctime.t_sec);
- to->di_ctime.t_nsec = cpu_to_be32(from->di_ctime.t_nsec);
- to->di_size = cpu_to_be64(from->di_size);
- to->di_nblocks = cpu_to_be64(from->di_nblocks);
- to->di_extsize = cpu_to_be32(from->di_extsize);
- to->di_nextents = cpu_to_be32(from->di_nextents);
- to->di_anextents = cpu_to_be16(from->di_anextents);
- to->di_forkoff = from->di_forkoff;
- to->di_aformat = from->di_aformat;
- to->di_dmevmask = cpu_to_be32(from->di_dmevmask);
- to->di_dmstate = cpu_to_be16(from->di_dmstate);
- to->di_flags = cpu_to_be16(from->di_flags);
- to->di_gen = cpu_to_be32(from->di_gen);
-
- if (from->di_version == 3) {
- to->di_changecount = cpu_to_be64(from->di_changecount);
- to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
- to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
- to->di_flags2 = cpu_to_be64(from->di_flags2);
- to->di_ino = cpu_to_be64(from->di_ino);
- to->di_lsn = cpu_to_be64(from->di_lsn);
- memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
- uuid_copy(&to->di_uuid, &from->di_uuid);
- }
-}
-
-static bool
-xfs_dinode_verify(
- struct xfs_mount *mp,
- struct xfs_inode *ip,
- struct xfs_dinode *dip)
-{
- if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC))
- return false;
-
- /* only version 3 or greater inodes are extensively verified here */
- if (dip->di_version < 3)
- return true;
-
- if (!xfs_sb_version_hascrc(&mp->m_sb))
- return false;
- if (!xfs_verify_cksum((char *)dip, mp->m_sb.sb_inodesize,
- offsetof(struct xfs_dinode, di_crc)))
- return false;
- if (be64_to_cpu(dip->di_ino) != ip->i_ino)
- return false;
- if (!uuid_equal(&dip->di_uuid, &mp->m_sb.sb_uuid))
- return false;
- return true;
-}
-
-void
-xfs_dinode_calc_crc(
- struct xfs_mount *mp,
- struct xfs_dinode *dip)
-{
- __uint32_t crc;
-
- if (dip->di_version < 3)
- return;
-
- ASSERT(xfs_sb_version_hascrc(&mp->m_sb));
- crc = xfs_start_cksum((char *)dip, mp->m_sb.sb_inodesize,
- offsetof(struct xfs_dinode, di_crc));
- dip->di_crc = xfs_end_cksum(crc);
-}
-
-/*
- * Read the disk inode attributes into the in-core inode structure.
- *
- * If we are initialising a new inode and we are not utilising the
- * XFS_MOUNT_IKEEP inode cluster mode, we can simple build the new inode core
- * with a random generation number. If we are keeping inodes around, we need to
- * read the inode cluster to get the existing generation number off disk.
- */
-int
-xfs_iread(
- xfs_mount_t *mp,
- xfs_trans_t *tp,
- xfs_inode_t *ip,
- uint iget_flags)
-{
- xfs_buf_t *bp;
- xfs_dinode_t *dip;
- int error;
-
- /*
- * Fill in the location information in the in-core inode.
- */
- error = xfs_imap(mp, tp, ip->i_ino, &ip->i_imap, iget_flags);
- if (error)
- return error;
-
- /* shortcut IO on inode allocation if possible */
- if ((iget_flags & XFS_IGET_CREATE) &&
- !(mp->m_flags & XFS_MOUNT_IKEEP)) {
- /* initialise the on-disk inode core */
- memset(&ip->i_d, 0, sizeof(ip->i_d));
- ip->i_d.di_magic = XFS_DINODE_MAGIC;
- ip->i_d.di_gen = prandom_u32();
- if (xfs_sb_version_hascrc(&mp->m_sb)) {
- ip->i_d.di_version = 3;
- ip->i_d.di_ino = ip->i_ino;
- uuid_copy(&ip->i_d.di_uuid, &mp->m_sb.sb_uuid);
- } else
- ip->i_d.di_version = 2;
- return 0;
- }
-
- /*
- * Get pointers to the on-disk inode and the buffer containing it.
- */
- error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &dip, &bp, 0, iget_flags);
- if (error)
- return error;
-
- /* even unallocated inodes are verified */
- if (!xfs_dinode_verify(mp, ip, dip)) {
- xfs_alert(mp, "%s: validation failed for inode %lld failed",
- __func__, ip->i_ino);
-
- XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, dip);
- error = XFS_ERROR(EFSCORRUPTED);
- goto out_brelse;
- }
-
- /*
- * If the on-disk inode is already linked to a directory
- * entry, copy all of the inode into the in-core inode.
- * xfs_iformat_fork() handles copying in the inode format
- * specific information.
- * Otherwise, just get the truly permanent information.
- */
- if (dip->di_mode) {
- xfs_dinode_from_disk(&ip->i_d, dip);
- error = xfs_iformat_fork(ip, dip);
- if (error) {
-#ifdef DEBUG
- xfs_alert(mp, "%s: xfs_iformat() returned error %d",
- __func__, error);
-#endif /* DEBUG */
- goto out_brelse;
- }
- } else {
- /*
- * Partial initialisation of the in-core inode. Just the bits
- * that xfs_ialloc won't overwrite or relies on being correct.
- */
- ip->i_d.di_magic = be16_to_cpu(dip->di_magic);
- ip->i_d.di_version = dip->di_version;
- ip->i_d.di_gen = be32_to_cpu(dip->di_gen);
- ip->i_d.di_flushiter = be16_to_cpu(dip->di_flushiter);
-
- if (dip->di_version == 3) {
- ip->i_d.di_ino = be64_to_cpu(dip->di_ino);
- uuid_copy(&ip->i_d.di_uuid, &dip->di_uuid);
- }
-
- /*
- * Make sure to pull in the mode here as well in
- * case the inode is released without being used.
- * This ensures that xfs_inactive() will see that
- * the inode is already free and not try to mess
- * with the uninitialized part of it.
- */
- ip->i_d.di_mode = 0;
- }
-
- /*
- * The inode format changed when we moved the link count and
- * made it 32 bits long. If this is an old format inode,
- * convert it in memory to look like a new one. If it gets
- * flushed to disk we will convert back before flushing or
- * logging it. We zero out the new projid field and the old link
- * count field. We'll handle clearing the pad field (the remains
- * of the old uuid field) when we actually convert the inode to
- * the new format. We don't change the version number so that we
- * can distinguish this from a real new format inode.
- */
- if (ip->i_d.di_version == 1) {
- ip->i_d.di_nlink = ip->i_d.di_onlink;
- ip->i_d.di_onlink = 0;
- xfs_set_projid(ip, 0);
- }
-
- ip->i_delayed_blks = 0;
-
- /*
- * Mark the buffer containing the inode as something to keep
- * around for a while. This helps to keep recently accessed
- * meta-data in-core longer.
- */
- xfs_buf_set_ref(bp, XFS_INO_REF);
-
- /*
- * Use xfs_trans_brelse() to release the buffer containing the on-disk
- * inode, because it was acquired with xfs_trans_read_buf() in
- * xfs_imap_to_bp() above. If tp is NULL, this is just a normal
- * brelse(). If we're within a transaction, then xfs_trans_brelse()
- * will only release the buffer if it is not dirty within the
- * transaction. It will be OK to release the buffer in this case,
- * because inodes on disk are never destroyed and we will be locking the
- * new in-core inode before putting it in the cache where other
- * processes can find it. Thus we don't have to worry about the inode
- * being changed just because we released the buffer.
- */
- out_brelse:
- xfs_trans_brelse(tp, bp);
- return error;
-}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 3f8c56a..83a8ce3 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -18,20 +18,10 @@
#ifndef __XFS_INODE_H__
#define __XFS_INODE_H__
-struct posix_acl;
struct xfs_dinode;
struct xfs_inode;
-/*
- * Inode location information. Stored in the inode and passed to
- * xfs_imap_to_bp() to get a buffer and dinode for a given inode.
- */
-struct xfs_imap {
- xfs_daddr_t im_blkno; /* starting BB of inode chunk */
- ushort im_len; /* length in BBs of inode chunk */
- ushort im_boffset; /* inode offset in block in bytes */
-};
-
+#include "xfs_inode_buf.h"
#include "xfs_inode_fork.h"
#include "xfs_inode_item_format.h"
@@ -41,21 +31,4 @@ struct xfs_imap {
#endif /* __KERNEL__ */
-int xfs_imap_to_bp(struct xfs_mount *, struct xfs_trans *,
- struct xfs_imap *, struct xfs_dinode **,
- struct xfs_buf **, uint, uint);
-int xfs_iread(struct xfs_mount *, struct xfs_trans *,
- struct xfs_inode *, uint);
-void xfs_dinode_calc_crc(struct xfs_mount *, struct xfs_dinode *);
-void xfs_dinode_to_disk(struct xfs_dinode *,
- struct xfs_icdinode *);
-
-#if defined(DEBUG)
-void xfs_inobp_check(struct xfs_mount *, struct xfs_buf *);
-#else
-#define xfs_inobp_check(mp, bp)
-#endif /* DEBUG */
-
-extern const struct xfs_buf_ops xfs_inode_buf_ops;
-
#endif /* __XFS_INODE_H__ */
diff --git a/fs/xfs/xfs_inode_buf.c b/fs/xfs/xfs_inode_buf.c
new file mode 100644
index 0000000..99c6acb
--- /dev/null
+++ b/fs/xfs/xfs_inode_buf.c
@@ -0,0 +1,444 @@
+/*
+ * Copyright (c) 2000-2006 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_error.h"
+#include "xfs_cksum.h"
+#include "xfs_icache.h"
+#include "xfs_ialloc.h"
+
+/*
+ * Check that none of the inode's in the buffer have a next
+ * unlinked field of 0.
+ */
+#if defined(DEBUG)
+STATIC void
+xfs_inobp_check(
+ xfs_mount_t *mp,
+ xfs_buf_t *bp)
+{
+ int i;
+ int j;
+ xfs_dinode_t *dip;
+
+ j = mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog;
+
+ for (i = 0; i < j; i++) {
+ dip = (xfs_dinode_t *)xfs_buf_offset(bp,
+ i * mp->m_sb.sb_inodesize);
+ if (!dip->di_next_unlinked) {
+ xfs_alert(mp,
+ "Detected bogus zero next_unlinked field in incore inode buffer 0x%p.",
+ bp);
+ ASSERT(dip->di_next_unlinked);
+ }
+ }
+}
+#endif
+
+static void
+xfs_inode_buf_verify(
+ struct xfs_buf *bp)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+ int i;
+ int ni;
+
+ /*
+ * Validate the magic number and version of every inode in the buffer
+ */
+ ni = XFS_BB_TO_FSB(mp, bp->b_length) * mp->m_sb.sb_inopblock;
+ for (i = 0; i < ni; i++) {
+ int di_ok;
+ xfs_dinode_t *dip;
+
+ dip = (struct xfs_dinode *)xfs_buf_offset(bp,
+ (i << mp->m_sb.sb_inodelog));
+ di_ok = dip->di_magic == cpu_to_be16(XFS_DINODE_MAGIC) &&
+ XFS_DINODE_GOOD_VERSION(dip->di_version);
+ if (unlikely(XFS_TEST_ERROR(!di_ok, mp,
+ XFS_ERRTAG_ITOBP_INOTOBP,
+ XFS_RANDOM_ITOBP_INOTOBP))) {
+ xfs_buf_ioerror(bp, EFSCORRUPTED);
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_HIGH,
+ mp, dip);
+#ifdef DEBUG
+ xfs_emerg(mp,
+ "bad inode magic/vsn daddr %lld #%d (magic=%x)",
+ (unsigned long long)bp->b_bn, i,
+ be16_to_cpu(dip->di_magic));
+ ASSERT(0);
+#endif
+ }
+ }
+ xfs_inobp_check(mp, bp);
+}
+
+static void
+xfs_inode_buf_read_verify(
+ struct xfs_buf *bp)
+{
+ xfs_inode_buf_verify(bp);
+}
+
+static void
+xfs_inode_buf_write_verify(
+ struct xfs_buf *bp)
+{
+ xfs_inode_buf_verify(bp);
+}
+
+const struct xfs_buf_ops xfs_inode_buf_ops = {
+ .verify_read = xfs_inode_buf_read_verify,
+ .verify_write = xfs_inode_buf_write_verify,
+};
+
+/*
+ * This routine is called to map an inode to the buffer containing the on-disk
+ * version of the inode. It returns a pointer to the buffer containing the
+ * on-disk inode in the bpp parameter, and in the dipp parameter it returns a
+ * pointer to the on-disk inode within that buffer.
+ *
+ * If a non-zero error is returned, then the contents of bpp and dipp are
+ * undefined.
+ */
+int
+xfs_imap_to_bp(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_imap *imap,
+ struct xfs_dinode **dipp,
+ struct xfs_buf **bpp,
+ uint buf_flags,
+ uint iget_flags)
+{
+ struct xfs_buf *bp;
+ int error;
+
+ buf_flags |= XBF_UNMAPPED;
+ error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap->im_blkno,
+ (int)imap->im_len, buf_flags, &bp,
+ &xfs_inode_buf_ops);
+ if (error) {
+ if (error == EAGAIN) {
+ ASSERT(buf_flags & XBF_TRYLOCK);
+ return error;
+ }
+
+ if (error == EFSCORRUPTED &&
+ (iget_flags & XFS_IGET_UNTRUSTED))
+ return XFS_ERROR(EINVAL);
+
+ xfs_warn(mp, "%s: xfs_trans_read_buf() returned error %d.",
+ __func__, error);
+ return error;
+ }
+
+ *bpp = bp;
+ *dipp = (struct xfs_dinode *)xfs_buf_offset(bp, imap->im_boffset);
+ return 0;
+}
+
+STATIC void
+xfs_dinode_from_disk(
+ xfs_icdinode_t *to,
+ xfs_dinode_t *from)
+{
+ to->di_magic = be16_to_cpu(from->di_magic);
+ to->di_mode = be16_to_cpu(from->di_mode);
+ to->di_version = from ->di_version;
+ to->di_format = from->di_format;
+ to->di_onlink = be16_to_cpu(from->di_onlink);
+ to->di_uid = be32_to_cpu(from->di_uid);
+ to->di_gid = be32_to_cpu(from->di_gid);
+ to->di_nlink = be32_to_cpu(from->di_nlink);
+ to->di_projid_lo = be16_to_cpu(from->di_projid_lo);
+ to->di_projid_hi = be16_to_cpu(from->di_projid_hi);
+ memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
+ to->di_flushiter = be16_to_cpu(from->di_flushiter);
+ to->di_atime.t_sec = be32_to_cpu(from->di_atime.t_sec);
+ to->di_atime.t_nsec = be32_to_cpu(from->di_atime.t_nsec);
+ to->di_mtime.t_sec = be32_to_cpu(from->di_mtime.t_sec);
+ to->di_mtime.t_nsec = be32_to_cpu(from->di_mtime.t_nsec);
+ to->di_ctime.t_sec = be32_to_cpu(from->di_ctime.t_sec);
+ to->di_ctime.t_nsec = be32_to_cpu(from->di_ctime.t_nsec);
+ to->di_size = be64_to_cpu(from->di_size);
+ to->di_nblocks = be64_to_cpu(from->di_nblocks);
+ to->di_extsize = be32_to_cpu(from->di_extsize);
+ to->di_nextents = be32_to_cpu(from->di_nextents);
+ to->di_anextents = be16_to_cpu(from->di_anextents);
+ to->di_forkoff = from->di_forkoff;
+ to->di_aformat = from->di_aformat;
+ to->di_dmevmask = be32_to_cpu(from->di_dmevmask);
+ to->di_dmstate = be16_to_cpu(from->di_dmstate);
+ to->di_flags = be16_to_cpu(from->di_flags);
+ to->di_gen = be32_to_cpu(from->di_gen);
+
+ if (to->di_version == 3) {
+ to->di_changecount = be64_to_cpu(from->di_changecount);
+ to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec);
+ to->di_crtime.t_nsec = be32_to_cpu(from->di_crtime.t_nsec);
+ to->di_flags2 = be64_to_cpu(from->di_flags2);
+ to->di_ino = be64_to_cpu(from->di_ino);
+ to->di_lsn = be64_to_cpu(from->di_lsn);
+ memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
+ uuid_copy(&to->di_uuid, &from->di_uuid);
+ }
+}
+
+void
+xfs_dinode_to_disk(
+ xfs_dinode_t *to,
+ xfs_icdinode_t *from)
+{
+ to->di_magic = cpu_to_be16(from->di_magic);
+ to->di_mode = cpu_to_be16(from->di_mode);
+ to->di_version = from ->di_version;
+ to->di_format = from->di_format;
+ to->di_onlink = cpu_to_be16(from->di_onlink);
+ to->di_uid = cpu_to_be32(from->di_uid);
+ to->di_gid = cpu_to_be32(from->di_gid);
+ to->di_nlink = cpu_to_be32(from->di_nlink);
+ to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
+ to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
+ memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
+ to->di_flushiter = cpu_to_be16(from->di_flushiter);
+ to->di_atime.t_sec = cpu_to_be32(from->di_atime.t_sec);
+ to->di_atime.t_nsec = cpu_to_be32(from->di_atime.t_nsec);
+ to->di_mtime.t_sec = cpu_to_be32(from->di_mtime.t_sec);
+ to->di_mtime.t_nsec = cpu_to_be32(from->di_mtime.t_nsec);
+ to->di_ctime.t_sec = cpu_to_be32(from->di_ctime.t_sec);
+ to->di_ctime.t_nsec = cpu_to_be32(from->di_ctime.t_nsec);
+ to->di_size = cpu_to_be64(from->di_size);
+ to->di_nblocks = cpu_to_be64(from->di_nblocks);
+ to->di_extsize = cpu_to_be32(from->di_extsize);
+ to->di_nextents = cpu_to_be32(from->di_nextents);
+ to->di_anextents = cpu_to_be16(from->di_anextents);
+ to->di_forkoff = from->di_forkoff;
+ to->di_aformat = from->di_aformat;
+ to->di_dmevmask = cpu_to_be32(from->di_dmevmask);
+ to->di_dmstate = cpu_to_be16(from->di_dmstate);
+ to->di_flags = cpu_to_be16(from->di_flags);
+ to->di_gen = cpu_to_be32(from->di_gen);
+
+ if (from->di_version == 3) {
+ to->di_changecount = cpu_to_be64(from->di_changecount);
+ to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
+ to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
+ to->di_flags2 = cpu_to_be64(from->di_flags2);
+ to->di_ino = cpu_to_be64(from->di_ino);
+ to->di_lsn = cpu_to_be64(from->di_lsn);
+ memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
+ uuid_copy(&to->di_uuid, &from->di_uuid);
+ }
+}
+
+static bool
+xfs_dinode_verify(
+ struct xfs_mount *mp,
+ struct xfs_inode *ip,
+ struct xfs_dinode *dip)
+{
+ if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC))
+ return false;
+
+ /* only version 3 or greater inodes are extensively verified here */
+ if (dip->di_version < 3)
+ return true;
+
+ if (!xfs_sb_version_hascrc(&mp->m_sb))
+ return false;
+ if (!xfs_verify_cksum((char *)dip, mp->m_sb.sb_inodesize,
+ offsetof(struct xfs_dinode, di_crc)))
+ return false;
+ if (be64_to_cpu(dip->di_ino) != ip->i_ino)
+ return false;
+ if (!uuid_equal(&dip->di_uuid, &mp->m_sb.sb_uuid))
+ return false;
+ return true;
+}
+
+void
+xfs_dinode_calc_crc(
+ struct xfs_mount *mp,
+ struct xfs_dinode *dip)
+{
+ __uint32_t crc;
+
+ if (dip->di_version < 3)
+ return;
+
+ ASSERT(xfs_sb_version_hascrc(&mp->m_sb));
+ crc = xfs_start_cksum((char *)dip, mp->m_sb.sb_inodesize,
+ offsetof(struct xfs_dinode, di_crc));
+ dip->di_crc = xfs_end_cksum(crc);
+}
+
+/*
+ * Read the disk inode attributes into the in-core inode structure.
+ *
+ * If we are initialising a new inode and we are not utilising the
+ * XFS_MOUNT_IKEEP inode cluster mode, we can simple build the new inode core
+ * with a random generation number. If we are keeping inodes around, we need to
+ * read the inode cluster to get the existing generation number off disk.
+ */
+int
+xfs_iread(
+ xfs_mount_t *mp,
+ xfs_trans_t *tp,
+ xfs_inode_t *ip,
+ uint iget_flags)
+{
+ xfs_buf_t *bp;
+ xfs_dinode_t *dip;
+ int error;
+
+ /*
+ * Fill in the location information in the in-core inode.
+ */
+ error = xfs_imap(mp, tp, ip->i_ino, &ip->i_imap, iget_flags);
+ if (error)
+ return error;
+
+ /* shortcut IO on inode allocation if possible */
+ if ((iget_flags & XFS_IGET_CREATE) &&
+ !(mp->m_flags & XFS_MOUNT_IKEEP)) {
+ /* initialise the on-disk inode core */
+ memset(&ip->i_d, 0, sizeof(ip->i_d));
+ ip->i_d.di_magic = XFS_DINODE_MAGIC;
+ ip->i_d.di_gen = prandom_u32();
+ if (xfs_sb_version_hascrc(&mp->m_sb)) {
+ ip->i_d.di_version = 3;
+ ip->i_d.di_ino = ip->i_ino;
+ uuid_copy(&ip->i_d.di_uuid, &mp->m_sb.sb_uuid);
+ } else
+ ip->i_d.di_version = 2;
+ return 0;
+ }
+
+ /*
+ * Get pointers to the on-disk inode and the buffer containing it.
+ */
+ error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &dip, &bp, 0, iget_flags);
+ if (error)
+ return error;
+
+ /* even unallocated inodes are verified */
+ if (!xfs_dinode_verify(mp, ip, dip)) {
+ xfs_alert(mp, "%s: validation failed for inode %lld failed",
+ __func__, ip->i_ino);
+
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, dip);
+ error = XFS_ERROR(EFSCORRUPTED);
+ goto out_brelse;
+ }
+
+ /*
+ * If the on-disk inode is already linked to a directory
+ * entry, copy all of the inode into the in-core inode.
+ * xfs_iformat_fork() handles copying in the inode format
+ * specific information.
+ * Otherwise, just get the truly permanent information.
+ */
+ if (dip->di_mode) {
+ xfs_dinode_from_disk(&ip->i_d, dip);
+ error = xfs_iformat_fork(ip, dip);
+ if (error) {
+#ifdef DEBUG
+ xfs_alert(mp, "%s: xfs_iformat() returned error %d",
+ __func__, error);
+#endif /* DEBUG */
+ goto out_brelse;
+ }
+ } else {
+ /*
+ * Partial initialisation of the in-core inode. Just the bits
+ * that xfs_ialloc won't overwrite or relies on being correct.
+ */
+ ip->i_d.di_magic = be16_to_cpu(dip->di_magic);
+ ip->i_d.di_version = dip->di_version;
+ ip->i_d.di_gen = be32_to_cpu(dip->di_gen);
+ ip->i_d.di_flushiter = be16_to_cpu(dip->di_flushiter);
+
+ if (dip->di_version == 3) {
+ ip->i_d.di_ino = be64_to_cpu(dip->di_ino);
+ uuid_copy(&ip->i_d.di_uuid, &dip->di_uuid);
+ }
+
+ /*
+ * Make sure to pull in the mode here as well in
+ * case the inode is released without being used.
+ * This ensures that xfs_inactive() will see that
+ * the inode is already free and not try to mess
+ * with the uninitialized part of it.
+ */
+ ip->i_d.di_mode = 0;
+ }
+
+ /*
+ * The inode format changed when we moved the link count and
+ * made it 32 bits long. If this is an old format inode,
+ * convert it in memory to look like a new one. If it gets
+ * flushed to disk we will convert back before flushing or
+ * logging it. We zero out the new projid field and the old link
+ * count field. We'll handle clearing the pad field (the remains
+ * of the old uuid field) when we actually convert the inode to
+ * the new format. We don't change the version number so that we
+ * can distinguish this from a real new format inode.
+ */
+ if (ip->i_d.di_version == 1) {
+ ip->i_d.di_nlink = ip->i_d.di_onlink;
+ ip->i_d.di_onlink = 0;
+ xfs_set_projid(ip, 0);
+ }
+
+ ip->i_delayed_blks = 0;
+
+ /*
+ * Mark the buffer containing the inode as something to keep
+ * around for a while. This helps to keep recently accessed
+ * meta-data in-core longer.
+ */
+ xfs_buf_set_ref(bp, XFS_INO_REF);
+
+ /*
+ * Use xfs_trans_brelse() to release the buffer containing the on-disk
+ * inode, because it was acquired with xfs_trans_read_buf() in
+ * xfs_imap_to_bp() above. If tp is NULL, this is just a normal
+ * brelse(). If we're within a transaction, then xfs_trans_brelse()
+ * will only release the buffer if it is not dirty within the
+ * transaction. It will be OK to release the buffer in this case,
+ * because inodes on disk are never destroyed and we will be locking the
+ * new in-core inode before putting it in the cache where other
+ * processes can find it. Thus we don't have to worry about the inode
+ * being changed just because we released the buffer.
+ */
+ out_brelse:
+ xfs_trans_brelse(tp, bp);
+ return error;
+}
diff --git a/fs/xfs/xfs_inode_buf.h b/fs/xfs/xfs_inode_buf.h
new file mode 100644
index 0000000..aae9fc4
--- /dev/null
+++ b/fs/xfs/xfs_inode_buf.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_INODE_BUF_H__
+#define __XFS_INODE_BUF_H__
+
+struct xfs_inode;
+struct xfs_dinode;
+struct xfs_icdinode;
+
+/*
+ * Inode location information. Stored in the inode and passed to
+ * xfs_imap_to_bp() to get a buffer and dinode for a given inode.
+ */
+struct xfs_imap {
+ xfs_daddr_t im_blkno; /* starting BB of inode chunk */
+ ushort im_len; /* length in BBs of inode chunk */
+ ushort im_boffset; /* inode offset in block in bytes */
+};
+
+int xfs_imap_to_bp(struct xfs_mount *, struct xfs_trans *,
+ struct xfs_imap *, struct xfs_dinode **,
+ struct xfs_buf **, uint, uint);
+int xfs_iread(struct xfs_mount *, struct xfs_trans *,
+ struct xfs_inode *, uint);
+void xfs_dinode_calc_crc(struct xfs_mount *, struct xfs_dinode *);
+void xfs_dinode_to_disk(struct xfs_dinode *,
+ struct xfs_icdinode *);
+
+#if defined(DEBUG)
+void xfs_inobp_check(struct xfs_mount *, struct xfs_buf *);
+#else
+#define xfs_inobp_check(mp, bp)
+#endif /* DEBUG */
+
+extern const struct xfs_buf_ops xfs_inode_buf_ops;
+
+#endif /* __XFS_INODE_BUF_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 35/60] xfs: start repopulating xfs_inode.[ch] with kernel code
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (33 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 34/60] xfs: introduce xfs_inode_buf.c for inode buffer operations Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 36/60] xfs: move swap extent code to xfs_extent_ops Dave Chinner
` (26 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Now that xfs_inode.[ch] is no long shared with userspace, we can
start repopulating them with core kernel inode code such as locking
functions and the kernel struct xfs_inode definitions. From this
point onwards, xfs_inode.[ch] is purely for in-kernel inode specific
infrastructure. High level kernel only operations will slowly
migrate and merge with xfs_iops.c, so xfs_inode.[ch] really only
should gather core functions that are shared by other kernel-only
code.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 434 ++++++++++++++++++++++++++++++++++++++++++++++--
fs/xfs/xfs_inode.h | 134 ++++++++++++++-
fs/xfs/xfs_inode_ops.c | 415 ---------------------------------------------
fs/xfs/xfs_inode_ops.h | 64 -------
4 files changed, 545 insertions(+), 502 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index c032da6..25706c7 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -15,36 +15,434 @@
* along with this program; if not, write the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
-#include <linux/log2.h>
-
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_types.h"
#include "xfs_log.h"
-#include "xfs_inum.h"
#include "xfs_trans.h"
-#include "xfs_trans_priv.h"
#include "xfs_sb.h"
#include "xfs_ag.h"
#include "xfs_mount.h"
#include "xfs_bmap_btree.h"
-#include "xfs_alloc_btree.h"
-#include "xfs_ialloc_btree.h"
-#include "xfs_attr_sf.h"
#include "xfs_dinode.h"
#include "xfs_inode.h"
-#include "xfs_buf_item.h"
-#include "xfs_inode_item.h"
-#include "xfs_btree.h"
-#include "xfs_alloc.h"
-#include "xfs_ialloc.h"
-#include "xfs_bmap.h"
-#include "xfs_error.h"
-#include "xfs_quota.h"
-#include "xfs_filestream.h"
-#include "xfs_cksum.h"
#include "xfs_trace.h"
-#include "xfs_icache.h"
+
+/* Kernel only core inode infrastructure */
kmem_zone_t *xfs_inode_zone;
+/*
+ * This is a wrapper routine around the xfs_ilock() routine used to centralize
+ * some grungy code. It is used in places that wish to lock the inode solely
+ * for reading the extents. The reason these places can't just call
+ * xfs_ilock(SHARED) is that the inode lock also guards to bringing in of the
+ * extents from disk for a file in b-tree format. If the inode is in b-tree
+ * format, then we need to lock the inode exclusively until the extents are read
+ * in. Locking it exclusively all the time would limit our parallelism
+ * unnecessarily, though. What we do instead is check to see if the extents
+ * have been read in yet, and only lock the inode exclusively if they have not.
+ *
+ * The function returns a value which should be given to the corresponding
+ * xfs_iunlock_map_shared(). This value is the mode in which the lock was
+ * actually taken.
+ */
+uint
+xfs_ilock_map_shared(
+ xfs_inode_t *ip)
+{
+ uint lock_mode;
+
+ if ((ip->i_d.di_format == XFS_DINODE_FMT_BTREE) &&
+ ((ip->i_df.if_flags & XFS_IFEXTENTS) == 0)) {
+ lock_mode = XFS_ILOCK_EXCL;
+ } else {
+ lock_mode = XFS_ILOCK_SHARED;
+ }
+
+ xfs_ilock(ip, lock_mode);
+
+ return lock_mode;
+}
+
+/*
+ * This is simply the unlock routine to go with xfs_ilock_map_shared().
+ * All it does is call xfs_iunlock() with the given lock_mode.
+ */
+void
+xfs_iunlock_map_shared(
+ xfs_inode_t *ip,
+ unsigned int lock_mode)
+{
+ xfs_iunlock(ip, lock_mode);
+}
+
+/*
+ * The xfs inode contains 2 locks: a multi-reader lock called the
+ * i_iolock and a multi-reader lock called the i_lock. This routine
+ * allows either or both of the locks to be obtained.
+ *
+ * The 2 locks should always be ordered so that the IO lock is
+ * obtained first in order to prevent deadlock.
+ *
+ * ip -- the inode being locked
+ * lock_flags -- this parameter indicates the inode's locks
+ * to be locked. It can be:
+ * XFS_IOLOCK_SHARED,
+ * XFS_IOLOCK_EXCL,
+ * XFS_ILOCK_SHARED,
+ * XFS_ILOCK_EXCL,
+ * XFS_IOLOCK_SHARED | XFS_ILOCK_SHARED,
+ * XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL,
+ * XFS_IOLOCK_EXCL | XFS_ILOCK_SHARED,
+ * XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL
+ */
+void
+xfs_ilock(
+ xfs_inode_t *ip,
+ uint lock_flags)
+{
+ trace_xfs_ilock(ip, lock_flags, _RET_IP_);
+
+ /*
+ * You can't set both SHARED and EXCL for the same lock,
+ * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
+ * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
+ */
+ ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
+ (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
+ ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
+ (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
+ ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
+
+ if (lock_flags & XFS_IOLOCK_EXCL)
+ mrupdate_nested(&ip->i_iolock, XFS_IOLOCK_DEP(lock_flags));
+ else if (lock_flags & XFS_IOLOCK_SHARED)
+ mraccess_nested(&ip->i_iolock, XFS_IOLOCK_DEP(lock_flags));
+
+ if (lock_flags & XFS_ILOCK_EXCL)
+ mrupdate_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
+ else if (lock_flags & XFS_ILOCK_SHARED)
+ mraccess_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
+}
+
+/*
+ * This is just like xfs_ilock(), except that the caller
+ * is guaranteed not to sleep. It returns 1 if it gets
+ * the requested locks and 0 otherwise. If the IO lock is
+ * obtained but the inode lock cannot be, then the IO lock
+ * is dropped before returning.
+ *
+ * ip -- the inode being locked
+ * lock_flags -- this parameter indicates the inode's locks to be
+ * to be locked. See the comment for xfs_ilock() for a list
+ * of valid values.
+ */
+int
+xfs_ilock_nowait(
+ xfs_inode_t *ip,
+ uint lock_flags)
+{
+ trace_xfs_ilock_nowait(ip, lock_flags, _RET_IP_);
+
+ /*
+ * You can't set both SHARED and EXCL for the same lock,
+ * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
+ * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
+ */
+ ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
+ (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
+ ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
+ (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
+ ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
+
+ if (lock_flags & XFS_IOLOCK_EXCL) {
+ if (!mrtryupdate(&ip->i_iolock))
+ goto out;
+ } else if (lock_flags & XFS_IOLOCK_SHARED) {
+ if (!mrtryaccess(&ip->i_iolock))
+ goto out;
+ }
+ if (lock_flags & XFS_ILOCK_EXCL) {
+ if (!mrtryupdate(&ip->i_lock))
+ goto out_undo_iolock;
+ } else if (lock_flags & XFS_ILOCK_SHARED) {
+ if (!mrtryaccess(&ip->i_lock))
+ goto out_undo_iolock;
+ }
+ return 1;
+
+ out_undo_iolock:
+ if (lock_flags & XFS_IOLOCK_EXCL)
+ mrunlock_excl(&ip->i_iolock);
+ else if (lock_flags & XFS_IOLOCK_SHARED)
+ mrunlock_shared(&ip->i_iolock);
+ out:
+ return 0;
+}
+
+/*
+ * xfs_iunlock() is used to drop the inode locks acquired with
+ * xfs_ilock() and xfs_ilock_nowait(). The caller must pass
+ * in the flags given to xfs_ilock() or xfs_ilock_nowait() so
+ * that we know which locks to drop.
+ *
+ * ip -- the inode being unlocked
+ * lock_flags -- this parameter indicates the inode's locks to be
+ * to be unlocked. See the comment for xfs_ilock() for a list
+ * of valid values for this parameter.
+ *
+ */
+void
+xfs_iunlock(
+ xfs_inode_t *ip,
+ uint lock_flags)
+{
+ /*
+ * You can't set both SHARED and EXCL for the same lock,
+ * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
+ * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
+ */
+ ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
+ (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
+ ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
+ (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
+ ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
+ ASSERT(lock_flags != 0);
+
+ if (lock_flags & XFS_IOLOCK_EXCL)
+ mrunlock_excl(&ip->i_iolock);
+ else if (lock_flags & XFS_IOLOCK_SHARED)
+ mrunlock_shared(&ip->i_iolock);
+
+ if (lock_flags & XFS_ILOCK_EXCL)
+ mrunlock_excl(&ip->i_lock);
+ else if (lock_flags & XFS_ILOCK_SHARED)
+ mrunlock_shared(&ip->i_lock);
+
+ trace_xfs_iunlock(ip, lock_flags, _RET_IP_);
+}
+
+/*
+ * give up write locks. the i/o lock cannot be held nested
+ * if it is being demoted.
+ */
+void
+xfs_ilock_demote(
+ xfs_inode_t *ip,
+ uint lock_flags)
+{
+ ASSERT(lock_flags & (XFS_IOLOCK_EXCL|XFS_ILOCK_EXCL));
+ ASSERT((lock_flags & ~(XFS_IOLOCK_EXCL|XFS_ILOCK_EXCL)) == 0);
+
+ if (lock_flags & XFS_ILOCK_EXCL)
+ mrdemote(&ip->i_lock);
+ if (lock_flags & XFS_IOLOCK_EXCL)
+ mrdemote(&ip->i_iolock);
+
+ trace_xfs_ilock_demote(ip, lock_flags, _RET_IP_);
+}
+
+#if defined(DEBUG) || defined(XFS_WARN)
+int
+xfs_isilocked(
+ xfs_inode_t *ip,
+ uint lock_flags)
+{
+ if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
+ if (!(lock_flags & XFS_ILOCK_SHARED))
+ return !!ip->i_lock.mr_writer;
+ return rwsem_is_locked(&ip->i_lock.mr_lock);
+ }
+
+ if (lock_flags & (XFS_IOLOCK_EXCL|XFS_IOLOCK_SHARED)) {
+ if (!(lock_flags & XFS_IOLOCK_SHARED))
+ return !!ip->i_iolock.mr_writer;
+ return rwsem_is_locked(&ip->i_iolock.mr_lock);
+ }
+
+ ASSERT(0);
+ return 0;
+}
+#endif
+
+#ifdef DEBUG
+int xfs_locked_n;
+int xfs_small_retries;
+int xfs_middle_retries;
+int xfs_lots_retries;
+int xfs_lock_delays;
+#endif
+
+/*
+ * Bump the subclass so xfs_lock_inodes() acquires each lock with
+ * a different value
+ */
+static inline int
+xfs_lock_inumorder(int lock_mode, int subclass)
+{
+ if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
+ lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_IOLOCK_SHIFT;
+ if (lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))
+ lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_ILOCK_SHIFT;
+
+ return lock_mode;
+}
+
+/*
+ * The following routine will lock n inodes in exclusive mode.
+ * We assume the caller calls us with the inodes in i_ino order.
+ *
+ * We need to detect deadlock where an inode that we lock
+ * is in the AIL and we start waiting for another inode that is locked
+ * by a thread in a long running transaction (such as truncate). This can
+ * result in deadlock since the long running trans might need to wait
+ * for the inode we just locked in order to push the tail and free space
+ * in the log.
+ */
+void
+xfs_lock_inodes(
+ xfs_inode_t **ips,
+ int inodes,
+ uint lock_mode)
+{
+ int attempts = 0, i, j, try_lock;
+ xfs_log_item_t *lp;
+
+ ASSERT(ips && (inodes >= 2)); /* we need at least two */
+
+ try_lock = 0;
+ i = 0;
+
+again:
+ for (; i < inodes; i++) {
+ ASSERT(ips[i]);
+
+ if (i && (ips[i] == ips[i-1])) /* Already locked */
+ continue;
+
+ /*
+ * If try_lock is not set yet, make sure all locked inodes
+ * are not in the AIL.
+ * If any are, set try_lock to be used later.
+ */
+
+ if (!try_lock) {
+ for (j = (i - 1); j >= 0 && !try_lock; j--) {
+ lp = (xfs_log_item_t *)ips[j]->i_itemp;
+ if (lp && (lp->li_flags & XFS_LI_IN_AIL)) {
+ try_lock++;
+ }
+ }
+ }
+
+ /*
+ * If any of the previous locks we have locked is in the AIL,
+ * we must TRY to get the second and subsequent locks. If
+ * we can't get any, we must release all we have
+ * and try again.
+ */
+
+ if (try_lock) {
+ /* try_lock must be 0 if i is 0. */
+ /*
+ * try_lock means we have an inode locked
+ * that is in the AIL.
+ */
+ ASSERT(i != 0);
+ if (!xfs_ilock_nowait(ips[i], xfs_lock_inumorder(lock_mode, i))) {
+ attempts++;
+
+ /*
+ * Unlock all previous guys and try again.
+ * xfs_iunlock will try to push the tail
+ * if the inode is in the AIL.
+ */
+
+ for(j = i - 1; j >= 0; j--) {
+
+ /*
+ * Check to see if we've already
+ * unlocked this one.
+ * Not the first one going back,
+ * and the inode ptr is the same.
+ */
+ if ((j != (i - 1)) && ips[j] ==
+ ips[j+1])
+ continue;
+
+ xfs_iunlock(ips[j], lock_mode);
+ }
+
+ if ((attempts % 5) == 0) {
+ delay(1); /* Don't just spin the CPU */
+#ifdef DEBUG
+ xfs_lock_delays++;
+#endif
+ }
+ i = 0;
+ try_lock = 0;
+ goto again;
+ }
+ } else {
+ xfs_ilock(ips[i], xfs_lock_inumorder(lock_mode, i));
+ }
+ }
+
+#ifdef DEBUG
+ if (attempts) {
+ if (attempts < 5) xfs_small_retries++;
+ else if (attempts < 100) xfs_middle_retries++;
+ else xfs_lots_retries++;
+ } else {
+ xfs_locked_n++;
+ }
+#endif
+}
+
+/*
+ * xfs_lock_two_inodes() can only be used to lock one type of lock
+ * at a time - the iolock or the ilock, but not both at once. If
+ * we lock both at once, lockdep will report false positives saying
+ * we have violated locking orders.
+ */
+void
+xfs_lock_two_inodes(
+ xfs_inode_t *ip0,
+ xfs_inode_t *ip1,
+ uint lock_mode)
+{
+ xfs_inode_t *temp;
+ int attempts = 0;
+ xfs_log_item_t *lp;
+
+ if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
+ ASSERT((lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)) == 0);
+ ASSERT(ip0->i_ino != ip1->i_ino);
+
+ if (ip0->i_ino > ip1->i_ino) {
+ temp = ip0;
+ ip0 = ip1;
+ ip1 = temp;
+ }
+
+ again:
+ xfs_ilock(ip0, xfs_lock_inumorder(lock_mode, 0));
+
+ /*
+ * If the first lock we have locked is in the AIL, we must TRY to get
+ * the second lock. If we can't get it, we must release the first one
+ * and try again.
+ */
+ lp = (xfs_log_item_t *)ip0->i_itemp;
+ if (lp && (lp->li_flags & XFS_LI_IN_AIL)) {
+ if (!xfs_ilock_nowait(ip1, xfs_lock_inumorder(lock_mode, 1))) {
+ xfs_iunlock(ip0, lock_mode);
+ if ((++attempts % 5) == 0)
+ delay(1); /* Don't just spin the CPU */
+ goto again;
+ }
+ } else {
+ xfs_ilock(ip1, xfs_lock_inumorder(lock_mode, 1));
+ }
+}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 83a8ce3..6afcaa3 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -18,17 +18,141 @@
#ifndef __XFS_INODE_H__
#define __XFS_INODE_H__
-struct xfs_dinode;
-struct xfs_inode;
+//struct xfs_dinode;
+//struct xfs_inode;
#include "xfs_inode_buf.h"
#include "xfs_inode_fork.h"
#include "xfs_inode_item_format.h"
-#ifdef __KERNEL__
+/* kernel only inode definitions */
-#include "xfs_inode_ops.h"
+typedef struct xfs_inode {
+ /* Inode linking and identification information. */
+ struct xfs_mount *i_mount; /* fs mount struct ptr */
+ struct xfs_dquot *i_udquot; /* user dquot */
+ struct xfs_dquot *i_gdquot; /* group dquot */
+
+ /* Inode location stuff */
+ xfs_ino_t i_ino; /* inode number (agno/agino)*/
+ struct xfs_imap i_imap; /* location for xfs_imap() */
+
+ /* Extent information. */
+ struct xfs_ifork *i_afp; /* attribute fork pointer */
+ struct xfs_ifork i_df; /* data fork */
+
+ /* Transaction and locking information. */
+ struct xfs_inode_log_item *i_itemp; /* logging information */
+ mrlock_t i_lock; /* inode lock */
+ mrlock_t i_iolock; /* inode IO lock */
+ atomic_t i_pincount; /* inode pin count */
+ spinlock_t i_flags_lock; /* inode i_flags lock */
+ /* Miscellaneous state. */
+ unsigned long i_flags; /* see defined flags below */
+ unsigned int i_delayed_blks; /* count of delay alloc blks */
+
+ struct xfs_icdinode i_d; /* most of ondisk inode */
+
+ /* VFS inode */
+ struct inode i_vnode; /* embedded VFS inode */
+} xfs_inode_t;
+
+/* Convert from vfs inode to xfs inode */
+static inline struct xfs_inode *XFS_I(struct inode *inode)
+{
+ return container_of(inode, struct xfs_inode, i_vnode);
+}
+
+/* convert from xfs inode to vfs inode */
+static inline struct inode *VFS_I(struct xfs_inode *ip)
+{
+ return &ip->i_vnode;
+}
+
+/*
+ * For regular files we only update the on-disk filesize when actually
+ * writing data back to disk. Until then only the copy in the VFS inode
+ * is uptodate.
+ */
+static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
+{
+ if (S_ISREG(ip->i_d.di_mode))
+ return i_size_read(VFS_I(ip));
+ return ip->i_d.di_size;
+}
+
+/*
+ * Flags for inode locking.
+ * Bit ranges: 1<<1 - 1<<16-1 -- iolock/ilock modes (bitfield)
+ * 1<<16 - 1<<32-1 -- lockdep annotation (integers)
+ */
+#define XFS_IOLOCK_EXCL (1<<0)
+#define XFS_IOLOCK_SHARED (1<<1)
+#define XFS_ILOCK_EXCL (1<<2)
+#define XFS_ILOCK_SHARED (1<<3)
+
+#define XFS_LOCK_MASK (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED \
+ | XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)
-#endif /* __KERNEL__ */
+#define XFS_LOCK_FLAGS \
+ { XFS_IOLOCK_EXCL, "IOLOCK_EXCL" }, \
+ { XFS_IOLOCK_SHARED, "IOLOCK_SHARED" }, \
+ { XFS_ILOCK_EXCL, "ILOCK_EXCL" }, \
+ { XFS_ILOCK_SHARED, "ILOCK_SHARED" }
+
+
+/*
+ * Flags for lockdep annotations.
+ *
+ * XFS_LOCK_PARENT - for directory operations that require locking a
+ * parent directory inode and a child entry inode. The parent gets locked
+ * with this flag so it gets a lockdep subclass of 1 and the child entry
+ * lock will have a lockdep subclass of 0.
+ *
+ * XFS_LOCK_RTBITMAP/XFS_LOCK_RTSUM - the realtime device bitmap and summary
+ * inodes do not participate in the normal lock order, and thus have their
+ * own subclasses.
+ *
+ * XFS_LOCK_INUMORDER - for locking several inodes at the some time
+ * with xfs_lock_inodes(). This flag is used as the starting subclass
+ * and each subsequent lock acquired will increment the subclass by one.
+ * So the first lock acquired will have a lockdep subclass of 4, the
+ * second lock will have a lockdep subclass of 5, and so on. It is
+ * the responsibility of the class builder to shift this to the correct
+ * portion of the lock_mode lockdep mask.
+ */
+#define XFS_LOCK_PARENT 1
+#define XFS_LOCK_RTBITMAP 2
+#define XFS_LOCK_RTSUM 3
+#define XFS_LOCK_INUMORDER 4
+
+#define XFS_IOLOCK_SHIFT 16
+#define XFS_IOLOCK_PARENT (XFS_LOCK_PARENT << XFS_IOLOCK_SHIFT)
+
+#define XFS_ILOCK_SHIFT 24
+#define XFS_ILOCK_PARENT (XFS_LOCK_PARENT << XFS_ILOCK_SHIFT)
+#define XFS_ILOCK_RTBITMAP (XFS_LOCK_RTBITMAP << XFS_ILOCK_SHIFT)
+#define XFS_ILOCK_RTSUM (XFS_LOCK_RTSUM << XFS_ILOCK_SHIFT)
+
+#define XFS_IOLOCK_DEP_MASK 0x00ff0000
+#define XFS_ILOCK_DEP_MASK 0xff000000
+#define XFS_LOCK_DEP_MASK (XFS_IOLOCK_DEP_MASK | XFS_ILOCK_DEP_MASK)
+
+#define XFS_IOLOCK_DEP(flags) (((flags) & XFS_IOLOCK_DEP_MASK) >> XFS_IOLOCK_SHIFT)
+#define XFS_ILOCK_DEP(flags) (((flags) & XFS_ILOCK_DEP_MASK) >> XFS_ILOCK_SHIFT)
+
+void xfs_ilock(xfs_inode_t *, uint);
+int xfs_ilock_nowait(xfs_inode_t *, uint);
+void xfs_iunlock(xfs_inode_t *, uint);
+void xfs_ilock_demote(xfs_inode_t *, uint);
+int xfs_isilocked(xfs_inode_t *, uint);
+uint xfs_ilock_map_shared(xfs_inode_t *);
+void xfs_iunlock_map_shared(xfs_inode_t *, uint);
+void xfs_lock_inodes(xfs_inode_t **, int, uint);
+void xfs_lock_two_inodes(xfs_inode_t *, xfs_inode_t *, uint);
+
+extern struct kmem_zone *xfs_inode_zone;
+
+#include "xfs_inode_ops.h"
#endif /* __XFS_INODE_H__ */
diff --git a/fs/xfs/xfs_inode_ops.c b/fs/xfs/xfs_inode_ops.c
index 7a7f19d..b4d6948 100644
--- a/fs/xfs/xfs_inode_ops.c
+++ b/fs/xfs/xfs_inode_ops.c
@@ -72,421 +72,6 @@ xfs_get_extsz_hint(
return 0;
}
-/*
- * This is a wrapper routine around the xfs_ilock() routine used to centralize
- * some grungy code. It is used in places that wish to lock the inode solely
- * for reading the extents. The reason these places can't just call
- * xfs_ilock(SHARED) is that the inode lock also guards to bringing in of the
- * extents from disk for a file in b-tree format. If the inode is in b-tree
- * format, then we need to lock the inode exclusively until the extents are read
- * in. Locking it exclusively all the time would limit our parallelism
- * unnecessarily, though. What we do instead is check to see if the extents
- * have been read in yet, and only lock the inode exclusively if they have not.
- *
- * The function returns a value which should be given to the corresponding
- * xfs_iunlock_map_shared(). This value is the mode in which the lock was
- * actually taken.
- */
-uint
-xfs_ilock_map_shared(
- xfs_inode_t *ip)
-{
- uint lock_mode;
-
- if ((ip->i_d.di_format == XFS_DINODE_FMT_BTREE) &&
- ((ip->i_df.if_flags & XFS_IFEXTENTS) == 0)) {
- lock_mode = XFS_ILOCK_EXCL;
- } else {
- lock_mode = XFS_ILOCK_SHARED;
- }
-
- xfs_ilock(ip, lock_mode);
-
- return lock_mode;
-}
-
-/*
- * This is simply the unlock routine to go with xfs_ilock_map_shared().
- * All it does is call xfs_iunlock() with the given lock_mode.
- */
-void
-xfs_iunlock_map_shared(
- xfs_inode_t *ip,
- unsigned int lock_mode)
-{
- xfs_iunlock(ip, lock_mode);
-}
-
-/*
- * The xfs inode contains 2 locks: a multi-reader lock called the
- * i_iolock and a multi-reader lock called the i_lock. This routine
- * allows either or both of the locks to be obtained.
- *
- * The 2 locks should always be ordered so that the IO lock is
- * obtained first in order to prevent deadlock.
- *
- * ip -- the inode being locked
- * lock_flags -- this parameter indicates the inode's locks
- * to be locked. It can be:
- * XFS_IOLOCK_SHARED,
- * XFS_IOLOCK_EXCL,
- * XFS_ILOCK_SHARED,
- * XFS_ILOCK_EXCL,
- * XFS_IOLOCK_SHARED | XFS_ILOCK_SHARED,
- * XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL,
- * XFS_IOLOCK_EXCL | XFS_ILOCK_SHARED,
- * XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL
- */
-void
-xfs_ilock(
- xfs_inode_t *ip,
- uint lock_flags)
-{
- trace_xfs_ilock(ip, lock_flags, _RET_IP_);
-
- /*
- * You can't set both SHARED and EXCL for the same lock,
- * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
- * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
- */
- ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
- (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
- ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
- (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
- ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
-
- if (lock_flags & XFS_IOLOCK_EXCL)
- mrupdate_nested(&ip->i_iolock, XFS_IOLOCK_DEP(lock_flags));
- else if (lock_flags & XFS_IOLOCK_SHARED)
- mraccess_nested(&ip->i_iolock, XFS_IOLOCK_DEP(lock_flags));
-
- if (lock_flags & XFS_ILOCK_EXCL)
- mrupdate_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
- else if (lock_flags & XFS_ILOCK_SHARED)
- mraccess_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
-}
-
-/*
- * This is just like xfs_ilock(), except that the caller
- * is guaranteed not to sleep. It returns 1 if it gets
- * the requested locks and 0 otherwise. If the IO lock is
- * obtained but the inode lock cannot be, then the IO lock
- * is dropped before returning.
- *
- * ip -- the inode being locked
- * lock_flags -- this parameter indicates the inode's locks to be
- * to be locked. See the comment for xfs_ilock() for a list
- * of valid values.
- */
-int
-xfs_ilock_nowait(
- xfs_inode_t *ip,
- uint lock_flags)
-{
- trace_xfs_ilock_nowait(ip, lock_flags, _RET_IP_);
-
- /*
- * You can't set both SHARED and EXCL for the same lock,
- * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
- * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
- */
- ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
- (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
- ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
- (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
- ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
-
- if (lock_flags & XFS_IOLOCK_EXCL) {
- if (!mrtryupdate(&ip->i_iolock))
- goto out;
- } else if (lock_flags & XFS_IOLOCK_SHARED) {
- if (!mrtryaccess(&ip->i_iolock))
- goto out;
- }
- if (lock_flags & XFS_ILOCK_EXCL) {
- if (!mrtryupdate(&ip->i_lock))
- goto out_undo_iolock;
- } else if (lock_flags & XFS_ILOCK_SHARED) {
- if (!mrtryaccess(&ip->i_lock))
- goto out_undo_iolock;
- }
- return 1;
-
- out_undo_iolock:
- if (lock_flags & XFS_IOLOCK_EXCL)
- mrunlock_excl(&ip->i_iolock);
- else if (lock_flags & XFS_IOLOCK_SHARED)
- mrunlock_shared(&ip->i_iolock);
- out:
- return 0;
-}
-
-/*
- * xfs_iunlock() is used to drop the inode locks acquired with
- * xfs_ilock() and xfs_ilock_nowait(). The caller must pass
- * in the flags given to xfs_ilock() or xfs_ilock_nowait() so
- * that we know which locks to drop.
- *
- * ip -- the inode being unlocked
- * lock_flags -- this parameter indicates the inode's locks to be
- * to be unlocked. See the comment for xfs_ilock() for a list
- * of valid values for this parameter.
- *
- */
-void
-xfs_iunlock(
- xfs_inode_t *ip,
- uint lock_flags)
-{
- /*
- * You can't set both SHARED and EXCL for the same lock,
- * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED,
- * and XFS_ILOCK_EXCL are valid values to set in lock_flags.
- */
- ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) !=
- (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
- ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) !=
- (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
- ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0);
- ASSERT(lock_flags != 0);
-
- if (lock_flags & XFS_IOLOCK_EXCL)
- mrunlock_excl(&ip->i_iolock);
- else if (lock_flags & XFS_IOLOCK_SHARED)
- mrunlock_shared(&ip->i_iolock);
-
- if (lock_flags & XFS_ILOCK_EXCL)
- mrunlock_excl(&ip->i_lock);
- else if (lock_flags & XFS_ILOCK_SHARED)
- mrunlock_shared(&ip->i_lock);
-
- trace_xfs_iunlock(ip, lock_flags, _RET_IP_);
-}
-
-/*
- * give up write locks. the i/o lock cannot be held nested
- * if it is being demoted.
- */
-void
-xfs_ilock_demote(
- xfs_inode_t *ip,
- uint lock_flags)
-{
- ASSERT(lock_flags & (XFS_IOLOCK_EXCL|XFS_ILOCK_EXCL));
- ASSERT((lock_flags & ~(XFS_IOLOCK_EXCL|XFS_ILOCK_EXCL)) == 0);
-
- if (lock_flags & XFS_ILOCK_EXCL)
- mrdemote(&ip->i_lock);
- if (lock_flags & XFS_IOLOCK_EXCL)
- mrdemote(&ip->i_iolock);
-
- trace_xfs_ilock_demote(ip, lock_flags, _RET_IP_);
-}
-
-#if defined(DEBUG) || defined(XFS_WARN)
-int
-xfs_isilocked(
- xfs_inode_t *ip,
- uint lock_flags)
-{
- if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
- if (!(lock_flags & XFS_ILOCK_SHARED))
- return !!ip->i_lock.mr_writer;
- return rwsem_is_locked(&ip->i_lock.mr_lock);
- }
-
- if (lock_flags & (XFS_IOLOCK_EXCL|XFS_IOLOCK_SHARED)) {
- if (!(lock_flags & XFS_IOLOCK_SHARED))
- return !!ip->i_iolock.mr_writer;
- return rwsem_is_locked(&ip->i_iolock.mr_lock);
- }
-
- ASSERT(0);
- return 0;
-}
-#endif
-
-#ifdef DEBUG
-int xfs_locked_n;
-int xfs_small_retries;
-int xfs_middle_retries;
-int xfs_lots_retries;
-int xfs_lock_delays;
-#endif
-
-/*
- * Bump the subclass so xfs_lock_inodes() acquires each lock with
- * a different value
- */
-static inline int
-xfs_lock_inumorder(int lock_mode, int subclass)
-{
- if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
- lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_IOLOCK_SHIFT;
- if (lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))
- lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_ILOCK_SHIFT;
-
- return lock_mode;
-}
-
-/*
- * The following routine will lock n inodes in exclusive mode.
- * We assume the caller calls us with the inodes in i_ino order.
- *
- * We need to detect deadlock where an inode that we lock
- * is in the AIL and we start waiting for another inode that is locked
- * by a thread in a long running transaction (such as truncate). This can
- * result in deadlock since the long running trans might need to wait
- * for the inode we just locked in order to push the tail and free space
- * in the log.
- */
-void
-xfs_lock_inodes(
- xfs_inode_t **ips,
- int inodes,
- uint lock_mode)
-{
- int attempts = 0, i, j, try_lock;
- xfs_log_item_t *lp;
-
- ASSERT(ips && (inodes >= 2)); /* we need at least two */
-
- try_lock = 0;
- i = 0;
-
-again:
- for (; i < inodes; i++) {
- ASSERT(ips[i]);
-
- if (i && (ips[i] == ips[i-1])) /* Already locked */
- continue;
-
- /*
- * If try_lock is not set yet, make sure all locked inodes
- * are not in the AIL.
- * If any are, set try_lock to be used later.
- */
-
- if (!try_lock) {
- for (j = (i - 1); j >= 0 && !try_lock; j--) {
- lp = (xfs_log_item_t *)ips[j]->i_itemp;
- if (lp && (lp->li_flags & XFS_LI_IN_AIL)) {
- try_lock++;
- }
- }
- }
-
- /*
- * If any of the previous locks we have locked is in the AIL,
- * we must TRY to get the second and subsequent locks. If
- * we can't get any, we must release all we have
- * and try again.
- */
-
- if (try_lock) {
- /* try_lock must be 0 if i is 0. */
- /*
- * try_lock means we have an inode locked
- * that is in the AIL.
- */
- ASSERT(i != 0);
- if (!xfs_ilock_nowait(ips[i], xfs_lock_inumorder(lock_mode, i))) {
- attempts++;
-
- /*
- * Unlock all previous guys and try again.
- * xfs_iunlock will try to push the tail
- * if the inode is in the AIL.
- */
-
- for(j = i - 1; j >= 0; j--) {
-
- /*
- * Check to see if we've already
- * unlocked this one.
- * Not the first one going back,
- * and the inode ptr is the same.
- */
- if ((j != (i - 1)) && ips[j] ==
- ips[j+1])
- continue;
-
- xfs_iunlock(ips[j], lock_mode);
- }
-
- if ((attempts % 5) == 0) {
- delay(1); /* Don't just spin the CPU */
-#ifdef DEBUG
- xfs_lock_delays++;
-#endif
- }
- i = 0;
- try_lock = 0;
- goto again;
- }
- } else {
- xfs_ilock(ips[i], xfs_lock_inumorder(lock_mode, i));
- }
- }
-
-#ifdef DEBUG
- if (attempts) {
- if (attempts < 5) xfs_small_retries++;
- else if (attempts < 100) xfs_middle_retries++;
- else xfs_lots_retries++;
- } else {
- xfs_locked_n++;
- }
-#endif
-}
-
-/*
- * xfs_lock_two_inodes() can only be used to lock one type of lock
- * at a time - the iolock or the ilock, but not both at once. If
- * we lock both at once, lockdep will report false positives saying
- * we have violated locking orders.
- */
-void
-xfs_lock_two_inodes(
- xfs_inode_t *ip0,
- xfs_inode_t *ip1,
- uint lock_mode)
-{
- xfs_inode_t *temp;
- int attempts = 0;
- xfs_log_item_t *lp;
-
- if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
- ASSERT((lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)) == 0);
- ASSERT(ip0->i_ino != ip1->i_ino);
-
- if (ip0->i_ino > ip1->i_ino) {
- temp = ip0;
- ip0 = ip1;
- ip1 = temp;
- }
-
- again:
- xfs_ilock(ip0, xfs_lock_inumorder(lock_mode, 0));
-
- /*
- * If the first lock we have locked is in the AIL, we must TRY to get
- * the second lock. If we can't get it, we must release the first one
- * and try again.
- */
- lp = (xfs_log_item_t *)ip0->i_itemp;
- if (lp && (lp->li_flags & XFS_LI_IN_AIL)) {
- if (!xfs_ilock_nowait(ip1, xfs_lock_inumorder(lock_mode, 1))) {
- xfs_iunlock(ip0, lock_mode);
- if ((++attempts % 5) == 0)
- delay(1); /* Don't just spin the CPU */
- goto again;
- }
- } else {
- xfs_ilock(ip1, xfs_lock_inumorder(lock_mode, 1));
- }
-}
-
void
__xfs_iflock(
diff --git a/fs/xfs/xfs_inode_ops.h b/fs/xfs/xfs_inode_ops.h
index ecaa019..b74e544 100644
--- a/fs/xfs/xfs_inode_ops.h
+++ b/fs/xfs/xfs_inode_ops.h
@@ -28,59 +28,6 @@ struct xfs_mount;
struct xfs_trans;
struct xfs_dquot;
-typedef struct xfs_inode {
- /* Inode linking and identification information. */
- struct xfs_mount *i_mount; /* fs mount struct ptr */
- struct xfs_dquot *i_udquot; /* user dquot */
- struct xfs_dquot *i_gdquot; /* group dquot */
-
- /* Inode location stuff */
- xfs_ino_t i_ino; /* inode number (agno/agino)*/
- struct xfs_imap i_imap; /* location for xfs_imap() */
-
- /* Extent information. */
- xfs_ifork_t *i_afp; /* attribute fork pointer */
- xfs_ifork_t i_df; /* data fork */
-
- /* Transaction and locking information. */
- struct xfs_inode_log_item *i_itemp; /* logging information */
- mrlock_t i_lock; /* inode lock */
- mrlock_t i_iolock; /* inode IO lock */
- atomic_t i_pincount; /* inode pin count */
- spinlock_t i_flags_lock; /* inode i_flags lock */
- /* Miscellaneous state. */
- unsigned long i_flags; /* see defined flags below */
- unsigned int i_delayed_blks; /* count of delay alloc blks */
-
- xfs_icdinode_t i_d; /* most of ondisk inode */
-
- /* VFS inode */
- struct inode i_vnode; /* embedded VFS inode */
-} xfs_inode_t;
-
-/* Convert from vfs inode to xfs inode */
-static inline struct xfs_inode *XFS_I(struct inode *inode)
-{
- return container_of(inode, struct xfs_inode, i_vnode);
-}
-
-/* convert from xfs inode to vfs inode */
-static inline struct inode *VFS_I(struct xfs_inode *ip)
-{
- return &ip->i_vnode;
-}
-
-/*
- * For regular files we only update the on-disk filesize when actually
- * writing data back to disk. Until then only the copy in the VFS inode
- * is uptodate.
- */
-static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
-{
- if (S_ISREG(ip->i_d.di_mode))
- return i_size_read(VFS_I(ip));
- return ip->i_d.di_size;
-}
/*
* If this I/O goes past the on-disk inode size update it unless it would
@@ -324,13 +271,6 @@ int xfs_rename(struct xfs_inode *src_dp, struct xfs_name *src_name,
int xfs_free_eofblocks(struct xfs_mount *, struct xfs_inode *, bool);
bool xfs_can_free_eofblocks(struct xfs_inode *, bool);
-void xfs_ilock(xfs_inode_t *, uint);
-int xfs_ilock_nowait(xfs_inode_t *, uint);
-void xfs_iunlock(xfs_inode_t *, uint);
-void xfs_ilock_demote(xfs_inode_t *, uint);
-int xfs_isilocked(xfs_inode_t *, uint);
-uint xfs_ilock_map_shared(xfs_inode_t *);
-void xfs_iunlock_map_shared(xfs_inode_t *, uint);
int xfs_ialloc(struct xfs_trans *, xfs_inode_t *, umode_t,
xfs_nlink_t, xfs_dev_t, prid_t, int,
struct xfs_buf **, xfs_inode_t **);
@@ -349,8 +289,6 @@ void xfs_iunpin_wait(xfs_inode_t *);
#define xfs_ipincount(ip) ((unsigned int) atomic_read(&ip->i_pincount))
int xfs_iflush(struct xfs_inode *, struct xfs_buf **);
-void xfs_lock_inodes(xfs_inode_t **, int, uint);
-void xfs_lock_two_inodes(xfs_inode_t *, xfs_inode_t *, uint);
xfs_extlen_t xfs_get_extsz_hint(struct xfs_inode *ip);
@@ -379,6 +317,4 @@ int xfs_droplink(struct xfs_trans *, struct xfs_inode *);
int xfs_bumplink(struct xfs_trans *, struct xfs_inode *);
void xfs_bump_ino_vers2(struct xfs_trans *, struct xfs_inode *);
-extern struct kmem_zone *xfs_inode_zone;
-
#endif /* __XFS_INODE_OPS_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 36/60] xfs: move swap extent code to xfs_extent_ops
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (34 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 35/60] xfs: start repopulating xfs_inode.[ch] with kernel code Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 37/60] xfs: split out inode log item format definition Dave Chinner
` (25 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Swapping extents is clearly an extent operaiton, and it is not
shared with userspace. Move the code to xfs_extent_ops.[ch], and
the userspace ioctl structure definition to xfs_fs.h where most of
the other ioctl structure definitions are. The means xfs_dfrag.h is
no longer needed in userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 -
fs/xfs/xfs_dfrag.c | 456 -----------------------------------------------
fs/xfs/xfs_dfrag.h | 53 ------
fs/xfs/xfs_extent_ops.c | 416 ++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_extent_ops.h | 2 +
fs/xfs/xfs_fs.h | 15 ++
fs/xfs/xfs_ioctl.c | 1 -
fs/xfs/xfs_ioctl32.c | 2 +-
8 files changed, 434 insertions(+), 512 deletions(-)
delete mode 100644 fs/xfs/xfs_dfrag.c
delete mode 100644 fs/xfs/xfs_dfrag.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index d545286..347d27c 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -31,7 +31,6 @@ xfs-y += xfs_aops.o \
xfs_attr_list.o \
xfs_bit.o \
xfs_buf.o \
- xfs_dfrag.o \
xfs_dir2_readdir.o \
xfs_discard.o \
xfs_error.o \
diff --git a/fs/xfs/xfs_dfrag.c b/fs/xfs/xfs_dfrag.c
deleted file mode 100644
index a76356a..0000000
--- a/fs/xfs/xfs_dfrag.c
+++ /dev/null
@@ -1,456 +0,0 @@
-/*
- * Copyright (c) 2000-2006 Silicon Graphics, Inc.
- * All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it would be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write the Free Software Foundation,
- * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-#include "xfs.h"
-#include "xfs_fs.h"
-#include "xfs_types.h"
-#include "xfs_log.h"
-#include "xfs_trans.h"
-#include "xfs_sb.h"
-#include "xfs_ag.h"
-#include "xfs_mount.h"
-#include "xfs_bmap_btree.h"
-#include "xfs_dinode.h"
-#include "xfs_inode.h"
-#include "xfs_inode_item.h"
-#include "xfs_bmap.h"
-#include "xfs_itable.h"
-#include "xfs_dfrag.h"
-#include "xfs_error.h"
-#include "xfs_trace.h"
-
-
-static int xfs_swap_extents(
- xfs_inode_t *ip, /* target inode */
- xfs_inode_t *tip, /* tmp inode */
- xfs_swapext_t *sxp);
-
-/*
- * ioctl interface for swapext
- */
-int
-xfs_swapext(
- xfs_swapext_t *sxp)
-{
- xfs_inode_t *ip, *tip;
- struct fd f, tmp;
- int error = 0;
-
- /* Pull information for the target fd */
- f = fdget((int)sxp->sx_fdtarget);
- if (!f.file) {
- error = XFS_ERROR(EINVAL);
- goto out;
- }
-
- if (!(f.file->f_mode & FMODE_WRITE) ||
- !(f.file->f_mode & FMODE_READ) ||
- (f.file->f_flags & O_APPEND)) {
- error = XFS_ERROR(EBADF);
- goto out_put_file;
- }
-
- tmp = fdget((int)sxp->sx_fdtmp);
- if (!tmp.file) {
- error = XFS_ERROR(EINVAL);
- goto out_put_file;
- }
-
- if (!(tmp.file->f_mode & FMODE_WRITE) ||
- !(tmp.file->f_mode & FMODE_READ) ||
- (tmp.file->f_flags & O_APPEND)) {
- error = XFS_ERROR(EBADF);
- goto out_put_tmp_file;
- }
-
- if (IS_SWAPFILE(file_inode(f.file)) ||
- IS_SWAPFILE(file_inode(tmp.file))) {
- error = XFS_ERROR(EINVAL);
- goto out_put_tmp_file;
- }
-
- ip = XFS_I(file_inode(f.file));
- tip = XFS_I(file_inode(tmp.file));
-
- if (ip->i_mount != tip->i_mount) {
- error = XFS_ERROR(EINVAL);
- goto out_put_tmp_file;
- }
-
- if (ip->i_ino == tip->i_ino) {
- error = XFS_ERROR(EINVAL);
- goto out_put_tmp_file;
- }
-
- if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
- error = XFS_ERROR(EIO);
- goto out_put_tmp_file;
- }
-
- error = xfs_swap_extents(ip, tip, sxp);
-
- out_put_tmp_file:
- fdput(tmp);
- out_put_file:
- fdput(f);
- out:
- return error;
-}
-
-/*
- * We need to check that the format of the data fork in the temporary inode is
- * valid for the target inode before doing the swap. This is not a problem with
- * attr1 because of the fixed fork offset, but attr2 has a dynamically sized
- * data fork depending on the space the attribute fork is taking so we can get
- * invalid formats on the target inode.
- *
- * E.g. target has space for 7 extents in extent format, temp inode only has
- * space for 6. If we defragment down to 7 extents, then the tmp format is a
- * btree, but when swapped it needs to be in extent format. Hence we can't just
- * blindly swap data forks on attr2 filesystems.
- *
- * Note that we check the swap in both directions so that we don't end up with
- * a corrupt temporary inode, either.
- *
- * Note that fixing the way xfs_fsr sets up the attribute fork in the source
- * inode will prevent this situation from occurring, so all we do here is
- * reject and log the attempt. basically we are putting the responsibility on
- * userspace to get this right.
- */
-static int
-xfs_swap_extents_check_format(
- xfs_inode_t *ip, /* target inode */
- xfs_inode_t *tip) /* tmp inode */
-{
-
- /* Should never get a local format */
- if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL ||
- tip->i_d.di_format == XFS_DINODE_FMT_LOCAL)
- return EINVAL;
-
- /*
- * if the target inode has less extents that then temporary inode then
- * why did userspace call us?
- */
- if (ip->i_d.di_nextents < tip->i_d.di_nextents)
- return EINVAL;
-
- /*
- * if the target inode is in extent form and the temp inode is in btree
- * form then we will end up with the target inode in the wrong format
- * as we already know there are less extents in the temp inode.
- */
- if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
- tip->i_d.di_format == XFS_DINODE_FMT_BTREE)
- return EINVAL;
-
- /* Check temp in extent form to max in target */
- if (tip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) >
- XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
- return EINVAL;
-
- /* Check target in extent form to max in temp */
- if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) >
- XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
- return EINVAL;
-
- /*
- * If we are in a btree format, check that the temp root block will fit
- * in the target and that it has enough extents to be in btree format
- * in the target.
- *
- * Note that we have to be careful to allow btree->extent conversions
- * (a common defrag case) which will occur when the temp inode is in
- * extent format...
- */
- if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
- if (XFS_IFORK_BOFF(ip) &&
- tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip))
- return EINVAL;
- if (XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) <=
- XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
- return EINVAL;
- }
-
- /* Reciprocal target->temp btree format checks */
- if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
- if (XFS_IFORK_BOFF(tip) &&
- ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip))
- return EINVAL;
-
- if (XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) <=
- XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
- return EINVAL;
- }
-
- return 0;
-}
-
-static int
-xfs_swap_extents(
- xfs_inode_t *ip, /* target inode */
- xfs_inode_t *tip, /* tmp inode */
- xfs_swapext_t *sxp)
-{
- xfs_mount_t *mp = ip->i_mount;
- xfs_trans_t *tp;
- xfs_bstat_t *sbp = &sxp->sx_stat;
- xfs_ifork_t *tempifp, *ifp, *tifp;
- int src_log_flags, target_log_flags;
- int error = 0;
- int aforkblks = 0;
- int taforkblks = 0;
- __uint64_t tmp;
-
- /*
- * We have no way of updating owner information in the BMBT blocks for
- * each inode on CRC enabled filesystems, so to avoid corrupting the
- * this metadata we simply don't allow extent swaps to occur.
- */
- if (xfs_sb_version_hascrc(&mp->m_sb))
- return XFS_ERROR(EINVAL);
-
- tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
- if (!tempifp) {
- error = XFS_ERROR(ENOMEM);
- goto out;
- }
-
- /*
- * we have to do two separate lock calls here to keep lockdep
- * happy. If we try to get all the locks in one call, lock will
- * report false positives when we drop the ILOCK and regain them
- * below.
- */
- xfs_lock_two_inodes(ip, tip, XFS_IOLOCK_EXCL);
- xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL);
-
- /* Verify that both files have the same format */
- if ((ip->i_d.di_mode & S_IFMT) != (tip->i_d.di_mode & S_IFMT)) {
- error = XFS_ERROR(EINVAL);
- goto out_unlock;
- }
-
- /* Verify both files are either real-time or non-realtime */
- if (XFS_IS_REALTIME_INODE(ip) != XFS_IS_REALTIME_INODE(tip)) {
- error = XFS_ERROR(EINVAL);
- goto out_unlock;
- }
-
- error = -filemap_write_and_wait(VFS_I(tip)->i_mapping);
- if (error)
- goto out_unlock;
- truncate_pagecache_range(VFS_I(tip), 0, -1);
-
- /* Verify O_DIRECT for ftmp */
- if (VN_CACHED(VFS_I(tip)) != 0) {
- error = XFS_ERROR(EINVAL);
- goto out_unlock;
- }
-
- /* Verify all data are being swapped */
- if (sxp->sx_offset != 0 ||
- sxp->sx_length != ip->i_d.di_size ||
- sxp->sx_length != tip->i_d.di_size) {
- error = XFS_ERROR(EFAULT);
- goto out_unlock;
- }
-
- trace_xfs_swap_extent_before(ip, 0);
- trace_xfs_swap_extent_before(tip, 1);
-
- /* check inode formats now that data is flushed */
- error = xfs_swap_extents_check_format(ip, tip);
- if (error) {
- xfs_notice(mp,
- "%s: inode 0x%llx format is incompatible for exchanging.",
- __func__, ip->i_ino);
- goto out_unlock;
- }
-
- /*
- * Compare the current change & modify times with that
- * passed in. If they differ, we abort this swap.
- * This is the mechanism used to ensure the calling
- * process that the file was not changed out from
- * under it.
- */
- if ((sbp->bs_ctime.tv_sec != VFS_I(ip)->i_ctime.tv_sec) ||
- (sbp->bs_ctime.tv_nsec != VFS_I(ip)->i_ctime.tv_nsec) ||
- (sbp->bs_mtime.tv_sec != VFS_I(ip)->i_mtime.tv_sec) ||
- (sbp->bs_mtime.tv_nsec != VFS_I(ip)->i_mtime.tv_nsec)) {
- error = XFS_ERROR(EBUSY);
- goto out_unlock;
- }
-
- /* We need to fail if the file is memory mapped. Once we have tossed
- * all existing pages, the page fault will have no option
- * but to go to the filesystem for pages. By making the page fault call
- * vop_read (or write in the case of autogrow) they block on the iolock
- * until we have switched the extents.
- */
- if (VN_MAPPED(VFS_I(ip))) {
- error = XFS_ERROR(EBUSY);
- goto out_unlock;
- }
-
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- xfs_iunlock(tip, XFS_ILOCK_EXCL);
-
- /*
- * There is a race condition here since we gave up the
- * ilock. However, the data fork will not change since
- * we have the iolock (locked for truncation too) so we
- * are safe. We don't really care if non-io related
- * fields change.
- */
- truncate_pagecache_range(VFS_I(ip), 0, -1);
-
- tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT);
- if ((error = xfs_trans_reserve(tp, 0,
- XFS_ICHANGE_LOG_RES(mp), 0,
- 0, 0))) {
- xfs_iunlock(ip, XFS_IOLOCK_EXCL);
- xfs_iunlock(tip, XFS_IOLOCK_EXCL);
- xfs_trans_cancel(tp, 0);
- goto out;
- }
- xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL);
-
- /*
- * Count the number of extended attribute blocks
- */
- if ( ((XFS_IFORK_Q(ip) != 0) && (ip->i_d.di_anextents > 0)) &&
- (ip->i_d.di_aformat != XFS_DINODE_FMT_LOCAL)) {
- error = xfs_bmap_count_blocks(tp, ip, XFS_ATTR_FORK, &aforkblks);
- if (error)
- goto out_trans_cancel;
- }
- if ( ((XFS_IFORK_Q(tip) != 0) && (tip->i_d.di_anextents > 0)) &&
- (tip->i_d.di_aformat != XFS_DINODE_FMT_LOCAL)) {
- error = xfs_bmap_count_blocks(tp, tip, XFS_ATTR_FORK,
- &taforkblks);
- if (error)
- goto out_trans_cancel;
- }
-
- /*
- * Swap the data forks of the inodes
- */
- ifp = &ip->i_df;
- tifp = &tip->i_df;
- *tempifp = *ifp; /* struct copy */
- *ifp = *tifp; /* struct copy */
- *tifp = *tempifp; /* struct copy */
-
- /*
- * Fix the on-disk inode values
- */
- tmp = (__uint64_t)ip->i_d.di_nblocks;
- ip->i_d.di_nblocks = tip->i_d.di_nblocks - taforkblks + aforkblks;
- tip->i_d.di_nblocks = tmp + taforkblks - aforkblks;
-
- tmp = (__uint64_t) ip->i_d.di_nextents;
- ip->i_d.di_nextents = tip->i_d.di_nextents;
- tip->i_d.di_nextents = tmp;
-
- tmp = (__uint64_t) ip->i_d.di_format;
- ip->i_d.di_format = tip->i_d.di_format;
- tip->i_d.di_format = tmp;
-
- /*
- * The extents in the source inode could still contain speculative
- * preallocation beyond EOF (e.g. the file is open but not modified
- * while defrag is in progress). In that case, we need to copy over the
- * number of delalloc blocks the data fork in the source inode is
- * tracking beyond EOF so that when the fork is truncated away when the
- * temporary inode is unlinked we don't underrun the i_delayed_blks
- * counter on that inode.
- */
- ASSERT(tip->i_delayed_blks == 0);
- tip->i_delayed_blks = ip->i_delayed_blks;
- ip->i_delayed_blks = 0;
-
- src_log_flags = XFS_ILOG_CORE;
- switch (ip->i_d.di_format) {
- case XFS_DINODE_FMT_EXTENTS:
- /* If the extents fit in the inode, fix the
- * pointer. Otherwise it's already NULL or
- * pointing to the extent.
- */
- if (ip->i_d.di_nextents <= XFS_INLINE_EXTS) {
- ifp->if_u1.if_extents =
- ifp->if_u2.if_inline_ext;
- }
- src_log_flags |= XFS_ILOG_DEXT;
- break;
- case XFS_DINODE_FMT_BTREE:
- src_log_flags |= XFS_ILOG_DBROOT;
- break;
- }
-
- target_log_flags = XFS_ILOG_CORE;
- switch (tip->i_d.di_format) {
- case XFS_DINODE_FMT_EXTENTS:
- /* If the extents fit in the inode, fix the
- * pointer. Otherwise it's already NULL or
- * pointing to the extent.
- */
- if (tip->i_d.di_nextents <= XFS_INLINE_EXTS) {
- tifp->if_u1.if_extents =
- tifp->if_u2.if_inline_ext;
- }
- target_log_flags |= XFS_ILOG_DEXT;
- break;
- case XFS_DINODE_FMT_BTREE:
- target_log_flags |= XFS_ILOG_DBROOT;
- break;
- }
-
-
- xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
- xfs_trans_ijoin(tp, tip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
-
- xfs_trans_log_inode(tp, ip, src_log_flags);
- xfs_trans_log_inode(tp, tip, target_log_flags);
-
- /*
- * If this is a synchronous mount, make sure that the
- * transaction goes to disk before returning to the user.
- */
- if (mp->m_flags & XFS_MOUNT_WSYNC)
- xfs_trans_set_sync(tp);
-
- error = xfs_trans_commit(tp, 0);
-
- trace_xfs_swap_extent_after(ip, 0);
- trace_xfs_swap_extent_after(tip, 1);
-out:
- kmem_free(tempifp);
- return error;
-
-out_unlock:
- xfs_iunlock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
- xfs_iunlock(tip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
- goto out;
-
-out_trans_cancel:
- xfs_trans_cancel(tp, 0);
- goto out_unlock;
-}
diff --git a/fs/xfs/xfs_dfrag.h b/fs/xfs/xfs_dfrag.h
deleted file mode 100644
index 20bdd93..0000000
--- a/fs/xfs/xfs_dfrag.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/*
- * Copyright (c) 2000,2005 Silicon Graphics, Inc.
- * All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it would be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write the Free Software Foundation,
- * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-#ifndef __XFS_DFRAG_H__
-#define __XFS_DFRAG_H__
-
-/*
- * Structure passed to xfs_swapext
- */
-
-typedef struct xfs_swapext
-{
- __int64_t sx_version; /* version */
- __int64_t sx_fdtarget; /* fd of target file */
- __int64_t sx_fdtmp; /* fd of tmp file */
- xfs_off_t sx_offset; /* offset into file */
- xfs_off_t sx_length; /* leng from offset */
- char sx_pad[16]; /* pad space, unused */
- xfs_bstat_t sx_stat; /* stat of target b4 copy */
-} xfs_swapext_t;
-
-/*
- * Version flag
- */
-#define XFS_SX_VERSION 0
-
-#ifdef __KERNEL__
-/*
- * Prototypes for visible xfs_dfrag.c routines.
- */
-
-/*
- * Syscall interface for xfs_swapext
- */
-int xfs_swapext(struct xfs_swapext *sx);
-
-#endif /* __KERNEL__ */
-
-#endif /* __XFS_DFRAG_H__ */
diff --git a/fs/xfs/xfs_extent_ops.c b/fs/xfs/xfs_extent_ops.c
index 527af72..170b10b 100644
--- a/fs/xfs/xfs_extent_ops.c
+++ b/fs/xfs/xfs_extent_ops.c
@@ -1052,3 +1052,419 @@ xfs_getbmap(
return error;
}
+/*
+ * We need to check that the format of the data fork in the temporary inode is
+ * valid for the target inode before doing the swap. This is not a problem with
+ * attr1 because of the fixed fork offset, but attr2 has a dynamically sized
+ * data fork depending on the space the attribute fork is taking so we can get
+ * invalid formats on the target inode.
+ *
+ * E.g. target has space for 7 extents in extent format, temp inode only has
+ * space for 6. If we defragment down to 7 extents, then the tmp format is a
+ * btree, but when swapped it needs to be in extent format. Hence we can't just
+ * blindly swap data forks on attr2 filesystems.
+ *
+ * Note that we check the swap in both directions so that we don't end up with
+ * a corrupt temporary inode, either.
+ *
+ * Note that fixing the way xfs_fsr sets up the attribute fork in the source
+ * inode will prevent this situation from occurring, so all we do here is
+ * reject and log the attempt. basically we are putting the responsibility on
+ * userspace to get this right.
+ */
+static int
+xfs_swap_extents_check_format(
+ xfs_inode_t *ip, /* target inode */
+ xfs_inode_t *tip) /* tmp inode */
+{
+
+ /* Should never get a local format */
+ if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL ||
+ tip->i_d.di_format == XFS_DINODE_FMT_LOCAL)
+ return EINVAL;
+
+ /*
+ * if the target inode has less extents that then temporary inode then
+ * why did userspace call us?
+ */
+ if (ip->i_d.di_nextents < tip->i_d.di_nextents)
+ return EINVAL;
+
+ /*
+ * if the target inode is in extent form and the temp inode is in btree
+ * form then we will end up with the target inode in the wrong format
+ * as we already know there are less extents in the temp inode.
+ */
+ if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
+ tip->i_d.di_format == XFS_DINODE_FMT_BTREE)
+ return EINVAL;
+
+ /* Check temp in extent form to max in target */
+ if (tip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
+ XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) >
+ XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
+ return EINVAL;
+
+ /* Check target in extent form to max in temp */
+ if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
+ XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) >
+ XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
+ return EINVAL;
+
+ /*
+ * If we are in a btree format, check that the temp root block will fit
+ * in the target and that it has enough extents to be in btree format
+ * in the target.
+ *
+ * Note that we have to be careful to allow btree->extent conversions
+ * (a common defrag case) which will occur when the temp inode is in
+ * extent format...
+ */
+ if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+ if (XFS_IFORK_BOFF(ip) &&
+ tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip))
+ return EINVAL;
+ if (XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) <=
+ XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
+ return EINVAL;
+ }
+
+ /* Reciprocal target->temp btree format checks */
+ if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+ if (XFS_IFORK_BOFF(tip) &&
+ ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip))
+ return EINVAL;
+
+ if (XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) <=
+ XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
+ return EINVAL;
+ }
+
+ return 0;
+}
+
+static int
+xfs_swap_extents(
+ xfs_inode_t *ip, /* target inode */
+ xfs_inode_t *tip, /* tmp inode */
+ xfs_swapext_t *sxp)
+{
+ xfs_mount_t *mp = ip->i_mount;
+ xfs_trans_t *tp;
+ xfs_bstat_t *sbp = &sxp->sx_stat;
+ xfs_ifork_t *tempifp, *ifp, *tifp;
+ int src_log_flags, target_log_flags;
+ int error = 0;
+ int aforkblks = 0;
+ int taforkblks = 0;
+ __uint64_t tmp;
+
+ /*
+ * We have no way of updating owner information in the BMBT blocks for
+ * each inode on CRC enabled filesystems, so to avoid corrupting the
+ * this metadata we simply don't allow extent swaps to occur.
+ */
+ if (xfs_sb_version_hascrc(&mp->m_sb))
+ return XFS_ERROR(EINVAL);
+
+ tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
+ if (!tempifp) {
+ error = XFS_ERROR(ENOMEM);
+ goto out;
+ }
+
+ /*
+ * we have to do two separate lock calls here to keep lockdep
+ * happy. If we try to get all the locks in one call, lock will
+ * report false positives when we drop the ILOCK and regain them
+ * below.
+ */
+ xfs_lock_two_inodes(ip, tip, XFS_IOLOCK_EXCL);
+ xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL);
+
+ /* Verify that both files have the same format */
+ if ((ip->i_d.di_mode & S_IFMT) != (tip->i_d.di_mode & S_IFMT)) {
+ error = XFS_ERROR(EINVAL);
+ goto out_unlock;
+ }
+
+ /* Verify both files are either real-time or non-realtime */
+ if (XFS_IS_REALTIME_INODE(ip) != XFS_IS_REALTIME_INODE(tip)) {
+ error = XFS_ERROR(EINVAL);
+ goto out_unlock;
+ }
+
+ error = -filemap_write_and_wait(VFS_I(tip)->i_mapping);
+ if (error)
+ goto out_unlock;
+ truncate_pagecache_range(VFS_I(tip), 0, -1);
+
+ /* Verify O_DIRECT for ftmp */
+ if (VN_CACHED(VFS_I(tip)) != 0) {
+ error = XFS_ERROR(EINVAL);
+ goto out_unlock;
+ }
+
+ /* Verify all data are being swapped */
+ if (sxp->sx_offset != 0 ||
+ sxp->sx_length != ip->i_d.di_size ||
+ sxp->sx_length != tip->i_d.di_size) {
+ error = XFS_ERROR(EFAULT);
+ goto out_unlock;
+ }
+
+ trace_xfs_swap_extent_before(ip, 0);
+ trace_xfs_swap_extent_before(tip, 1);
+
+ /* check inode formats now that data is flushed */
+ error = xfs_swap_extents_check_format(ip, tip);
+ if (error) {
+ xfs_notice(mp,
+ "%s: inode 0x%llx format is incompatible for exchanging.",
+ __func__, ip->i_ino);
+ goto out_unlock;
+ }
+
+ /*
+ * Compare the current change & modify times with that
+ * passed in. If they differ, we abort this swap.
+ * This is the mechanism used to ensure the calling
+ * process that the file was not changed out from
+ * under it.
+ */
+ if ((sbp->bs_ctime.tv_sec != VFS_I(ip)->i_ctime.tv_sec) ||
+ (sbp->bs_ctime.tv_nsec != VFS_I(ip)->i_ctime.tv_nsec) ||
+ (sbp->bs_mtime.tv_sec != VFS_I(ip)->i_mtime.tv_sec) ||
+ (sbp->bs_mtime.tv_nsec != VFS_I(ip)->i_mtime.tv_nsec)) {
+ error = XFS_ERROR(EBUSY);
+ goto out_unlock;
+ }
+
+ /* We need to fail if the file is memory mapped. Once we have tossed
+ * all existing pages, the page fault will have no option
+ * but to go to the filesystem for pages. By making the page fault call
+ * vop_read (or write in the case of autogrow) they block on the iolock
+ * until we have switched the extents.
+ */
+ if (VN_MAPPED(VFS_I(ip))) {
+ error = XFS_ERROR(EBUSY);
+ goto out_unlock;
+ }
+
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_iunlock(tip, XFS_ILOCK_EXCL);
+
+ /*
+ * There is a race condition here since we gave up the
+ * ilock. However, the data fork will not change since
+ * we have the iolock (locked for truncation too) so we
+ * are safe. We don't really care if non-io related
+ * fields change.
+ */
+ truncate_pagecache_range(VFS_I(ip), 0, -1);
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT);
+ if ((error = xfs_trans_reserve(tp, 0,
+ XFS_ICHANGE_LOG_RES(mp), 0,
+ 0, 0))) {
+ xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+ xfs_iunlock(tip, XFS_IOLOCK_EXCL);
+ xfs_trans_cancel(tp, 0);
+ goto out;
+ }
+ xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL);
+
+ /*
+ * Count the number of extended attribute blocks
+ */
+ if ( ((XFS_IFORK_Q(ip) != 0) && (ip->i_d.di_anextents > 0)) &&
+ (ip->i_d.di_aformat != XFS_DINODE_FMT_LOCAL)) {
+ error = xfs_bmap_count_blocks(tp, ip, XFS_ATTR_FORK, &aforkblks);
+ if (error)
+ goto out_trans_cancel;
+ }
+ if ( ((XFS_IFORK_Q(tip) != 0) && (tip->i_d.di_anextents > 0)) &&
+ (tip->i_d.di_aformat != XFS_DINODE_FMT_LOCAL)) {
+ error = xfs_bmap_count_blocks(tp, tip, XFS_ATTR_FORK,
+ &taforkblks);
+ if (error)
+ goto out_trans_cancel;
+ }
+
+ /*
+ * Swap the data forks of the inodes
+ */
+ ifp = &ip->i_df;
+ tifp = &tip->i_df;
+ *tempifp = *ifp; /* struct copy */
+ *ifp = *tifp; /* struct copy */
+ *tifp = *tempifp; /* struct copy */
+
+ /*
+ * Fix the on-disk inode values
+ */
+ tmp = (__uint64_t)ip->i_d.di_nblocks;
+ ip->i_d.di_nblocks = tip->i_d.di_nblocks - taforkblks + aforkblks;
+ tip->i_d.di_nblocks = tmp + taforkblks - aforkblks;
+
+ tmp = (__uint64_t) ip->i_d.di_nextents;
+ ip->i_d.di_nextents = tip->i_d.di_nextents;
+ tip->i_d.di_nextents = tmp;
+
+ tmp = (__uint64_t) ip->i_d.di_format;
+ ip->i_d.di_format = tip->i_d.di_format;
+ tip->i_d.di_format = tmp;
+
+ /*
+ * The extents in the source inode could still contain speculative
+ * preallocation beyond EOF (e.g. the file is open but not modified
+ * while defrag is in progress). In that case, we need to copy over the
+ * number of delalloc blocks the data fork in the source inode is
+ * tracking beyond EOF so that when the fork is truncated away when the
+ * temporary inode is unlinked we don't underrun the i_delayed_blks
+ * counter on that inode.
+ */
+ ASSERT(tip->i_delayed_blks == 0);
+ tip->i_delayed_blks = ip->i_delayed_blks;
+ ip->i_delayed_blks = 0;
+
+ src_log_flags = XFS_ILOG_CORE;
+ switch (ip->i_d.di_format) {
+ case XFS_DINODE_FMT_EXTENTS:
+ /* If the extents fit in the inode, fix the
+ * pointer. Otherwise it's already NULL or
+ * pointing to the extent.
+ */
+ if (ip->i_d.di_nextents <= XFS_INLINE_EXTS) {
+ ifp->if_u1.if_extents =
+ ifp->if_u2.if_inline_ext;
+ }
+ src_log_flags |= XFS_ILOG_DEXT;
+ break;
+ case XFS_DINODE_FMT_BTREE:
+ src_log_flags |= XFS_ILOG_DBROOT;
+ break;
+ }
+
+ target_log_flags = XFS_ILOG_CORE;
+ switch (tip->i_d.di_format) {
+ case XFS_DINODE_FMT_EXTENTS:
+ /* If the extents fit in the inode, fix the
+ * pointer. Otherwise it's already NULL or
+ * pointing to the extent.
+ */
+ if (tip->i_d.di_nextents <= XFS_INLINE_EXTS) {
+ tifp->if_u1.if_extents =
+ tifp->if_u2.if_inline_ext;
+ }
+ target_log_flags |= XFS_ILOG_DEXT;
+ break;
+ case XFS_DINODE_FMT_BTREE:
+ target_log_flags |= XFS_ILOG_DBROOT;
+ break;
+ }
+
+
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
+ xfs_trans_ijoin(tp, tip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
+
+ xfs_trans_log_inode(tp, ip, src_log_flags);
+ xfs_trans_log_inode(tp, tip, target_log_flags);
+
+ /*
+ * If this is a synchronous mount, make sure that the
+ * transaction goes to disk before returning to the user.
+ */
+ if (mp->m_flags & XFS_MOUNT_WSYNC)
+ xfs_trans_set_sync(tp);
+
+ error = xfs_trans_commit(tp, 0);
+
+ trace_xfs_swap_extent_after(ip, 0);
+ trace_xfs_swap_extent_after(tip, 1);
+out:
+ kmem_free(tempifp);
+ return error;
+
+out_unlock:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
+ xfs_iunlock(tip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
+ goto out;
+
+out_trans_cancel:
+ xfs_trans_cancel(tp, 0);
+ goto out_unlock;
+}
+
+/*
+ * ioctl interface for swapext
+ */
+int
+xfs_swapext(
+ xfs_swapext_t *sxp)
+{
+ xfs_inode_t *ip, *tip;
+ struct fd f, tmp;
+ int error = 0;
+
+ /* Pull information for the target fd */
+ f = fdget((int)sxp->sx_fdtarget);
+ if (!f.file) {
+ error = XFS_ERROR(EINVAL);
+ goto out;
+ }
+
+ if (!(f.file->f_mode & FMODE_WRITE) ||
+ !(f.file->f_mode & FMODE_READ) ||
+ (f.file->f_flags & O_APPEND)) {
+ error = XFS_ERROR(EBADF);
+ goto out_put_file;
+ }
+
+ tmp = fdget((int)sxp->sx_fdtmp);
+ if (!tmp.file) {
+ error = XFS_ERROR(EINVAL);
+ goto out_put_file;
+ }
+
+ if (!(tmp.file->f_mode & FMODE_WRITE) ||
+ !(tmp.file->f_mode & FMODE_READ) ||
+ (tmp.file->f_flags & O_APPEND)) {
+ error = XFS_ERROR(EBADF);
+ goto out_put_tmp_file;
+ }
+
+ if (IS_SWAPFILE(file_inode(f.file)) ||
+ IS_SWAPFILE(file_inode(tmp.file))) {
+ error = XFS_ERROR(EINVAL);
+ goto out_put_tmp_file;
+ }
+
+ ip = XFS_I(file_inode(f.file));
+ tip = XFS_I(file_inode(tmp.file));
+
+ if (ip->i_mount != tip->i_mount) {
+ error = XFS_ERROR(EINVAL);
+ goto out_put_tmp_file;
+ }
+
+ if (ip->i_ino == tip->i_ino) {
+ error = XFS_ERROR(EINVAL);
+ goto out_put_tmp_file;
+ }
+
+ if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
+ error = XFS_ERROR(EIO);
+ goto out_put_tmp_file;
+ }
+
+ error = xfs_swap_extents(ip, tip, sxp);
+
+ out_put_tmp_file:
+ fdput(tmp);
+ out_put_file:
+ fdput(f);
+ out:
+ return error;
+}
+
diff --git a/fs/xfs/xfs_extent_ops.h b/fs/xfs/xfs_extent_ops.h
index 550a37d..e579ec8 100644
--- a/fs/xfs/xfs_extent_ops.h
+++ b/fs/xfs/xfs_extent_ops.h
@@ -27,4 +27,6 @@ typedef int (*xfs_bmap_format_t)(void **, struct getbmapx *, int *);
int xfs_getbmap(struct xfs_inode *ip, struct getbmapx *bmv,
xfs_bmap_format_t formatter, void *arg);
+int xfs_swapext(struct xfs_swapext *sx);
+
#endif /* _XFS_EXTENT_OPS_H */
diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
index 68c2e18..74b24b2 100644
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -461,6 +461,21 @@ typedef struct xfs_handle {
+ (handle).ha_fid.fid_len)
/*
+ * Structure passed to XFS_IOC_SWAPEXT
+ */
+typedef struct xfs_swapext
+{
+ __int64_t sx_version; /* version */
+#define XFS_SX_VERSION 0
+ __int64_t sx_fdtarget; /* fd of target file */
+ __int64_t sx_fdtmp; /* fd of tmp file */
+ xfs_off_t sx_offset; /* offset into file */
+ xfs_off_t sx_length; /* leng from offset */
+ char sx_pad[16]; /* pad space, unused */
+ xfs_bstat_t sx_stat; /* stat of target b4 copy */
+} xfs_swapext_t;
+
+/*
* Flags for going down operation
*/
#define XFS_FSOP_GOING_FLAGS_DEFAULT 0x0 /* going down */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 964a6dd..6ddab0e 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -33,7 +33,6 @@
#include "xfs_attr.h"
#include "xfs_bmap.h"
#include "xfs_buf_item.h"
-#include "xfs_dfrag.h"
#include "xfs_fsops.h"
#include "xfs_discard.h"
#include "xfs_quota.h"
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 55a3072..e82f890 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -33,7 +33,6 @@
#include "xfs_inode.h"
#include "xfs_itable.h"
#include "xfs_error.h"
-#include "xfs_dfrag.h"
#include "xfs_fsops.h"
#include "xfs_alloc.h"
#include "xfs_rtalloc.h"
@@ -41,6 +40,7 @@
#include "xfs_ioctl.h"
#include "xfs_ioctl32.h"
#include "xfs_trace.h"
+#include "xfs_extent_ops.h"
#define _NATIVE_IOC(cmd, type) \
_IOC(_IOC_DIR(cmd), _IOC_TYPE(cmd), _IOC_NR(cmd), sizeof(type))
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 37/60] xfs: split out inode log item format definition
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (35 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 36/60] xfs: move swap extent code to xfs_extent_ops Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 38/60] xfs: separate dquot on disk format definitions out of xfs_quota.h Dave Chinner
` (24 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The EFI/EFD item format definitions are shared with userspace. Split
the out of header files that contain kernel only defintions to make
it simple to shared them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_extfree_item.h | 90 ++-------------------------------
fs/xfs/xfs_extfree_item_format.h | 101 ++++++++++++++++++++++++++++++++++++++
2 files changed, 105 insertions(+), 86 deletions(-)
create mode 100644 fs/xfs/xfs_extfree_item_format.h
diff --git a/fs/xfs/xfs_extfree_item.h b/fs/xfs/xfs_extfree_item.h
index 4322224..0b49c87f 100644
--- a/fs/xfs/xfs_extfree_item.h
+++ b/fs/xfs/xfs_extfree_item.h
@@ -18,92 +18,12 @@
#ifndef __XFS_EXTFREE_ITEM_H__
#define __XFS_EXTFREE_ITEM_H__
-struct xfs_mount;
-struct kmem_zone;
-
-typedef struct xfs_extent {
- xfs_dfsbno_t ext_start;
- xfs_extlen_t ext_len;
-} xfs_extent_t;
-
-/*
- * Since an xfs_extent_t has types (start:64, len: 32)
- * there are different alignments on 32 bit and 64 bit kernels.
- * So we provide the different variants for use by a
- * conversion routine.
- */
-
-typedef struct xfs_extent_32 {
- __uint64_t ext_start;
- __uint32_t ext_len;
-} __attribute__((packed)) xfs_extent_32_t;
-
-typedef struct xfs_extent_64 {
- __uint64_t ext_start;
- __uint32_t ext_len;
- __uint32_t ext_pad;
-} xfs_extent_64_t;
-
-/*
- * This is the structure used to lay out an efi log item in the
- * log. The efi_extents field is a variable size array whose
- * size is given by efi_nextents.
- */
-typedef struct xfs_efi_log_format {
- __uint16_t efi_type; /* efi log item type */
- __uint16_t efi_size; /* size of this item */
- __uint32_t efi_nextents; /* # extents to free */
- __uint64_t efi_id; /* efi identifier */
- xfs_extent_t efi_extents[1]; /* array of extents to free */
-} xfs_efi_log_format_t;
+#include "xfs_extfree_item_format.h"
-typedef struct xfs_efi_log_format_32 {
- __uint16_t efi_type; /* efi log item type */
- __uint16_t efi_size; /* size of this item */
- __uint32_t efi_nextents; /* # extents to free */
- __uint64_t efi_id; /* efi identifier */
- xfs_extent_32_t efi_extents[1]; /* array of extents to free */
-} __attribute__((packed)) xfs_efi_log_format_32_t;
+/* kernel only EFI/EFD definitions */
-typedef struct xfs_efi_log_format_64 {
- __uint16_t efi_type; /* efi log item type */
- __uint16_t efi_size; /* size of this item */
- __uint32_t efi_nextents; /* # extents to free */
- __uint64_t efi_id; /* efi identifier */
- xfs_extent_64_t efi_extents[1]; /* array of extents to free */
-} xfs_efi_log_format_64_t;
-
-/*
- * This is the structure used to lay out an efd log item in the
- * log. The efd_extents array is a variable size array whose
- * size is given by efd_nextents;
- */
-typedef struct xfs_efd_log_format {
- __uint16_t efd_type; /* efd log item type */
- __uint16_t efd_size; /* size of this item */
- __uint32_t efd_nextents; /* # of extents freed */
- __uint64_t efd_efi_id; /* id of corresponding efi */
- xfs_extent_t efd_extents[1]; /* array of extents freed */
-} xfs_efd_log_format_t;
-
-typedef struct xfs_efd_log_format_32 {
- __uint16_t efd_type; /* efd log item type */
- __uint16_t efd_size; /* size of this item */
- __uint32_t efd_nextents; /* # of extents freed */
- __uint64_t efd_efi_id; /* id of corresponding efi */
- xfs_extent_32_t efd_extents[1]; /* array of extents freed */
-} __attribute__((packed)) xfs_efd_log_format_32_t;
-
-typedef struct xfs_efd_log_format_64 {
- __uint16_t efd_type; /* efd log item type */
- __uint16_t efd_size; /* size of this item */
- __uint32_t efd_nextents; /* # of extents freed */
- __uint64_t efd_efi_id; /* id of corresponding efi */
- xfs_extent_64_t efd_extents[1]; /* array of extents freed */
-} xfs_efd_log_format_64_t;
-
-
-#ifdef __KERNEL__
+struct xfs_mount;
+struct kmem_zone;
/*
* Max number of extents in fast allocation path.
@@ -160,6 +80,4 @@ int xfs_efi_copy_format(xfs_log_iovec_t *buf,
xfs_efi_log_format_t *dst_efi_fmt);
void xfs_efi_item_free(xfs_efi_log_item_t *);
-#endif /* __KERNEL__ */
-
#endif /* __XFS_EXTFREE_ITEM_H__ */
diff --git a/fs/xfs/xfs_extfree_item_format.h b/fs/xfs/xfs_extfree_item_format.h
new file mode 100644
index 0000000..ddd1fbb
--- /dev/null
+++ b/fs/xfs/xfs_extfree_item_format.h
@@ -0,0 +1,101 @@
+/*
+ * Copyright (c) 2000,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_EXTFREE_ITEM_FORMAT_H__
+#define __XFS_EXTFREE_ITEM_FORMAT_H__
+
+typedef struct xfs_extent {
+ xfs_dfsbno_t ext_start;
+ xfs_extlen_t ext_len;
+} xfs_extent_t;
+
+/*
+ * Since an xfs_extent_t has types (start:64, len: 32)
+ * there are different alignments on 32 bit and 64 bit kernels.
+ * So we provide the different variants for use by a
+ * conversion routine.
+ */
+typedef struct xfs_extent_32 {
+ __uint64_t ext_start;
+ __uint32_t ext_len;
+} __attribute__((packed)) xfs_extent_32_t;
+
+typedef struct xfs_extent_64 {
+ __uint64_t ext_start;
+ __uint32_t ext_len;
+ __uint32_t ext_pad;
+} xfs_extent_64_t;
+
+/*
+ * This is the structure used to lay out an efi log item in the
+ * log. The efi_extents field is a variable size array whose
+ * size is given by efi_nextents.
+ */
+typedef struct xfs_efi_log_format {
+ __uint16_t efi_type; /* efi log item type */
+ __uint16_t efi_size; /* size of this item */
+ __uint32_t efi_nextents; /* # extents to free */
+ __uint64_t efi_id; /* efi identifier */
+ xfs_extent_t efi_extents[1]; /* array of extents to free */
+} xfs_efi_log_format_t;
+
+typedef struct xfs_efi_log_format_32 {
+ __uint16_t efi_type; /* efi log item type */
+ __uint16_t efi_size; /* size of this item */
+ __uint32_t efi_nextents; /* # extents to free */
+ __uint64_t efi_id; /* efi identifier */
+ xfs_extent_32_t efi_extents[1]; /* array of extents to free */
+} __attribute__((packed)) xfs_efi_log_format_32_t;
+
+typedef struct xfs_efi_log_format_64 {
+ __uint16_t efi_type; /* efi log item type */
+ __uint16_t efi_size; /* size of this item */
+ __uint32_t efi_nextents; /* # extents to free */
+ __uint64_t efi_id; /* efi identifier */
+ xfs_extent_64_t efi_extents[1]; /* array of extents to free */
+} xfs_efi_log_format_64_t;
+
+/*
+ * This is the structure used to lay out an efd log item in the
+ * log. The efd_extents array is a variable size array whose
+ * size is given by efd_nextents;
+ */
+typedef struct xfs_efd_log_format {
+ __uint16_t efd_type; /* efd log item type */
+ __uint16_t efd_size; /* size of this item */
+ __uint32_t efd_nextents; /* # of extents freed */
+ __uint64_t efd_efi_id; /* id of corresponding efi */
+ xfs_extent_t efd_extents[1]; /* array of extents freed */
+} xfs_efd_log_format_t;
+
+typedef struct xfs_efd_log_format_32 {
+ __uint16_t efd_type; /* efd log item type */
+ __uint16_t efd_size; /* size of this item */
+ __uint32_t efd_nextents; /* # of extents freed */
+ __uint64_t efd_efi_id; /* id of corresponding efi */
+ xfs_extent_32_t efd_extents[1]; /* array of extents freed */
+} __attribute__((packed)) xfs_efd_log_format_32_t;
+
+typedef struct xfs_efd_log_format_64 {
+ __uint16_t efd_type; /* efd log item type */
+ __uint16_t efd_size; /* size of this item */
+ __uint32_t efd_nextents; /* # of extents freed */
+ __uint64_t efd_efi_id; /* id of corresponding efi */
+ xfs_extent_64_t efd_extents[1]; /* array of extents freed */
+} xfs_efd_log_format_64_t;
+
+#endif /* __XFS_EXTFREE_ITEM_FORMAT_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 38/60] xfs: separate dquot on disk format definitions out of xfs_quota.h
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (36 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 37/60] xfs: split out inode log item format definition Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 39/60] xfs: separate icreate log format definitions from xfs_icreate_item.h Dave Chinner
` (23 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The on disk format definitions of the on-disk dquot, log formats and
quota off log formats are all intertwined with other definitions for
quotas. Separate them out into their own header file so they can
easily be shared with userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_dquot_format.h | 132 +++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_quota.h | 114 +--------------------------------------
2 files changed, 134 insertions(+), 112 deletions(-)
create mode 100644 fs/xfs/xfs_dquot_format.h
diff --git a/fs/xfs/xfs_dquot_format.h b/fs/xfs/xfs_dquot_format.h
new file mode 100644
index 0000000..f34615b
--- /dev/null
+++ b/fs/xfs/xfs_dquot_format.h
@@ -0,0 +1,132 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_DQUOT_FORMAT_H__
+#define __XFS_DQUOT_FORMAT_H__
+
+/*
+ * The ondisk form of a dquot structure.
+ */
+#define XFS_DQUOT_MAGIC 0x4451 /* 'DQ' */
+#define XFS_DQUOT_VERSION (u_int8_t)0x01 /* latest version number */
+
+/*
+ * uid_t and gid_t are hard-coded to 32 bits in the inode.
+ * Hence, an 'id' in a dquot is 32 bits..
+ */
+typedef __uint32_t xfs_dqid_t;
+
+/*
+ * This is the main portion of the on-disk representation of quota
+ * information for a user. This is the q_core of the xfs_dquot_t that
+ * is kept in kernel memory. We pad this with some more expansion room
+ * to construct the on disk structure.
+ */
+typedef struct xfs_disk_dquot {
+ __be16 d_magic; /* dquot magic = XFS_DQUOT_MAGIC */
+ __u8 d_version; /* dquot version */
+ __u8 d_flags; /* XFS_DQ_USER/PROJ/GROUP */
+ __be32 d_id; /* user,project,group id */
+ __be64 d_blk_hardlimit;/* absolute limit on disk blks */
+ __be64 d_blk_softlimit;/* preferred limit on disk blks */
+ __be64 d_ino_hardlimit;/* maximum # allocated inodes */
+ __be64 d_ino_softlimit;/* preferred inode limit */
+ __be64 d_bcount; /* disk blocks owned by the user */
+ __be64 d_icount; /* inodes owned by the user */
+ __be32 d_itimer; /* zero if within inode limits if not,
+ this is when we refuse service */
+ __be32 d_btimer; /* similar to above; for disk blocks */
+ __be16 d_iwarns; /* warnings issued wrt num inodes */
+ __be16 d_bwarns; /* warnings issued wrt disk blocks */
+ __be32 d_pad0; /* 64 bit align */
+ __be64 d_rtb_hardlimit;/* absolute limit on realtime blks */
+ __be64 d_rtb_softlimit;/* preferred limit on RT disk blks */
+ __be64 d_rtbcount; /* realtime blocks owned */
+ __be32 d_rtbtimer; /* similar to above; for RT disk blocks */
+ __be16 d_rtbwarns; /* warnings issued wrt RT disk blocks */
+ __be16 d_pad;
+} xfs_disk_dquot_t;
+
+/*
+ * This is what goes on disk. This is separated from the xfs_disk_dquot because
+ * carrying the unnecessary padding would be a waste of memory.
+ */
+typedef struct xfs_dqblk {
+ xfs_disk_dquot_t dd_diskdq; /* portion that lives incore as well */
+ char dd_fill[4]; /* filling for posterity */
+
+ /*
+ * These two are only present on filesystems with the CRC bits set.
+ */
+ __be32 dd_crc; /* checksum */
+ __be64 dd_lsn; /* last modification in log */
+ uuid_t dd_uuid; /* location information */
+} xfs_dqblk_t;
+
+#define XFS_DQUOT_CRC_OFF offsetof(struct xfs_dqblk, dd_crc)
+
+/*
+ * These are the structures used to lay out dquots and quotaoff
+ * records on the log. Quite similar to those of inodes.
+ */
+
+/*
+ * log format struct for dquots.
+ * The first two fields must be the type and size fitting into
+ * 32 bits : log_recovery code assumes that.
+ */
+typedef struct xfs_dq_logformat {
+ __uint16_t qlf_type; /* dquot log item type */
+ __uint16_t qlf_size; /* size of this item */
+ xfs_dqid_t qlf_id; /* usr/grp/proj id : 32 bits */
+ __int64_t qlf_blkno; /* blkno of dquot buffer */
+ __int32_t qlf_len; /* len of dquot buffer */
+ __uint32_t qlf_boffset; /* off of dquot in buffer */
+} xfs_dq_logformat_t;
+
+/*
+ * log format struct for QUOTAOFF records.
+ * The first two fields must be the type and size fitting into
+ * 32 bits : log_recovery code assumes that.
+ * We write two LI_QUOTAOFF logitems per quotaoff, the last one keeps a pointer
+ * to the first and ensures that the first logitem is taken out of the AIL
+ * only when the last one is securely committed.
+ */
+typedef struct xfs_qoff_logformat {
+ unsigned short qf_type; /* quotaoff log item type */
+ unsigned short qf_size; /* size of this item */
+ unsigned int qf_flags; /* USR and/or GRP */
+ char qf_pad[12]; /* padding for future */
+} xfs_qoff_logformat_t;
+
+/*
+ * Disk quotas status in m_qflags, sb_qflags and xfs_qoff_logformat. 16 bits.
+ */
+#define XFS_UQUOTA_ACCT 0x0001 /* user quota accounting ON */
+#define XFS_UQUOTA_ENFD 0x0002 /* user quota limits enforced */
+#define XFS_UQUOTA_CHKD 0x0004 /* quotacheck run on usr quotas */
+#define XFS_PQUOTA_ACCT 0x0008 /* project quota accounting ON */
+#define XFS_OQUOTA_ENFD 0x0010 /* other (grp/prj) quota limits enforced */
+#define XFS_OQUOTA_CHKD 0x0020 /* quotacheck run on other (grp/prj) quotas */
+#define XFS_GQUOTA_ACCT 0x0040 /* group quota accounting ON */
+
+#define XFS_ALL_QUOTA_ACCT \
+ (XFS_UQUOTA_ACCT | XFS_GQUOTA_ACCT | XFS_PQUOTA_ACCT)
+#define XFS_ALL_QUOTA_ENFD (XFS_UQUOTA_ENFD | XFS_OQUOTA_ENFD)
+#define XFS_ALL_QUOTA_CHKD (XFS_UQUOTA_CHKD | XFS_OQUOTA_CHKD)
+
+#endif /* __XFS_DQUOT_FORMAT_H__ */
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index c38068f..634132d 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -18,19 +18,9 @@
#ifndef __XFS_QUOTA_H__
#define __XFS_QUOTA_H__
-struct xfs_trans;
-
-/*
- * The ondisk form of a dquot structure.
- */
-#define XFS_DQUOT_MAGIC 0x4451 /* 'DQ' */
-#define XFS_DQUOT_VERSION (u_int8_t)0x01 /* latest version number */
+#include "xfs_dquot_format.h"
-/*
- * uid_t and gid_t are hard-coded to 32 bits in the inode.
- * Hence, an 'id' in a dquot is 32 bits..
- */
-typedef __uint32_t xfs_dqid_t;
+struct xfs_trans;
/*
* Even though users may not have quota limits occupying all 64-bits,
@@ -41,55 +31,6 @@ typedef __uint64_t xfs_qcnt_t;
typedef __uint16_t xfs_qwarncnt_t;
/*
- * This is the main portion of the on-disk representation of quota
- * information for a user. This is the q_core of the xfs_dquot_t that
- * is kept in kernel memory. We pad this with some more expansion room
- * to construct the on disk structure.
- */
-typedef struct xfs_disk_dquot {
- __be16 d_magic; /* dquot magic = XFS_DQUOT_MAGIC */
- __u8 d_version; /* dquot version */
- __u8 d_flags; /* XFS_DQ_USER/PROJ/GROUP */
- __be32 d_id; /* user,project,group id */
- __be64 d_blk_hardlimit;/* absolute limit on disk blks */
- __be64 d_blk_softlimit;/* preferred limit on disk blks */
- __be64 d_ino_hardlimit;/* maximum # allocated inodes */
- __be64 d_ino_softlimit;/* preferred inode limit */
- __be64 d_bcount; /* disk blocks owned by the user */
- __be64 d_icount; /* inodes owned by the user */
- __be32 d_itimer; /* zero if within inode limits if not,
- this is when we refuse service */
- __be32 d_btimer; /* similar to above; for disk blocks */
- __be16 d_iwarns; /* warnings issued wrt num inodes */
- __be16 d_bwarns; /* warnings issued wrt disk blocks */
- __be32 d_pad0; /* 64 bit align */
- __be64 d_rtb_hardlimit;/* absolute limit on realtime blks */
- __be64 d_rtb_softlimit;/* preferred limit on RT disk blks */
- __be64 d_rtbcount; /* realtime blocks owned */
- __be32 d_rtbtimer; /* similar to above; for RT disk blocks */
- __be16 d_rtbwarns; /* warnings issued wrt RT disk blocks */
- __be16 d_pad;
-} xfs_disk_dquot_t;
-
-/*
- * This is what goes on disk. This is separated from the xfs_disk_dquot because
- * carrying the unnecessary padding would be a waste of memory.
- */
-typedef struct xfs_dqblk {
- xfs_disk_dquot_t dd_diskdq; /* portion that lives incore as well */
- char dd_fill[4]; /* filling for posterity */
-
- /*
- * These two are only present on filesystems with the CRC bits set.
- */
- __be32 dd_crc; /* checksum */
- __be64 dd_lsn; /* last modification in log */
- uuid_t dd_uuid; /* location information */
-} xfs_dqblk_t;
-
-#define XFS_DQUOT_CRC_OFF offsetof(struct xfs_dqblk, dd_crc)
-
-/*
* flags for q_flags field in the dquot.
*/
#define XFS_DQ_USER 0x0001 /* a user quota */
@@ -113,60 +54,9 @@ typedef struct xfs_dqblk {
*/
#define XFS_DQUOT_LOGRES(mp) (sizeof(xfs_disk_dquot_t) * 3)
-
-/*
- * These are the structures used to lay out dquots and quotaoff
- * records on the log. Quite similar to those of inodes.
- */
-
-/*
- * log format struct for dquots.
- * The first two fields must be the type and size fitting into
- * 32 bits : log_recovery code assumes that.
- */
-typedef struct xfs_dq_logformat {
- __uint16_t qlf_type; /* dquot log item type */
- __uint16_t qlf_size; /* size of this item */
- xfs_dqid_t qlf_id; /* usr/grp/proj id : 32 bits */
- __int64_t qlf_blkno; /* blkno of dquot buffer */
- __int32_t qlf_len; /* len of dquot buffer */
- __uint32_t qlf_boffset; /* off of dquot in buffer */
-} xfs_dq_logformat_t;
-
-/*
- * log format struct for QUOTAOFF records.
- * The first two fields must be the type and size fitting into
- * 32 bits : log_recovery code assumes that.
- * We write two LI_QUOTAOFF logitems per quotaoff, the last one keeps a pointer
- * to the first and ensures that the first logitem is taken out of the AIL
- * only when the last one is securely committed.
- */
-typedef struct xfs_qoff_logformat {
- unsigned short qf_type; /* quotaoff log item type */
- unsigned short qf_size; /* size of this item */
- unsigned int qf_flags; /* USR and/or GRP */
- char qf_pad[12]; /* padding for future */
-} xfs_qoff_logformat_t;
-
-
-/*
- * Disk quotas status in m_qflags, and also sb_qflags. 16 bits.
- */
-#define XFS_UQUOTA_ACCT 0x0001 /* user quota accounting ON */
-#define XFS_UQUOTA_ENFD 0x0002 /* user quota limits enforced */
-#define XFS_UQUOTA_CHKD 0x0004 /* quotacheck run on usr quotas */
-#define XFS_PQUOTA_ACCT 0x0008 /* project quota accounting ON */
-#define XFS_OQUOTA_ENFD 0x0010 /* other (grp/prj) quota limits enforced */
-#define XFS_OQUOTA_CHKD 0x0020 /* quotacheck run on other (grp/prj) quotas */
-#define XFS_GQUOTA_ACCT 0x0040 /* group quota accounting ON */
-
/*
* Quota Accounting/Enforcement flags
*/
-#define XFS_ALL_QUOTA_ACCT \
- (XFS_UQUOTA_ACCT | XFS_GQUOTA_ACCT | XFS_PQUOTA_ACCT)
-#define XFS_ALL_QUOTA_ENFD (XFS_UQUOTA_ENFD | XFS_OQUOTA_ENFD)
-#define XFS_ALL_QUOTA_CHKD (XFS_UQUOTA_CHKD | XFS_OQUOTA_CHKD)
#define XFS_IS_QUOTA_RUNNING(mp) ((mp)->m_qflags & XFS_ALL_QUOTA_ACCT)
#define XFS_IS_UQUOTA_RUNNING(mp) ((mp)->m_qflags & XFS_UQUOTA_ACCT)
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 39/60] xfs: separate icreate log format definitions from xfs_icreate_item.h
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (37 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 38/60] xfs: separate dquot on disk format definitions out of xfs_quota.h Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 40/60] xfs: don't special case shared superblock mounts Dave Chinner
` (22 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The on disk log format definitions for the icreate log item are
intertwined with the kernel-only in-memory log item definitions.
Separate the log format definitions out into their own header file
so they can easily be shared with userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_icreate_item.h | 22 +--------------------
fs/xfs/xfs_icreate_item_format.h | 39 ++++++++++++++++++++++++++++++++++++++
2 files changed, 40 insertions(+), 21 deletions(-)
create mode 100644 fs/xfs/xfs_icreate_item_format.h
diff --git a/fs/xfs/xfs_icreate_item.h b/fs/xfs/xfs_icreate_item.h
index 79df981..e796608 100644
--- a/fs/xfs/xfs_icreate_item.h
+++ b/fs/xfs/xfs_icreate_item.h
@@ -18,25 +18,7 @@
#ifndef XFS_ICREATE_ITEM_H
#define XFS_ICREATE_ITEM_H 1
-/*
- * on disk log item structure
- *
- * Log recovery assumes the first two entries are the type and size and they fit
- * in 32 bits. Also in host order (ugh) so they have to be 32 bit aligned so
- * decoding can be done correctly.
- */
-struct xfs_icreate_log {
- __uint16_t icl_type; /* type of log format structure */
- __uint16_t icl_size; /* size of log format structure */
- __be32 icl_ag; /* ag being allocated in */
- __be32 icl_agbno; /* start block of inode range */
- __be32 icl_count; /* number of inodes to initialise */
- __be32 icl_isize; /* size of inodes */
- __be32 icl_length; /* length of extent to initialise */
- __be32 icl_gen; /* inode generation number to use */
-};
-
-#ifdef __KERNEL__
+#include "xfs_icreate_item_format.h"
/* in memory log item structure */
struct xfs_icreate_item {
@@ -51,6 +33,4 @@ void xfs_icreate_log(struct xfs_trans *tp, xfs_agnumber_t agno,
unsigned int inode_size, xfs_agblock_t length,
unsigned int generation);
-#endif /* __KERNEL__ */
-
#endif /* XFS_ICREATE_ITEM_H */
diff --git a/fs/xfs/xfs_icreate_item_format.h b/fs/xfs/xfs_icreate_item_format.h
new file mode 100644
index 0000000..0299652
--- /dev/null
+++ b/fs/xfs/xfs_icreate_item_format.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (c) 2008-2010, Dave Chinner
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_ICREATE_ITEM_FORMAT_H__
+#define __XFS_ICREATE_ITEM_FORMAT_H__ 1
+
+/*
+ * on disk log item structure
+ *
+ * Log recovery assumes the first two entries are the type and size and they fit
+ * in 32 bits. Also in host order (ugh) so they have to be 32 bit aligned so
+ * decoding can be done correctly.
+ */
+struct xfs_icreate_log {
+ __uint16_t icl_type; /* type of log format structure */
+ __uint16_t icl_size; /* size of log format structure */
+ __be32 icl_ag; /* ag being allocated in */
+ __be32 icl_agbno; /* start block of inode range */
+ __be32 icl_count; /* number of inodes to initialise */
+ __be32 icl_isize; /* size of inodes */
+ __be32 icl_length; /* length of extent to initialise */
+ __be32 icl_gen; /* inode generation number to use */
+};
+
+#endif /* __XFS_ICREATE_ITEM_FORMAT_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 40/60] xfs: don't special case shared superblock mounts
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (38 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 39/60] xfs: separate icreate log format definitions from xfs_icreate_item.h Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 41/60] xfs: kill __KERNEL__ check for debug code in allocation code Dave Chinner
` (21 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Neither kernel or userspace support shared read-only mounts, so
don't beother special casing the support check to be different
between kernel and userspace. The same check canbe used as neither
like it...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_sb.h | 7 -------
1 file changed, 7 deletions(-)
diff --git a/fs/xfs/xfs_sb.h b/fs/xfs/xfs_sb.h
index 8f78df1..677a480 100644
--- a/fs/xfs/xfs_sb.h
+++ b/fs/xfs/xfs_sb.h
@@ -355,15 +355,8 @@ static inline int xfs_sb_good_version(xfs_sb_t *sbp)
(sbp->sb_features2 & ~XFS_SB_VERSION2_OKREALBITS)))
return 0;
-#ifdef __KERNEL__
if (sbp->sb_shared_vn > XFS_SB_MAX_SHARED_VN)
return 0;
-#else
- if ((sbp->sb_versionnum & XFS_SB_VERSION_SHAREDBIT) &&
- sbp->sb_shared_vn > XFS_SB_MAX_SHARED_VN)
- return 0;
-#endif
-
return 1;
}
if (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5)
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 41/60] xfs: kill __KERNEL__ check for debug code in allocation code
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (39 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 40/60] xfs: don't special case shared superblock mounts Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 42/60] xfs: split out on-disk transaction definitions Dave Chinner
` (20 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Userspace running debug builds is relatively rare, so there's need
to special case the allocation algorithm code coverage debug switch.
As it is, userspace defines random numbers to 0, so invert the
logic of the switch so it is effectively a no-op in userspace.
This kills another couple of __KERNEL__ users.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_alloc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
index 71596e5..5a1393f 100644
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -878,7 +878,7 @@ xfs_alloc_ag_vextent_near(
xfs_agblock_t ltnew; /* useful start bno of left side */
xfs_extlen_t rlen; /* length of returned extent */
int forced = 0;
-#if defined(DEBUG) && defined(__KERNEL__)
+#ifdef DEBUG
/*
* Randomly don't execute the first algorithm.
*/
@@ -938,8 +938,8 @@ restart:
xfs_extlen_t blen=0;
xfs_agblock_t bnew=0;
-#if defined(DEBUG) && defined(__KERNEL__)
- if (!dofirst)
+#ifdef DEBUG
+ if (dofirst)
break;
#endif
/*
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 42/60] xfs: split out on-disk transaction definitions
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (40 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 41/60] xfs: kill __KERNEL__ check for debug code in allocation code Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 43/60] xfs: remove __KERNEL__ from debug code Dave Chinner
` (19 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
There's a bunch of definitions in xfs_trans.h that define on-disk
formats - transaction headers taht get written into the log, log
item type definitions, etc. Split out everything into a separate
file so that all which remains in xfs_trans.h are kernel only
definitions.
Also, remove the duplicate magic number definitions for
XFS_TRANS_HEADER_MAGIC...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_trans.c | 4 +-
fs/xfs/xfs_trans.h | 217 +---------------------------------------
fs/xfs/xfs_trans_format.h | 240 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 243 insertions(+), 218 deletions(-)
create mode 100644 fs/xfs/xfs_trans_format.h
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index ed24650..0228f74 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -92,7 +92,7 @@ _xfs_trans_alloc(
atomic_inc(&mp->m_active_trans);
tp = kmem_zone_zalloc(xfs_trans_zone, memflags);
- tp->t_magic = XFS_TRANS_MAGIC;
+ tp->t_magic = XFS_TRANS_HEADER_MAGIC;
tp->t_type = type;
tp->t_mountp = mp;
INIT_LIST_HEAD(&tp->t_items);
@@ -137,7 +137,7 @@ xfs_trans_dup(
/*
* Initialize the new transaction structure.
*/
- ntp->t_magic = XFS_TRANS_MAGIC;
+ ntp->t_magic = XFS_TRANS_HEADER_MAGIC;
ntp->t_type = tp->t_type;
ntp->t_mountp = tp->t_mountp;
INIT_LIST_HEAD(&ntp->t_items);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 4654b58..21c3e07 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -21,220 +21,7 @@
struct xfs_log_item;
#include "xfs_trans_resv.h"
-
-/*
- * This is the structure written in the log at the head of
- * every transaction. It identifies the type and id of the
- * transaction, and contains the number of items logged by
- * the transaction so we know how many to expect during recovery.
- *
- * Do not change the below structure without redoing the code in
- * xlog_recover_add_to_trans() and xlog_recover_add_to_cont_trans().
- */
-typedef struct xfs_trans_header {
- uint th_magic; /* magic number */
- uint th_type; /* transaction type */
- __int32_t th_tid; /* transaction id (unused) */
- uint th_num_items; /* num items logged by trans */
-} xfs_trans_header_t;
-
-#define XFS_TRANS_HEADER_MAGIC 0x5452414e /* TRAN */
-
-/*
- * Log item types.
- */
-#define XFS_LI_EFI 0x1236
-#define XFS_LI_EFD 0x1237
-#define XFS_LI_IUNLINK 0x1238
-#define XFS_LI_INODE 0x123b /* aligned ino chunks, var-size ibufs */
-#define XFS_LI_BUF 0x123c /* v2 bufs, variable sized inode bufs */
-#define XFS_LI_DQUOT 0x123d
-#define XFS_LI_QUOTAOFF 0x123e
-#define XFS_LI_ICREATE 0x123f
-
-#define XFS_LI_TYPE_DESC \
- { XFS_LI_EFI, "XFS_LI_EFI" }, \
- { XFS_LI_EFD, "XFS_LI_EFD" }, \
- { XFS_LI_IUNLINK, "XFS_LI_IUNLINK" }, \
- { XFS_LI_INODE, "XFS_LI_INODE" }, \
- { XFS_LI_BUF, "XFS_LI_BUF" }, \
- { XFS_LI_DQUOT, "XFS_LI_DQUOT" }, \
- { XFS_LI_QUOTAOFF, "XFS_LI_QUOTAOFF" }, \
- { XFS_LI_ICREATE, "XFS_LI_ICREATE" }
-
-/*
- * Transaction types. Used to distinguish types of buffers.
- */
-#define XFS_TRANS_SETATTR_NOT_SIZE 1
-#define XFS_TRANS_SETATTR_SIZE 2
-#define XFS_TRANS_INACTIVE 3
-#define XFS_TRANS_CREATE 4
-#define XFS_TRANS_CREATE_TRUNC 5
-#define XFS_TRANS_TRUNCATE_FILE 6
-#define XFS_TRANS_REMOVE 7
-#define XFS_TRANS_LINK 8
-#define XFS_TRANS_RENAME 9
-#define XFS_TRANS_MKDIR 10
-#define XFS_TRANS_RMDIR 11
-#define XFS_TRANS_SYMLINK 12
-#define XFS_TRANS_SET_DMATTRS 13
-#define XFS_TRANS_GROWFS 14
-#define XFS_TRANS_STRAT_WRITE 15
-#define XFS_TRANS_DIOSTRAT 16
-/* 17 was XFS_TRANS_WRITE_SYNC */
-#define XFS_TRANS_WRITEID 18
-#define XFS_TRANS_ADDAFORK 19
-#define XFS_TRANS_ATTRINVAL 20
-#define XFS_TRANS_ATRUNCATE 21
-#define XFS_TRANS_ATTR_SET 22
-#define XFS_TRANS_ATTR_RM 23
-#define XFS_TRANS_ATTR_FLAG 24
-#define XFS_TRANS_CLEAR_AGI_BUCKET 25
-#define XFS_TRANS_QM_SBCHANGE 26
-/*
- * Dummy entries since we use the transaction type to index into the
- * trans_type[] in xlog_recover_print_trans_head()
- */
-#define XFS_TRANS_DUMMY1 27
-#define XFS_TRANS_DUMMY2 28
-#define XFS_TRANS_QM_QUOTAOFF 29
-#define XFS_TRANS_QM_DQALLOC 30
-#define XFS_TRANS_QM_SETQLIM 31
-#define XFS_TRANS_QM_DQCLUSTER 32
-#define XFS_TRANS_QM_QINOCREATE 33
-#define XFS_TRANS_QM_QUOTAOFF_END 34
-#define XFS_TRANS_SB_UNIT 35
-#define XFS_TRANS_FSYNC_TS 36
-#define XFS_TRANS_GROWFSRT_ALLOC 37
-#define XFS_TRANS_GROWFSRT_ZERO 38
-#define XFS_TRANS_GROWFSRT_FREE 39
-#define XFS_TRANS_SWAPEXT 40
-#define XFS_TRANS_SB_COUNT 41
-#define XFS_TRANS_CHECKPOINT 42
-#define XFS_TRANS_ICREATE 43
-#define XFS_TRANS_TYPE_MAX 43
-/* new transaction types need to be reflected in xfs_logprint(8) */
-
-#define XFS_TRANS_TYPES \
- { XFS_TRANS_SETATTR_NOT_SIZE, "SETATTR_NOT_SIZE" }, \
- { XFS_TRANS_SETATTR_SIZE, "SETATTR_SIZE" }, \
- { XFS_TRANS_INACTIVE, "INACTIVE" }, \
- { XFS_TRANS_CREATE, "CREATE" }, \
- { XFS_TRANS_CREATE_TRUNC, "CREATE_TRUNC" }, \
- { XFS_TRANS_TRUNCATE_FILE, "TRUNCATE_FILE" }, \
- { XFS_TRANS_REMOVE, "REMOVE" }, \
- { XFS_TRANS_LINK, "LINK" }, \
- { XFS_TRANS_RENAME, "RENAME" }, \
- { XFS_TRANS_MKDIR, "MKDIR" }, \
- { XFS_TRANS_RMDIR, "RMDIR" }, \
- { XFS_TRANS_SYMLINK, "SYMLINK" }, \
- { XFS_TRANS_SET_DMATTRS, "SET_DMATTRS" }, \
- { XFS_TRANS_GROWFS, "GROWFS" }, \
- { XFS_TRANS_STRAT_WRITE, "STRAT_WRITE" }, \
- { XFS_TRANS_DIOSTRAT, "DIOSTRAT" }, \
- { XFS_TRANS_WRITEID, "WRITEID" }, \
- { XFS_TRANS_ADDAFORK, "ADDAFORK" }, \
- { XFS_TRANS_ATTRINVAL, "ATTRINVAL" }, \
- { XFS_TRANS_ATRUNCATE, "ATRUNCATE" }, \
- { XFS_TRANS_ATTR_SET, "ATTR_SET" }, \
- { XFS_TRANS_ATTR_RM, "ATTR_RM" }, \
- { XFS_TRANS_ATTR_FLAG, "ATTR_FLAG" }, \
- { XFS_TRANS_CLEAR_AGI_BUCKET, "CLEAR_AGI_BUCKET" }, \
- { XFS_TRANS_QM_SBCHANGE, "QM_SBCHANGE" }, \
- { XFS_TRANS_QM_QUOTAOFF, "QM_QUOTAOFF" }, \
- { XFS_TRANS_QM_DQALLOC, "QM_DQALLOC" }, \
- { XFS_TRANS_QM_SETQLIM, "QM_SETQLIM" }, \
- { XFS_TRANS_QM_DQCLUSTER, "QM_DQCLUSTER" }, \
- { XFS_TRANS_QM_QINOCREATE, "QM_QINOCREATE" }, \
- { XFS_TRANS_QM_QUOTAOFF_END, "QM_QOFF_END" }, \
- { XFS_TRANS_SB_UNIT, "SB_UNIT" }, \
- { XFS_TRANS_FSYNC_TS, "FSYNC_TS" }, \
- { XFS_TRANS_GROWFSRT_ALLOC, "GROWFSRT_ALLOC" }, \
- { XFS_TRANS_GROWFSRT_ZERO, "GROWFSRT_ZERO" }, \
- { XFS_TRANS_GROWFSRT_FREE, "GROWFSRT_FREE" }, \
- { XFS_TRANS_SWAPEXT, "SWAPEXT" }, \
- { XFS_TRANS_SB_COUNT, "SB_COUNT" }, \
- { XFS_TRANS_CHECKPOINT, "CHECKPOINT" }, \
- { XFS_TRANS_DUMMY1, "DUMMY1" }, \
- { XFS_TRANS_DUMMY2, "DUMMY2" }, \
- { XLOG_UNMOUNT_REC_TYPE, "UNMOUNT" }
-
-/*
- * This structure is used to track log items associated with
- * a transaction. It points to the log item and keeps some
- * flags to track the state of the log item. It also tracks
- * the amount of space needed to log the item it describes
- * once we get to commit processing (see xfs_trans_commit()).
- */
-struct xfs_log_item_desc {
- struct xfs_log_item *lid_item;
- struct list_head lid_trans;
- unsigned char lid_flags;
-};
-
-#define XFS_LID_DIRTY 0x1
-
-#define XFS_TRANS_MAGIC 0x5452414E /* 'TRAN' */
-/*
- * Values for t_flags.
- */
-#define XFS_TRANS_DIRTY 0x01 /* something needs to be logged */
-#define XFS_TRANS_SB_DIRTY 0x02 /* superblock is modified */
-#define XFS_TRANS_PERM_LOG_RES 0x04 /* xact took a permanent log res */
-#define XFS_TRANS_SYNC 0x08 /* make commit synchronous */
-#define XFS_TRANS_DQ_DIRTY 0x10 /* at least one dquot in trx dirty */
-#define XFS_TRANS_RESERVE 0x20 /* OK to use reserved data blocks */
-#define XFS_TRANS_FREEZE_PROT 0x40 /* Transaction has elevated writer
- count in superblock */
-
-/*
- * Values for call flags parameter.
- */
-#define XFS_TRANS_RELEASE_LOG_RES 0x4
-#define XFS_TRANS_ABORT 0x8
-
-/*
- * Field values for xfs_trans_mod_sb.
- */
-#define XFS_TRANS_SB_ICOUNT 0x00000001
-#define XFS_TRANS_SB_IFREE 0x00000002
-#define XFS_TRANS_SB_FDBLOCKS 0x00000004
-#define XFS_TRANS_SB_RES_FDBLOCKS 0x00000008
-#define XFS_TRANS_SB_FREXTENTS 0x00000010
-#define XFS_TRANS_SB_RES_FREXTENTS 0x00000020
-#define XFS_TRANS_SB_DBLOCKS 0x00000040
-#define XFS_TRANS_SB_AGCOUNT 0x00000080
-#define XFS_TRANS_SB_IMAXPCT 0x00000100
-#define XFS_TRANS_SB_REXTSIZE 0x00000200
-#define XFS_TRANS_SB_RBMBLOCKS 0x00000400
-#define XFS_TRANS_SB_RBLOCKS 0x00000800
-#define XFS_TRANS_SB_REXTENTS 0x00001000
-#define XFS_TRANS_SB_REXTSLOG 0x00002000
-
-/*
- * Here we centralize the specification of XFS meta-data buffer
- * reference count values. This determine how hard the buffer
- * cache tries to hold onto the buffer.
- */
-#define XFS_AGF_REF 4
-#define XFS_AGI_REF 4
-#define XFS_AGFL_REF 3
-#define XFS_INO_BTREE_REF 3
-#define XFS_ALLOC_BTREE_REF 2
-#define XFS_BMAP_BTREE_REF 2
-#define XFS_DIR_BTREE_REF 2
-#define XFS_INO_REF 2
-#define XFS_ATTR_BTREE_REF 1
-#define XFS_DQUOT_REF 1
-
-/*
- * Flags for xfs_trans_ichgtime().
- */
-#define XFS_ICHGTIME_MOD 0x1 /* data fork modification timestamp */
-#define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */
-#define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */
-
-#ifdef __KERNEL__
+#include "xfs_trans_format.h"
struct xfs_buf;
struct xfs_buftarg;
@@ -464,6 +251,4 @@ void xfs_trans_ail_destroy(struct xfs_mount *);
extern kmem_zone_t *xfs_trans_zone;
extern kmem_zone_t *xfs_log_item_desc_zone;
-#endif /* __KERNEL__ */
-
#endif /* __XFS_TRANS_H__ */
diff --git a/fs/xfs/xfs_trans_format.h b/fs/xfs/xfs_trans_format.h
new file mode 100644
index 0000000..a4f5d1d
--- /dev/null
+++ b/fs/xfs/xfs_trans_format.h
@@ -0,0 +1,240 @@
+/*
+ * Copyright (c) 2000-2002,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_TRANS_FORMAT_H__
+#define __XFS_TRANS_FORMAT_H__
+
+/*
+ * This file contains mostly on-disk format defintions for the transaction
+ * subsystem. There are some defintions that are in-memory, but are included
+ * here because they are shared with userspace simply for convenience.
+ */
+
+/*
+ * This is the structure written in the log at the head of
+ * every transaction. It identifies the type and id of the
+ * transaction, and contains the number of items logged by
+ * the transaction so we know how many to expect during recovery.
+ *
+ * Do not change the below structure without redoing the code in
+ * xlog_recover_add_to_trans() and xlog_recover_add_to_cont_trans().
+ */
+typedef struct xfs_trans_header {
+ uint th_magic; /* magic number */
+ uint th_type; /* transaction type */
+ __int32_t th_tid; /* transaction id (unused) */
+ uint th_num_items; /* num items logged by trans */
+} xfs_trans_header_t;
+
+#define XFS_TRANS_HEADER_MAGIC 0x5452414e /* TRAN */
+
+/*
+ * Log item types.
+ */
+#define XFS_LI_EFI 0x1236
+#define XFS_LI_EFD 0x1237
+#define XFS_LI_IUNLINK 0x1238
+#define XFS_LI_INODE 0x123b /* aligned ino chunks, var-size ibufs */
+#define XFS_LI_BUF 0x123c /* v2 bufs, variable sized inode bufs */
+#define XFS_LI_DQUOT 0x123d
+#define XFS_LI_QUOTAOFF 0x123e
+#define XFS_LI_ICREATE 0x123f
+
+#define XFS_LI_TYPE_DESC \
+ { XFS_LI_EFI, "XFS_LI_EFI" }, \
+ { XFS_LI_EFD, "XFS_LI_EFD" }, \
+ { XFS_LI_IUNLINK, "XFS_LI_IUNLINK" }, \
+ { XFS_LI_INODE, "XFS_LI_INODE" }, \
+ { XFS_LI_BUF, "XFS_LI_BUF" }, \
+ { XFS_LI_DQUOT, "XFS_LI_DQUOT" }, \
+ { XFS_LI_QUOTAOFF, "XFS_LI_QUOTAOFF" }, \
+ { XFS_LI_ICREATE, "XFS_LI_ICREATE" }
+
+/*
+ * Transaction types. Used to distinguish types of buffers.
+ */
+#define XFS_TRANS_SETATTR_NOT_SIZE 1
+#define XFS_TRANS_SETATTR_SIZE 2
+#define XFS_TRANS_INACTIVE 3
+#define XFS_TRANS_CREATE 4
+#define XFS_TRANS_CREATE_TRUNC 5
+#define XFS_TRANS_TRUNCATE_FILE 6
+#define XFS_TRANS_REMOVE 7
+#define XFS_TRANS_LINK 8
+#define XFS_TRANS_RENAME 9
+#define XFS_TRANS_MKDIR 10
+#define XFS_TRANS_RMDIR 11
+#define XFS_TRANS_SYMLINK 12
+#define XFS_TRANS_SET_DMATTRS 13
+#define XFS_TRANS_GROWFS 14
+#define XFS_TRANS_STRAT_WRITE 15
+#define XFS_TRANS_DIOSTRAT 16
+/* 17 was XFS_TRANS_WRITE_SYNC */
+#define XFS_TRANS_WRITEID 18
+#define XFS_TRANS_ADDAFORK 19
+#define XFS_TRANS_ATTRINVAL 20
+#define XFS_TRANS_ATRUNCATE 21
+#define XFS_TRANS_ATTR_SET 22
+#define XFS_TRANS_ATTR_RM 23
+#define XFS_TRANS_ATTR_FLAG 24
+#define XFS_TRANS_CLEAR_AGI_BUCKET 25
+#define XFS_TRANS_QM_SBCHANGE 26
+/*
+ * Dummy entries since we use the transaction type to index into the
+ * trans_type[] in xlog_recover_print_trans_head()
+ */
+#define XFS_TRANS_DUMMY1 27
+#define XFS_TRANS_DUMMY2 28
+#define XFS_TRANS_QM_QUOTAOFF 29
+#define XFS_TRANS_QM_DQALLOC 30
+#define XFS_TRANS_QM_SETQLIM 31
+#define XFS_TRANS_QM_DQCLUSTER 32
+#define XFS_TRANS_QM_QINOCREATE 33
+#define XFS_TRANS_QM_QUOTAOFF_END 34
+#define XFS_TRANS_SB_UNIT 35
+#define XFS_TRANS_FSYNC_TS 36
+#define XFS_TRANS_GROWFSRT_ALLOC 37
+#define XFS_TRANS_GROWFSRT_ZERO 38
+#define XFS_TRANS_GROWFSRT_FREE 39
+#define XFS_TRANS_SWAPEXT 40
+#define XFS_TRANS_SB_COUNT 41
+#define XFS_TRANS_CHECKPOINT 42
+#define XFS_TRANS_ICREATE 43
+#define XFS_TRANS_TYPE_MAX 43
+/* new transaction types need to be reflected in xfs_logprint(8) */
+
+#define XFS_TRANS_TYPES \
+ { XFS_TRANS_SETATTR_NOT_SIZE, "SETATTR_NOT_SIZE" }, \
+ { XFS_TRANS_SETATTR_SIZE, "SETATTR_SIZE" }, \
+ { XFS_TRANS_INACTIVE, "INACTIVE" }, \
+ { XFS_TRANS_CREATE, "CREATE" }, \
+ { XFS_TRANS_CREATE_TRUNC, "CREATE_TRUNC" }, \
+ { XFS_TRANS_TRUNCATE_FILE, "TRUNCATE_FILE" }, \
+ { XFS_TRANS_REMOVE, "REMOVE" }, \
+ { XFS_TRANS_LINK, "LINK" }, \
+ { XFS_TRANS_RENAME, "RENAME" }, \
+ { XFS_TRANS_MKDIR, "MKDIR" }, \
+ { XFS_TRANS_RMDIR, "RMDIR" }, \
+ { XFS_TRANS_SYMLINK, "SYMLINK" }, \
+ { XFS_TRANS_SET_DMATTRS, "SET_DMATTRS" }, \
+ { XFS_TRANS_GROWFS, "GROWFS" }, \
+ { XFS_TRANS_STRAT_WRITE, "STRAT_WRITE" }, \
+ { XFS_TRANS_DIOSTRAT, "DIOSTRAT" }, \
+ { XFS_TRANS_WRITEID, "WRITEID" }, \
+ { XFS_TRANS_ADDAFORK, "ADDAFORK" }, \
+ { XFS_TRANS_ATTRINVAL, "ATTRINVAL" }, \
+ { XFS_TRANS_ATRUNCATE, "ATRUNCATE" }, \
+ { XFS_TRANS_ATTR_SET, "ATTR_SET" }, \
+ { XFS_TRANS_ATTR_RM, "ATTR_RM" }, \
+ { XFS_TRANS_ATTR_FLAG, "ATTR_FLAG" }, \
+ { XFS_TRANS_CLEAR_AGI_BUCKET, "CLEAR_AGI_BUCKET" }, \
+ { XFS_TRANS_QM_SBCHANGE, "QM_SBCHANGE" }, \
+ { XFS_TRANS_QM_QUOTAOFF, "QM_QUOTAOFF" }, \
+ { XFS_TRANS_QM_DQALLOC, "QM_DQALLOC" }, \
+ { XFS_TRANS_QM_SETQLIM, "QM_SETQLIM" }, \
+ { XFS_TRANS_QM_DQCLUSTER, "QM_DQCLUSTER" }, \
+ { XFS_TRANS_QM_QINOCREATE, "QM_QINOCREATE" }, \
+ { XFS_TRANS_QM_QUOTAOFF_END, "QM_QOFF_END" }, \
+ { XFS_TRANS_SB_UNIT, "SB_UNIT" }, \
+ { XFS_TRANS_FSYNC_TS, "FSYNC_TS" }, \
+ { XFS_TRANS_GROWFSRT_ALLOC, "GROWFSRT_ALLOC" }, \
+ { XFS_TRANS_GROWFSRT_ZERO, "GROWFSRT_ZERO" }, \
+ { XFS_TRANS_GROWFSRT_FREE, "GROWFSRT_FREE" }, \
+ { XFS_TRANS_SWAPEXT, "SWAPEXT" }, \
+ { XFS_TRANS_SB_COUNT, "SB_COUNT" }, \
+ { XFS_TRANS_CHECKPOINT, "CHECKPOINT" }, \
+ { XFS_TRANS_DUMMY1, "DUMMY1" }, \
+ { XFS_TRANS_DUMMY2, "DUMMY2" }, \
+ { XLOG_UNMOUNT_REC_TYPE, "UNMOUNT" }
+
+
+/* --- Information from here down is in-memory --- */
+
+/*
+ * This structure is used to track log items associated with
+ * a transaction. It points to the log item and keeps some
+ * flags to track the state of the log item. It also tracks
+ * the amount of space needed to log the item it describes
+ * once we get to commit processing (see xfs_trans_commit()).
+ */
+struct xfs_log_item_desc {
+ struct xfs_log_item *lid_item;
+ struct list_head lid_trans;
+ unsigned char lid_flags;
+};
+
+#define XFS_LID_DIRTY 0x1
+
+/*
+ * Values for t_flags.
+ */
+#define XFS_TRANS_DIRTY 0x01 /* something needs to be logged */
+#define XFS_TRANS_SB_DIRTY 0x02 /* superblock is modified */
+#define XFS_TRANS_PERM_LOG_RES 0x04 /* xact took a permanent log res */
+#define XFS_TRANS_SYNC 0x08 /* make commit synchronous */
+#define XFS_TRANS_DQ_DIRTY 0x10 /* at least one dquot in trx dirty */
+#define XFS_TRANS_RESERVE 0x20 /* OK to use reserved data blocks */
+#define XFS_TRANS_FREEZE_PROT 0x40 /* Transaction has elevated writer
+ count in superblock */
+/*
+ * Values for call flags parameter.
+ */
+#define XFS_TRANS_RELEASE_LOG_RES 0x4
+#define XFS_TRANS_ABORT 0x8
+
+/*
+ * Field values for xfs_trans_mod_sb.
+ */
+#define XFS_TRANS_SB_ICOUNT 0x00000001
+#define XFS_TRANS_SB_IFREE 0x00000002
+#define XFS_TRANS_SB_FDBLOCKS 0x00000004
+#define XFS_TRANS_SB_RES_FDBLOCKS 0x00000008
+#define XFS_TRANS_SB_FREXTENTS 0x00000010
+#define XFS_TRANS_SB_RES_FREXTENTS 0x00000020
+#define XFS_TRANS_SB_DBLOCKS 0x00000040
+#define XFS_TRANS_SB_AGCOUNT 0x00000080
+#define XFS_TRANS_SB_IMAXPCT 0x00000100
+#define XFS_TRANS_SB_REXTSIZE 0x00000200
+#define XFS_TRANS_SB_RBMBLOCKS 0x00000400
+#define XFS_TRANS_SB_RBLOCKS 0x00000800
+#define XFS_TRANS_SB_REXTENTS 0x00001000
+#define XFS_TRANS_SB_REXTSLOG 0x00002000
+
+/*
+ * Here we centralize the specification of XFS meta-data buffer
+ * reference count values. This determine how hard the buffer
+ * cache tries to hold onto the buffer.
+ */
+#define XFS_AGF_REF 4
+#define XFS_AGI_REF 4
+#define XFS_AGFL_REF 3
+#define XFS_INO_BTREE_REF 3
+#define XFS_ALLOC_BTREE_REF 2
+#define XFS_BMAP_BTREE_REF 2
+#define XFS_DIR_BTREE_REF 2
+#define XFS_INO_REF 2
+#define XFS_ATTR_BTREE_REF 1
+#define XFS_DQUOT_REF 1
+
+/*
+ * Flags for xfs_trans_ichgtime().
+ */
+#define XFS_ICHGTIME_MOD 0x1 /* data fork modification timestamp */
+#define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */
+#define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */
+
+#endif /* __XFS_TRANS_FORMAT_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 43/60] xfs: remove __KERNEL__ from debug code
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (41 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 42/60] xfs: split out on-disk transaction definitions Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 44/60] xfs: remove __KERNEL__ check from xfs_dir2_leaf.c Dave Chinner
` (18 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
There is no reason the remaining kernel-only debug code needs to
remain kernel-only. Kill the __KERNEL__ part of the defines, and let
userspace handle the debug code appropriately.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_bmap.h | 2 +-
fs/xfs/xfs_dir2_data.c | 4 ++--
fs/xfs/xfs_rtalloc.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
index 262f760..a2e50c6 100644
--- a/fs/xfs/xfs_bmap.h
+++ b/fs/xfs/xfs_bmap.h
@@ -163,7 +163,7 @@ typedef struct xfs_bmalloca {
{ BMAP_RIGHT_FILLING, "RF" }, \
{ BMAP_ATTRFORK, "ATTR" }
-#if defined(__KERNEL) && defined(DEBUG)
+#ifdef DEBUG
void xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
int whichfork, unsigned long caller_ip);
#define XFS_BMAP_TRACE_EXLIST(ip,c,w) \
diff --git a/fs/xfs/xfs_dir2_data.c b/fs/xfs/xfs_dir2_data.c
index 4e1917d..98c23fa 100644
--- a/fs/xfs/xfs_dir2_data.c
+++ b/fs/xfs/xfs_dir2_data.c
@@ -331,7 +331,7 @@ xfs_dir2_data_freefind(
xfs_dir2_data_free_t *dfp; /* bestfree entry */
xfs_dir2_data_aoff_t off; /* offset value needed */
struct xfs_dir2_data_free *bf;
-#if defined(DEBUG) && defined(__KERNEL__)
+#ifdef DEBUG
int matched; /* matched the value */
int seenzero; /* saw a 0 bestfree entry */
#endif
@@ -339,7 +339,7 @@ xfs_dir2_data_freefind(
off = (xfs_dir2_data_aoff_t)((char *)dup - (char *)hdr);
bf = xfs_dir3_data_bestfree_p(hdr);
-#if defined(DEBUG) && defined(__KERNEL__)
+#ifdef DEBUG
/*
* Validate some consistency in the bestfree table.
* Check order, non-overlapping entries, and if we find the
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 490fc05..19b738f 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -2146,7 +2146,7 @@ xfs_rtfree_extent(
ASSERT(mp->m_rbmip->i_itemp != NULL);
ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL));
-#if defined(__KERNEL__) && defined(DEBUG)
+#ifdef DEBUG
/*
* Check to see that this whole range is currently allocated.
*/
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 44/60] xfs: remove __KERNEL__ check from xfs_dir2_leaf.c
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (42 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 43/60] xfs: remove __KERNEL__ from debug code Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 45/60] xfs: xfs_filestreams.h doesn't need __KERNEL__ Dave Chinner
` (17 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
It's actually an ifndef section, which means it is only included in
userspace. however, it's deep within the libxfs code, so it's
unlikely that the condition checked in userspace can actually occur
(search an empty leaf) through the libxfs interfaces. i.e. if it can
happen in usrspace, it can happen in the kernel, so remove it from
userspace too....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_dir2_leaf.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index 005a6b3..853ebdf 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -1587,10 +1587,6 @@ xfs_dir2_leaf_search_hash(
ents = xfs_dir3_leaf_ents_p(leaf);
xfs_dir3_leaf_hdr_from_disk(&leafhdr, leaf);
-#ifndef __KERNEL__
- if (!leafhdr.count)
- return 0;
-#endif
/*
* Note, the table cannot be empty, so we have to go through the loop.
* Binary search the leaf entries looking for our hash value.
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 45/60] xfs: xfs_filestreams.h doesn't need __KERNEL__
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (43 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 44/60] xfs: remove __KERNEL__ check from xfs_dir2_leaf.c Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 46/60] xfs: split out the remote symlink handling Dave Chinner
` (16 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Because it is only used within the kernel.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_filestream.h | 4 ----
1 file changed, 4 deletions(-)
diff --git a/fs/xfs/xfs_filestream.h b/fs/xfs/xfs_filestream.h
index 09dd9af..6d61dbe 100644
--- a/fs/xfs/xfs_filestream.h
+++ b/fs/xfs/xfs_filestream.h
@@ -18,8 +18,6 @@
#ifndef __XFS_FILESTREAM_H__
#define __XFS_FILESTREAM_H__
-#ifdef __KERNEL__
-
struct xfs_mount;
struct xfs_inode;
struct xfs_perag;
@@ -69,6 +67,4 @@ xfs_inode_is_filestream(
(ip->i_d.di_flags & XFS_DIFLAG_FILESTREAM);
}
-#endif /* __KERNEL__ */
-
#endif /* __XFS_FILESTREAM_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 46/60] xfs: split out the remote symlink handling
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (44 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 45/60] xfs: xfs_filestreams.h doesn't need __KERNEL__ Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 47/60] xfs: separate out log format definitions Dave Chinner
` (15 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The remote symlink format definition and manipulation needs to be
shared with userspace, but the in-kernel interfaces do not. Split
the remote symlink format handling out into xfs_symlink_remote.[ch]
fo it can easily be shared with userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 3 +-
fs/xfs/xfs_symlink.c | 174 +------------------------------------
fs/xfs/xfs_symlink.h | 41 +--------
fs/xfs/xfs_symlink_remote.c | 200 +++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_symlink_remote.h | 60 +++++++++++++
5 files changed, 267 insertions(+), 211 deletions(-)
create mode 100644 fs/xfs/xfs_symlink_remote.c
create mode 100644 fs/xfs/xfs_symlink_remote.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 347d27c..5fbd5a1 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -51,6 +51,7 @@ xfs-y += xfs_aops.o \
xfs_mount.o \
xfs_mru_cache.o \
xfs_super.o \
+ xfs_symlink.o \
xfs_trans.o \
xfs_xattr.o \
kmem.o \
@@ -80,7 +81,7 @@ xfs-y += xfs_alloc.o \
xfs_inode_buf.o \
xfs_log_recover.o \
xfs_sb.o \
- xfs_symlink.o \
+ xfs_symlink_remote.o \
xfs_trans_resv.o
# low-level transaction/log code
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 69ec1e6..bed61a5 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -18,7 +18,6 @@
*/
#include "xfs.h"
#include "xfs_fs.h"
-#include "xfs_types.h"
#include "xfs_bit.h"
#include "xfs_log.h"
#include "xfs_trans.h"
@@ -32,188 +31,17 @@
#include "xfs_ialloc_btree.h"
#include "xfs_dinode.h"
#include "xfs_inode.h"
-#include "xfs_inode_item.h"
-#include "xfs_itable.h"
#include "xfs_ialloc.h"
#include "xfs_alloc.h"
#include "xfs_bmap.h"
#include "xfs_error.h"
#include "xfs_quota.h"
#include "xfs_trans_space.h"
-#include "xfs_log_priv.h"
#include "xfs_trace.h"
#include "xfs_symlink.h"
-#include "xfs_cksum.h"
-#include "xfs_buf_item.h"
+/* ----- Kernel only symlink functions ----- */
-/*
- * Each contiguous block has a header, so it is not just a simple pathlen
- * to FSB conversion.
- */
-int
-xfs_symlink_blocks(
- struct xfs_mount *mp,
- int pathlen)
-{
- int buflen = XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
-
- return (pathlen + buflen - 1) / buflen;
-}
-
-static int
-xfs_symlink_hdr_set(
- struct xfs_mount *mp,
- xfs_ino_t ino,
- uint32_t offset,
- uint32_t size,
- struct xfs_buf *bp)
-{
- struct xfs_dsymlink_hdr *dsl = bp->b_addr;
-
- if (!xfs_sb_version_hascrc(&mp->m_sb))
- return 0;
-
- dsl->sl_magic = cpu_to_be32(XFS_SYMLINK_MAGIC);
- dsl->sl_offset = cpu_to_be32(offset);
- dsl->sl_bytes = cpu_to_be32(size);
- uuid_copy(&dsl->sl_uuid, &mp->m_sb.sb_uuid);
- dsl->sl_owner = cpu_to_be64(ino);
- dsl->sl_blkno = cpu_to_be64(bp->b_bn);
- bp->b_ops = &xfs_symlink_buf_ops;
-
- return sizeof(struct xfs_dsymlink_hdr);
-}
-
-/*
- * Checking of the symlink header is split into two parts. the verifier does
- * CRC, location and bounds checking, the unpacking function checks the path
- * parameters and owner.
- */
-bool
-xfs_symlink_hdr_ok(
- struct xfs_mount *mp,
- xfs_ino_t ino,
- uint32_t offset,
- uint32_t size,
- struct xfs_buf *bp)
-{
- struct xfs_dsymlink_hdr *dsl = bp->b_addr;
-
- if (offset != be32_to_cpu(dsl->sl_offset))
- return false;
- if (size != be32_to_cpu(dsl->sl_bytes))
- return false;
- if (ino != be64_to_cpu(dsl->sl_owner))
- return false;
-
- /* ok */
- return true;
-}
-
-static bool
-xfs_symlink_verify(
- struct xfs_buf *bp)
-{
- struct xfs_mount *mp = bp->b_target->bt_mount;
- struct xfs_dsymlink_hdr *dsl = bp->b_addr;
-
- if (!xfs_sb_version_hascrc(&mp->m_sb))
- return false;
- if (dsl->sl_magic != cpu_to_be32(XFS_SYMLINK_MAGIC))
- return false;
- if (!uuid_equal(&dsl->sl_uuid, &mp->m_sb.sb_uuid))
- return false;
- if (bp->b_bn != be64_to_cpu(dsl->sl_blkno))
- return false;
- if (be32_to_cpu(dsl->sl_offset) +
- be32_to_cpu(dsl->sl_bytes) >= MAXPATHLEN)
- return false;
- if (dsl->sl_owner == 0)
- return false;
-
- return true;
-}
-
-static void
-xfs_symlink_read_verify(
- struct xfs_buf *bp)
-{
- struct xfs_mount *mp = bp->b_target->bt_mount;
-
- /* no verification of non-crc buffers */
- if (!xfs_sb_version_hascrc(&mp->m_sb))
- return;
-
- if (!xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
- offsetof(struct xfs_dsymlink_hdr, sl_crc)) ||
- !xfs_symlink_verify(bp)) {
- XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
- xfs_buf_ioerror(bp, EFSCORRUPTED);
- }
-}
-
-static void
-xfs_symlink_write_verify(
- struct xfs_buf *bp)
-{
- struct xfs_mount *mp = bp->b_target->bt_mount;
- struct xfs_buf_log_item *bip = bp->b_fspriv;
-
- /* no verification of non-crc buffers */
- if (!xfs_sb_version_hascrc(&mp->m_sb))
- return;
-
- if (!xfs_symlink_verify(bp)) {
- XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
- xfs_buf_ioerror(bp, EFSCORRUPTED);
- return;
- }
-
- if (bip) {
- struct xfs_dsymlink_hdr *dsl = bp->b_addr;
- dsl->sl_lsn = cpu_to_be64(bip->bli_item.li_lsn);
- }
- xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
- offsetof(struct xfs_dsymlink_hdr, sl_crc));
-}
-
-const struct xfs_buf_ops xfs_symlink_buf_ops = {
- .verify_read = xfs_symlink_read_verify,
- .verify_write = xfs_symlink_write_verify,
-};
-
-void
-xfs_symlink_local_to_remote(
- struct xfs_trans *tp,
- struct xfs_buf *bp,
- struct xfs_inode *ip,
- struct xfs_ifork *ifp)
-{
- struct xfs_mount *mp = ip->i_mount;
- char *buf;
-
- if (!xfs_sb_version_hascrc(&mp->m_sb)) {
- bp->b_ops = NULL;
- memcpy(bp->b_addr, ifp->if_u1.if_data, ifp->if_bytes);
- return;
- }
-
- /*
- * As this symlink fits in an inode literal area, it must also fit in
- * the smallest buffer the filesystem supports.
- */
- ASSERT(BBTOB(bp->b_length) >=
- ifp->if_bytes + sizeof(struct xfs_dsymlink_hdr));
-
- bp->b_ops = &xfs_symlink_buf_ops;
-
- buf = bp->b_addr;
- buf += xfs_symlink_hdr_set(mp, ip->i_ino, 0, ifp->if_bytes, bp);
- memcpy(buf, ifp->if_u1.if_data, ifp->if_bytes);
-}
-
-/* ----- Kernel only functions below ----- */
STATIC int
xfs_readlink_bmap(
struct xfs_inode *ip,
diff --git a/fs/xfs/xfs_symlink.h b/fs/xfs/xfs_symlink.h
index 4818edf..b726102 100644
--- a/fs/xfs/xfs_symlink.h
+++ b/fs/xfs/xfs_symlink.h
@@ -17,51 +17,18 @@
#ifndef __XFS_SYMLINK_H
#define __XFS_SYMLINK_H 1
-struct xfs_mount;
+#include "xfs_symlink_remote.h"
+
+/* Kernel only symlink defintions */
+
struct xfs_trans;
struct xfs_inode;
-struct xfs_buf;
struct xfs_ifork;
struct xfs_name;
-#define XFS_SYMLINK_MAGIC 0x58534c4d /* XSLM */
-
-struct xfs_dsymlink_hdr {
- __be32 sl_magic;
- __be32 sl_offset;
- __be32 sl_bytes;
- __be32 sl_crc;
- uuid_t sl_uuid;
- __be64 sl_owner;
- __be64 sl_blkno;
- __be64 sl_lsn;
-};
-
-/*
- * The maximum pathlen is 1024 bytes. Since the minimum file system
- * blocksize is 512 bytes, we can get a max of 3 extents back from
- * bmapi when crc headers are taken into account.
- */
-#define XFS_SYMLINK_MAPS 3
-
-#define XFS_SYMLINK_BUF_SPACE(mp, bufsize) \
- ((bufsize) - (xfs_sb_version_hascrc(&(mp)->m_sb) ? \
- sizeof(struct xfs_dsymlink_hdr) : 0))
-
-int xfs_symlink_blocks(struct xfs_mount *mp, int pathlen);
-bool xfs_symlink_hdr_ok(struct xfs_mount *mp, xfs_ino_t ino, uint32_t offset,
- uint32_t size, struct xfs_buf *bp);
-void xfs_symlink_local_to_remote(struct xfs_trans *tp, struct xfs_buf *bp,
- struct xfs_inode *ip, struct xfs_ifork *ifp);
-
-extern const struct xfs_buf_ops xfs_symlink_buf_ops;
-
-#ifdef __KERNEL__
-
int xfs_symlink(struct xfs_inode *dp, struct xfs_name *link_name,
const char *target_path, umode_t mode, struct xfs_inode **ipp);
int xfs_readlink(struct xfs_inode *ip, char *link);
int xfs_inactive_symlink_rmt(struct xfs_inode *ip, struct xfs_trans **tpp);
-#endif /* __KERNEL__ */
#endif /* __XFS_SYMLINK_H */
diff --git a/fs/xfs/xfs_symlink_remote.c b/fs/xfs/xfs_symlink_remote.c
new file mode 100644
index 0000000..a369025
--- /dev/null
+++ b/fs/xfs/xfs_symlink_remote.c
@@ -0,0 +1,200 @@
+/*
+ * Copyright (c) 2000-2006 Silicon Graphics, Inc.
+ * Copyright (c) 2012-2013 Red Hat, Inc.
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_symlink.h"
+#include "xfs_cksum.h"
+#include "xfs_buf_item.h"
+
+
+/*
+ * Each contiguous block has a header, so it is not just a simple pathlen
+ * to FSB conversion.
+ */
+int
+xfs_symlink_blocks(
+ struct xfs_mount *mp,
+ int pathlen)
+{
+ int buflen = XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
+
+ return (pathlen + buflen - 1) / buflen;
+}
+
+int
+xfs_symlink_hdr_set(
+ struct xfs_mount *mp,
+ xfs_ino_t ino,
+ uint32_t offset,
+ uint32_t size,
+ struct xfs_buf *bp)
+{
+ struct xfs_dsymlink_hdr *dsl = bp->b_addr;
+
+ if (!xfs_sb_version_hascrc(&mp->m_sb))
+ return 0;
+
+ dsl->sl_magic = cpu_to_be32(XFS_SYMLINK_MAGIC);
+ dsl->sl_offset = cpu_to_be32(offset);
+ dsl->sl_bytes = cpu_to_be32(size);
+ uuid_copy(&dsl->sl_uuid, &mp->m_sb.sb_uuid);
+ dsl->sl_owner = cpu_to_be64(ino);
+ dsl->sl_blkno = cpu_to_be64(bp->b_bn);
+ bp->b_ops = &xfs_symlink_buf_ops;
+
+ return sizeof(struct xfs_dsymlink_hdr);
+}
+
+/*
+ * Checking of the symlink header is split into two parts. the verifier does
+ * CRC, location and bounds checking, the unpacking function checks the path
+ * parameters and owner.
+ */
+bool
+xfs_symlink_hdr_ok(
+ struct xfs_mount *mp,
+ xfs_ino_t ino,
+ uint32_t offset,
+ uint32_t size,
+ struct xfs_buf *bp)
+{
+ struct xfs_dsymlink_hdr *dsl = bp->b_addr;
+
+ if (offset != be32_to_cpu(dsl->sl_offset))
+ return false;
+ if (size != be32_to_cpu(dsl->sl_bytes))
+ return false;
+ if (ino != be64_to_cpu(dsl->sl_owner))
+ return false;
+
+ /* ok */
+ return true;
+}
+
+static bool
+xfs_symlink_verify(
+ struct xfs_buf *bp)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+ struct xfs_dsymlink_hdr *dsl = bp->b_addr;
+
+ if (!xfs_sb_version_hascrc(&mp->m_sb))
+ return false;
+ if (dsl->sl_magic != cpu_to_be32(XFS_SYMLINK_MAGIC))
+ return false;
+ if (!uuid_equal(&dsl->sl_uuid, &mp->m_sb.sb_uuid))
+ return false;
+ if (bp->b_bn != be64_to_cpu(dsl->sl_blkno))
+ return false;
+ if (be32_to_cpu(dsl->sl_offset) +
+ be32_to_cpu(dsl->sl_bytes) >= MAXPATHLEN)
+ return false;
+ if (dsl->sl_owner == 0)
+ return false;
+
+ return true;
+}
+
+static void
+xfs_symlink_read_verify(
+ struct xfs_buf *bp)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+
+ /* no verification of non-crc buffers */
+ if (!xfs_sb_version_hascrc(&mp->m_sb))
+ return;
+
+ if (!xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
+ offsetof(struct xfs_dsymlink_hdr, sl_crc)) ||
+ !xfs_symlink_verify(bp)) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
+ xfs_buf_ioerror(bp, EFSCORRUPTED);
+ }
+}
+
+static void
+xfs_symlink_write_verify(
+ struct xfs_buf *bp)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+ struct xfs_buf_log_item *bip = bp->b_fspriv;
+
+ /* no verification of non-crc buffers */
+ if (!xfs_sb_version_hascrc(&mp->m_sb))
+ return;
+
+ if (!xfs_symlink_verify(bp)) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
+ xfs_buf_ioerror(bp, EFSCORRUPTED);
+ return;
+ }
+
+ if (bip) {
+ struct xfs_dsymlink_hdr *dsl = bp->b_addr;
+ dsl->sl_lsn = cpu_to_be64(bip->bli_item.li_lsn);
+ }
+ xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
+ offsetof(struct xfs_dsymlink_hdr, sl_crc));
+}
+
+const struct xfs_buf_ops xfs_symlink_buf_ops = {
+ .verify_read = xfs_symlink_read_verify,
+ .verify_write = xfs_symlink_write_verify,
+};
+
+void
+xfs_symlink_local_to_remote(
+ struct xfs_trans *tp,
+ struct xfs_buf *bp,
+ struct xfs_inode *ip,
+ struct xfs_ifork *ifp)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ char *buf;
+
+ if (!xfs_sb_version_hascrc(&mp->m_sb)) {
+ bp->b_ops = NULL;
+ memcpy(bp->b_addr, ifp->if_u1.if_data, ifp->if_bytes);
+ return;
+ }
+
+ /*
+ * As this symlink fits in an inode literal area, it must also fit in
+ * the smallest buffer the filesystem supports.
+ */
+ ASSERT(BBTOB(bp->b_length) >=
+ ifp->if_bytes + sizeof(struct xfs_dsymlink_hdr));
+
+ bp->b_ops = &xfs_symlink_buf_ops;
+
+ buf = bp->b_addr;
+ buf += xfs_symlink_hdr_set(mp, ip->i_ino, 0, ifp->if_bytes, bp);
+ memcpy(buf, ifp->if_u1.if_data, ifp->if_bytes);
+}
+
+
diff --git a/fs/xfs/xfs_symlink_remote.h b/fs/xfs/xfs_symlink_remote.h
new file mode 100644
index 0000000..a49da42
--- /dev/null
+++ b/fs/xfs/xfs_symlink_remote.h
@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2012 Red Hat, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_SYMLINK_REMOTE_H
+#define __XFS_SYMLINK_REMOTE_H 1
+
+struct xfs_mount;
+struct xfs_trans;
+struct xfs_inode;
+struct xfs_buf;
+struct xfs_ifork;
+
+#define XFS_SYMLINK_MAGIC 0x58534c4d /* XSLM */
+
+struct xfs_dsymlink_hdr {
+ __be32 sl_magic;
+ __be32 sl_offset;
+ __be32 sl_bytes;
+ __be32 sl_crc;
+ uuid_t sl_uuid;
+ __be64 sl_owner;
+ __be64 sl_blkno;
+ __be64 sl_lsn;
+};
+
+/*
+ * The maximum pathlen is 1024 bytes. Since the minimum file system
+ * blocksize is 512 bytes, we can get a max of 3 extents back from
+ * bmapi when crc headers are taken into account.
+ */
+#define XFS_SYMLINK_MAPS 3
+
+#define XFS_SYMLINK_BUF_SPACE(mp, bufsize) \
+ ((bufsize) - (xfs_sb_version_hascrc(&(mp)->m_sb) ? \
+ sizeof(struct xfs_dsymlink_hdr) : 0))
+
+int xfs_symlink_blocks(struct xfs_mount *mp, int pathlen);
+int xfs_symlink_hdr_set(struct xfs_mount *mp, xfs_ino_t ino, uint32_t offset,
+ uint32_t size, struct xfs_buf *bp);
+bool xfs_symlink_hdr_ok(struct xfs_mount *mp, xfs_ino_t ino, uint32_t offset,
+ uint32_t size, struct xfs_buf *bp);
+void xfs_symlink_local_to_remote(struct xfs_trans *tp, struct xfs_buf *bp,
+ struct xfs_inode *ip, struct xfs_ifork *ifp);
+
+extern const struct xfs_buf_ops xfs_symlink_buf_ops;
+
+#endif /* __XFS_SYMLINK_REMOTE_H */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 47/60] xfs: separate out log format definitions
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (45 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 46/60] xfs: split out the remote symlink handling Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 48/60] xfs: move kernel specific type definitions to xfs.h Dave Chinner
` (14 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The on-disk format definitions for the log are spread randoms
through a couple of header files. Consolidate it all in a single
file that can be shared easily with userspace. This means that
xfs_log.h and xfs_log_priv.h no longer need to be shared with
userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.h | 89 ++++++-----------------
fs/xfs/xfs_log_format.h | 178 ++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_log_priv.h | 146 ++-----------------------------------
fs/xfs/xfs_log_recover.c | 2 +
4 files changed, 207 insertions(+), 208 deletions(-)
create mode 100644 fs/xfs/xfs_log_format.h
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index fb630e4..e63d9e1 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -18,14 +18,29 @@
#ifndef __XFS_LOG_H__
#define __XFS_LOG_H__
-/* get lsn fields */
-#define CYCLE_LSN(lsn) ((uint)((lsn)>>32))
-#define BLOCK_LSN(lsn) ((uint)(lsn))
+#include "xfs_log_format.h"
-/* this is used in a spot where we might otherwise double-endian-flip */
-#define CYCLE_LSN_DISK(lsn) (((__be32 *)&(lsn))[0])
+struct xfs_log_vec {
+ struct xfs_log_vec *lv_next; /* next lv in build list */
+ int lv_niovecs; /* number of iovecs in lv */
+ struct xfs_log_iovec *lv_iovecp; /* iovec array */
+ struct xfs_log_item *lv_item; /* owner */
+ char *lv_buf; /* formatted buffer */
+ int lv_buf_len; /* size of formatted buffer */
+};
+
+#define XFS_LOG_VEC_ORDERED (-1)
+
+/*
+ * Structure used to pass callback function and the function's argument
+ * to the log manager.
+ */
+typedef struct xfs_log_callback {
+ struct xfs_log_callback *cb_next;
+ void (*cb_func)(void *, int);
+ void *cb_arg;
+} xfs_log_callback_t;
-#ifdef __KERNEL__
/*
* By comparing each component, we don't have to worry about extra
* endian issues in treating two 32 bit numbers as one 64 bit number
@@ -59,67 +74,6 @@ static inline xfs_lsn_t _lsn_cmp(xfs_lsn_t lsn1, xfs_lsn_t lsn2)
*/
#define XFS_LOG_SYNC 0x1
-#endif /* __KERNEL__ */
-
-
-/* Log Clients */
-#define XFS_TRANSACTION 0x69
-#define XFS_VOLUME 0x2
-#define XFS_LOG 0xaa
-
-
-/* Region types for iovec's i_type */
-#define XLOG_REG_TYPE_BFORMAT 1
-#define XLOG_REG_TYPE_BCHUNK 2
-#define XLOG_REG_TYPE_EFI_FORMAT 3
-#define XLOG_REG_TYPE_EFD_FORMAT 4
-#define XLOG_REG_TYPE_IFORMAT 5
-#define XLOG_REG_TYPE_ICORE 6
-#define XLOG_REG_TYPE_IEXT 7
-#define XLOG_REG_TYPE_IBROOT 8
-#define XLOG_REG_TYPE_ILOCAL 9
-#define XLOG_REG_TYPE_IATTR_EXT 10
-#define XLOG_REG_TYPE_IATTR_BROOT 11
-#define XLOG_REG_TYPE_IATTR_LOCAL 12
-#define XLOG_REG_TYPE_QFORMAT 13
-#define XLOG_REG_TYPE_DQUOT 14
-#define XLOG_REG_TYPE_QUOTAOFF 15
-#define XLOG_REG_TYPE_LRHEADER 16
-#define XLOG_REG_TYPE_UNMOUNT 17
-#define XLOG_REG_TYPE_COMMIT 18
-#define XLOG_REG_TYPE_TRANSHDR 19
-#define XLOG_REG_TYPE_ICREATE 20
-#define XLOG_REG_TYPE_MAX 20
-
-typedef struct xfs_log_iovec {
- void *i_addr; /* beginning address of region */
- int i_len; /* length in bytes of region */
- uint i_type; /* type of region */
-} xfs_log_iovec_t;
-
-struct xfs_log_vec {
- struct xfs_log_vec *lv_next; /* next lv in build list */
- int lv_niovecs; /* number of iovecs in lv */
- struct xfs_log_iovec *lv_iovecp; /* iovec array */
- struct xfs_log_item *lv_item; /* owner */
- char *lv_buf; /* formatted buffer */
- int lv_buf_len; /* size of formatted buffer */
-};
-
-#define XFS_LOG_VEC_ORDERED (-1)
-
-/*
- * Structure used to pass callback function and the function's argument
- * to the log manager.
- */
-typedef struct xfs_log_callback {
- struct xfs_log_callback *cb_next;
- void (*cb_func)(void *, int);
- void *cb_arg;
-} xfs_log_callback_t;
-
-
-#ifdef __KERNEL__
/* Log manager interfaces */
struct xfs_mount;
struct xlog_in_core;
@@ -188,5 +142,4 @@ void xfs_log_work_queue(struct xfs_mount *mp);
void xfs_log_worker(struct work_struct *work);
void xfs_log_quiesce(struct xfs_mount *mp);
-#endif
#endif /* __XFS_LOG_H__ */
diff --git a/fs/xfs/xfs_log_format.h b/fs/xfs/xfs_log_format.h
new file mode 100644
index 0000000..9f9aeb6
--- /dev/null
+++ b/fs/xfs/xfs_log_format.h
@@ -0,0 +1,178 @@
+/*
+ * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_LOG_FORMAT_H__
+#define __XFS_LOG_FORMAT_H__
+
+typedef __uint32_t xlog_tid_t;
+
+#define XLOG_MIN_ICLOGS 2
+#define XLOG_MAX_ICLOGS 8
+#define XLOG_HEADER_MAGIC_NUM 0xFEEDbabe /* Invalid cycle number */
+#define XLOG_VERSION_1 1
+#define XLOG_VERSION_2 2 /* Large IClogs, Log sunit */
+#define XLOG_VERSION_OKBITS (XLOG_VERSION_1 | XLOG_VERSION_2)
+#define XLOG_MIN_RECORD_BSIZE (16*1024) /* eventually 32k */
+#define XLOG_BIG_RECORD_BSIZE (32*1024) /* 32k buffers */
+#define XLOG_MAX_RECORD_BSIZE (256*1024)
+#define XLOG_HEADER_CYCLE_SIZE (32*1024) /* cycle data in header */
+#define XLOG_MIN_RECORD_BSHIFT 14 /* 16384 == 1 << 14 */
+#define XLOG_BIG_RECORD_BSHIFT 15 /* 32k == 1 << 15 */
+#define XLOG_MAX_RECORD_BSHIFT 18 /* 256k == 1 << 18 */
+#define XLOG_BTOLSUNIT(log, b) (((b)+(log)->l_mp->m_sb.sb_logsunit-1) / \
+ (log)->l_mp->m_sb.sb_logsunit)
+#define XLOG_LSUNITTOB(log, su) ((su) * (log)->l_mp->m_sb.sb_logsunit)
+
+#define XLOG_HEADER_SIZE 512
+
+#define XLOG_REC_SHIFT(log) \
+ BTOBB(1 << (xfs_sb_version_haslogv2(&log->l_mp->m_sb) ? \
+ XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT))
+#define XLOG_TOTAL_REC_SHIFT(log) \
+ BTOBB(XLOG_MAX_ICLOGS << (xfs_sb_version_haslogv2(&log->l_mp->m_sb) ? \
+ XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT))
+
+/* get lsn fields */
+#define CYCLE_LSN(lsn) ((uint)((lsn)>>32))
+#define BLOCK_LSN(lsn) ((uint)(lsn))
+
+/* this is used in a spot where we might otherwise double-endian-flip */
+#define CYCLE_LSN_DISK(lsn) (((__be32 *)&(lsn))[0])
+
+static inline xfs_lsn_t xlog_assign_lsn(uint cycle, uint block)
+{
+ return ((xfs_lsn_t)cycle << 32) | block;
+}
+
+static inline uint xlog_get_cycle(char *ptr)
+{
+ if (be32_to_cpu(*(__be32 *)ptr) == XLOG_HEADER_MAGIC_NUM)
+ return be32_to_cpu(*((__be32 *)ptr + 1));
+ else
+ return be32_to_cpu(*(__be32 *)ptr);
+}
+
+/* Log Clients */
+#define XFS_TRANSACTION 0x69
+#define XFS_VOLUME 0x2
+#define XFS_LOG 0xaa
+
+#define XLOG_UNMOUNT_TYPE 0x556e /* Un for Unmount */
+
+/* Region types for iovec's i_type */
+#define XLOG_REG_TYPE_BFORMAT 1
+#define XLOG_REG_TYPE_BCHUNK 2
+#define XLOG_REG_TYPE_EFI_FORMAT 3
+#define XLOG_REG_TYPE_EFD_FORMAT 4
+#define XLOG_REG_TYPE_IFORMAT 5
+#define XLOG_REG_TYPE_ICORE 6
+#define XLOG_REG_TYPE_IEXT 7
+#define XLOG_REG_TYPE_IBROOT 8
+#define XLOG_REG_TYPE_ILOCAL 9
+#define XLOG_REG_TYPE_IATTR_EXT 10
+#define XLOG_REG_TYPE_IATTR_BROOT 11
+#define XLOG_REG_TYPE_IATTR_LOCAL 12
+#define XLOG_REG_TYPE_QFORMAT 13
+#define XLOG_REG_TYPE_DQUOT 14
+#define XLOG_REG_TYPE_QUOTAOFF 15
+#define XLOG_REG_TYPE_LRHEADER 16
+#define XLOG_REG_TYPE_UNMOUNT 17
+#define XLOG_REG_TYPE_COMMIT 18
+#define XLOG_REG_TYPE_TRANSHDR 19
+#define XLOG_REG_TYPE_ICREATE 20
+#define XLOG_REG_TYPE_MAX 20
+
+/*
+ * Flags to log operation header
+ *
+ * The first write of a new transaction will be preceded with a start
+ * record, XLOG_START_TRANS. Once a transaction is committed, a commit
+ * record is written, XLOG_COMMIT_TRANS. If a single region can not fit into
+ * the remainder of the current active in-core log, it is split up into
+ * multiple regions. Each partial region will be marked with a
+ * XLOG_CONTINUE_TRANS until the last one, which gets marked with XLOG_END_TRANS.
+ *
+ */
+#define XLOG_START_TRANS 0x01 /* Start a new transaction */
+#define XLOG_COMMIT_TRANS 0x02 /* Commit this transaction */
+#define XLOG_CONTINUE_TRANS 0x04 /* Cont this trans into new region */
+#define XLOG_WAS_CONT_TRANS 0x08 /* Cont this trans into new region */
+#define XLOG_END_TRANS 0x10 /* End a continued transaction */
+#define XLOG_UNMOUNT_TRANS 0x20 /* Unmount a filesystem transaction */
+
+
+typedef struct xlog_op_header {
+ __be32 oh_tid; /* transaction id of operation : 4 b */
+ __be32 oh_len; /* bytes in data region : 4 b */
+ __u8 oh_clientid; /* who sent me this : 1 b */
+ __u8 oh_flags; /* : 1 b */
+ __u16 oh_res2; /* 32 bit align : 2 b */
+} xlog_op_header_t;
+
+
+/* valid values for h_fmt */
+#define XLOG_FMT_UNKNOWN 0
+#define XLOG_FMT_LINUX_LE 1
+#define XLOG_FMT_LINUX_BE 2
+#define XLOG_FMT_IRIX_BE 3
+
+/* our fmt */
+#ifdef XFS_NATIVE_HOST
+#define XLOG_FMT XLOG_FMT_LINUX_BE
+#else
+#define XLOG_FMT XLOG_FMT_LINUX_LE
+#endif
+
+typedef struct xlog_rec_header {
+ __be32 h_magicno; /* log record (LR) identifier : 4 */
+ __be32 h_cycle; /* write cycle of log : 4 */
+ __be32 h_version; /* LR version : 4 */
+ __be32 h_len; /* len in bytes; should be 64-bit aligned: 4 */
+ __be64 h_lsn; /* lsn of this LR : 8 */
+ __be64 h_tail_lsn; /* lsn of 1st LR w/ buffers not committed: 8 */
+ __le32 h_crc; /* crc of log record : 4 */
+ __be32 h_prev_block; /* block number to previous LR : 4 */
+ __be32 h_num_logops; /* number of log operations in this LR : 4 */
+ __be32 h_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE];
+ /* new fields */
+ __be32 h_fmt; /* format of log record : 4 */
+ uuid_t h_fs_uuid; /* uuid of FS : 16 */
+ __be32 h_size; /* iclog size : 4 */
+} xlog_rec_header_t;
+
+typedef struct xlog_rec_ext_header {
+ __be32 xh_cycle; /* write cycle of log : 4 */
+ __be32 xh_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; /* : 256 */
+} xlog_rec_ext_header_t;
+
+/*
+ * Quite misnamed, because this union lays out the actual on-disk log buffer.
+ */
+typedef union xlog_in_core2 {
+ xlog_rec_header_t hic_header;
+ xlog_rec_ext_header_t hic_xheader;
+ char hic_sector[XLOG_HEADER_SIZE];
+} xlog_in_core_2_t;
+
+/* not an on-disk structure, but needed by log recovery in userspace */
+typedef struct xfs_log_iovec {
+ void *i_addr; /* beginning address of region */
+ int i_len; /* length in bytes of region */
+ uint i_type; /* type of region */
+} xfs_log_iovec_t;
+
+#endif /* __XFS_LOG_FORMAT_H__ */
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index b9ea262..edd0964 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -24,51 +24,13 @@ struct xlog_ticket;
struct xfs_mount;
/*
- * Macros, structures, prototypes for internal log manager use.
+ * Flags for log structure
*/
-
-#define XLOG_MIN_ICLOGS 2
-#define XLOG_MAX_ICLOGS 8
-#define XLOG_HEADER_MAGIC_NUM 0xFEEDbabe /* Invalid cycle number */
-#define XLOG_VERSION_1 1
-#define XLOG_VERSION_2 2 /* Large IClogs, Log sunit */
-#define XLOG_VERSION_OKBITS (XLOG_VERSION_1 | XLOG_VERSION_2)
-#define XLOG_MIN_RECORD_BSIZE (16*1024) /* eventually 32k */
-#define XLOG_BIG_RECORD_BSIZE (32*1024) /* 32k buffers */
-#define XLOG_MAX_RECORD_BSIZE (256*1024)
-#define XLOG_HEADER_CYCLE_SIZE (32*1024) /* cycle data in header */
-#define XLOG_MIN_RECORD_BSHIFT 14 /* 16384 == 1 << 14 */
-#define XLOG_BIG_RECORD_BSHIFT 15 /* 32k == 1 << 15 */
-#define XLOG_MAX_RECORD_BSHIFT 18 /* 256k == 1 << 18 */
-#define XLOG_BTOLSUNIT(log, b) (((b)+(log)->l_mp->m_sb.sb_logsunit-1) / \
- (log)->l_mp->m_sb.sb_logsunit)
-#define XLOG_LSUNITTOB(log, su) ((su) * (log)->l_mp->m_sb.sb_logsunit)
-
-#define XLOG_HEADER_SIZE 512
-
-#define XLOG_REC_SHIFT(log) \
- BTOBB(1 << (xfs_sb_version_haslogv2(&log->l_mp->m_sb) ? \
- XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT))
-#define XLOG_TOTAL_REC_SHIFT(log) \
- BTOBB(XLOG_MAX_ICLOGS << (xfs_sb_version_haslogv2(&log->l_mp->m_sb) ? \
- XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT))
-
-static inline xfs_lsn_t xlog_assign_lsn(uint cycle, uint block)
-{
- return ((xfs_lsn_t)cycle << 32) | block;
-}
-
-static inline uint xlog_get_cycle(char *ptr)
-{
- if (be32_to_cpu(*(__be32 *)ptr) == XLOG_HEADER_MAGIC_NUM)
- return be32_to_cpu(*((__be32 *)ptr + 1));
- else
- return be32_to_cpu(*(__be32 *)ptr);
-}
-
-#define BLK_AVG(blk1, blk2) ((blk1+blk2) >> 1)
-
-#ifdef __KERNEL__
+#define XLOG_ACTIVE_RECOVERY 0x2 /* in the middle of recovery */
+#define XLOG_RECOVERY_NEEDED 0x4 /* log was recovered */
+#define XLOG_IO_ERROR 0x8 /* log hit an I/O error, and being
+ shutdown */
+#define XLOG_TAIL_WARN 0x10 /* log tail verify warning issued */
/*
* get client id from packed copy.
@@ -101,28 +63,8 @@ static inline uint xlog_get_client_id(__be32 i)
#define XLOG_STATE_IOERROR 0x0080 /* IO error happened in sync'ing log */
#define XLOG_STATE_ALL 0x7FFF /* All possible valid flags */
#define XLOG_STATE_NOTUSED 0x8000 /* This IC log not being used */
-#endif /* __KERNEL__ */
/*
- * Flags to log operation header
- *
- * The first write of a new transaction will be preceded with a start
- * record, XLOG_START_TRANS. Once a transaction is committed, a commit
- * record is written, XLOG_COMMIT_TRANS. If a single region can not fit into
- * the remainder of the current active in-core log, it is split up into
- * multiple regions. Each partial region will be marked with a
- * XLOG_CONTINUE_TRANS until the last one, which gets marked with XLOG_END_TRANS.
- *
- */
-#define XLOG_START_TRANS 0x01 /* Start a new transaction */
-#define XLOG_COMMIT_TRANS 0x02 /* Commit this transaction */
-#define XLOG_CONTINUE_TRANS 0x04 /* Cont this trans into new region */
-#define XLOG_WAS_CONT_TRANS 0x08 /* Cont this trans into new region */
-#define XLOG_END_TRANS 0x10 /* End a continued transaction */
-#define XLOG_UNMOUNT_TRANS 0x20 /* Unmount a filesystem transaction */
-
-#ifdef __KERNEL__
-/*
* Flags to log ticket
*/
#define XLOG_TIC_INITED 0x1 /* has been initialized */
@@ -132,22 +74,6 @@ static inline uint xlog_get_client_id(__be32 i)
{ XLOG_TIC_INITED, "XLOG_TIC_INITED" }, \
{ XLOG_TIC_PERM_RESERV, "XLOG_TIC_PERM_RESERV" }
-#endif /* __KERNEL__ */
-
-#define XLOG_UNMOUNT_TYPE 0x556e /* Un for Unmount */
-
-/*
- * Flags for log structure
- */
-#define XLOG_ACTIVE_RECOVERY 0x2 /* in the middle of recovery */
-#define XLOG_RECOVERY_NEEDED 0x4 /* log was recovered */
-#define XLOG_IO_ERROR 0x8 /* log hit an I/O error, and being
- shutdown */
-#define XLOG_TAIL_WARN 0x10 /* log tail verify warning issued */
-
-typedef __uint32_t xlog_tid_t;
-
-#ifdef __KERNEL__
/*
* Below are states for covering allocation transactions.
* By covering, we mean changing the h_tail_lsn in the last on-disk
@@ -223,7 +149,6 @@ typedef __uint32_t xlog_tid_t;
#define XLOG_COVER_OPS 5
-
/* Ticket reservation region accounting */
#define XLOG_TIC_LEN_MAX 15
@@ -258,64 +183,6 @@ typedef struct xlog_ticket {
xlog_res_t t_res_arr[XLOG_TIC_LEN_MAX]; /* array of res : 8 * 15 */
} xlog_ticket_t;
-#endif
-
-
-typedef struct xlog_op_header {
- __be32 oh_tid; /* transaction id of operation : 4 b */
- __be32 oh_len; /* bytes in data region : 4 b */
- __u8 oh_clientid; /* who sent me this : 1 b */
- __u8 oh_flags; /* : 1 b */
- __u16 oh_res2; /* 32 bit align : 2 b */
-} xlog_op_header_t;
-
-
-/* valid values for h_fmt */
-#define XLOG_FMT_UNKNOWN 0
-#define XLOG_FMT_LINUX_LE 1
-#define XLOG_FMT_LINUX_BE 2
-#define XLOG_FMT_IRIX_BE 3
-
-/* our fmt */
-#ifdef XFS_NATIVE_HOST
-#define XLOG_FMT XLOG_FMT_LINUX_BE
-#else
-#define XLOG_FMT XLOG_FMT_LINUX_LE
-#endif
-
-typedef struct xlog_rec_header {
- __be32 h_magicno; /* log record (LR) identifier : 4 */
- __be32 h_cycle; /* write cycle of log : 4 */
- __be32 h_version; /* LR version : 4 */
- __be32 h_len; /* len in bytes; should be 64-bit aligned: 4 */
- __be64 h_lsn; /* lsn of this LR : 8 */
- __be64 h_tail_lsn; /* lsn of 1st LR w/ buffers not committed: 8 */
- __le32 h_crc; /* crc of log record : 4 */
- __be32 h_prev_block; /* block number to previous LR : 4 */
- __be32 h_num_logops; /* number of log operations in this LR : 4 */
- __be32 h_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE];
- /* new fields */
- __be32 h_fmt; /* format of log record : 4 */
- uuid_t h_fs_uuid; /* uuid of FS : 16 */
- __be32 h_size; /* iclog size : 4 */
-} xlog_rec_header_t;
-
-typedef struct xlog_rec_ext_header {
- __be32 xh_cycle; /* write cycle of log : 4 */
- __be32 xh_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; /* : 256 */
-} xlog_rec_ext_header_t;
-
-#ifdef __KERNEL__
-
-/*
- * Quite misnamed, because this union lays out the actual on-disk log buffer.
- */
-typedef union xlog_in_core2 {
- xlog_rec_header_t hic_header;
- xlog_rec_ext_header_t hic_xheader;
- char hic_sector[XLOG_HEADER_SIZE];
-} xlog_in_core_2_t;
-
/*
* - A log record header is 512 bytes. There is plenty of room to grow the
* xlog_rec_header_t into the reserved space.
@@ -686,6 +553,5 @@ static inline void xlog_wait(wait_queue_head_t *wq, spinlock_t *lock)
schedule();
remove_wait_queue(wq, &wait);
}
-#endif /* __KERNEL__ */
#endif /* __XFS_LOG_PRIV_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 8a810fe..f840cb9 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -54,6 +54,8 @@
#include "xfs_attr_leaf.h"
#include "xfs_attr_remote.h"
+#define BLK_AVG(blk1, blk2) ((blk1+blk2) >> 1)
+
STATIC int
xlog_find_zeroed(
struct xlog *,
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 48/60] xfs: move kernel specific type definitions to xfs.h
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (46 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 47/60] xfs: separate out log format definitions Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 49/60] xfs: make struct xfs_perag kernel only Dave Chinner
` (13 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
xfs_types.h is shared with userspace, so having kernel specific
types defined in it is problematic. Move all the kernel specific
defines to xfs_linux.h so we can remove the __KERNEL__ guards from
xfs_types.h
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_linux.h | 32 ++++++++++++++++++++++++++++++++
fs/xfs/xfs_types.h | 36 ------------------------------------
2 files changed, 32 insertions(+), 36 deletions(-)
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 800f896..3bc52b7 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -32,6 +32,38 @@
# define XFS_BIG_INUMS 0
#endif
+/*
+ * Kernel specific type declarations for XFS
+ */
+typedef signed char __int8_t;
+typedef unsigned char __uint8_t;
+typedef signed short int __int16_t;
+typedef unsigned short int __uint16_t;
+typedef signed int __int32_t;
+typedef unsigned int __uint32_t;
+typedef signed long long int __int64_t;
+typedef unsigned long long int __uint64_t;
+
+typedef __uint32_t inst_t; /* an instruction */
+
+typedef __s64 xfs_off_t; /* <file offset> type */
+typedef unsigned long long xfs_ino_t; /* <inode> type */
+typedef __s64 xfs_daddr_t; /* <disk address> type */
+typedef char * xfs_caddr_t; /* <core address> type */
+typedef __u32 xfs_dev_t;
+typedef __u32 xfs_nlink_t;
+
+/* __psint_t is the same size as a pointer */
+#if (BITS_PER_LONG == 32)
+typedef __int32_t __psint_t;
+typedef __uint32_t __psunsigned_t;
+#elif (BITS_PER_LONG == 64)
+typedef __int64_t __psint_t;
+typedef __uint64_t __psunsigned_t;
+#else
+#error BITS_PER_LONG must be 32 or 64
+#endif
+
#include "xfs_types.h"
#include "kmem.h"
diff --git a/fs/xfs/xfs_types.h b/fs/xfs/xfs_types.h
index dd6bf71..b4a7d5f 100644
--- a/fs/xfs/xfs_types.h
+++ b/fs/xfs/xfs_types.h
@@ -18,42 +18,6 @@
#ifndef __XFS_TYPES_H__
#define __XFS_TYPES_H__
-#ifdef __KERNEL__
-
-/*
- * Additional type declarations for XFS
- */
-typedef signed char __int8_t;
-typedef unsigned char __uint8_t;
-typedef signed short int __int16_t;
-typedef unsigned short int __uint16_t;
-typedef signed int __int32_t;
-typedef unsigned int __uint32_t;
-typedef signed long long int __int64_t;
-typedef unsigned long long int __uint64_t;
-
-typedef __uint32_t inst_t; /* an instruction */
-
-typedef __s64 xfs_off_t; /* <file offset> type */
-typedef unsigned long long xfs_ino_t; /* <inode> type */
-typedef __s64 xfs_daddr_t; /* <disk address> type */
-typedef char * xfs_caddr_t; /* <core address> type */
-typedef __u32 xfs_dev_t;
-typedef __u32 xfs_nlink_t;
-
-/* __psint_t is the same size as a pointer */
-#if (BITS_PER_LONG == 32)
-typedef __int32_t __psint_t;
-typedef __uint32_t __psunsigned_t;
-#elif (BITS_PER_LONG == 64)
-typedef __int64_t __psint_t;
-typedef __uint64_t __psunsigned_t;
-#else
-#error BITS_PER_LONG must be 32 or 64
-#endif
-
-#endif /* __KERNEL__ */
-
typedef __uint32_t prid_t; /* project ID */
typedef __uint32_t xfs_agblock_t; /* blockno in alloc. group */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 49/60] xfs: make struct xfs_perag kernel only
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (47 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 48/60] xfs: move kernel specific type definitions to xfs.h Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 50/60] xfs: create xfs_bmap_util.[ch] Dave Chinner
` (12 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The struct xfs_perag has many kernel-only definitions in it,
requiring a __KERNEL__ guard so userspace can use it to. Move it to
xfs_mount.h so that it it kernel-only, and let userspace redefine
it's own version of the structure containing only what it needs.
This gets rid of another __KERNEL__ check in the XFS header files.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_acl.c | 1 +
fs/xfs/xfs_ag.h | 53 -------------------------------------------
fs/xfs/xfs_mount.h | 52 ++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_symlink_remote.c | 1 +
4 files changed, 54 insertions(+), 53 deletions(-)
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index b9b9740..1ec23d0 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -20,6 +20,7 @@
#include "xfs_attr.h"
#include "xfs_bmap_btree.h"
#include "xfs_inode.h"
+#include "xfs_ag.h"
#include "xfs_sb.h"
#include "xfs_log.h"
#include "xfs_trans.h"
diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
index 317aa86..1cb740a 100644
--- a/fs/xfs/xfs_ag.h
+++ b/fs/xfs/xfs_ag.h
@@ -227,59 +227,6 @@ typedef struct xfs_agfl {
} xfs_agfl_t;
/*
- * Per-ag incore structure, copies of information in agf and agi,
- * to improve the performance of allocation group selection.
- */
-#define XFS_PAGB_NUM_SLOTS 128
-
-typedef struct xfs_perag {
- struct xfs_mount *pag_mount; /* owner filesystem */
- xfs_agnumber_t pag_agno; /* AG this structure belongs to */
- atomic_t pag_ref; /* perag reference count */
- char pagf_init; /* this agf's entry is initialized */
- char pagi_init; /* this agi's entry is initialized */
- char pagf_metadata; /* the agf is preferred to be metadata */
- char pagi_inodeok; /* The agi is ok for inodes */
- __uint8_t pagf_levels[XFS_BTNUM_AGF];
- /* # of levels in bno & cnt btree */
- __uint32_t pagf_flcount; /* count of blocks in freelist */
- xfs_extlen_t pagf_freeblks; /* total free blocks */
- xfs_extlen_t pagf_longest; /* longest free space */
- __uint32_t pagf_btreeblks; /* # of blocks held in AGF btrees */
- xfs_agino_t pagi_freecount; /* number of free inodes */
- xfs_agino_t pagi_count; /* number of allocated inodes */
-
- /*
- * Inode allocation search lookup optimisation.
- * If the pagino matches, the search for new inodes
- * doesn't need to search the near ones again straight away
- */
- xfs_agino_t pagl_pagino;
- xfs_agino_t pagl_leftrec;
- xfs_agino_t pagl_rightrec;
-#ifdef __KERNEL__
- spinlock_t pagb_lock; /* lock for pagb_tree */
- struct rb_root pagb_tree; /* ordered tree of busy extents */
-
- atomic_t pagf_fstrms; /* # of filestreams active in this AG */
-
- spinlock_t pag_ici_lock; /* incore inode cache lock */
- struct radix_tree_root pag_ici_root; /* incore inode cache root */
- int pag_ici_reclaimable; /* reclaimable inodes */
- struct mutex pag_ici_reclaim_lock; /* serialisation point */
- unsigned long pag_ici_reclaim_cursor; /* reclaim restart point */
-
- /* buffer cache index */
- spinlock_t pag_buf_lock; /* lock for pag_buf_tree */
- struct rb_root pag_buf_tree; /* ordered tree of active buffers */
-
- /* for rcu-safe freeing */
- struct rcu_head rcu_head;
-#endif
- int pagb_count; /* pagb slots in use */
-} xfs_perag_t;
-
-/*
* tags for inode radix tree
*/
#define XFS_ICI_NO_TAG (-1) /* special flag for an untagged lookup
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e3acebd..7a45a75 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -322,6 +322,58 @@ typedef struct xfs_mod_sb {
int64_t msb_delta; /* Change to make to specified field */
} xfs_mod_sb_t;
+/*
+ * Per-ag incore structure, copies of information in agf and agi, to improve the
+ * performance of allocation group selection. This is defined for the kernel
+ * only, and hence is defined here instead of in xfs_ag.h. You need the struct
+ * xfs_mount to be defined to look up a xfs_perag anyway (via mp->m_perag_tree),
+ * so this doesn't introduce any strange header file dependencies.
+ */
+typedef struct xfs_perag {
+ struct xfs_mount *pag_mount; /* owner filesystem */
+ xfs_agnumber_t pag_agno; /* AG this structure belongs to */
+ atomic_t pag_ref; /* perag reference count */
+ char pagf_init; /* this agf's entry is initialized */
+ char pagi_init; /* this agi's entry is initialized */
+ char pagf_metadata; /* the agf is preferred to be metadata */
+ char pagi_inodeok; /* The agi is ok for inodes */
+ __uint8_t pagf_levels[XFS_BTNUM_AGF];
+ /* # of levels in bno & cnt btree */
+ __uint32_t pagf_flcount; /* count of blocks in freelist */
+ xfs_extlen_t pagf_freeblks; /* total free blocks */
+ xfs_extlen_t pagf_longest; /* longest free space */
+ __uint32_t pagf_btreeblks; /* # of blocks held in AGF btrees */
+ xfs_agino_t pagi_freecount; /* number of free inodes */
+ xfs_agino_t pagi_count; /* number of allocated inodes */
+
+ /*
+ * Inode allocation search lookup optimisation.
+ * If the pagino matches, the search for new inodes
+ * doesn't need to search the near ones again straight away
+ */
+ xfs_agino_t pagl_pagino;
+ xfs_agino_t pagl_leftrec;
+ xfs_agino_t pagl_rightrec;
+ spinlock_t pagb_lock; /* lock for pagb_tree */
+ struct rb_root pagb_tree; /* ordered tree of busy extents */
+
+ atomic_t pagf_fstrms; /* # of filestreams active in this AG */
+
+ spinlock_t pag_ici_lock; /* incore inode cache lock */
+ struct radix_tree_root pag_ici_root; /* incore inode cache root */
+ int pag_ici_reclaimable; /* reclaimable inodes */
+ struct mutex pag_ici_reclaim_lock; /* serialisation point */
+ unsigned long pag_ici_reclaim_cursor; /* reclaim restart point */
+
+ /* buffer cache index */
+ spinlock_t pag_buf_lock; /* lock for pag_buf_tree */
+ struct rb_root pag_buf_tree; /* ordered tree of active buffers */
+
+ /* for rcu-safe freeing */
+ struct rcu_head rcu_head;
+ int pagb_count; /* pagb slots in use */
+} xfs_perag_t;
+
extern int xfs_log_sbcount(xfs_mount_t *);
extern __uint64_t xfs_default_resblks(xfs_mount_t *mp);
extern int xfs_mountfs(xfs_mount_t *mp);
diff --git a/fs/xfs/xfs_symlink_remote.c b/fs/xfs/xfs_symlink_remote.c
index a369025..a6ea18f 100644
--- a/fs/xfs/xfs_symlink_remote.c
+++ b/fs/xfs/xfs_symlink_remote.c
@@ -20,6 +20,7 @@
#include "xfs_fs.h"
#include "xfs_log.h"
#include "xfs_trans.h"
+#include "xfs_ag.h"
#include "xfs_sb.h"
#include "xfs_mount.h"
#include "xfs_bmap_btree.h"
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 50/60] xfs: create xfs_bmap_util.[ch]
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (48 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 49/60] xfs: make struct xfs_perag kernel only Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:50 ` [PATCH 51/60] xfs: introduce xfs_quota_defs.h Dave Chinner
` (11 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
There is a bunch of code in xfs_bmap.c that is kernel specific and
not shared with userspace. to minimise the difference between the
kernel and userspace code, shift this unshared code to
xfs_bmap_util.c, and the declarations to xfs_bmap_util.h.
The biggest issue here is xfs_bmap_finish() - userspce has it's own
definition of this function, and so we need to move it out of
xfs_bmap.[ch]. This means several other files need to include
xfs_bmap_util.c as well.
It also introduces and interesting dance for the stack switching
code in xfs_bmapi_allocate(). The stack switching/workqueue code is
actually moved to xfs_bmap_util.c, so that userspace can simply use
a #define in a header file to connect the dots without needing to
know about the stack switch code at all.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_aops.c | 1 +
fs/xfs/xfs_attr.c | 1 +
fs/xfs/xfs_attr_remote.c | 1 +
fs/xfs/xfs_bmap.c | 448 +------------------------------------------
fs/xfs/xfs_bmap.h | 50 -----
fs/xfs/xfs_bmap_util.c | 478 ++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_bmap_util.h | 89 +++++++++
fs/xfs/xfs_dquot.c | 1 +
fs/xfs/xfs_extent_ops.c | 1 +
fs/xfs/xfs_filestream.c | 5 +-
fs/xfs/xfs_inode_ops.c | 1 +
fs/xfs/xfs_iomap.c | 1 +
fs/xfs/xfs_rtalloc.c | 1 +
fs/xfs/xfs_symlink.c | 1 +
fs/xfs/xfs_trans_resv.c | 1 +
16 files changed, 590 insertions(+), 491 deletions(-)
create mode 100644 fs/xfs/xfs_bmap_util.c
create mode 100644 fs/xfs/xfs_bmap_util.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5fbd5a1..c5c38a9 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -30,6 +30,7 @@ xfs-y += xfs_aops.o \
xfs_attr_inactive.o \
xfs_attr_list.o \
xfs_bit.o \
+ xfs_bmap_util.o \
xfs_buf.o \
xfs_dir2_readdir.o \
xfs_discard.o \
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 82d2bfd..69fdde8 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -30,6 +30,7 @@
#include "xfs_iomap.h"
#include "xfs_trace.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include <linux/aio.h>
#include <linux/gfp.h>
#include <linux/mpage.h>
diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index f61d2bb..f4cc54c 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -33,6 +33,7 @@
#include "xfs_alloc.h"
#include "xfs_inode_item.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_attr.h"
#include "xfs_attr_leaf.h"
#include "xfs_attr_remote.h"
diff --git a/fs/xfs/xfs_attr_remote.c b/fs/xfs/xfs_attr_remote.c
index b8ab302..712a502 100644
--- a/fs/xfs/xfs_attr_remote.c
+++ b/fs/xfs/xfs_attr_remote.c
@@ -34,6 +34,7 @@
#include "xfs_alloc.h"
#include "xfs_inode_item.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_attr.h"
#include "xfs_attr_leaf.h"
#include "xfs_attr_remote.h"
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 8f57042..2be9a04 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -40,6 +40,7 @@
#include "xfs_extfree_item.h"
#include "xfs_alloc.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_rtalloc.h"
#include "xfs_error.h"
#include "xfs_attr_leaf.h"
@@ -108,19 +109,6 @@ xfs_bmap_compute_maxlevels(
mp->m_bm_maxlevels[whichfork] = level;
}
-/*
- * Convert the given file system block to a disk block. We have to treat it
- * differently based on whether the file is a real time file or not, because the
- * bmap code does.
- */
-xfs_daddr_t
-xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb)
-{
- return (XFS_IS_REALTIME_INODE(ip) ? \
- (xfs_daddr_t)XFS_FSB_TO_BB((ip)->i_mount, (fsb)) : \
- XFS_FSB_TO_DADDR((ip)->i_mount, (fsb)));
-}
-
STATIC int /* error */
xfs_bmbt_lookup_eq(
struct xfs_btree_cur *cur,
@@ -263,173 +251,6 @@ xfs_bmap_forkoff_reset(
}
/*
- * Extent tree block counting routines.
- */
-
-/*
- * Count leaf blocks given a range of extent records.
- */
-STATIC void
-xfs_bmap_count_leaves(
- xfs_ifork_t *ifp,
- xfs_extnum_t idx,
- int numrecs,
- int *count)
-{
- int b;
-
- for (b = 0; b < numrecs; b++) {
- xfs_bmbt_rec_host_t *frp = xfs_iext_get_ext(ifp, idx + b);
- *count += xfs_bmbt_get_blockcount(frp);
- }
-}
-
-/*
- * Count leaf blocks given a range of extent records originally
- * in btree format.
- */
-STATIC void
-xfs_bmap_disk_count_leaves(
- struct xfs_mount *mp,
- struct xfs_btree_block *block,
- int numrecs,
- int *count)
-{
- int b;
- xfs_bmbt_rec_t *frp;
-
- for (b = 1; b <= numrecs; b++) {
- frp = XFS_BMBT_REC_ADDR(mp, block, b);
- *count += xfs_bmbt_disk_get_blockcount(frp);
- }
-}
-
-/*
- * Recursively walks each level of a btree
- * to count total fsblocks is use.
- */
-STATIC int /* error */
-xfs_bmap_count_tree(
- xfs_mount_t *mp, /* file system mount point */
- xfs_trans_t *tp, /* transaction pointer */
- xfs_ifork_t *ifp, /* inode fork pointer */
- xfs_fsblock_t blockno, /* file system block number */
- int levelin, /* level in btree */
- int *count) /* Count of blocks */
-{
- int error;
- xfs_buf_t *bp, *nbp;
- int level = levelin;
- __be64 *pp;
- xfs_fsblock_t bno = blockno;
- xfs_fsblock_t nextbno;
- struct xfs_btree_block *block, *nextblock;
- int numrecs;
-
- error = xfs_btree_read_bufl(mp, tp, bno, 0, &bp, XFS_BMAP_BTREE_REF,
- &xfs_bmbt_buf_ops);
- if (error)
- return error;
- *count += 1;
- block = XFS_BUF_TO_BLOCK(bp);
-
- if (--level) {
- /* Not at node above leaves, count this level of nodes */
- nextbno = be64_to_cpu(block->bb_u.l.bb_rightsib);
- while (nextbno != NULLFSBLOCK) {
- error = xfs_btree_read_bufl(mp, tp, nextbno, 0, &nbp,
- XFS_BMAP_BTREE_REF,
- &xfs_bmbt_buf_ops);
- if (error)
- return error;
- *count += 1;
- nextblock = XFS_BUF_TO_BLOCK(nbp);
- nextbno = be64_to_cpu(nextblock->bb_u.l.bb_rightsib);
- xfs_trans_brelse(tp, nbp);
- }
-
- /* Dive to the next level */
- pp = XFS_BMBT_PTR_ADDR(mp, block, 1, mp->m_bmap_dmxr[1]);
- bno = be64_to_cpu(*pp);
- if (unlikely((error =
- xfs_bmap_count_tree(mp, tp, ifp, bno, level, count)) < 0)) {
- xfs_trans_brelse(tp, bp);
- XFS_ERROR_REPORT("xfs_bmap_count_tree(1)",
- XFS_ERRLEVEL_LOW, mp);
- return XFS_ERROR(EFSCORRUPTED);
- }
- xfs_trans_brelse(tp, bp);
- } else {
- /* count all level 1 nodes and their leaves */
- for (;;) {
- nextbno = be64_to_cpu(block->bb_u.l.bb_rightsib);
- numrecs = be16_to_cpu(block->bb_numrecs);
- xfs_bmap_disk_count_leaves(mp, block, numrecs, count);
- xfs_trans_brelse(tp, bp);
- if (nextbno == NULLFSBLOCK)
- break;
- bno = nextbno;
- error = xfs_btree_read_bufl(mp, tp, bno, 0, &bp,
- XFS_BMAP_BTREE_REF,
- &xfs_bmbt_buf_ops);
- if (error)
- return error;
- *count += 1;
- block = XFS_BUF_TO_BLOCK(bp);
- }
- }
- return 0;
-}
-
-/*
- * Count fsblocks of the given fork.
- */
-int /* error */
-xfs_bmap_count_blocks(
- xfs_trans_t *tp, /* transaction pointer */
- xfs_inode_t *ip, /* incore inode */
- int whichfork, /* data or attr fork */
- int *count) /* out: count of blocks */
-{
- struct xfs_btree_block *block; /* current btree block */
- xfs_fsblock_t bno; /* block # of "block" */
- xfs_ifork_t *ifp; /* fork structure */
- int level; /* btree level, for checking */
- xfs_mount_t *mp; /* file system mount structure */
- __be64 *pp; /* pointer to block address */
-
- bno = NULLFSBLOCK;
- mp = ip->i_mount;
- ifp = XFS_IFORK_PTR(ip, whichfork);
- if ( XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS ) {
- xfs_bmap_count_leaves(ifp, 0,
- ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t),
- count);
- return 0;
- }
-
- /*
- * Root level must use BMAP_BROOT_PTR_ADDR macro to get ptr out.
- */
- block = ifp->if_broot;
- level = be16_to_cpu(block->bb_level);
- ASSERT(level > 0);
- pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, 1, ifp->if_broot_bytes);
- bno = be64_to_cpu(*pp);
- ASSERT(bno != NULLDFSBNO);
- ASSERT(XFS_FSB_TO_AGNO(mp, bno) < mp->m_sb.sb_agcount);
- ASSERT(XFS_FSB_TO_AGBNO(mp, bno) < mp->m_sb.sb_agblocks);
-
- if (unlikely(xfs_bmap_count_tree(mp, tp, ifp, bno, level, count) < 0)) {
- XFS_ERROR_REPORT("xfs_bmap_count_blocks(2)", XFS_ERRLEVEL_LOW,
- mp);
- return XFS_ERROR(EFSCORRUPTED);
- }
-
- return 0;
-}
-
-/*
* Debug/sanity checking code
*/
@@ -823,7 +644,7 @@ xfs_bmap_add_free(
* Remove the entry "free" from the free item list. Prev points to the
* previous entry, unless "free" is the head of the list.
*/
-STATIC void
+void
xfs_bmap_del_free(
xfs_bmap_free_t *flist, /* free item list header */
xfs_bmap_free_item_t *prev, /* previous item on list, if any */
@@ -837,92 +658,6 @@ xfs_bmap_del_free(
kmem_zone_free(xfs_bmap_free_item_zone, free);
}
-
-/*
- * Routine to be called at transaction's end by xfs_bmapi, xfs_bunmapi
- * caller. Frees all the extents that need freeing, which must be done
- * last due to locking considerations. We never free any extents in
- * the first transaction.
- *
- * Return 1 if the given transaction was committed and a new one
- * started, and 0 otherwise in the committed parameter.
- */
-int /* error */
-xfs_bmap_finish(
- xfs_trans_t **tp, /* transaction pointer addr */
- xfs_bmap_free_t *flist, /* i/o: list extents to free */
- int *committed) /* xact committed or not */
-{
- xfs_efd_log_item_t *efd; /* extent free data */
- xfs_efi_log_item_t *efi; /* extent free intention */
- int error; /* error return value */
- xfs_bmap_free_item_t *free; /* free extent item */
- unsigned int logres; /* new log reservation */
- unsigned int logcount; /* new log count */
- xfs_mount_t *mp; /* filesystem mount structure */
- xfs_bmap_free_item_t *next; /* next item on free list */
- xfs_trans_t *ntp; /* new transaction pointer */
-
- ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES);
- if (flist->xbf_count == 0) {
- *committed = 0;
- return 0;
- }
- ntp = *tp;
- efi = xfs_trans_get_efi(ntp, flist->xbf_count);
- for (free = flist->xbf_first; free; free = free->xbfi_next)
- xfs_trans_log_efi_extent(ntp, efi, free->xbfi_startblock,
- free->xbfi_blockcount);
- logres = ntp->t_log_res;
- logcount = ntp->t_log_count;
- ntp = xfs_trans_dup(*tp);
- error = xfs_trans_commit(*tp, 0);
- *tp = ntp;
- *committed = 1;
- /*
- * We have a new transaction, so we should return committed=1,
- * even though we're returning an error.
- */
- if (error)
- return error;
-
- /*
- * transaction commit worked ok so we can drop the extra ticket
- * reference that we gained in xfs_trans_dup()
- */
- xfs_log_ticket_put(ntp->t_ticket);
-
- if ((error = xfs_trans_reserve(ntp, 0, logres, 0, XFS_TRANS_PERM_LOG_RES,
- logcount)))
- return error;
- efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
- for (free = flist->xbf_first; free != NULL; free = next) {
- next = free->xbfi_next;
- if ((error = xfs_free_extent(ntp, free->xbfi_startblock,
- free->xbfi_blockcount))) {
- /*
- * The bmap free list will be cleaned up at a
- * higher level. The EFI will be canceled when
- * this transaction is aborted.
- * Need to force shutdown here to make sure it
- * happens, since this transaction may not be
- * dirty yet.
- */
- mp = ntp->t_mountp;
- if (!XFS_FORCED_SHUTDOWN(mp))
- xfs_force_shutdown(mp,
- (error == EFSCORRUPTED) ?
- SHUTDOWN_CORRUPT_INCORE :
- SHUTDOWN_META_IO_ERROR);
- return error;
- }
- xfs_trans_log_efd_extent(ntp, efd, free->xbfi_startblock,
- free->xbfi_blockcount);
- xfs_bmap_del_free(flist, NULL, free);
- }
- return 0;
-}
-
/*
* Free up any items left in the list.
*/
@@ -1863,7 +1598,7 @@ xfs_bmap_last_before(
return 0;
}
-STATIC int
+int
xfs_bmap_last_extent(
struct xfs_trans *tp,
struct xfs_inode *ip,
@@ -1927,29 +1662,6 @@ xfs_bmap_isaeof(
}
/*
- * Check if the endoff is outside the last extent. If so the caller will grow
- * the allocation to a stripe unit boundary. All offsets are considered outside
- * the end of file for an empty fork, so 1 is returned in *eof in that case.
- */
-int
-xfs_bmap_eof(
- struct xfs_inode *ip,
- xfs_fileoff_t endoff,
- int whichfork,
- int *eof)
-{
- struct xfs_bmbt_irec rec;
- int error;
-
- error = xfs_bmap_last_extent(NULL, ip, whichfork, &rec, eof);
- if (error || *eof)
- return error;
-
- *eof = endoff >= rec.br_startoff + rec.br_blockcount;
- return 0;
-}
-
-/*
* Returns the file-relative block number of the first block past eof in
* the file. This is not based on i_size, it is based on the extent records.
* Returns 0 for local files, as they do not have extent records.
@@ -3488,7 +3200,7 @@ done:
/*
* Adjust the size of the new extent based on di_extsize and rt extsize.
*/
-STATIC int
+int
xfs_bmap_extsize_align(
xfs_mount_t *mp,
xfs_bmbt_irec_t *gotp, /* next extent pointer */
@@ -3650,9 +3362,9 @@ xfs_bmap_extsize_align(
#define XFS_ALLOC_GAP_UNITS 4
-STATIC void
+void
xfs_bmap_adjacent(
- xfs_bmalloca_t *ap) /* bmap alloc argument struct */
+ struct xfs_bmalloca *ap) /* bmap alloc argument struct */
{
xfs_fsblock_t adjust; /* adjustment to block numbers */
xfs_agnumber_t fb_agno; /* ag number of ap->firstblock */
@@ -3799,109 +3511,6 @@ xfs_bmap_adjacent(
}
STATIC int
-xfs_bmap_rtalloc(
- xfs_bmalloca_t *ap) /* bmap alloc argument struct */
-{
- xfs_alloctype_t atype = 0; /* type for allocation routines */
- int error; /* error return value */
- xfs_mount_t *mp; /* mount point structure */
- xfs_extlen_t prod = 0; /* product factor for allocators */
- xfs_extlen_t ralen = 0; /* realtime allocation length */
- xfs_extlen_t align; /* minimum allocation alignment */
- xfs_rtblock_t rtb;
-
- mp = ap->ip->i_mount;
- align = xfs_get_extsz_hint(ap->ip);
- prod = align / mp->m_sb.sb_rextsize;
- error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev,
- align, 1, ap->eof, 0,
- ap->conv, &ap->offset, &ap->length);
- if (error)
- return error;
- ASSERT(ap->length);
- ASSERT(ap->length % mp->m_sb.sb_rextsize == 0);
-
- /*
- * If the offset & length are not perfectly aligned
- * then kill prod, it will just get us in trouble.
- */
- if (do_mod(ap->offset, align) || ap->length % align)
- prod = 1;
- /*
- * Set ralen to be the actual requested length in rtextents.
- */
- ralen = ap->length / mp->m_sb.sb_rextsize;
- /*
- * If the old value was close enough to MAXEXTLEN that
- * we rounded up to it, cut it back so it's valid again.
- * Note that if it's a really large request (bigger than
- * MAXEXTLEN), we don't hear about that number, and can't
- * adjust the starting point to match it.
- */
- if (ralen * mp->m_sb.sb_rextsize >= MAXEXTLEN)
- ralen = MAXEXTLEN / mp->m_sb.sb_rextsize;
-
- /*
- * Lock out other modifications to the RT bitmap inode.
- */
- xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(ap->tp, mp->m_rbmip, XFS_ILOCK_EXCL);
-
- /*
- * If it's an allocation to an empty file at offset 0,
- * pick an extent that will space things out in the rt area.
- */
- if (ap->eof && ap->offset == 0) {
- xfs_rtblock_t uninitialized_var(rtx); /* realtime extent no */
-
- error = xfs_rtpick_extent(mp, ap->tp, ralen, &rtx);
- if (error)
- return error;
- ap->blkno = rtx * mp->m_sb.sb_rextsize;
- } else {
- ap->blkno = 0;
- }
-
- xfs_bmap_adjacent(ap);
-
- /*
- * Realtime allocation, done through xfs_rtallocate_extent.
- */
- atype = ap->blkno == 0 ? XFS_ALLOCTYPE_ANY_AG : XFS_ALLOCTYPE_NEAR_BNO;
- do_div(ap->blkno, mp->m_sb.sb_rextsize);
- rtb = ap->blkno;
- ap->length = ralen;
- if ((error = xfs_rtallocate_extent(ap->tp, ap->blkno, 1, ap->length,
- &ralen, atype, ap->wasdel, prod, &rtb)))
- return error;
- if (rtb == NULLFSBLOCK && prod > 1 &&
- (error = xfs_rtallocate_extent(ap->tp, ap->blkno, 1,
- ap->length, &ralen, atype,
- ap->wasdel, 1, &rtb)))
- return error;
- ap->blkno = rtb;
- if (ap->blkno != NULLFSBLOCK) {
- ap->blkno *= mp->m_sb.sb_rextsize;
- ralen *= mp->m_sb.sb_rextsize;
- ap->length = ralen;
- ap->ip->i_d.di_nblocks += ralen;
- xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
- if (ap->wasdel)
- ap->ip->i_delayed_blks -= ralen;
- /*
- * Adjust the disk quota also. This was reserved
- * earlier.
- */
- xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
- ap->wasdel ? XFS_TRANS_DQ_DELRTBCOUNT :
- XFS_TRANS_DQ_RTBCOUNT, (long) ralen);
- } else {
- ap->length = 0;
- }
- return 0;
-}
-
-STATIC int
xfs_bmap_btalloc_nullfb(
struct xfs_bmalloca *ap,
struct xfs_alloc_arg *args,
@@ -4018,7 +3627,7 @@ xfs_bmap_btalloc_nullfb(
STATIC int
xfs_bmap_btalloc(
- xfs_bmalloca_t *ap) /* bmap alloc argument struct */
+ struct xfs_bmalloca *ap) /* bmap alloc argument struct */
{
xfs_mount_t *mp; /* mount point structure */
xfs_alloctype_t atype = 0; /* type for allocation routines */
@@ -4250,7 +3859,7 @@ xfs_bmap_btalloc(
*/
STATIC int
xfs_bmap_alloc(
- xfs_bmalloca_t *ap) /* bmap alloc argument struct */
+ struct xfs_bmalloca *ap) /* bmap alloc argument struct */
{
if (XFS_IS_REALTIME_INODE(ap->ip) && ap->userdata)
return xfs_bmap_rtalloc(ap);
@@ -4638,7 +4247,7 @@ xfs_bmapi_delay(
}
-STATIC int
+int
__xfs_bmapi_allocate(
struct xfs_bmalloca *bma)
{
@@ -4753,45 +4362,6 @@ __xfs_bmapi_allocate(
return 0;
}
-static void
-xfs_bmapi_allocate_worker(
- struct work_struct *work)
-{
- struct xfs_bmalloca *args = container_of(work,
- struct xfs_bmalloca, work);
- unsigned long pflags;
-
- /* we are in a transaction context here */
- current_set_flags_nested(&pflags, PF_FSTRANS);
-
- args->result = __xfs_bmapi_allocate(args);
- complete(args->done);
-
- current_restore_flags_nested(&pflags, PF_FSTRANS);
-}
-
-/*
- * Some allocation requests often come in with little stack to work on. Push
- * them off to a worker thread so there is lots of stack to use. Otherwise just
- * call directly to avoid the context switch overhead here.
- */
-int
-xfs_bmapi_allocate(
- struct xfs_bmalloca *args)
-{
- DECLARE_COMPLETION_ONSTACK(done);
-
- if (!args->stack_switch)
- return __xfs_bmapi_allocate(args);
-
-
- args->done = &done;
- INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker);
- queue_work(xfs_alloc_wq, &args->work);
- wait_for_completion(&done);
- return args->result;
-}
-
STATIC int
xfs_bmapi_convert_unwritten(
struct xfs_bmalloca *bma,
diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
index a2e50c6..a64d6f7 100644
--- a/fs/xfs/xfs_bmap.h
+++ b/fs/xfs/xfs_bmap.h
@@ -107,43 +107,6 @@ static inline void xfs_bmap_init(xfs_bmap_free_t *flp, xfs_fsblock_t *fbp)
}
/*
- * Argument structure for xfs_bmap_alloc.
- */
-typedef struct xfs_bmalloca {
- xfs_fsblock_t *firstblock; /* i/o first block allocated */
- struct xfs_bmap_free *flist; /* bmap freelist */
- struct xfs_trans *tp; /* transaction pointer */
- struct xfs_inode *ip; /* incore inode pointer */
- struct xfs_bmbt_irec prev; /* extent before the new one */
- struct xfs_bmbt_irec got; /* extent after, or delayed */
-
- xfs_fileoff_t offset; /* offset in file filling in */
- xfs_extlen_t length; /* i/o length asked/allocated */
- xfs_fsblock_t blkno; /* starting block of new extent */
-
- struct xfs_btree_cur *cur; /* btree cursor */
- xfs_extnum_t idx; /* current extent index */
- int nallocs;/* number of extents alloc'd */
- int logflags;/* flags for transaction logging */
-
- xfs_extlen_t total; /* total blocks needed for xaction */
- xfs_extlen_t minlen; /* minimum allocation size (blocks) */
- xfs_extlen_t minleft; /* amount must be left after alloc */
- char eof; /* set if allocating past last extent */
- char wasdel; /* replacing a delayed allocation */
- char userdata;/* set if is user data */
- char aeof; /* allocated space at eof */
- char conv; /* overwriting unwritten extents */
- char stack_switch;
- int flags;
-#ifdef __KERNEL__
- struct completion *done;
- struct work_struct work;
- int result;
-#endif /* __KERNEL__ */
-} xfs_bmalloca_t;
-
-/*
* Flags for xfs_bmap_add_extent*.
*/
#define BMAP_LEFT_CONTIG (1 << 0)
@@ -206,17 +169,4 @@ int xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
xfs_extnum_t num);
uint xfs_default_attroffset(struct xfs_inode *ip);
-#ifdef __KERNEL__
-
-int xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
- int *committed);
-int xfs_bmap_eof(struct xfs_inode *ip, xfs_fileoff_t endoff,
- int whichfork, int *eof);
-int xfs_bmap_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip,
- int whichfork, int *count);
-
-xfs_daddr_t xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb);
-
-#endif /* __KERNEL__ */
-
#endif /* __XFS_BMAP_H__ */
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
new file mode 100644
index 0000000..0571a1c
--- /dev/null
+++ b/fs/xfs/xfs_bmap_util.c
@@ -0,0 +1,478 @@
+/*
+ * Copyright (c) 2000-2006 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_inum.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_dinode.h"
+#include "xfs_inode.h"
+#include "xfs_btree.h"
+#include "xfs_extfree_item.h"
+#include "xfs_alloc.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_rtalloc.h"
+#include "xfs_error.h"
+#include "xfs_quota.h"
+#include "xfs_trans_space.h"
+#include "xfs_trace.h"
+
+/* Kernel only BMAP related definitions and functions */
+
+/*
+ * Convert the given file system block to a disk block. We have to treat it
+ * differently based on whether the file is a real time file or not, because the
+ * bmap code does.
+ */
+xfs_daddr_t
+xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb)
+{
+ return (XFS_IS_REALTIME_INODE(ip) ? \
+ (xfs_daddr_t)XFS_FSB_TO_BB((ip)->i_mount, (fsb)) : \
+ XFS_FSB_TO_DADDR((ip)->i_mount, (fsb)));
+}
+
+/*
+ * Routine to be called at transaction's end by xfs_bmapi, xfs_bunmapi
+ * caller. Frees all the extents that need freeing, which must be done
+ * last due to locking considerations. We never free any extents in
+ * the first transaction.
+ *
+ * Return 1 if the given transaction was committed and a new one
+ * started, and 0 otherwise in the committed parameter.
+ */
+int /* error */
+xfs_bmap_finish(
+ xfs_trans_t **tp, /* transaction pointer addr */
+ xfs_bmap_free_t *flist, /* i/o: list extents to free */
+ int *committed) /* xact committed or not */
+{
+ xfs_efd_log_item_t *efd; /* extent free data */
+ xfs_efi_log_item_t *efi; /* extent free intention */
+ int error; /* error return value */
+ xfs_bmap_free_item_t *free; /* free extent item */
+ unsigned int logres; /* new log reservation */
+ unsigned int logcount; /* new log count */
+ xfs_mount_t *mp; /* filesystem mount structure */
+ xfs_bmap_free_item_t *next; /* next item on free list */
+ xfs_trans_t *ntp; /* new transaction pointer */
+
+ ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES);
+ if (flist->xbf_count == 0) {
+ *committed = 0;
+ return 0;
+ }
+ ntp = *tp;
+ efi = xfs_trans_get_efi(ntp, flist->xbf_count);
+ for (free = flist->xbf_first; free; free = free->xbfi_next)
+ xfs_trans_log_efi_extent(ntp, efi, free->xbfi_startblock,
+ free->xbfi_blockcount);
+ logres = ntp->t_log_res;
+ logcount = ntp->t_log_count;
+ ntp = xfs_trans_dup(*tp);
+ error = xfs_trans_commit(*tp, 0);
+ *tp = ntp;
+ *committed = 1;
+ /*
+ * We have a new transaction, so we should return committed=1,
+ * even though we're returning an error.
+ */
+ if (error)
+ return error;
+
+ /*
+ * transaction commit worked ok so we can drop the extra ticket
+ * reference that we gained in xfs_trans_dup()
+ */
+ xfs_log_ticket_put(ntp->t_ticket);
+
+ if ((error = xfs_trans_reserve(ntp, 0, logres, 0, XFS_TRANS_PERM_LOG_RES,
+ logcount)))
+ return error;
+ efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
+ for (free = flist->xbf_first; free != NULL; free = next) {
+ next = free->xbfi_next;
+ if ((error = xfs_free_extent(ntp, free->xbfi_startblock,
+ free->xbfi_blockcount))) {
+ /*
+ * The bmap free list will be cleaned up at a
+ * higher level. The EFI will be canceled when
+ * this transaction is aborted.
+ * Need to force shutdown here to make sure it
+ * happens, since this transaction may not be
+ * dirty yet.
+ */
+ mp = ntp->t_mountp;
+ if (!XFS_FORCED_SHUTDOWN(mp))
+ xfs_force_shutdown(mp,
+ (error == EFSCORRUPTED) ?
+ SHUTDOWN_CORRUPT_INCORE :
+ SHUTDOWN_META_IO_ERROR);
+ return error;
+ }
+ xfs_trans_log_efd_extent(ntp, efd, free->xbfi_startblock,
+ free->xbfi_blockcount);
+ xfs_bmap_del_free(flist, NULL, free);
+ }
+ return 0;
+}
+
+int
+xfs_bmap_rtalloc(
+ struct xfs_bmalloca *ap) /* bmap alloc argument struct */
+{
+ xfs_alloctype_t atype = 0; /* type for allocation routines */
+ int error; /* error return value */
+ xfs_mount_t *mp; /* mount point structure */
+ xfs_extlen_t prod = 0; /* product factor for allocators */
+ xfs_extlen_t ralen = 0; /* realtime allocation length */
+ xfs_extlen_t align; /* minimum allocation alignment */
+ xfs_rtblock_t rtb;
+
+ mp = ap->ip->i_mount;
+ align = xfs_get_extsz_hint(ap->ip);
+ prod = align / mp->m_sb.sb_rextsize;
+ error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev,
+ align, 1, ap->eof, 0,
+ ap->conv, &ap->offset, &ap->length);
+ if (error)
+ return error;
+ ASSERT(ap->length);
+ ASSERT(ap->length % mp->m_sb.sb_rextsize == 0);
+
+ /*
+ * If the offset & length are not perfectly aligned
+ * then kill prod, it will just get us in trouble.
+ */
+ if (do_mod(ap->offset, align) || ap->length % align)
+ prod = 1;
+ /*
+ * Set ralen to be the actual requested length in rtextents.
+ */
+ ralen = ap->length / mp->m_sb.sb_rextsize;
+ /*
+ * If the old value was close enough to MAXEXTLEN that
+ * we rounded up to it, cut it back so it's valid again.
+ * Note that if it's a really large request (bigger than
+ * MAXEXTLEN), we don't hear about that number, and can't
+ * adjust the starting point to match it.
+ */
+ if (ralen * mp->m_sb.sb_rextsize >= MAXEXTLEN)
+ ralen = MAXEXTLEN / mp->m_sb.sb_rextsize;
+
+ /*
+ * Lock out other modifications to the RT bitmap inode.
+ */
+ xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(ap->tp, mp->m_rbmip, XFS_ILOCK_EXCL);
+
+ /*
+ * If it's an allocation to an empty file at offset 0,
+ * pick an extent that will space things out in the rt area.
+ */
+ if (ap->eof && ap->offset == 0) {
+ xfs_rtblock_t uninitialized_var(rtx); /* realtime extent no */
+
+ error = xfs_rtpick_extent(mp, ap->tp, ralen, &rtx);
+ if (error)
+ return error;
+ ap->blkno = rtx * mp->m_sb.sb_rextsize;
+ } else {
+ ap->blkno = 0;
+ }
+
+ xfs_bmap_adjacent(ap);
+
+ /*
+ * Realtime allocation, done through xfs_rtallocate_extent.
+ */
+ atype = ap->blkno == 0 ? XFS_ALLOCTYPE_ANY_AG : XFS_ALLOCTYPE_NEAR_BNO;
+ do_div(ap->blkno, mp->m_sb.sb_rextsize);
+ rtb = ap->blkno;
+ ap->length = ralen;
+ if ((error = xfs_rtallocate_extent(ap->tp, ap->blkno, 1, ap->length,
+ &ralen, atype, ap->wasdel, prod, &rtb)))
+ return error;
+ if (rtb == NULLFSBLOCK && prod > 1 &&
+ (error = xfs_rtallocate_extent(ap->tp, ap->blkno, 1,
+ ap->length, &ralen, atype,
+ ap->wasdel, 1, &rtb)))
+ return error;
+ ap->blkno = rtb;
+ if (ap->blkno != NULLFSBLOCK) {
+ ap->blkno *= mp->m_sb.sb_rextsize;
+ ralen *= mp->m_sb.sb_rextsize;
+ ap->length = ralen;
+ ap->ip->i_d.di_nblocks += ralen;
+ xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
+ if (ap->wasdel)
+ ap->ip->i_delayed_blks -= ralen;
+ /*
+ * Adjust the disk quota also. This was reserved
+ * earlier.
+ */
+ xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
+ ap->wasdel ? XFS_TRANS_DQ_DELRTBCOUNT :
+ XFS_TRANS_DQ_RTBCOUNT, (long) ralen);
+ } else {
+ ap->length = 0;
+ }
+ return 0;
+}
+
+/*
+ * Stack switching interfaces for allocation
+ */
+static void
+xfs_bmapi_allocate_worker(
+ struct work_struct *work)
+{
+ struct xfs_bmalloca *args = container_of(work,
+ struct xfs_bmalloca, work);
+ unsigned long pflags;
+
+ /* we are in a transaction context here */
+ current_set_flags_nested(&pflags, PF_FSTRANS);
+
+ args->result = __xfs_bmapi_allocate(args);
+ complete(args->done);
+
+ current_restore_flags_nested(&pflags, PF_FSTRANS);
+}
+
+/*
+ * Some allocation requests often come in with little stack to work on. Push
+ * them off to a worker thread so there is lots of stack to use. Otherwise just
+ * call directly to avoid the context switch overhead here.
+ */
+int
+xfs_bmapi_allocate(
+ struct xfs_bmalloca *args)
+{
+ DECLARE_COMPLETION_ONSTACK(done);
+
+ if (!args->stack_switch)
+ return __xfs_bmapi_allocate(args);
+
+
+ args->done = &done;
+ INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker);
+ queue_work(xfs_alloc_wq, &args->work);
+ wait_for_completion(&done);
+ return args->result;
+}
+
+/*
+ * Check if the endoff is outside the last extent. If so the caller will grow
+ * the allocation to a stripe unit boundary. All offsets are considered outside
+ * the end of file for an empty fork, so 1 is returned in *eof in that case.
+ */
+int
+xfs_bmap_eof(
+ struct xfs_inode *ip,
+ xfs_fileoff_t endoff,
+ int whichfork,
+ int *eof)
+{
+ struct xfs_bmbt_irec rec;
+ int error;
+
+ error = xfs_bmap_last_extent(NULL, ip, whichfork, &rec, eof);
+ if (error || *eof)
+ return error;
+
+ *eof = endoff >= rec.br_startoff + rec.br_blockcount;
+ return 0;
+}
+
+/*
+ * Extent tree block counting routines.
+ */
+
+/*
+ * Count leaf blocks given a range of extent records.
+ */
+STATIC void
+xfs_bmap_count_leaves(
+ xfs_ifork_t *ifp,
+ xfs_extnum_t idx,
+ int numrecs,
+ int *count)
+{
+ int b;
+
+ for (b = 0; b < numrecs; b++) {
+ xfs_bmbt_rec_host_t *frp = xfs_iext_get_ext(ifp, idx + b);
+ *count += xfs_bmbt_get_blockcount(frp);
+ }
+}
+
+/*
+ * Count leaf blocks given a range of extent records originally
+ * in btree format.
+ */
+STATIC void
+xfs_bmap_disk_count_leaves(
+ struct xfs_mount *mp,
+ struct xfs_btree_block *block,
+ int numrecs,
+ int *count)
+{
+ int b;
+ xfs_bmbt_rec_t *frp;
+
+ for (b = 1; b <= numrecs; b++) {
+ frp = XFS_BMBT_REC_ADDR(mp, block, b);
+ *count += xfs_bmbt_disk_get_blockcount(frp);
+ }
+}
+
+/*
+ * Recursively walks each level of a btree
+ * to count total fsblocks is use.
+ */
+STATIC int /* error */
+xfs_bmap_count_tree(
+ xfs_mount_t *mp, /* file system mount point */
+ xfs_trans_t *tp, /* transaction pointer */
+ xfs_ifork_t *ifp, /* inode fork pointer */
+ xfs_fsblock_t blockno, /* file system block number */
+ int levelin, /* level in btree */
+ int *count) /* Count of blocks */
+{
+ int error;
+ xfs_buf_t *bp, *nbp;
+ int level = levelin;
+ __be64 *pp;
+ xfs_fsblock_t bno = blockno;
+ xfs_fsblock_t nextbno;
+ struct xfs_btree_block *block, *nextblock;
+ int numrecs;
+
+ error = xfs_btree_read_bufl(mp, tp, bno, 0, &bp, XFS_BMAP_BTREE_REF,
+ &xfs_bmbt_buf_ops);
+ if (error)
+ return error;
+ *count += 1;
+ block = XFS_BUF_TO_BLOCK(bp);
+
+ if (--level) {
+ /* Not at node above leaves, count this level of nodes */
+ nextbno = be64_to_cpu(block->bb_u.l.bb_rightsib);
+ while (nextbno != NULLFSBLOCK) {
+ error = xfs_btree_read_bufl(mp, tp, nextbno, 0, &nbp,
+ XFS_BMAP_BTREE_REF,
+ &xfs_bmbt_buf_ops);
+ if (error)
+ return error;
+ *count += 1;
+ nextblock = XFS_BUF_TO_BLOCK(nbp);
+ nextbno = be64_to_cpu(nextblock->bb_u.l.bb_rightsib);
+ xfs_trans_brelse(tp, nbp);
+ }
+
+ /* Dive to the next level */
+ pp = XFS_BMBT_PTR_ADDR(mp, block, 1, mp->m_bmap_dmxr[1]);
+ bno = be64_to_cpu(*pp);
+ if (unlikely((error =
+ xfs_bmap_count_tree(mp, tp, ifp, bno, level, count)) < 0)) {
+ xfs_trans_brelse(tp, bp);
+ XFS_ERROR_REPORT("xfs_bmap_count_tree(1)",
+ XFS_ERRLEVEL_LOW, mp);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+ xfs_trans_brelse(tp, bp);
+ } else {
+ /* count all level 1 nodes and their leaves */
+ for (;;) {
+ nextbno = be64_to_cpu(block->bb_u.l.bb_rightsib);
+ numrecs = be16_to_cpu(block->bb_numrecs);
+ xfs_bmap_disk_count_leaves(mp, block, numrecs, count);
+ xfs_trans_brelse(tp, bp);
+ if (nextbno == NULLFSBLOCK)
+ break;
+ bno = nextbno;
+ error = xfs_btree_read_bufl(mp, tp, bno, 0, &bp,
+ XFS_BMAP_BTREE_REF,
+ &xfs_bmbt_buf_ops);
+ if (error)
+ return error;
+ *count += 1;
+ block = XFS_BUF_TO_BLOCK(bp);
+ }
+ }
+ return 0;
+}
+
+/*
+ * Count fsblocks of the given fork.
+ */
+int /* error */
+xfs_bmap_count_blocks(
+ xfs_trans_t *tp, /* transaction pointer */
+ xfs_inode_t *ip, /* incore inode */
+ int whichfork, /* data or attr fork */
+ int *count) /* out: count of blocks */
+{
+ struct xfs_btree_block *block; /* current btree block */
+ xfs_fsblock_t bno; /* block # of "block" */
+ xfs_ifork_t *ifp; /* fork structure */
+ int level; /* btree level, for checking */
+ xfs_mount_t *mp; /* file system mount structure */
+ __be64 *pp; /* pointer to block address */
+
+ bno = NULLFSBLOCK;
+ mp = ip->i_mount;
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ if ( XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS ) {
+ xfs_bmap_count_leaves(ifp, 0,
+ ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t),
+ count);
+ return 0;
+ }
+
+ /*
+ * Root level must use BMAP_BROOT_PTR_ADDR macro to get ptr out.
+ */
+ block = ifp->if_broot;
+ level = be16_to_cpu(block->bb_level);
+ ASSERT(level > 0);
+ pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, 1, ifp->if_broot_bytes);
+ bno = be64_to_cpu(*pp);
+ ASSERT(bno != NULLDFSBNO);
+ ASSERT(XFS_FSB_TO_AGNO(mp, bno) < mp->m_sb.sb_agcount);
+ ASSERT(XFS_FSB_TO_AGBNO(mp, bno) < mp->m_sb.sb_agblocks);
+
+ if (unlikely(xfs_bmap_count_tree(mp, tp, ifp, bno, level, count) < 0)) {
+ XFS_ERROR_REPORT("xfs_bmap_count_blocks(2)", XFS_ERRLEVEL_LOW,
+ mp);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+ return 0;
+}
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
new file mode 100644
index 0000000..cb1493a
--- /dev/null
+++ b/fs/xfs/xfs_bmap_util.h
@@ -0,0 +1,89 @@
+/*
+ * Copyright (c) 2000-2006 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_BMAP_UTIL_H__
+#define __XFS_BMAP_UTIL_H__
+
+/* Kernel only BMAP related definitions and functions */
+
+struct xfs_bmbt_irec;
+struct xfs_ifork;
+struct xfs_inode;
+struct xfs_mount;
+struct xfs_trans;
+
+/*
+ * Argument structure for xfs_bmap_alloc.
+ */
+struct xfs_bmalloca {
+ xfs_fsblock_t *firstblock; /* i/o first block allocated */
+ struct xfs_bmap_free *flist; /* bmap freelist */
+ struct xfs_trans *tp; /* transaction pointer */
+ struct xfs_inode *ip; /* incore inode pointer */
+ struct xfs_bmbt_irec prev; /* extent before the new one */
+ struct xfs_bmbt_irec got; /* extent after, or delayed */
+
+ xfs_fileoff_t offset; /* offset in file filling in */
+ xfs_extlen_t length; /* i/o length asked/allocated */
+ xfs_fsblock_t blkno; /* starting block of new extent */
+
+ struct xfs_btree_cur *cur; /* btree cursor */
+ xfs_extnum_t idx; /* current extent index */
+ int nallocs;/* number of extents alloc'd */
+ int logflags;/* flags for transaction logging */
+
+ xfs_extlen_t total; /* total blocks needed for xaction */
+ xfs_extlen_t minlen; /* minimum allocation size (blocks) */
+ xfs_extlen_t minleft; /* amount must be left after alloc */
+ char eof; /* set if allocating past last extent */
+ char wasdel; /* replacing a delayed allocation */
+ char userdata;/* set if is user data */
+ char aeof; /* allocated space at eof */
+ char conv; /* overwriting unwritten extents */
+ char stack_switch;
+ int flags;
+ struct completion *done;
+ struct work_struct work;
+ int result;
+};
+
+int xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
+ int *committed);
+int xfs_bmap_rtalloc(struct xfs_bmalloca *ap);
+int xfs_bmapi_allocate(struct xfs_bmalloca *args);
+int __xfs_bmapi_allocate(struct xfs_bmalloca *args);
+int xfs_bmap_eof(struct xfs_inode *ip, xfs_fileoff_t endoff,
+ int whichfork, int *eof);
+int xfs_bmap_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip,
+ int whichfork, int *count);
+
+/* functions in xfs_bmap.c that are only needed by xfs_bmap_util.c */
+void xfs_bmap_del_free(struct xfs_bmap_free *flist,
+ struct xfs_bmap_free_item *prev,
+ struct xfs_bmap_free_item *free);
+int xfs_bmap_extsize_align(struct xfs_mount *mp, struct xfs_bmbt_irec *gotp,
+ struct xfs_bmbt_irec *prevp, xfs_extlen_t extsz,
+ int rt, int eof, int delay, int convert,
+ xfs_fileoff_t *offp, xfs_extlen_t *lenp);
+void xfs_bmap_adjacent(struct xfs_bmalloca *ap);
+int xfs_bmap_last_extent(struct xfs_trans *tp, struct xfs_inode *ip,
+ int whichfork, struct xfs_bmbt_irec *rec,
+ int *is_empty);
+
+xfs_daddr_t xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb);
+
+#endif /* __XFS_BMAP_UTIL_H__ */
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 044e97a..464f91c 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -28,6 +28,7 @@
#include "xfs_bmap_btree.h"
#include "xfs_inode.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_rtalloc.h"
#include "xfs_error.h"
#include "xfs_itable.h"
diff --git a/fs/xfs/xfs_extent_ops.c b/fs/xfs/xfs_extent_ops.c
index 170b10b..f7e1f14 100644
--- a/fs/xfs/xfs_extent_ops.c
+++ b/fs/xfs/xfs_extent_ops.c
@@ -38,6 +38,7 @@
#include "xfs_ialloc.h"
#include "xfs_alloc.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_acl.h"
#include "xfs_attr.h"
#include "xfs_error.h"
diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 86cd468..4c71797 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -26,6 +26,7 @@
#include "xfs_sb.h"
#include "xfs_mount.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_alloc.h"
#include "xfs_mru_cache.h"
#include "xfs_filestream.h"
@@ -667,8 +668,8 @@ exit:
*/
int
xfs_filestream_new_ag(
- xfs_bmalloca_t *ap,
- xfs_agnumber_t *agp)
+ struct xfs_bmalloca *ap,
+ xfs_agnumber_t *agp)
{
int flags, err;
xfs_inode_t *ip, *pip = NULL;
diff --git a/fs/xfs/xfs_inode_ops.c b/fs/xfs/xfs_inode_ops.c
index b4d6948..2777741 100644
--- a/fs/xfs/xfs_inode_ops.c
+++ b/fs/xfs/xfs_inode_ops.c
@@ -44,6 +44,7 @@
#include "xfs_alloc.h"
#include "xfs_ialloc.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_error.h"
#include "xfs_quota.h"
#include "xfs_filestream.h"
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index effd1bc4..770f75c 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -32,6 +32,7 @@
#include "xfs_inode_item.h"
#include "xfs_btree.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_rtalloc.h"
#include "xfs_error.h"
#include "xfs_itable.h"
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 19b738f..46408df 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -29,6 +29,7 @@
#include "xfs_inode.h"
#include "xfs_alloc.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_rtalloc.h"
#include "xfs_fsops.h"
#include "xfs_error.h"
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index bed61a5..6d30954 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -34,6 +34,7 @@
#include "xfs_ialloc.h"
#include "xfs_alloc.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_error.h"
#include "xfs_quota.h"
#include "xfs_trans_space.h"
diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
index 91c6f42..1dc3a43 100644
--- a/fs/xfs/xfs_trans_resv.c
+++ b/fs/xfs/xfs_trans_resv.c
@@ -37,6 +37,7 @@
#include "xfs_alloc.h"
#include "xfs_extent_busy.h"
#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
#include "xfs_quota.h"
#include "xfs_qm.h"
#include "xfs_trans_priv.h"
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 51/60] xfs: introduce xfs_quota_defs.h
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (49 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 50/60] xfs: create xfs_bmap_util.[ch] Dave Chinner
@ 2013-06-19 4:50 ` Dave Chinner
2013-06-19 4:51 ` [PATCH 52/60] xfs: introduce xfs_rtalloc_defs.h Dave Chinner
` (10 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:50 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
There are a lot of quota flag definitions that are shared by user
and kernel space. Move them all to xfs_quota_defs.h so we can
unshare xfs_quota.h and remove the __KERNEL__ regions from it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_quota.h | 120 +--------------------------------------
fs/xfs/xfs_quota_defs.h | 143 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 146 insertions(+), 117 deletions(-)
create mode 100644 fs/xfs/xfs_quota_defs.h
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index 634132d..0db4bf2 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -19,127 +19,14 @@
#define __XFS_QUOTA_H__
#include "xfs_dquot_format.h"
-
-struct xfs_trans;
-
-/*
- * Even though users may not have quota limits occupying all 64-bits,
- * they may need 64-bit accounting. Hence, 64-bit quota-counters,
- * and quota-limits. This is a waste in the common case, but hey ...
- */
-typedef __uint64_t xfs_qcnt_t;
-typedef __uint16_t xfs_qwarncnt_t;
-
-/*
- * flags for q_flags field in the dquot.
- */
-#define XFS_DQ_USER 0x0001 /* a user quota */
-#define XFS_DQ_PROJ 0x0002 /* project quota */
-#define XFS_DQ_GROUP 0x0004 /* a group quota */
-#define XFS_DQ_DIRTY 0x0008 /* dquot is dirty */
-#define XFS_DQ_FREEING 0x0010 /* dquot is beeing torn down */
-
-#define XFS_DQ_ALLTYPES (XFS_DQ_USER|XFS_DQ_PROJ|XFS_DQ_GROUP)
-
-#define XFS_DQ_FLAGS \
- { XFS_DQ_USER, "USER" }, \
- { XFS_DQ_PROJ, "PROJ" }, \
- { XFS_DQ_GROUP, "GROUP" }, \
- { XFS_DQ_DIRTY, "DIRTY" }, \
- { XFS_DQ_FREEING, "FREEING" }
-
-/*
- * In the worst case, when both user and group quotas are on,
- * we can have a max of three dquots changing in a single transaction.
- */
-#define XFS_DQUOT_LOGRES(mp) (sizeof(xfs_disk_dquot_t) * 3)
-
-/*
- * Quota Accounting/Enforcement flags
- */
-
-#define XFS_IS_QUOTA_RUNNING(mp) ((mp)->m_qflags & XFS_ALL_QUOTA_ACCT)
-#define XFS_IS_UQUOTA_RUNNING(mp) ((mp)->m_qflags & XFS_UQUOTA_ACCT)
-#define XFS_IS_PQUOTA_RUNNING(mp) ((mp)->m_qflags & XFS_PQUOTA_ACCT)
-#define XFS_IS_GQUOTA_RUNNING(mp) ((mp)->m_qflags & XFS_GQUOTA_ACCT)
-#define XFS_IS_UQUOTA_ENFORCED(mp) ((mp)->m_qflags & XFS_UQUOTA_ENFD)
-#define XFS_IS_OQUOTA_ENFORCED(mp) ((mp)->m_qflags & XFS_OQUOTA_ENFD)
+#include "xfs_quota_defs.h"
/*
- * Incore only flags for quotaoff - these bits get cleared when quota(s)
- * are in the process of getting turned off. These flags are in m_qflags but
- * never in sb_qflags.
+ * Kernel only quota definitions and functions
*/
-#define XFS_UQUOTA_ACTIVE 0x0100 /* uquotas are being turned off */
-#define XFS_PQUOTA_ACTIVE 0x0200 /* pquotas are being turned off */
-#define XFS_GQUOTA_ACTIVE 0x0400 /* gquotas are being turned off */
-#define XFS_ALL_QUOTA_ACTIVE \
- (XFS_UQUOTA_ACTIVE | XFS_PQUOTA_ACTIVE | XFS_GQUOTA_ACTIVE)
-/*
- * Checking XFS_IS_*QUOTA_ON() while holding any inode lock guarantees
- * quota will be not be switched off as long as that inode lock is held.
- */
-#define XFS_IS_QUOTA_ON(mp) ((mp)->m_qflags & (XFS_UQUOTA_ACTIVE | \
- XFS_GQUOTA_ACTIVE | \
- XFS_PQUOTA_ACTIVE))
-#define XFS_IS_OQUOTA_ON(mp) ((mp)->m_qflags & (XFS_GQUOTA_ACTIVE | \
- XFS_PQUOTA_ACTIVE))
-#define XFS_IS_UQUOTA_ON(mp) ((mp)->m_qflags & XFS_UQUOTA_ACTIVE)
-#define XFS_IS_GQUOTA_ON(mp) ((mp)->m_qflags & XFS_GQUOTA_ACTIVE)
-#define XFS_IS_PQUOTA_ON(mp) ((mp)->m_qflags & XFS_PQUOTA_ACTIVE)
-
-/*
- * Flags to tell various functions what to do. Not all of these are meaningful
- * to a single function. None of these XFS_QMOPT_* flags are meant to have
- * persistent values (ie. their values can and will change between versions)
- */
-#define XFS_QMOPT_DQALLOC 0x0000002 /* alloc dquot ondisk if needed */
-#define XFS_QMOPT_UQUOTA 0x0000004 /* user dquot requested */
-#define XFS_QMOPT_PQUOTA 0x0000008 /* project dquot requested */
-#define XFS_QMOPT_FORCE_RES 0x0000010 /* ignore quota limits */
-#define XFS_QMOPT_SBVERSION 0x0000040 /* change superblock version num */
-#define XFS_QMOPT_DOWARN 0x0000400 /* increase warning cnt if needed */
-#define XFS_QMOPT_DQREPAIR 0x0001000 /* repair dquot if damaged */
-#define XFS_QMOPT_GQUOTA 0x0002000 /* group dquot requested */
-#define XFS_QMOPT_ENOSPC 0x0004000 /* enospc instead of edquot (prj) */
-
-/*
- * flags to xfs_trans_mod_dquot to indicate which field needs to be
- * modified.
- */
-#define XFS_QMOPT_RES_REGBLKS 0x0010000
-#define XFS_QMOPT_RES_RTBLKS 0x0020000
-#define XFS_QMOPT_BCOUNT 0x0040000
-#define XFS_QMOPT_ICOUNT 0x0080000
-#define XFS_QMOPT_RTBCOUNT 0x0100000
-#define XFS_QMOPT_DELBCOUNT 0x0200000
-#define XFS_QMOPT_DELRTBCOUNT 0x0400000
-#define XFS_QMOPT_RES_INOS 0x0800000
-
-/*
- * flags for dqalloc.
- */
-#define XFS_QMOPT_INHERIT 0x1000000
-
-/*
- * flags to xfs_trans_mod_dquot.
- */
-#define XFS_TRANS_DQ_RES_BLKS XFS_QMOPT_RES_REGBLKS
-#define XFS_TRANS_DQ_RES_RTBLKS XFS_QMOPT_RES_RTBLKS
-#define XFS_TRANS_DQ_RES_INOS XFS_QMOPT_RES_INOS
-#define XFS_TRANS_DQ_BCOUNT XFS_QMOPT_BCOUNT
-#define XFS_TRANS_DQ_DELBCOUNT XFS_QMOPT_DELBCOUNT
-#define XFS_TRANS_DQ_ICOUNT XFS_QMOPT_ICOUNT
-#define XFS_TRANS_DQ_RTBCOUNT XFS_QMOPT_RTBCOUNT
-#define XFS_TRANS_DQ_DELRTBCOUNT XFS_QMOPT_DELRTBCOUNT
-
-
-#define XFS_QMOPT_QUOTALL \
- (XFS_QMOPT_UQUOTA | XFS_QMOPT_PQUOTA | XFS_QMOPT_GQUOTA)
-#define XFS_QMOPT_RESBLK_MASK (XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_RES_RTBLKS)
+struct xfs_trans;
-#ifdef __KERNEL__
/*
* This check is done typically without holding the inode lock;
* that may seem racy, but it is harmless in the context that it is used.
@@ -281,5 +168,4 @@ extern int xfs_mount_reset_sbqflags(struct xfs_mount *);
extern const struct xfs_buf_ops xfs_dquot_buf_ops;
-#endif /* __KERNEL__ */
#endif /* __XFS_QUOTA_H__ */
diff --git a/fs/xfs/xfs_quota_defs.h b/fs/xfs/xfs_quota_defs.h
new file mode 100644
index 0000000..2460f59
--- /dev/null
+++ b/fs/xfs/xfs_quota_defs.h
@@ -0,0 +1,143 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_QUOTA_DEFS_H__
+#define __XFS_QUOTA_DEFS_H__
+
+/*
+ * Quota definitions shared between user and kernel source trees.
+ */
+
+/*
+ * Even though users may not have quota limits occupying all 64-bits,
+ * they may need 64-bit accounting. Hence, 64-bit quota-counters,
+ * and quota-limits. This is a waste in the common case, but hey ...
+ */
+typedef __uint64_t xfs_qcnt_t;
+typedef __uint16_t xfs_qwarncnt_t;
+
+/*
+ * flags for q_flags field in the dquot.
+ */
+#define XFS_DQ_USER 0x0001 /* a user quota */
+#define XFS_DQ_PROJ 0x0002 /* project quota */
+#define XFS_DQ_GROUP 0x0004 /* a group quota */
+#define XFS_DQ_DIRTY 0x0008 /* dquot is dirty */
+#define XFS_DQ_FREEING 0x0010 /* dquot is beeing torn down */
+
+#define XFS_DQ_ALLTYPES (XFS_DQ_USER|XFS_DQ_PROJ|XFS_DQ_GROUP)
+
+#define XFS_DQ_FLAGS \
+ { XFS_DQ_USER, "USER" }, \
+ { XFS_DQ_PROJ, "PROJ" }, \
+ { XFS_DQ_GROUP, "GROUP" }, \
+ { XFS_DQ_DIRTY, "DIRTY" }, \
+ { XFS_DQ_FREEING, "FREEING" }
+
+/*
+ * In the worst case, when both user and group quotas are on,
+ * we can have a max of three dquots changing in a single transaction.
+ */
+#define XFS_DQUOT_LOGRES(mp) (sizeof(xfs_disk_dquot_t) * 3)
+
+/*
+ * Quota Accounting/Enforcement flags
+ */
+
+#define XFS_IS_QUOTA_RUNNING(mp) ((mp)->m_qflags & XFS_ALL_QUOTA_ACCT)
+#define XFS_IS_UQUOTA_RUNNING(mp) ((mp)->m_qflags & XFS_UQUOTA_ACCT)
+#define XFS_IS_PQUOTA_RUNNING(mp) ((mp)->m_qflags & XFS_PQUOTA_ACCT)
+#define XFS_IS_GQUOTA_RUNNING(mp) ((mp)->m_qflags & XFS_GQUOTA_ACCT)
+#define XFS_IS_UQUOTA_ENFORCED(mp) ((mp)->m_qflags & XFS_UQUOTA_ENFD)
+#define XFS_IS_OQUOTA_ENFORCED(mp) ((mp)->m_qflags & XFS_OQUOTA_ENFD)
+
+/*
+ * Incore only flags for quotaoff - these bits get cleared when quota(s)
+ * are in the process of getting turned off. These flags are in m_qflags but
+ * never in sb_qflags.
+ */
+#define XFS_UQUOTA_ACTIVE 0x0100 /* uquotas are being turned off */
+#define XFS_PQUOTA_ACTIVE 0x0200 /* pquotas are being turned off */
+#define XFS_GQUOTA_ACTIVE 0x0400 /* gquotas are being turned off */
+#define XFS_ALL_QUOTA_ACTIVE \
+ (XFS_UQUOTA_ACTIVE | XFS_PQUOTA_ACTIVE | XFS_GQUOTA_ACTIVE)
+
+/*
+ * Checking XFS_IS_*QUOTA_ON() while holding any inode lock guarantees
+ * quota will be not be switched off as long as that inode lock is held.
+ */
+#define XFS_IS_QUOTA_ON(mp) ((mp)->m_qflags & (XFS_UQUOTA_ACTIVE | \
+ XFS_GQUOTA_ACTIVE | \
+ XFS_PQUOTA_ACTIVE))
+#define XFS_IS_OQUOTA_ON(mp) ((mp)->m_qflags & (XFS_GQUOTA_ACTIVE | \
+ XFS_PQUOTA_ACTIVE))
+#define XFS_IS_UQUOTA_ON(mp) ((mp)->m_qflags & XFS_UQUOTA_ACTIVE)
+#define XFS_IS_GQUOTA_ON(mp) ((mp)->m_qflags & XFS_GQUOTA_ACTIVE)
+#define XFS_IS_PQUOTA_ON(mp) ((mp)->m_qflags & XFS_PQUOTA_ACTIVE)
+
+/*
+ * Flags to tell various functions what to do. Not all of these are meaningful
+ * to a single function. None of these XFS_QMOPT_* flags are meant to have
+ * persistent values (ie. their values can and will change between versions)
+ */
+#define XFS_QMOPT_DQALLOC 0x0000002 /* alloc dquot ondisk if needed */
+#define XFS_QMOPT_UQUOTA 0x0000004 /* user dquot requested */
+#define XFS_QMOPT_PQUOTA 0x0000008 /* project dquot requested */
+#define XFS_QMOPT_FORCE_RES 0x0000010 /* ignore quota limits */
+#define XFS_QMOPT_SBVERSION 0x0000040 /* change superblock version num */
+#define XFS_QMOPT_DOWARN 0x0000400 /* increase warning cnt if needed */
+#define XFS_QMOPT_DQREPAIR 0x0001000 /* repair dquot if damaged */
+#define XFS_QMOPT_GQUOTA 0x0002000 /* group dquot requested */
+#define XFS_QMOPT_ENOSPC 0x0004000 /* enospc instead of edquot (prj) */
+
+/*
+ * flags to xfs_trans_mod_dquot to indicate which field needs to be
+ * modified.
+ */
+#define XFS_QMOPT_RES_REGBLKS 0x0010000
+#define XFS_QMOPT_RES_RTBLKS 0x0020000
+#define XFS_QMOPT_BCOUNT 0x0040000
+#define XFS_QMOPT_ICOUNT 0x0080000
+#define XFS_QMOPT_RTBCOUNT 0x0100000
+#define XFS_QMOPT_DELBCOUNT 0x0200000
+#define XFS_QMOPT_DELRTBCOUNT 0x0400000
+#define XFS_QMOPT_RES_INOS 0x0800000
+
+/*
+ * flags for dqalloc.
+ */
+#define XFS_QMOPT_INHERIT 0x1000000
+
+/*
+ * flags to xfs_trans_mod_dquot.
+ */
+#define XFS_TRANS_DQ_RES_BLKS XFS_QMOPT_RES_REGBLKS
+#define XFS_TRANS_DQ_RES_RTBLKS XFS_QMOPT_RES_RTBLKS
+#define XFS_TRANS_DQ_RES_INOS XFS_QMOPT_RES_INOS
+#define XFS_TRANS_DQ_BCOUNT XFS_QMOPT_BCOUNT
+#define XFS_TRANS_DQ_DELBCOUNT XFS_QMOPT_DELBCOUNT
+#define XFS_TRANS_DQ_ICOUNT XFS_QMOPT_ICOUNT
+#define XFS_TRANS_DQ_RTBCOUNT XFS_QMOPT_RTBCOUNT
+#define XFS_TRANS_DQ_DELRTBCOUNT XFS_QMOPT_DELRTBCOUNT
+
+
+#define XFS_QMOPT_QUOTALL \
+ (XFS_QMOPT_UQUOTA | XFS_QMOPT_PQUOTA | XFS_QMOPT_GQUOTA)
+#define XFS_QMOPT_RESBLK_MASK (XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_RES_RTBLKS)
+
+
+#endif /* __XFS_QUOTA_DEFS_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 52/60] xfs: introduce xfs_rtalloc_defs.h
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (50 preceding siblings ...)
2013-06-19 4:50 ` [PATCH 51/60] xfs: introduce xfs_quota_defs.h Dave Chinner
@ 2013-06-19 4:51 ` Dave Chinner
2013-06-19 4:51 ` [PATCH 53/60] xfs: Introduce a new structure to hold transaction reservation items Dave Chinner
` (9 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:51 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
There are quite a few realtime device definitions shared with
userspace. Move them from xfs_rtalloc.h to xfs_rt_alloc_defs.h
so we don't need to share xfs_rtalloc.h with userspace anymore.
This removes the final __KERNEL__ region from the XFS kernel
codebase. Yay!
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_rtalloc.h | 55 +++--------------------------------
fs/xfs/xfs_rtalloc_defs.h | 70 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 74 insertions(+), 51 deletions(-)
create mode 100644 fs/xfs/xfs_rtalloc_defs.h
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
index f7f3a35..f9d569a 100644
--- a/fs/xfs/xfs_rtalloc.h
+++ b/fs/xfs/xfs_rtalloc.h
@@ -18,57 +18,12 @@
#ifndef __XFS_RTALLOC_H__
#define __XFS_RTALLOC_H__
-struct xfs_mount;
-struct xfs_trans;
-
-/* Min and max rt extent sizes, specified in bytes */
-#define XFS_MAX_RTEXTSIZE (1024 * 1024 * 1024) /* 1GB */
-#define XFS_DFL_RTEXTSIZE (64 * 1024) /* 64kB */
-#define XFS_MIN_RTEXTSIZE (4 * 1024) /* 4kB */
-
-/*
- * Constants for bit manipulations.
- */
-#define XFS_NBBYLOG 3 /* log2(NBBY) */
-#define XFS_WORDLOG 2 /* log2(sizeof(xfs_rtword_t)) */
-#define XFS_NBWORDLOG (XFS_NBBYLOG + XFS_WORDLOG)
-#define XFS_NBWORD (1 << XFS_NBWORDLOG)
-#define XFS_WORDMASK ((1 << XFS_WORDLOG) - 1)
-
-#define XFS_BLOCKSIZE(mp) ((mp)->m_sb.sb_blocksize)
-#define XFS_BLOCKMASK(mp) ((mp)->m_blockmask)
-#define XFS_BLOCKWSIZE(mp) ((mp)->m_blockwsize)
-#define XFS_BLOCKWMASK(mp) ((mp)->m_blockwmask)
-
-/*
- * Summary and bit manipulation macros.
- */
-#define XFS_SUMOFFS(mp,ls,bb) ((int)((ls) * (mp)->m_sb.sb_rbmblocks + (bb)))
-#define XFS_SUMOFFSTOBLOCK(mp,s) \
- (((s) * (uint)sizeof(xfs_suminfo_t)) >> (mp)->m_sb.sb_blocklog)
-#define XFS_SUMPTR(mp,bp,so) \
- ((xfs_suminfo_t *)((bp)->b_addr + \
- (((so) * (uint)sizeof(xfs_suminfo_t)) & XFS_BLOCKMASK(mp))))
-
-#define XFS_BITTOBLOCK(mp,bi) ((bi) >> (mp)->m_blkbit_log)
-#define XFS_BLOCKTOBIT(mp,bb) ((bb) << (mp)->m_blkbit_log)
-#define XFS_BITTOWORD(mp,bi) \
- ((int)(((bi) >> XFS_NBWORDLOG) & XFS_BLOCKWMASK(mp)))
-
-#define XFS_RTMIN(a,b) ((a) < (b) ? (a) : (b))
-#define XFS_RTMAX(a,b) ((a) > (b) ? (a) : (b))
+#include "xfs_rtalloc_defs.h"
-#define XFS_RTLOBIT(w) xfs_lowbit32(w)
-#define XFS_RTHIBIT(w) xfs_highbit32(w)
-
-#if XFS_BIG_BLKNOS
-#define XFS_RTBLOCKLOG(b) xfs_highbit64(b)
-#else
-#define XFS_RTBLOCKLOG(b) xfs_highbit32(b)
-#endif
+/* kernel only definitions and functions */
-
-#ifdef __KERNEL__
+struct xfs_mount;
+struct xfs_trans;
#ifdef CONFIG_XFS_RT
/*
@@ -161,6 +116,4 @@ xfs_rtmount_init(
# define xfs_rtunmount_inodes(m)
#endif /* CONFIG_XFS_RT */
-#endif /* __KERNEL__ */
-
#endif /* __XFS_RTALLOC_H__ */
diff --git a/fs/xfs/xfs_rtalloc_defs.h b/fs/xfs/xfs_rtalloc_defs.h
new file mode 100644
index 0000000..42afb1b
--- /dev/null
+++ b/fs/xfs/xfs_rtalloc_defs.h
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_RTALLOC_DEFS_H__
+#define __XFS_RTALLOC_DEFS_H__
+
+/* definitions shared with userspace */
+
+/* Min and max rt extent sizes, specified in bytes */
+#define XFS_MAX_RTEXTSIZE (1024 * 1024 * 1024) /* 1GB */
+#define XFS_DFL_RTEXTSIZE (64 * 1024) /* 64kB */
+#define XFS_MIN_RTEXTSIZE (4 * 1024) /* 4kB */
+
+/*
+ * Constants for bit manipulations.
+ */
+#define XFS_NBBYLOG 3 /* log2(NBBY) */
+#define XFS_WORDLOG 2 /* log2(sizeof(xfs_rtword_t)) */
+#define XFS_NBWORDLOG (XFS_NBBYLOG + XFS_WORDLOG)
+#define XFS_NBWORD (1 << XFS_NBWORDLOG)
+#define XFS_WORDMASK ((1 << XFS_WORDLOG) - 1)
+
+#define XFS_BLOCKSIZE(mp) ((mp)->m_sb.sb_blocksize)
+#define XFS_BLOCKMASK(mp) ((mp)->m_blockmask)
+#define XFS_BLOCKWSIZE(mp) ((mp)->m_blockwsize)
+#define XFS_BLOCKWMASK(mp) ((mp)->m_blockwmask)
+
+/*
+ * Summary and bit manipulation macros.
+ */
+#define XFS_SUMOFFS(mp,ls,bb) ((int)((ls) * (mp)->m_sb.sb_rbmblocks + (bb)))
+#define XFS_SUMOFFSTOBLOCK(mp,s) \
+ (((s) * (uint)sizeof(xfs_suminfo_t)) >> (mp)->m_sb.sb_blocklog)
+#define XFS_SUMPTR(mp,bp,so) \
+ ((xfs_suminfo_t *)((bp)->b_addr + \
+ (((so) * (uint)sizeof(xfs_suminfo_t)) & XFS_BLOCKMASK(mp))))
+
+#define XFS_BITTOBLOCK(mp,bi) ((bi) >> (mp)->m_blkbit_log)
+#define XFS_BLOCKTOBIT(mp,bb) ((bb) << (mp)->m_blkbit_log)
+#define XFS_BITTOWORD(mp,bi) \
+ ((int)(((bi) >> XFS_NBWORDLOG) & XFS_BLOCKWMASK(mp)))
+
+#define XFS_RTMIN(a,b) ((a) < (b) ? (a) : (b))
+#define XFS_RTMAX(a,b) ((a) > (b) ? (a) : (b))
+
+#define XFS_RTLOBIT(w) xfs_lowbit32(w)
+#define XFS_RTHIBIT(w) xfs_highbit32(w)
+
+#if XFS_BIG_BLKNOS
+#define XFS_RTBLOCKLOG(b) xfs_highbit64(b)
+#else
+#define XFS_RTBLOCKLOG(b) xfs_highbit32(b)
+#endif
+
+
+#endif /* __XFS_RTALLOC_DEFS_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 53/60] xfs: Introduce a new structure to hold transaction reservation items
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (51 preceding siblings ...)
2013-06-19 4:51 ` [PATCH 52/60] xfs: introduce xfs_rtalloc_defs.h Dave Chinner
@ 2013-06-19 4:51 ` Dave Chinner
2013-06-19 4:51 ` [PATCH 54/60] xfs: Introduce tr_fsyncts to m_reservation Dave Chinner
` (8 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:51 UTC (permalink / raw)
To: xfs
From: Jie Liu <jeff.liu@oracle.com>
Introduce a new structure xfs_trans_res to hold transaction
reservation item info per log ticket.
We also need to improve xfs_trans_resv_calc() by initializing the
log count as well as log flags for permanent log reservation.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_mount.h | 2 +-
fs/xfs/xfs_trans.c | 2 +-
fs/xfs/xfs_trans_resv.c | 121 ++++++++++++++++++++++++++++++++++-----------
fs/xfs/xfs_trans_resv.h | 125 ++++++++++++++++++++++++++---------------------
4 files changed, 163 insertions(+), 87 deletions(-)
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 7a45a75..f6aa7c2 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -134,7 +134,7 @@ typedef struct xfs_mount {
int m_ialloc_blks; /* blocks in inode allocation */
int m_inoalign_mask;/* mask sb_inoalignmt if used */
uint m_qflags; /* quota status flags */
- struct xfs_trans_resv m_reservations;/* precomputed res values */
+ struct xfs_trans_resv m_resv; /* precomputed res values */
__uint64_t m_maxicount; /* maximum inode count */
__uint64_t m_resblks; /* total reserved blocks */
__uint64_t m_resblks_avail;/* available reserved blocks */
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 0228f74..52e9219 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -56,7 +56,7 @@ void
xfs_trans_init(
struct xfs_mount *mp)
{
- xfs_trans_resv_calc(mp, &mp->m_reservations);
+ xfs_trans_resv_calc(mp, &mp->m_resv);
}
/*
diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
index 1dc3a43..7d841c8 100644
--- a/fs/xfs/xfs_trans_resv.c
+++ b/fs/xfs/xfs_trans_resv.c
@@ -675,32 +675,97 @@ xfs_trans_resv_calc(
struct xfs_mount *mp,
struct xfs_trans_resv *resp)
{
- resp->tr_write = xfs_calc_write_reservation(mp);
- resp->tr_itruncate = xfs_calc_itruncate_reservation(mp);
- resp->tr_rename = xfs_calc_rename_reservation(mp);
- resp->tr_link = xfs_calc_link_reservation(mp);
- resp->tr_remove = xfs_calc_remove_reservation(mp);
- resp->tr_symlink = xfs_calc_symlink_reservation(mp);
- resp->tr_create = xfs_calc_create_reservation(mp);
- resp->tr_mkdir = xfs_calc_mkdir_reservation(mp);
- resp->tr_ifree = xfs_calc_ifree_reservation(mp);
- resp->tr_ichange = xfs_calc_ichange_reservation(mp);
- resp->tr_growdata = xfs_calc_growdata_reservation(mp);
- resp->tr_swrite = xfs_calc_swrite_reservation(mp);
- resp->tr_writeid = xfs_calc_writeid_reservation(mp);
- resp->tr_addafork = xfs_calc_addafork_reservation(mp);
- resp->tr_attrinval = xfs_calc_attrinval_reservation(mp);
- resp->tr_attrsetm = xfs_calc_attrsetm_reservation(mp);
- resp->tr_attrsetrt = xfs_calc_attrsetrt_reservation(mp);
- resp->tr_attrrm = xfs_calc_attrrm_reservation(mp);
- resp->tr_clearagi = xfs_calc_clear_agi_bucket_reservation(mp);
- resp->tr_growrtalloc = xfs_calc_growrtalloc_reservation(mp);
- resp->tr_growrtzero = xfs_calc_growrtzero_reservation(mp);
- resp->tr_growrtfree = xfs_calc_growrtfree_reservation(mp);
- resp->tr_qm_sbchange = xfs_calc_qm_sbchange_reservation(mp);
- resp->tr_qm_setqlim = xfs_calc_qm_setqlim_reservation(mp);
- resp->tr_qm_dqalloc = xfs_calc_qm_dqalloc_reservation(mp);
- resp->tr_qm_quotaoff = xfs_calc_qm_quotaoff_reservation(mp);
- resp->tr_qm_equotaoff = xfs_calc_qm_quotaoff_end_reservation(mp);
- resp->tr_sb = xfs_calc_sb_reservation(mp);
+ /*
+ * The following transactions are logged in physical format and
+ * require a permanent reservation on space.
+ */
+ resp->tr_write.tr_logres = xfs_calc_write_reservation(mp);
+ resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
+ resp->tr_write.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_itruncate.tr_logres = xfs_calc_itruncate_reservation(mp);
+ resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
+ resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
+ resp->tr_rename.tr_logcount = XFS_RENAME_LOG_COUNT;
+ resp->tr_rename.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_link.tr_logres = xfs_calc_link_reservation(mp);
+ resp->tr_link.tr_logcount = XFS_LINK_LOG_COUNT;
+ resp->tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp);
+ resp->tr_remove.tr_logcount = XFS_REMOVE_LOG_COUNT;
+ resp->tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_symlink.tr_logres = xfs_calc_symlink_reservation(mp);
+ resp->tr_symlink.tr_logcount = XFS_SYMLINK_LOG_COUNT;
+ resp->tr_symlink.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_create.tr_logres = xfs_calc_create_reservation(mp);
+ resp->tr_create.tr_logcount = XFS_CREATE_LOG_COUNT;
+ resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
+ resp->tr_mkdir.tr_logcount = XFS_MKDIR_LOG_COUNT;
+ resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_ifree.tr_logres = xfs_calc_ifree_reservation(mp);
+ resp->tr_ifree.tr_logcount = XFS_INACTIVE_LOG_COUNT;
+ resp->tr_ifree.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_addafork.tr_logres = xfs_calc_addafork_reservation(mp);
+ resp->tr_addafork.tr_logcount = XFS_ADDAFORK_LOG_COUNT;
+ resp->tr_addafork.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_attrinval.tr_logres = xfs_calc_attrinval_reservation(mp);
+ resp->tr_attrinval.tr_logcount = XFS_ATTRINVAL_LOG_COUNT;
+ resp->tr_attrinval.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_attrsetm.tr_logres = xfs_calc_attrsetm_reservation(mp);
+ resp->tr_attrsetm.tr_logcount = XFS_ATTRSET_LOG_COUNT;
+ resp->tr_attrsetm.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_attrrm.tr_logres = xfs_calc_attrrm_reservation(mp);
+ resp->tr_attrrm.tr_logcount = XFS_ATTRRM_LOG_COUNT;
+ resp->tr_attrrm.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_growrtalloc.tr_logres = xfs_calc_growrtalloc_reservation(mp);
+ resp->tr_growrtalloc.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
+ resp->tr_growrtalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ resp->tr_qm_dqalloc.tr_logres = xfs_calc_qm_dqalloc_reservation(mp);
+ resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
+ resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+ /*
+ * The following transactions are logged in logical format with
+ * a default log count.
+ */
+ resp->tr_qm_sbchange.tr_logres = xfs_calc_qm_sbchange_reservation(mp);
+ resp->tr_qm_sbchange.tr_logcount = XFS_DEFAULT_LOG_COUNT;
+
+ resp->tr_qm_setqlim.tr_logres = xfs_calc_qm_setqlim_reservation(mp);
+ resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT;
+
+ resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp);
+ resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
+
+ resp->tr_qm_equotaoff.tr_logres =
+ xfs_calc_qm_quotaoff_end_reservation(mp);
+ resp->tr_qm_equotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
+
+ resp->tr_sb.tr_logres = xfs_calc_sb_reservation(mp);
+ resp->tr_sb.tr_logcount = XFS_DEFAULT_LOG_COUNT;
+
+ /* The following transaction are logged in logical format */
+ resp->tr_ichange.tr_logres = xfs_calc_ichange_reservation(mp);
+ resp->tr_growdata.tr_logres = xfs_calc_growdata_reservation(mp);
+ resp->tr_swrite.tr_logres = xfs_calc_swrite_reservation(mp);
+ resp->tr_writeid.tr_logres = xfs_calc_writeid_reservation(mp);
+ resp->tr_attrsetrt.tr_logres = xfs_calc_attrsetrt_reservation(mp);
+ resp->tr_clearagi.tr_logres = xfs_calc_clear_agi_bucket_reservation(mp);
+ resp->tr_growrtzero.tr_logres = xfs_calc_growrtzero_reservation(mp);
+ resp->tr_growrtfree.tr_logres = xfs_calc_growrtfree_reservation(mp);
}
diff --git a/fs/xfs/xfs_trans_resv.h b/fs/xfs/xfs_trans_resv.h
index 0936f93..e5e32c7 100644
--- a/fs/xfs/xfs_trans_resv.h
+++ b/fs/xfs/xfs_trans_resv.h
@@ -21,35 +21,45 @@
/*
* structure for maintaining pre-calculated transaction reservations.
*/
+struct xfs_trans_res {
+ uint tr_logres; /* log space unit in bytes per log ticket */
+ int tr_logcount; /* number of log operations per log ticket */
+ int tr_logflags; /* log flags, currently only used for indicating
+ * a reservation request is permanent or not */
+};
+
struct xfs_trans_resv {
- uint tr_write; /* extent alloc trans */
- uint tr_itruncate; /* truncate trans */
- uint tr_rename; /* rename trans */
- uint tr_link; /* link trans */
- uint tr_remove; /* unlink trans */
- uint tr_symlink; /* symlink trans */
- uint tr_create; /* create trans */
- uint tr_mkdir; /* mkdir trans */
- uint tr_ifree; /* inode free trans */
- uint tr_ichange; /* inode update trans */
- uint tr_growdata; /* fs data section grow trans */
- uint tr_swrite; /* sync write inode trans */
- uint tr_addafork; /* cvt inode to attributed trans */
- uint tr_writeid; /* write setuid/setgid file */
- uint tr_attrinval; /* attr fork buffer invalidation */
- uint tr_attrsetm; /* set/create an attribute at mount time */
- uint tr_attrsetrt; /* set/create an attribute at runtime */
- uint tr_attrrm; /* remove an attribute */
- uint tr_clearagi; /* clear bad agi unlinked ino bucket */
- uint tr_growrtalloc; /* grow realtime allocations */
- uint tr_growrtzero; /* grow realtime zeroing */
- uint tr_growrtfree; /* grow realtime freeing */
- uint tr_qm_sbchange; /* change quota flags */
- uint tr_qm_setqlim; /* adjust quota limits */
- uint tr_qm_dqalloc; /* allocate quota on disk */
- uint tr_qm_quotaoff; /* turn quota off */
- uint tr_qm_equotaoff;/* end of turn quota off */
- uint tr_sb; /* modify superblock */
+ struct xfs_trans_res tr_write; /* extent alloc trans */
+ struct xfs_trans_res tr_itruncate; /* truncate trans */
+ struct xfs_trans_res tr_rename; /* rename trans */
+ struct xfs_trans_res tr_link; /* link trans */
+ struct xfs_trans_res tr_remove; /* unlink trans */
+ struct xfs_trans_res tr_symlink; /* symlink trans */
+ struct xfs_trans_res tr_create; /* create trans */
+ struct xfs_trans_res tr_mkdir; /* mkdir trans */
+ struct xfs_trans_res tr_ifree; /* inode free trans */
+ struct xfs_trans_res tr_ichange; /* inode update trans */
+ struct xfs_trans_res tr_growdata; /* fs data section grow trans */
+ struct xfs_trans_res tr_swrite; /* sync write inode trans */
+ struct xfs_trans_res tr_addafork; /* add inode attr fork trans */
+ struct xfs_trans_res tr_writeid; /* write setuid/setgid file */
+ struct xfs_trans_res tr_attrinval; /* attr fork buffer
+ * invalidation */
+ struct xfs_trans_res tr_attrsetm; /* set/create an attribute at
+ * mount time */
+ struct xfs_trans_res tr_attrsetrt; /* set/create an attribute at
+ * runtime */
+ struct xfs_trans_res tr_attrrm; /* remove an attribute */
+ struct xfs_trans_res tr_clearagi; /* clear agi unlinked bucket */
+ struct xfs_trans_res tr_growrtalloc; /* grow realtime allocations */
+ struct xfs_trans_res tr_growrtzero; /* grow realtime zeroing */
+ struct xfs_trans_res tr_growrtfree; /* grow realtime freeing */
+ struct xfs_trans_res tr_qm_sbchange; /* change quota flags */
+ struct xfs_trans_res tr_qm_setqlim; /* adjust quota limits */
+ struct xfs_trans_res tr_qm_dqalloc; /* allocate quota on disk */
+ struct xfs_trans_res tr_qm_quotaoff; /* turn quota off */
+ struct xfs_trans_res tr_qm_equotaoff;/* end of turn quota off */
+ struct xfs_trans_res tr_sb; /* modify superblock */
};
/*
@@ -77,39 +87,40 @@ struct xfs_trans_resv {
XFS_DAENTER_BMAPS(mp, XFS_DATA_FORK) + 1)
-#define XFS_WRITE_LOG_RES(mp) ((mp)->m_reservations.tr_write)
-#define XFS_ITRUNCATE_LOG_RES(mp) ((mp)->m_reservations.tr_itruncate)
-#define XFS_RENAME_LOG_RES(mp) ((mp)->m_reservations.tr_rename)
-#define XFS_LINK_LOG_RES(mp) ((mp)->m_reservations.tr_link)
-#define XFS_REMOVE_LOG_RES(mp) ((mp)->m_reservations.tr_remove)
-#define XFS_SYMLINK_LOG_RES(mp) ((mp)->m_reservations.tr_symlink)
-#define XFS_CREATE_LOG_RES(mp) ((mp)->m_reservations.tr_create)
-#define XFS_MKDIR_LOG_RES(mp) ((mp)->m_reservations.tr_mkdir)
-#define XFS_IFREE_LOG_RES(mp) ((mp)->m_reservations.tr_ifree)
-#define XFS_ICHANGE_LOG_RES(mp) ((mp)->m_reservations.tr_ichange)
-#define XFS_GROWDATA_LOG_RES(mp) ((mp)->m_reservations.tr_growdata)
-#define XFS_GROWRTALLOC_LOG_RES(mp) ((mp)->m_reservations.tr_growrtalloc)
-#define XFS_GROWRTZERO_LOG_RES(mp) ((mp)->m_reservations.tr_growrtzero)
-#define XFS_GROWRTFREE_LOG_RES(mp) ((mp)->m_reservations.tr_growrtfree)
-#define XFS_SWRITE_LOG_RES(mp) ((mp)->m_reservations.tr_swrite)
+#define XFS_WRITE_LOG_RES(mp) ((mp)->m_resv.tr_write.tr_logres)
+#define XFS_RENAME_LOG_RES(mp) ((mp)->m_resv.tr_rename.tr_logres)
+#define XFS_LINK_LOG_RES(mp) ((mp)->m_resv.tr_link.tr_logres)
+#define XFS_REMOVE_LOG_RES(mp) ((mp)->m_resv.tr_remove.tr_logres)
+#define XFS_SYMLINK_LOG_RES(mp) ((mp)->m_resv.tr_symlink.tr_logres)
+#define XFS_CREATE_LOG_RES(mp) ((mp)->m_resv.tr_create.tr_logres)
+#define XFS_MKDIR_LOG_RES(mp) ((mp)->m_resv.tr_mkdir.tr_logres)
+#define XFS_IFREE_LOG_RES(mp) ((mp)->m_resv.tr_ifree.tr_logres)
+#define XFS_SWRITE_LOG_RES(mp) ((mp)->m_resv.tr_swrite.tr_logres)
+#define XFS_ICHANGE_LOG_RES(mp) ((mp)->m_resv.tr_ichange.tr_logres)
+#define XFS_GROWDATA_LOG_RES(mp) ((mp)->m_resv.tr_growdata.tr_logres)
+#define XFS_ITRUNCATE_LOG_RES(mp) ((mp)->m_resv.tr_itruncate.tr_logres)
+#define XFS_GROWRTZERO_LOG_RES(mp) ((mp)->m_resv.tr_growrtzero.tr_logres)
+#define XFS_GROWRTFREE_LOG_RES(mp) ((mp)->m_resv.tr_growrtfree.tr_logres)
+#define XFS_GROWRTALLOC_LOG_RES(mp) ((mp)->m_resv.tr_growrtalloc.tr_logres)
+
/*
* Logging the inode timestamps on an fsync -- same as SWRITE
* as long as SWRITE logs the entire inode core
*/
-#define XFS_FSYNC_TS_LOG_RES(mp) ((mp)->m_reservations.tr_swrite)
-#define XFS_WRITEID_LOG_RES(mp) ((mp)->m_reservations.tr_swrite)
-#define XFS_ADDAFORK_LOG_RES(mp) ((mp)->m_reservations.tr_addafork)
-#define XFS_ATTRINVAL_LOG_RES(mp) ((mp)->m_reservations.tr_attrinval)
-#define XFS_ATTRSETM_LOG_RES(mp) ((mp)->m_reservations.tr_attrsetm)
-#define XFS_ATTRSETRT_LOG_RES(mp) ((mp)->m_reservations.tr_attrsetrt)
-#define XFS_ATTRRM_LOG_RES(mp) ((mp)->m_reservations.tr_attrrm)
-#define XFS_CLEAR_AGI_BUCKET_LOG_RES(mp) ((mp)->m_reservations.tr_clearagi)
-#define XFS_QM_SBCHANGE_LOG_RES(mp) ((mp)->m_reservations.tr_qm_sbchange)
-#define XFS_QM_SETQLIM_LOG_RES(mp) ((mp)->m_reservations.tr_qm_setqlim)
-#define XFS_QM_DQALLOC_LOG_RES(mp) ((mp)->m_reservations.tr_qm_dqalloc)
-#define XFS_QM_QUOTAOFF_LOG_RES(mp) ((mp)->m_reservations.tr_qm_quotaoff)
-#define XFS_QM_QUOTAOFF_END_LOG_RES(mp) ((mp)->m_reservations.tr_qm_equotaoff)
-#define XFS_SB_LOG_RES(mp) ((mp)->m_reservations.tr_sb)
+#define XFS_FSYNC_TS_LOG_RES(mp) ((mp)->m_resv.tr_swrite.tr_logres)
+#define XFS_WRITEID_LOG_RES(mp) ((mp)->m_resv.tr_swrite.tr_logres)
+#define XFS_ADDAFORK_LOG_RES(mp) ((mp)->m_resv.tr_addafork.tr_logres)
+#define XFS_ATTRSETM_LOG_RES(mp) ((mp)->m_resv.tr_attrsetm.tr_logres)
+#define XFS_ATTRINVAL_LOG_RES(mp) ((mp)->m_resv.tr_attrinval.tr_logres)
+#define XFS_ATTRSETRT_LOG_RES(mp) ((mp)->m_resv.tr_attrsetrt.tr_logres)
+#define XFS_ATTRRM_LOG_RES(mp) ((mp)->m_resv.tr_attrrm.tr_logres)
+#define XFS_SB_LOG_RES(mp) ((mp)->m_resv.tr_sb.tr_logres)
+#define XFS_QM_SETQLIM_LOG_RES(mp) ((mp)->m_resv.tr_qm_setqlim.tr_logres)
+#define XFS_QM_DQALLOC_LOG_RES(mp) ((mp)->m_resv.tr_qm_dqalloc.tr_logres)
+#define XFS_QM_SBCHANGE_LOG_RES(mp) ((mp)->m_resv.tr_qm_sbchange.tr_logres)
+#define XFS_QM_QUOTAOFF_LOG_RES(mp) ((mp)->m_resv.tr_qm_quotaoff.tr_logres)
+#define XFS_QM_QUOTAOFF_END_LOG_RES(mp) ((mp)->m_resv.tr_qm_equotaoff.tr_logres)
+#define XFS_CLEAR_AGI_BUCKET_LOG_RES(mp) ((mp)->m_resv.tr_clearagi.tr_logres)
/*
* Various log count values.
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 54/60] xfs: Introduce tr_fsyncts to m_reservation
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (52 preceding siblings ...)
2013-06-19 4:51 ` [PATCH 53/60] xfs: Introduce a new structure to hold transaction reservation items Dave Chinner
@ 2013-06-19 4:51 ` Dave Chinner
2013-06-19 4:51 ` [PATCH 55/60] xfs: Make writeid transaction use tr_writeid Dave Chinner
` (7 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:51 UTC (permalink / raw)
To: xfs
From: Jie Liu <jeff.liu@oracle.com>
A preparation step.
For now fsync_ts transaction use the pre-calculated log reservation
size of tr_swrite.
This patch introduce a new item tr_fsyncts to mp->m_reservations
structure so that we can fetch the log reservation value for it
in a same manner to others.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
---
fs/xfs/xfs_trans_resv.c | 1 +
fs/xfs/xfs_trans_resv.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
index 7d841c8..e46791a 100644
--- a/fs/xfs/xfs_trans_resv.c
+++ b/fs/xfs/xfs_trans_resv.c
@@ -763,6 +763,7 @@ xfs_trans_resv_calc(
resp->tr_ichange.tr_logres = xfs_calc_ichange_reservation(mp);
resp->tr_growdata.tr_logres = xfs_calc_growdata_reservation(mp);
resp->tr_swrite.tr_logres = xfs_calc_swrite_reservation(mp);
+ resp->tr_fsyncts.tr_logres = xfs_calc_swrite_reservation(mp);
resp->tr_writeid.tr_logres = xfs_calc_writeid_reservation(mp);
resp->tr_attrsetrt.tr_logres = xfs_calc_attrsetrt_reservation(mp);
resp->tr_clearagi.tr_logres = xfs_calc_clear_agi_bucket_reservation(mp);
diff --git a/fs/xfs/xfs_trans_resv.h b/fs/xfs/xfs_trans_resv.h
index e5e32c7..785128f 100644
--- a/fs/xfs/xfs_trans_resv.h
+++ b/fs/xfs/xfs_trans_resv.h
@@ -60,6 +60,7 @@ struct xfs_trans_resv {
struct xfs_trans_res tr_qm_quotaoff; /* turn quota off */
struct xfs_trans_res tr_qm_equotaoff;/* end of turn quota off */
struct xfs_trans_res tr_sb; /* modify superblock */
+ struct xfs_trans_res tr_fsyncts; /* update timestamps on fsync */
};
/*
@@ -107,7 +108,7 @@ struct xfs_trans_resv {
* Logging the inode timestamps on an fsync -- same as SWRITE
* as long as SWRITE logs the entire inode core
*/
-#define XFS_FSYNC_TS_LOG_RES(mp) ((mp)->m_resv.tr_swrite.tr_logres)
+#define XFS_FSYNC_TS_LOG_RES(mp) ((mp)->m_resv.tr_fsyncts.tr_logres)
#define XFS_WRITEID_LOG_RES(mp) ((mp)->m_resv.tr_swrite.tr_logres)
#define XFS_ADDAFORK_LOG_RES(mp) ((mp)->m_resv.tr_addafork.tr_logres)
#define XFS_ATTRSETM_LOG_RES(mp) ((mp)->m_resv.tr_attrsetm.tr_logres)
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 55/60] xfs: Make writeid transaction use tr_writeid
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (53 preceding siblings ...)
2013-06-19 4:51 ` [PATCH 54/60] xfs: Introduce tr_fsyncts to m_reservation Dave Chinner
@ 2013-06-19 4:51 ` Dave Chinner
2013-06-19 4:51 ` [PATCH 56/60] xfs: refactor xfs_trans_reserve() interface Dave Chinner
` (6 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:51 UTC (permalink / raw)
To: xfs
From: Jie Liu <jeff.liu@oracle.com>
tr_writeid is defined at mp->m_resv structure, however, it does not
really being used when it should be..
This patch changes it to tr_writeid to fetch the correct log
reservation size.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_trans_resv.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_trans_resv.h b/fs/xfs/xfs_trans_resv.h
index 785128f..27b64c5 100644
--- a/fs/xfs/xfs_trans_resv.h
+++ b/fs/xfs/xfs_trans_resv.h
@@ -109,7 +109,7 @@ struct xfs_trans_resv {
* as long as SWRITE logs the entire inode core
*/
#define XFS_FSYNC_TS_LOG_RES(mp) ((mp)->m_resv.tr_fsyncts.tr_logres)
-#define XFS_WRITEID_LOG_RES(mp) ((mp)->m_resv.tr_swrite.tr_logres)
+#define XFS_WRITEID_LOG_RES(mp) ((mp)->m_resv.tr_writeid.tr_logres)
#define XFS_ADDAFORK_LOG_RES(mp) ((mp)->m_resv.tr_addafork.tr_logres)
#define XFS_ATTRSETM_LOG_RES(mp) ((mp)->m_resv.tr_attrsetm.tr_logres)
#define XFS_ATTRINVAL_LOG_RES(mp) ((mp)->m_resv.tr_attrinval.tr_logres)
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 56/60] xfs: refactor xfs_trans_reserve() interface
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (54 preceding siblings ...)
2013-06-19 4:51 ` [PATCH 55/60] xfs: Make writeid transaction use tr_writeid Dave Chinner
@ 2013-06-19 4:51 ` Dave Chinner
2013-06-19 4:51 ` [PATCH 57/60] xfs: Get rid of all XFS_XXX_LOG_RES() macro Dave Chinner
` (5 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:51 UTC (permalink / raw)
To: xfs
From: Jie Liu <jeff.liu@oracle.com>
With the new xfs_trans_res structure has been introduced, the log
reservation size, log count as well as log flags are pre-initialized
at mount time. So it's time to refine xfs_trans_reserve() interface
to be more neat.
Also, introduce a new helper M_RES() to return a pointer to the
mp->m_resv structure to simplify the input.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_aops.c | 2 +-
fs/xfs/xfs_attr.c | 33 ++++++++--------
fs/xfs/xfs_attr_inactive.c | 5 +--
fs/xfs/xfs_bmap.c | 4 +-
fs/xfs/xfs_bmap_util.c | 13 ++++---
fs/xfs/xfs_dquot.c | 6 +--
fs/xfs/xfs_extent_ops.c | 24 ++++--------
fs/xfs/xfs_fsops.c | 8 ++--
fs/xfs/xfs_inode_ops.c | 93 ++++++++++++++++++--------------------------
fs/xfs/xfs_ioctl.c | 4 +-
fs/xfs/xfs_iomap.c | 18 +++------
fs/xfs/xfs_iops.c | 8 ++--
fs/xfs/xfs_log_recover.c | 5 +--
fs/xfs/xfs_mount.c | 9 ++---
fs/xfs/xfs_qm.c | 11 ++----
fs/xfs/xfs_qm_syscalls.c | 13 ++-----
fs/xfs/xfs_rtalloc.c | 17 ++++----
fs/xfs/xfs_symlink.c | 10 ++---
fs/xfs/xfs_trans.c | 51 ++++++++++++------------
fs/xfs/xfs_trans.h | 3 +-
fs/xfs/xfs_trans_resv.c | 7 ++--
fs/xfs/xfs_trans_resv.h | 3 ++
22 files changed, 150 insertions(+), 197 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 69fdde8..3935e28 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -116,7 +116,7 @@ xfs_setfilesize_trans_alloc(
tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
- error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
return error;
diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index f4cc54c..9d907ff 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -226,13 +226,14 @@ xfs_attr_set_int(
int valuelen,
int flags)
{
- xfs_da_args_t args;
- xfs_fsblock_t firstblock;
- xfs_bmap_free_t flist;
- int error, err2, committed;
- xfs_mount_t *mp = dp->i_mount;
- int rsvd = (flags & ATTR_ROOT) != 0;
- int local;
+ xfs_da_args_t args;
+ xfs_fsblock_t firstblock;
+ xfs_bmap_free_t flist;
+ int error, err2, committed;
+ struct xfs_mount *mp = dp->i_mount;
+ struct xfs_trans_res tres;
+ int rsvd = (flags & ATTR_ROOT) != 0;
+ int local;
/*
* Attach the dquots to the inode.
@@ -292,11 +293,11 @@ xfs_attr_set_int(
if (rsvd)
args.trans->t_flags |= XFS_TRANS_RESERVE;
- error = xfs_trans_reserve(args.trans, args.total,
- XFS_ATTRSETM_LOG_RES(mp) +
- XFS_ATTRSETRT_LOG_RES(mp) * args.total,
- 0, XFS_TRANS_PERM_LOG_RES,
- XFS_ATTRSET_LOG_COUNT);
+ tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
+ M_RES(mp)->tr_attrsetrt.tr_logres * args.total;
+ tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
+ tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
+ error = xfs_trans_reserve(args.trans, &tres, args.total, 0);
if (error) {
xfs_trans_cancel(args.trans, 0);
return(error);
@@ -516,11 +517,9 @@ xfs_attr_remove_int(xfs_inode_t *dp, struct xfs_name *name, int flags)
if (flags & ATTR_ROOT)
args.trans->t_flags |= XFS_TRANS_RESERVE;
- if ((error = xfs_trans_reserve(args.trans,
- XFS_ATTRRM_SPACE_RES(mp),
- XFS_ATTRRM_LOG_RES(mp),
- 0, XFS_TRANS_PERM_LOG_RES,
- XFS_ATTRRM_LOG_COUNT))) {
+ error = xfs_trans_reserve(args.trans, &M_RES(mp)->tr_attrrm,
+ XFS_ATTRRM_SPACE_RES(mp), 0);
+ if (error) {
xfs_trans_cancel(args.trans, 0);
return(error);
}
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index a19439b..23cffe4 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -412,9 +412,8 @@ xfs_attr_inactive(xfs_inode_t *dp)
* the log.
*/
trans = xfs_trans_alloc(mp, XFS_TRANS_ATTRINVAL);
- if ((error = xfs_trans_reserve(trans, 0, XFS_ATTRINVAL_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_ATTRINVAL_LOG_COUNT))) {
+ error = xfs_trans_reserve(trans, &M_RES(mp)->tr_attrinval, 0, 0);
+ if (error) {
xfs_trans_cancel(trans, 0);
return(error);
}
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 2be9a04..b2a4f29 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -1148,8 +1148,8 @@ xfs_bmap_add_attrfork(
blks = XFS_ADDAFORK_SPACE_RES(mp);
if (rsvd)
tp->t_flags |= XFS_TRANS_RESERVE;
- if ((error = xfs_trans_reserve(tp, blks, XFS_ADDAFORK_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_ADDAFORK_LOG_COUNT)))
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_addafork, blks, 0);
+ if (error)
goto error0;
xfs_ilock(ip, XFS_ILOCK_EXCL);
error = xfs_trans_reserve_quota_nblks(tp, ip, blks, 0, rsvd ?
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 0571a1c..819d17c 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -76,8 +76,7 @@ xfs_bmap_finish(
xfs_efi_log_item_t *efi; /* extent free intention */
int error; /* error return value */
xfs_bmap_free_item_t *free; /* free extent item */
- unsigned int logres; /* new log reservation */
- unsigned int logcount; /* new log count */
+ struct xfs_trans_res tres; /* new log reservation */
xfs_mount_t *mp; /* filesystem mount structure */
xfs_bmap_free_item_t *next; /* next item on free list */
xfs_trans_t *ntp; /* new transaction pointer */
@@ -92,8 +91,10 @@ xfs_bmap_finish(
for (free = flist->xbf_first; free; free = free->xbfi_next)
xfs_trans_log_efi_extent(ntp, efi, free->xbfi_startblock,
free->xbfi_blockcount);
- logres = ntp->t_log_res;
- logcount = ntp->t_log_count;
+
+ tres.tr_logres = ntp->t_log_res;
+ tres.tr_logcount = ntp->t_log_count;
+ tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
ntp = xfs_trans_dup(*tp);
error = xfs_trans_commit(*tp, 0);
*tp = ntp;
@@ -111,8 +112,8 @@ xfs_bmap_finish(
*/
xfs_log_ticket_put(ntp->t_ticket);
- if ((error = xfs_trans_reserve(ntp, 0, logres, 0, XFS_TRANS_PERM_LOG_RES,
- logcount)))
+ error = xfs_trans_reserve(ntp, &tres, 0, 0);
+ if (error)
return error;
efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
for (free = flist->xbf_first; free != NULL; free = next) {
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 464f91c..19c24ea 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -711,10 +711,8 @@ xfs_qm_dqread(
if (flags & XFS_QMOPT_DQALLOC) {
tp = xfs_trans_alloc(mp, XFS_TRANS_QM_DQALLOC);
- error = xfs_trans_reserve(tp, XFS_QM_DQALLOC_SPACE_RES(mp),
- XFS_QM_DQALLOC_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_WRITE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_attrsetm,
+ XFS_QM_DQALLOC_SPACE_RES(mp), 0);
if (error)
goto error1;
cancelflags = XFS_TRANS_RELEASE_LOG_RES;
diff --git a/fs/xfs/xfs_extent_ops.c b/fs/xfs/xfs_extent_ops.c
index f7e1f14..a95905d 100644
--- a/fs/xfs/xfs_extent_ops.c
+++ b/fs/xfs/xfs_extent_ops.c
@@ -163,10 +163,8 @@ xfs_alloc_file_space(
* Allocate and setup the transaction.
*/
tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
- error = xfs_trans_reserve(tp, resblks,
- XFS_WRITE_LOG_RES(mp), resrtextents,
- XFS_TRANS_PERM_LOG_RES,
- XFS_WRITE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write,
+ resblks, resrtextents);
/*
* Check for running out of space
*/
@@ -456,12 +454,7 @@ xfs_free_file_space(
*/
tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
tp->t_flags |= XFS_TRANS_RESERVE;
- error = xfs_trans_reserve(tp,
- resblks,
- XFS_WRITE_LOG_RES(mp),
- 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_WRITE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, resblks, 0);
/*
* check for running out of space
@@ -735,10 +728,8 @@ xfs_change_file_space(
* update the inode timestamp, mode, and prealloc flag bits
*/
tp = xfs_trans_alloc(mp, XFS_TRANS_WRITEID);
-
- if ((error = xfs_trans_reserve(tp, 0, XFS_WRITEID_LOG_RES(mp),
- 0, 0, 0))) {
- /* ASSERT(0); */
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_writeid, 0, 0);
+ if (error) {
xfs_trans_cancel(tp, 0);
return error;
}
@@ -1265,9 +1256,8 @@ xfs_swap_extents(
truncate_pagecache_range(VFS_I(ip), 0, -1);
tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT);
- if ((error = xfs_trans_reserve(tp, 0,
- XFS_ICHANGE_LOG_RES(mp), 0,
- 0, 0))) {
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
+ if (error) {
xfs_iunlock(ip, XFS_IOLOCK_EXCL);
xfs_iunlock(tip, XFS_IOLOCK_EXCL);
xfs_trans_cancel(tp, 0);
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 3c3644e..f430f7a 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -203,8 +203,9 @@ xfs_growfs_data_private(
tp = xfs_trans_alloc(mp, XFS_TRANS_GROWFS);
tp->t_flags |= XFS_TRANS_RESERVE;
- if ((error = xfs_trans_reserve(tp, XFS_GROWFS_SPACE_RES(mp),
- XFS_GROWDATA_LOG_RES(mp), 0, 0, 0))) {
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_growdata,
+ XFS_GROWFS_SPACE_RES(mp), 0);
+ if (error) {
xfs_trans_cancel(tp, 0);
return error;
}
@@ -739,8 +740,7 @@ xfs_fs_log_dummy(
int error;
tp = _xfs_trans_alloc(mp, XFS_TRANS_DUMMY1, KM_SLEEP);
- error = xfs_trans_reserve(tp, 0, XFS_SB_LOG_RES(mp), 0, 0,
- XFS_DEFAULT_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_sb, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
return error;
diff --git a/fs/xfs/xfs_inode_ops.c b/fs/xfs/xfs_inode_ops.c
index 2777741..c551a8a 100644
--- a/fs/xfs/xfs_inode_ops.c
+++ b/fs/xfs/xfs_inode_ops.c
@@ -515,8 +515,6 @@ xfs_dir_ialloc(
xfs_inode_t *ip;
xfs_buf_t *ialloc_context = NULL;
int code;
- uint log_res;
- uint log_count;
void *dqinfo;
uint tflags;
@@ -562,6 +560,8 @@ xfs_dir_ialloc(
* to succeed the second time.
*/
if (ialloc_context) {
+ struct xfs_trans_res tres;
+
/*
* Normally, xfs_trans_commit releases all the locks.
* We call bhold to hang on to the ialloc_context across
@@ -574,8 +574,8 @@ xfs_dir_ialloc(
* Save the log reservation so we can use
* them in the next transaction.
*/
- log_res = xfs_trans_get_log_res(tp);
- log_count = xfs_trans_get_log_count(tp);
+ tres.tr_logres = xfs_trans_get_log_res(tp);
+ tres.tr_logcount = xfs_trans_get_log_count(tp);
/*
* We want the quota changes to be associated with the next
@@ -618,8 +618,9 @@ xfs_dir_ialloc(
* reference that we gained in xfs_trans_dup()
*/
xfs_log_ticket_put(tp->t_ticket);
- code = xfs_trans_reserve(tp, 0, log_res, 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
+ tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
+ code = xfs_trans_reserve(tp, &tres, 0, 0);
+
/*
* Re-attach the quota info that we detached from prev trx.
*/
@@ -783,9 +784,8 @@ xfs_create(
prid_t prid;
struct xfs_dquot *udqp = NULL;
struct xfs_dquot *gdqp = NULL;
+ struct xfs_trans_res tres;
uint resblks;
- uint log_res;
- uint log_count;
trace_xfs_create(dp, name);
@@ -808,13 +808,13 @@ xfs_create(
if (is_dir) {
rdev = 0;
resblks = XFS_MKDIR_SPACE_RES(mp, name->len);
- log_res = XFS_MKDIR_LOG_RES(mp);
- log_count = XFS_MKDIR_LOG_COUNT;
+ tres.tr_logres = M_RES(mp)->tr_mkdir.tr_logres;
+ tres.tr_logcount = XFS_MKDIR_LOG_COUNT;
tp = xfs_trans_alloc(mp, XFS_TRANS_MKDIR);
} else {
resblks = XFS_CREATE_SPACE_RES(mp, name->len);
- log_res = XFS_CREATE_LOG_RES(mp);
- log_count = XFS_CREATE_LOG_COUNT;
+ tres.tr_logres = M_RES(mp)->tr_create.tr_logres;
+ tres.tr_logcount = XFS_CREATE_LOG_COUNT;
tp = xfs_trans_alloc(mp, XFS_TRANS_CREATE);
}
@@ -826,19 +826,17 @@ xfs_create(
* the case we'll drop the one we have and get a more
* appropriate transaction later.
*/
- error = xfs_trans_reserve(tp, resblks, log_res, 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
+ tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
+ error = xfs_trans_reserve(tp, &tres, resblks, 0);
if (error == ENOSPC) {
/* flush outstanding delalloc blocks and retry */
xfs_flush_inodes(mp);
- error = xfs_trans_reserve(tp, resblks, log_res, 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
+ error = xfs_trans_reserve(tp, &tres, resblks, 0);
}
if (error == ENOSPC) {
/* No space at all so try a "no-allocation" reservation */
resblks = 0;
- error = xfs_trans_reserve(tp, 0, log_res, 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
+ error = xfs_trans_reserve(tp, &tres, 0, 0);
}
if (error) {
cancel_flags = 0;
@@ -989,12 +987,10 @@ xfs_link(
tp = xfs_trans_alloc(mp, XFS_TRANS_LINK);
cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
resblks = XFS_LINK_SPACE_RES(mp, target_name->len);
- error = xfs_trans_reserve(tp, resblks, XFS_LINK_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_LINK_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_link, resblks, 0);
if (error == ENOSPC) {
resblks = 0;
- error = xfs_trans_reserve(tp, 0, XFS_LINK_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_LINK_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_link, 0, 0);
}
if (error) {
cancel_flags = 0;
@@ -1169,10 +1165,7 @@ xfs_itruncate_extents(
* reference that we gained in xfs_trans_dup()
*/
xfs_log_ticket_put(tp->t_ticket);
- error = xfs_trans_reserve(tp, 0,
- XFS_ITRUNCATE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_ITRUNCATE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
if (error)
goto out;
}
@@ -1292,10 +1285,7 @@ xfs_free_eofblocks(
}
}
- error = xfs_trans_reserve(tp, 0,
- XFS_ITRUNCATE_LOG_RES(mp),
- 0, XFS_TRANS_PERM_LOG_RES,
- XFS_ITRUNCATE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
if (error) {
ASSERT(XFS_FORCED_SHUTDOWN(mp));
xfs_trans_cancel(tp, 0);
@@ -1437,13 +1427,14 @@ int
xfs_inactive(
xfs_inode_t *ip)
{
- xfs_bmap_free_t free_list;
- xfs_fsblock_t first_block;
- int committed;
- xfs_trans_t *tp;
- xfs_mount_t *mp;
- int error;
- int truncate = 0;
+ xfs_bmap_free_t free_list;
+ xfs_fsblock_t first_block;
+ int committed;
+ struct xfs_trans *tp;
+ struct xfs_mount *mp;
+ struct xfs_trans_res *resp;
+ int error;
+ int truncate = 0;
/*
* If the inode is already free, then there can be nothing
@@ -1487,13 +1478,10 @@ xfs_inactive(
return VN_INACTIVE_CACHE;
tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
- error = xfs_trans_reserve(tp, 0,
- (truncate || S_ISLNK(ip->i_d.di_mode)) ?
- XFS_ITRUNCATE_LOG_RES(mp) :
- XFS_IFREE_LOG_RES(mp),
- 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_ITRUNCATE_LOG_COUNT);
+ resp = (truncate || S_ISLNK(ip->i_d.di_mode)) ?
+ &M_RES(mp)->tr_itruncate : &M_RES(mp)->tr_ifree;
+
+ error = xfs_trans_reserve(tp, resp, 0, 0);
if (error) {
ASSERT(XFS_FORCED_SHUTDOWN(mp));
xfs_trans_cancel(tp, 0);
@@ -1547,10 +1535,7 @@ xfs_inactive(
goto out;
tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
- error = xfs_trans_reserve(tp, 0,
- XFS_IFREE_LOG_RES(mp),
- 0, XFS_TRANS_PERM_LOG_RES,
- XFS_INACTIVE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
goto out;
@@ -2154,12 +2139,10 @@ xfs_remove(
* block from the directory.
*/
resblks = XFS_REMOVE_SPACE_RES(mp);
- error = xfs_trans_reserve(tp, resblks, XFS_REMOVE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_remove, resblks, 0);
if (error == ENOSPC) {
resblks = 0;
- error = xfs_trans_reserve(tp, 0, XFS_REMOVE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, log_count);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_remove, 0, 0);
}
if (error) {
ASSERT(error != ENOSPC);
@@ -2792,12 +2775,10 @@ xfs_rename(
tp = xfs_trans_alloc(mp, XFS_TRANS_RENAME);
cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
spaceres = XFS_RENAME_SPACE_RES(mp, target_name->len);
- error = xfs_trans_reserve(tp, spaceres, XFS_RENAME_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_RENAME_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_rename, spaceres, 0);
if (error == ENOSPC) {
spaceres = 0;
- error = xfs_trans_reserve(tp, 0, XFS_RENAME_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_RENAME_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_rename, 0, 0);
}
if (error) {
xfs_trans_cancel(tp, 0);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 6ddab0e..ce2b7df 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -366,7 +366,7 @@ xfs_set_dmattrs(
return XFS_ERROR(EIO);
tp = xfs_trans_alloc(mp, XFS_TRANS_SET_DMATTRS);
- error = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES (mp), 0, 0, 0);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
return error;
@@ -1000,7 +1000,7 @@ xfs_ioctl_setattr(
* first do an error checking pass.
*/
tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
- code = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES(mp), 0, 0, 0);
+ code = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
if (code)
goto error_return;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 770f75c..1688b3b 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -187,10 +187,8 @@ xfs_iomap_write_direct(
* Allocate and setup the transaction
*/
tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
- error = xfs_trans_reserve(tp, resblks,
- XFS_WRITE_LOG_RES(mp), resrtextents,
- XFS_TRANS_PERM_LOG_RES,
- XFS_WRITE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write,
+ resblks, resrtextents);
/*
* Check for running out of space, note: need lock to return
*/
@@ -698,10 +696,8 @@ xfs_iomap_write_allocate(
tp = xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE);
tp->t_flags |= XFS_TRANS_RESERVE;
nres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
- error = xfs_trans_reserve(tp, nres,
- XFS_WRITE_LOG_RES(mp),
- 0, XFS_TRANS_PERM_LOG_RES,
- XFS_WRITE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write,
+ nres, 0);
if (error) {
xfs_trans_cancel(tp, 0);
return XFS_ERROR(error);
@@ -864,10 +860,8 @@ xfs_iomap_write_unwritten(
sb_start_intwrite(mp->m_super);
tp = _xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE, KM_NOFS);
tp->t_flags |= XFS_TRANS_RESERVE | XFS_TRANS_FREEZE_PROT;
- error = xfs_trans_reserve(tp, resblks,
- XFS_WRITE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_WRITE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write,
+ resblks, 0);
if (error) {
xfs_trans_cancel(tp, 0);
return XFS_ERROR(error);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 29d2108..8131f13 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -545,7 +545,7 @@ xfs_setattr_nonsize(
}
tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
- error = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES(mp), 0, 0, 0);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
if (error)
goto out_dqrele;
@@ -807,9 +807,7 @@ xfs_setattr_size(
goto out_unlock;
tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
- error = xfs_trans_reserve(tp, 0, XFS_ITRUNCATE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_ITRUNCATE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
if (error)
goto out_trans_cancel;
@@ -932,7 +930,7 @@ xfs_vn_update_time(
trace_xfs_update_time(ip);
tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
- error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
return -error;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index f840cb9..821f05c 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3368,7 +3368,7 @@ xlog_recover_process_efi(
}
tp = xfs_trans_alloc(mp, 0);
- error = xfs_trans_reserve(tp, 0, XFS_ITRUNCATE_LOG_RES(mp), 0, 0, 0);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
if (error)
goto abort_error;
efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
@@ -3474,8 +3474,7 @@ xlog_recover_clear_agi_bucket(
int error;
tp = xfs_trans_alloc(mp, XFS_TRANS_CLEAR_AGI_BUCKET);
- error = xfs_trans_reserve(tp, 0, XFS_CLEAR_AGI_BUCKET_LOG_RES(mp),
- 0, 0, 0);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_clearagi, 0, 0);
if (error)
goto out_abort;
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 6846b77..24d2c35 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -607,8 +607,7 @@ xfs_mount_reset_sbqflags(
return 0;
tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SBCHANGE);
- error = xfs_trans_reserve(tp, 0, XFS_QM_SBCHANGE_LOG_RES(mp),
- 0, 0, XFS_DEFAULT_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_qm_sbchange, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
xfs_alert(mp, "%s: Superblock update failed!", __func__);
@@ -1081,8 +1080,7 @@ xfs_log_sbcount(xfs_mount_t *mp)
return 0;
tp = _xfs_trans_alloc(mp, XFS_TRANS_SB_COUNT, KM_SLEEP);
- error = xfs_trans_reserve(tp, 0, XFS_SB_LOG_RES(mp), 0, 0,
- XFS_DEFAULT_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_sb, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
return error;
@@ -1402,8 +1400,7 @@ xfs_mount_log_sb(
XFS_SB_VERSIONNUM));
tp = xfs_trans_alloc(mp, XFS_TRANS_SB_UNIT);
- error = xfs_trans_reserve(tp, 0, XFS_SB_LOG_RES(mp), 0, 0,
- XFS_DEFAULT_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_sb, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
return error;
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 6dddf1a..e11354f 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -781,11 +781,9 @@ xfs_qm_qino_alloc(
int committed;
tp = xfs_trans_alloc(mp, XFS_TRANS_QM_QINOCREATE);
- if ((error = xfs_trans_reserve(tp,
- XFS_QM_QINOCREATE_SPACE_RES(mp),
- XFS_CREATE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_CREATE_LOG_COUNT))) {
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_create,
+ XFS_QM_QINOCREATE_SPACE_RES(mp), 0);
+ if (error) {
xfs_trans_cancel(tp, 0);
return error;
}
@@ -1620,8 +1618,7 @@ xfs_qm_write_sb_changes(
int error;
tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SBCHANGE);
- error = xfs_trans_reserve(tp, 0, XFS_QM_SBCHANGE_LOG_RES(mp),
- 0, 0, XFS_DEFAULT_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_qm_sbchange, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
return error;
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 7c260f8..6e6a56a 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -242,9 +242,7 @@ xfs_qm_scall_trunc_qfile(
xfs_ilock(ip, XFS_IOLOCK_EXCL);
tp = xfs_trans_alloc(mp, XFS_TRANS_TRUNCATE_FILE);
- error = xfs_trans_reserve(tp, 0, XFS_ITRUNCATE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_ITRUNCATE_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
xfs_iunlock(ip, XFS_IOLOCK_EXCL);
@@ -510,8 +508,7 @@ xfs_qm_scall_setqlim(
xfs_dqunlock(dqp);
tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
- error = xfs_trans_reserve(tp, 0, XFS_QM_SETQLIM_LOG_RES(mp),
- 0, 0, XFS_DEFAULT_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_qm_setqlim, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
goto out_rele;
@@ -645,8 +642,7 @@ xfs_qm_log_quotaoff_end(
tp = xfs_trans_alloc(mp, XFS_TRANS_QM_QUOTAOFF_END);
- error = xfs_trans_reserve(tp, 0, XFS_QM_QUOTAOFF_END_LOG_RES(mp),
- 0, 0, XFS_DEFAULT_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_qm_equotaoff, 0, 0);
if (error) {
xfs_trans_cancel(tp, 0);
return (error);
@@ -679,8 +675,7 @@ xfs_qm_log_quotaoff(
uint oldsbqflag=0;
tp = xfs_trans_alloc(mp, XFS_TRANS_QM_QUOTAOFF);
- error = xfs_trans_reserve(tp, 0, XFS_QM_QUOTAOFF_LOG_RES(mp),
- 0, 0, XFS_DEFAULT_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_qm_quotaoff, 0, 0);
if (error)
goto error0;
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 46408df..e8e49de 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -100,10 +100,9 @@ xfs_growfs_rt_alloc(
/*
* Reserve space & log for one extent added to the file.
*/
- if ((error = xfs_trans_reserve(tp, resblks,
- XFS_GROWRTALLOC_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES,
- XFS_DEFAULT_PERM_LOG_COUNT)))
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_growdata,
+ resblks, 0);
+ if (error)
goto error_cancel;
cancelflags = XFS_TRANS_RELEASE_LOG_RES;
/*
@@ -146,8 +145,9 @@ xfs_growfs_rt_alloc(
/*
* Reserve log for one block zeroing.
*/
- if ((error = xfs_trans_reserve(tp, 0,
- XFS_GROWRTZERO_LOG_RES(mp), 0, 0, 0)))
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_growrtzero,
+ 0, 0);
+ if (error)
goto error_cancel;
/*
* Lock the bitmap inode.
@@ -1957,8 +1957,9 @@ xfs_growfs_rt(
* Start a transaction, get the log reservation.
*/
tp = xfs_trans_alloc(mp, XFS_TRANS_GROWFSRT_FREE);
- if ((error = xfs_trans_reserve(tp, 0,
- XFS_GROWRTFREE_LOG_RES(nmp), 0, 0, 0)))
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_growrtfree,
+ 0, 0);
+ if (error)
goto error_cancel;
/*
* Lock out other callers by grabbing the bitmap inode lock.
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 6d30954..5e8f83f 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -229,12 +229,10 @@ xfs_symlink(
else
fs_blocks = xfs_symlink_blocks(mp, pathlen);
resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
- error = xfs_trans_reserve(tp, resblks, XFS_SYMLINK_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_SYMLINK_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_symlink, resblks, 0);
if (error == ENOSPC && fs_blocks == 0) {
resblks = 0;
- error = xfs_trans_reserve(tp, 0, XFS_SYMLINK_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_SYMLINK_LOG_COUNT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_symlink, 0, 0);
}
if (error) {
cancel_flags = 0;
@@ -534,8 +532,8 @@ xfs_inactive_symlink_rmt(
* Put an itruncate log reservation in the new transaction
* for our caller.
*/
- if ((error = xfs_trans_reserve(tp, 0, XFS_ITRUNCATE_LOG_RES(mp), 0,
- XFS_TRANS_PERM_LOG_RES, XFS_ITRUNCATE_LOG_COUNT))) {
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
+ if (error) {
ASSERT(XFS_FORCED_SHUTDOWN(mp));
goto error0;
}
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 52e9219..2450418e 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -56,7 +56,7 @@ void
xfs_trans_init(
struct xfs_mount *mp)
{
- xfs_trans_resv_calc(mp, &mp->m_resv);
+ xfs_trans_resv_calc(mp, M_RES(mp));
}
/*
@@ -180,12 +180,10 @@ xfs_trans_dup(
*/
int
xfs_trans_reserve(
- xfs_trans_t *tp,
- uint blocks,
- uint logspace,
- uint rtextents,
- uint flags,
- uint logcount)
+ struct xfs_trans *tp,
+ struct xfs_trans_res *resp,
+ uint blocks,
+ uint rtextents)
{
int error = 0;
int rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
@@ -211,13 +209,15 @@ xfs_trans_reserve(
/*
* Reserve the log space needed for this transaction.
*/
- if (logspace > 0) {
+ if (resp->tr_logres > 0) {
bool permanent = false;
- ASSERT(tp->t_log_res == 0 || tp->t_log_res == logspace);
- ASSERT(tp->t_log_count == 0 || tp->t_log_count == logcount);
+ ASSERT(tp->t_log_res == 0 ||
+ tp->t_log_res == resp->tr_logres);
+ ASSERT(tp->t_log_count == 0 ||
+ tp->t_log_count == resp->tr_logcount);
- if (flags & XFS_TRANS_PERM_LOG_RES) {
+ if (resp->tr_logflags & XFS_TRANS_PERM_LOG_RES) {
tp->t_flags |= XFS_TRANS_PERM_LOG_RES;
permanent = true;
} else {
@@ -226,20 +226,21 @@ xfs_trans_reserve(
}
if (tp->t_ticket != NULL) {
- ASSERT(flags & XFS_TRANS_PERM_LOG_RES);
+ ASSERT(resp->tr_logflags & XFS_TRANS_PERM_LOG_RES);
error = xfs_log_regrant(tp->t_mountp, tp->t_ticket);
} else {
- error = xfs_log_reserve(tp->t_mountp, logspace,
- logcount, &tp->t_ticket,
- XFS_TRANSACTION, permanent,
- tp->t_type);
+ error = xfs_log_reserve(tp->t_mountp,
+ resp->tr_logres,
+ resp->tr_logcount,
+ &tp->t_ticket, XFS_TRANSACTION,
+ permanent, tp->t_type);
}
if (error)
goto undo_blocks;
- tp->t_log_res = logspace;
- tp->t_log_count = logcount;
+ tp->t_log_res = resp->tr_logres;
+ tp->t_log_count = resp->tr_logcount;
}
/*
@@ -264,10 +265,10 @@ xfs_trans_reserve(
* reservations which have already been performed.
*/
undo_log:
- if (logspace > 0) {
+ if (resp->tr_logres > 0) {
int log_flags;
- if (flags & XFS_TRANS_PERM_LOG_RES) {
+ if (resp->tr_logflags & XFS_TRANS_PERM_LOG_RES) {
log_flags = XFS_LOG_REL_PERM_RESERV;
} else {
log_flags = 0;
@@ -1014,7 +1015,7 @@ xfs_trans_roll(
struct xfs_inode *dp)
{
struct xfs_trans *trans;
- unsigned int logres, count;
+ struct xfs_trans_res tres;
int error;
/*
@@ -1026,8 +1027,8 @@ xfs_trans_roll(
/*
* Copy the critical parameters from one trans to the next.
*/
- logres = trans->t_log_res;
- count = trans->t_log_count;
+ tres.tr_logres = trans->t_log_res;
+ tres.tr_logcount = trans->t_log_count;
*tpp = xfs_trans_dup(trans);
/*
@@ -1058,8 +1059,8 @@ xfs_trans_roll(
* across this call, or that anything that is locked be logged in
* the prior and the next transactions.
*/
- error = xfs_trans_reserve(trans, 0, logres, 0,
- XFS_TRANS_PERM_LOG_RES, count);
+ tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
+ error = xfs_trans_reserve(trans, &tres, 0, 0);
/*
* Ensure that the inode is in the new transaction and locked.
*/
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 21c3e07..fcd31a1 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -33,6 +33,7 @@ struct xfs_log_iovec;
struct xfs_log_item_desc;
struct xfs_mount;
struct xfs_trans;
+struct xfs_trans_res;
struct xfs_dquot_acct;
struct xfs_busy_extent;
@@ -169,7 +170,7 @@ typedef struct xfs_trans {
xfs_trans_t *xfs_trans_alloc(struct xfs_mount *, uint);
xfs_trans_t *_xfs_trans_alloc(struct xfs_mount *, uint, xfs_km_flags_t);
xfs_trans_t *xfs_trans_dup(xfs_trans_t *);
-int xfs_trans_reserve(xfs_trans_t *, uint, uint, uint,
+int xfs_trans_reserve(struct xfs_trans *, struct xfs_trans_res *,
uint, uint);
void xfs_trans_mod_sb(xfs_trans_t *, uint, int64_t);
diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
index e46791a..b351275 100644
--- a/fs/xfs/xfs_trans_resv.c
+++ b/fs/xfs/xfs_trans_resv.c
@@ -551,7 +551,8 @@ xfs_calc_attrsetm_reservation(
* Since the runtime attribute transaction space is dependent on the total
* blocks needed for the 1st bmap, here we calculate out the space unit for
* one block so that the caller could figure out the total space according
- * to the attibute extent length in blocks by: ext * XFS_ATTRSETRT_LOG_RES(mp).
+ * to the attibute extent length in blocks by:
+ * ext * M_RES(mp)->tr_attrsetrt.tr_logres
*/
STATIC uint
xfs_calc_attrsetrt_reservation(
@@ -623,14 +624,14 @@ xfs_calc_qm_setqlim_reservation(
/*
* Allocating quota on disk if needed.
- * the write transaction log space: XFS_WRITE_LOG_RES(mp)
+ * the write transaction log space: M_RES(mp)->tr_write.tr_logres
* the unit of quota allocation: one system block size
*/
STATIC uint
xfs_calc_qm_dqalloc_reservation(
struct xfs_mount *mp)
{
- return XFS_WRITE_LOG_RES(mp) +
+ return M_RES(mp)->tr_write.tr_logres +
xfs_calc_buf_res(1,
XFS_FSB_TO_B(mp, XFS_DQUOT_CLUSTER_SIZE_FSB) - 1);
}
diff --git a/fs/xfs/xfs_trans_resv.h b/fs/xfs/xfs_trans_resv.h
index 27b64c5..e62a804 100644
--- a/fs/xfs/xfs_trans_resv.h
+++ b/fs/xfs/xfs_trans_resv.h
@@ -63,6 +63,9 @@ struct xfs_trans_resv {
struct xfs_trans_res tr_fsyncts; /* update timestamps on fsync */
};
+/* shorthand way of accessing reservation structure */
+#define M_RES(mp) (&(mp)->m_resv)
+
/*
* Per-extent log reservation for the allocation btree changes
* involved in freeing or allocating an extent.
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 57/60] xfs: Get rid of all XFS_XXX_LOG_RES() macro
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (55 preceding siblings ...)
2013-06-19 4:51 ` [PATCH 56/60] xfs: refactor xfs_trans_reserve() interface Dave Chinner
@ 2013-06-19 4:51 ` Dave Chinner
2013-06-19 4:51 ` [PATCH 58/60] xfs: Refactor xfs_ticket_alloc() to extract a new helper Dave Chinner
` (4 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:51 UTC (permalink / raw)
To: xfs
From: Jie Liu <jeff.liu@oracle.com>
Get rid of all XFS_XXX_LOG_RES() macros since they are obsoleted now.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_trans_resv.h | 36 ------------------------------------
1 file changed, 36 deletions(-)
diff --git a/fs/xfs/xfs_trans_resv.h b/fs/xfs/xfs_trans_resv.h
index e62a804..95ff606 100644
--- a/fs/xfs/xfs_trans_resv.h
+++ b/fs/xfs/xfs_trans_resv.h
@@ -90,42 +90,6 @@ struct xfs_trans_resv {
(XFS_DAENTER_BLOCKS(mp, XFS_DATA_FORK) + \
XFS_DAENTER_BMAPS(mp, XFS_DATA_FORK) + 1)
-
-#define XFS_WRITE_LOG_RES(mp) ((mp)->m_resv.tr_write.tr_logres)
-#define XFS_RENAME_LOG_RES(mp) ((mp)->m_resv.tr_rename.tr_logres)
-#define XFS_LINK_LOG_RES(mp) ((mp)->m_resv.tr_link.tr_logres)
-#define XFS_REMOVE_LOG_RES(mp) ((mp)->m_resv.tr_remove.tr_logres)
-#define XFS_SYMLINK_LOG_RES(mp) ((mp)->m_resv.tr_symlink.tr_logres)
-#define XFS_CREATE_LOG_RES(mp) ((mp)->m_resv.tr_create.tr_logres)
-#define XFS_MKDIR_LOG_RES(mp) ((mp)->m_resv.tr_mkdir.tr_logres)
-#define XFS_IFREE_LOG_RES(mp) ((mp)->m_resv.tr_ifree.tr_logres)
-#define XFS_SWRITE_LOG_RES(mp) ((mp)->m_resv.tr_swrite.tr_logres)
-#define XFS_ICHANGE_LOG_RES(mp) ((mp)->m_resv.tr_ichange.tr_logres)
-#define XFS_GROWDATA_LOG_RES(mp) ((mp)->m_resv.tr_growdata.tr_logres)
-#define XFS_ITRUNCATE_LOG_RES(mp) ((mp)->m_resv.tr_itruncate.tr_logres)
-#define XFS_GROWRTZERO_LOG_RES(mp) ((mp)->m_resv.tr_growrtzero.tr_logres)
-#define XFS_GROWRTFREE_LOG_RES(mp) ((mp)->m_resv.tr_growrtfree.tr_logres)
-#define XFS_GROWRTALLOC_LOG_RES(mp) ((mp)->m_resv.tr_growrtalloc.tr_logres)
-
-/*
- * Logging the inode timestamps on an fsync -- same as SWRITE
- * as long as SWRITE logs the entire inode core
- */
-#define XFS_FSYNC_TS_LOG_RES(mp) ((mp)->m_resv.tr_fsyncts.tr_logres)
-#define XFS_WRITEID_LOG_RES(mp) ((mp)->m_resv.tr_writeid.tr_logres)
-#define XFS_ADDAFORK_LOG_RES(mp) ((mp)->m_resv.tr_addafork.tr_logres)
-#define XFS_ATTRSETM_LOG_RES(mp) ((mp)->m_resv.tr_attrsetm.tr_logres)
-#define XFS_ATTRINVAL_LOG_RES(mp) ((mp)->m_resv.tr_attrinval.tr_logres)
-#define XFS_ATTRSETRT_LOG_RES(mp) ((mp)->m_resv.tr_attrsetrt.tr_logres)
-#define XFS_ATTRRM_LOG_RES(mp) ((mp)->m_resv.tr_attrrm.tr_logres)
-#define XFS_SB_LOG_RES(mp) ((mp)->m_resv.tr_sb.tr_logres)
-#define XFS_QM_SETQLIM_LOG_RES(mp) ((mp)->m_resv.tr_qm_setqlim.tr_logres)
-#define XFS_QM_DQALLOC_LOG_RES(mp) ((mp)->m_resv.tr_qm_dqalloc.tr_logres)
-#define XFS_QM_SBCHANGE_LOG_RES(mp) ((mp)->m_resv.tr_qm_sbchange.tr_logres)
-#define XFS_QM_QUOTAOFF_LOG_RES(mp) ((mp)->m_resv.tr_qm_quotaoff.tr_logres)
-#define XFS_QM_QUOTAOFF_END_LOG_RES(mp) ((mp)->m_resv.tr_qm_equotaoff.tr_logres)
-#define XFS_CLEAR_AGI_BUCKET_LOG_RES(mp) ((mp)->m_resv.tr_clearagi.tr_logres)
-
/*
* Various log count values.
*/
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 58/60] xfs: Refactor xfs_ticket_alloc() to extract a new helper
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (56 preceding siblings ...)
2013-06-19 4:51 ` [PATCH 57/60] xfs: Get rid of all XFS_XXX_LOG_RES() macro Dave Chinner
@ 2013-06-19 4:51 ` Dave Chinner
2013-06-19 4:51 ` [PATCH 59/60] xfs: Add xfs_log_rlimit.c Dave Chinner
` (3 subsequent siblings)
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:51 UTC (permalink / raw)
To: xfs
From: Jie Liu <jeff.liu@oracle.com>
Refactor xlog_ticket_alloc() to extract a new helper, i.e.
xfs_log_calc_unit_res().
This helper would be used to calculate the total log reservation
size by adding extra log operation/transation headers for a new
log ticket.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 60 ++++++++++++++++++++++++++++++-----------------
fs/xfs/xfs_log_format.h | 4 ++++
2 files changed, 42 insertions(+), 22 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index db08d34..e1f9c72 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3390,24 +3390,17 @@ xfs_log_ticket_get(
}
/*
- * Allocate and initialise a new log ticket.
+ * Figure out the total log space unit (in bytes) that would be
+ * required for a log ticket.
*/
-struct xlog_ticket *
-xlog_ticket_alloc(
- struct xlog *log,
- int unit_bytes,
- int cnt,
- char client,
- bool permanent,
- xfs_km_flags_t alloc_flags)
+int
+xfs_log_calc_unit_res(
+ struct xfs_mount *mp,
+ int unit_bytes)
{
- struct xlog_ticket *tic;
- uint num_headers;
- int iclog_space;
-
- tic = kmem_zone_zalloc(xfs_log_ticket_zone, alloc_flags);
- if (!tic)
- return NULL;
+ struct xlog *log = mp->m_log;
+ int iclog_space;
+ uint num_headers;
/*
* Permanent reservations have up to 'cnt'-1 active log operations
@@ -3482,20 +3475,43 @@ xlog_ticket_alloc(
unit_bytes += log->l_iclog_hsize;
/* for roundoff padding for transaction data and one for commit record */
- if (xfs_sb_version_haslogv2(&log->l_mp->m_sb) &&
- log->l_mp->m_sb.sb_logsunit > 1) {
+ if (xfs_sb_version_haslogv2(&mp->m_sb) && mp->m_sb.sb_logsunit > 1) {
/* log su roundoff */
- unit_bytes += 2*log->l_mp->m_sb.sb_logsunit;
+ unit_bytes += 2 * mp->m_sb.sb_logsunit;
} else {
/* BB roundoff */
- unit_bytes += 2*BBSIZE;
+ unit_bytes += 2 * BBSIZE;
}
+ return unit_bytes;
+}
+
+/*
+ * Allocate and initialise a new log ticket.
+ */
+struct xlog_ticket *
+xlog_ticket_alloc(
+ struct xlog *log,
+ int unit_bytes,
+ int cnt,
+ char client,
+ bool permanent,
+ xfs_km_flags_t alloc_flags)
+{
+ struct xlog_ticket *tic;
+ int unit_res;
+
+ tic = kmem_zone_zalloc(xfs_log_ticket_zone, alloc_flags);
+ if (!tic)
+ return NULL;
+
+ unit_res = xfs_log_calc_unit_res(log->l_mp, unit_bytes);
+
atomic_set(&tic->t_ref, 1);
tic->t_task = current;
INIT_LIST_HEAD(&tic->t_queue);
- tic->t_unit_res = unit_bytes;
- tic->t_curr_res = unit_bytes;
+ tic->t_unit_res = unit_res;
+ tic->t_curr_res = unit_res;
tic->t_cnt = cnt;
tic->t_ocnt = cnt;
tic->t_tid = prandom_u32();
diff --git a/fs/xfs/xfs_log_format.h b/fs/xfs/xfs_log_format.h
index 9f9aeb6..37a7ff9 100644
--- a/fs/xfs/xfs_log_format.h
+++ b/fs/xfs/xfs_log_format.h
@@ -18,6 +18,8 @@
#ifndef __XFS_LOG_FORMAT_H__
#define __XFS_LOG_FORMAT_H__
+struct xfs_mount;
+
typedef __uint32_t xlog_tid_t;
#define XLOG_MIN_ICLOGS 2
@@ -175,4 +177,6 @@ typedef struct xfs_log_iovec {
uint i_type; /* type of region */
} xfs_log_iovec_t;
+int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes);
+
#endif /* __XFS_LOG_FORMAT_H__ */
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* [PATCH 59/60] xfs: Add xfs_log_rlimit.c
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (57 preceding siblings ...)
2013-06-19 4:51 ` [PATCH 58/60] xfs: Refactor xfs_ticket_alloc() to extract a new helper Dave Chinner
@ 2013-06-19 4:51 ` Dave Chinner
2013-06-20 17:24 ` Michael L. Semon
` (2 more replies)
2013-06-19 4:51 ` [PATCH 60/60] xfs: Validate log space at mount time Dave Chinner
` (2 subsequent siblings)
61 siblings, 3 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:51 UTC (permalink / raw)
To: xfs
From: Jie Liu <jeff.liu@oracle.com>
Add source files for xfs_log_rlimit.c The new file is used for log
size calculations and validation shared with userspace.
[dchinner: xfs_log_calc_max_attrsetm_res() does not modify the
tr_attrsetm reservation, just calculates the maximum. ]
[dchinner: rework loop in xfs_log_get_max_trans_res() ]
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1 +
fs/xfs/xfs_log_format.h | 8 ++-
fs/xfs/xfs_log_rlimit.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 153 insertions(+), 1 deletion(-)
create mode 100644 fs/xfs/xfs_log_rlimit.c
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index c5c38a9..22691eb 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -81,6 +81,7 @@ xfs-y += xfs_alloc.o \
xfs_inode_fork.o \
xfs_inode_buf.o \
xfs_log_recover.o \
+ xfs_log_rlimit.o \
xfs_sb.o \
xfs_symlink_remote.o \
xfs_trans_resv.o
diff --git a/fs/xfs/xfs_log_format.h b/fs/xfs/xfs_log_format.h
index 37a7ff9..8f46b6a 100644
--- a/fs/xfs/xfs_log_format.h
+++ b/fs/xfs/xfs_log_format.h
@@ -1,5 +1,6 @@
/*
* Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2013 Jie Liu.
* All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
@@ -19,6 +20,7 @@
#define __XFS_LOG_FORMAT_H__
struct xfs_mount;
+struct xfs_trans_res;
typedef __uint32_t xlog_tid_t;
@@ -41,6 +43,9 @@ typedef __uint32_t xlog_tid_t;
#define XLOG_HEADER_SIZE 512
+/* Minimum number of transactions that must fit in the log (defined by mkfs) */
+#define XFS_MIN_LOG_FACTOR 3
+
#define XLOG_REC_SHIFT(log) \
BTOBB(1 << (xfs_sb_version_haslogv2(&log->l_mp->m_sb) ? \
XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT))
@@ -125,7 +130,6 @@ typedef struct xlog_op_header {
__u16 oh_res2; /* 32 bit align : 2 b */
} xlog_op_header_t;
-
/* valid values for h_fmt */
#define XLOG_FMT_UNKNOWN 0
#define XLOG_FMT_LINUX_LE 1
@@ -178,5 +182,7 @@ typedef struct xfs_log_iovec {
} xfs_log_iovec_t;
int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes);
+int xfs_log_calc_minimum_size(struct xfs_mount *);
+
#endif /* __XFS_LOG_FORMAT_H__ */
diff --git a/fs/xfs/xfs_log_rlimit.c b/fs/xfs/xfs_log_rlimit.c
new file mode 100644
index 0000000..e3f4b4e
--- /dev/null
+++ b/fs/xfs/xfs_log_rlimit.c
@@ -0,0 +1,145 @@
+/*
+ * Copyright (c) 2013 Jie Liu.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_log.h"
+#include "xfs_trans.h"
+#include "xfs_ag.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_trans_space.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr_leaf.h"
+
+/*
+ * Calculate the maximum length in bytes that would be required for a local
+ * attribute value as large attributes out of line are not logged.
+ */
+STATIC int
+xfs_log_calc_max_attrsetm_res(
+ struct xfs_mount *mp)
+{
+ int size;
+ int nblks;
+
+ size = xfs_attr_leaf_entsize_local_max(mp->m_sb.sb_blocksize) -
+ MAXNAMELEN - 1;
+ nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
+ nblks += XFS_B_TO_FSB(mp, size);
+ nblks += XFS_NEXTENTADD_SPACE_RES(mp, size, XFS_ATTR_FORK);
+
+ return M_RES(mp)->tr_attrsetm.tr_logres +
+ M_RES(mp)->tr_attrsetrt.tr_logres * nblks;
+}
+
+/*
+ * Iterate over the log space reservation table to figure out and return
+ * the maximum one in terms of the pre-calculated values which were done
+ * at mount time.
+ */
+STATIC void
+xfs_log_get_max_trans_res(
+ struct xfs_mount *mp,
+ struct xfs_trans_res *max_resp)
+{
+ struct xfs_trans_res *resp;
+ struct xfs_trans_res *end_resp;
+ int log_space = 0;
+ int attr_space;
+
+ attr_space = xfs_log_calc_max_attrsetm_res(mp);
+
+ resp = (struct xfs_trans_res *)M_RES(mp);
+ end_resp = (struct xfs_trans_res *)(M_RES(mp) + 1);
+ for (; resp < end_resp; resp++) {
+ int tmp = resp->tr_logcount > 1 ?
+ resp->tr_logres * resp->tr_logcount :
+ resp->tr_logres;
+ if (log_space < tmp) {
+ log_space = tmp;
+ *max_resp = *resp; /* struct copy */
+ }
+ }
+
+ if (attr_space > log_space) {
+ *max_resp = M_RES(mp)->tr_attrsetm; /* struct copy */
+ max_resp->tr_logres = attr_space;
+ }
+}
+
+/*
+ * Calculate the minimum valid log size for the given superblock configuration.
+ * Used to calculate the minimum log size at mkfs time, and to determine if
+ * the log is large enough or not at mount time. Returns the minimum size in
+ * filesystem block size units.
+ */
+int
+xfs_log_calc_minimum_size(
+ struct xfs_mount *mp)
+{
+ struct xfs_trans_res tres = {0};
+ int max_logres;
+ int min_logblks = 0;
+ int lsunit = 0;
+
+ xfs_log_get_max_trans_res(mp, &tres);
+
+ max_logres = xfs_log_calc_unit_res(mp, tres.tr_logres);
+ if (tres.tr_logcount > 1)
+ max_logres *= tres.tr_logcount;
+
+ if (xfs_sb_version_haslogv2(&mp->m_sb) && mp->m_sb.sb_logsunit > 1)
+ lsunit = BTOBB(mp->m_sb.sb_logsunit);
+
+ /*
+ * Two factors should be taken into account for calculating the minimum
+ * log space.
+ * 1) The fundamental limitation is that no single transaction can be
+ * larger than half size of the log.
+ *
+ * From mkfs.xfs, this is considered by the XFS_MIN_LOG_FACTOR
+ * define, which is set to 3. That means we can definitely fit
+ * maximally sized 2 transactions in the log. We'll use this same
+ * value here.
+ *
+ * 2) If the lsunit option is specified, a transaction requires 2 LSU
+ * for the reservation because there are two log writes that can
+ * require padding - the transaction data and the commit record which
+ * are written separately and both can require padding to the LSU.
+ * Consider that we can have an active CIL reservation holding 2*LSU,
+ * but the CIL is not over a push threshold, in this case, if we
+ * don't have enough log space for at one new transaction, which
+ * includes another 2*LSU in the reservation, we will run into dead
+ * loop situation in log space grant procedure. i.e.
+ * xlog_grant_head_wait().
+ *
+ * Hence the log size needs to be able to contain two maximally sized
+ * and padded transactions, which is (2 * (2 * LSU + maxlres)).
+ *
+ * Also, the log size should be a multiple of the log stripe unit, round
+ * it up to lsunit boundary if lsunit is specified.
+ */
+ if (lsunit)
+ min_logblks = roundup(BTOBB(max_logres), lsunit) + 2 * lsunit;
+ else
+ min_logblks = BTOBB(max_logres);
+ min_logblks *= XFS_MIN_LOG_FACTOR;
+ return XFS_BB_TO_FSB(mp, min_logblks);
+}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 59/60] xfs: Add xfs_log_rlimit.c
2013-06-19 4:51 ` [PATCH 59/60] xfs: Add xfs_log_rlimit.c Dave Chinner
@ 2013-06-20 17:24 ` Michael L. Semon
2013-06-21 6:10 ` Michael L. Semon
2013-06-24 21:26 ` Mark Tinguely
2 siblings, 0 replies; 95+ messages in thread
From: Michael L. Semon @ 2013-06-20 17:24 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/19/2013 12:51 AM, Dave Chinner wrote:
> From: Jie Liu <jeff.liu@oracle.com>
>
> Add source files for xfs_log_rlimit.c The new file is used for log
> size calculations and validation shared with userspace.
>
> [dchinner: xfs_log_calc_max_attrsetm_res() does not modify the
> tr_attrsetm reservation, just calculates the maximum. ]
>
> [dchinner: rework loop in xfs_log_get_max_trans_res() ]
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/Makefile | 1 +
> fs/xfs/xfs_log_format.h | 8 ++-
> fs/xfs/xfs_log_rlimit.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 153 insertions(+), 1 deletion(-)
> create mode 100644 fs/xfs/xfs_log_rlimit.c
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index c5c38a9..22691eb 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -81,6 +81,7 @@ xfs-y += xfs_alloc.o \
> xfs_inode_fork.o \
> xfs_inode_buf.o \
> xfs_log_recover.o \
> + xfs_log_rlimit.o \
> xfs_sb.o \
> xfs_symlink_remote.o \
> xfs_trans_resv.o
> diff --git a/fs/xfs/xfs_log_format.h b/fs/xfs/xfs_log_format.h
> index 37a7ff9..8f46b6a 100644
> --- a/fs/xfs/xfs_log_format.h
> +++ b/fs/xfs/xfs_log_format.h
> @@ -1,5 +1,6 @@
> /*
> * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
> + * Copyright (c) 2013 Jie Liu.
> * All Rights Reserved.
> *
> * This program is free software; you can redistribute it and/or
> @@ -19,6 +20,7 @@
> #define __XFS_LOG_FORMAT_H__
>
> struct xfs_mount;
> +struct xfs_trans_res;
>
> typedef __uint32_t xlog_tid_t;
>
> @@ -41,6 +43,9 @@ typedef __uint32_t xlog_tid_t;
>
> #define XLOG_HEADER_SIZE 512
>
> +/* Minimum number of transactions that must fit in the log (defined by mkfs) */
> +#define XFS_MIN_LOG_FACTOR 3
> +
> #define XLOG_REC_SHIFT(log) \
> BTOBB(1 << (xfs_sb_version_haslogv2(&log->l_mp->m_sb) ? \
> XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT))
> @@ -125,7 +130,6 @@ typedef struct xlog_op_header {
> __u16 oh_res2; /* 32 bit align : 2 b */
> } xlog_op_header_t;
>
> -
> /* valid values for h_fmt */
> #define XLOG_FMT_UNKNOWN 0
> #define XLOG_FMT_LINUX_LE 1
> @@ -178,5 +182,7 @@ typedef struct xfs_log_iovec {
> } xfs_log_iovec_t;
>
> int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes);
> +int xfs_log_calc_minimum_size(struct xfs_mount *);
> +
>
> #endif /* __XFS_LOG_FORMAT_H__ */
> diff --git a/fs/xfs/xfs_log_rlimit.c b/fs/xfs/xfs_log_rlimit.c
> new file mode 100644
> index 0000000..e3f4b4e
> --- /dev/null
> +++ b/fs/xfs/xfs_log_rlimit.c
> @@ -0,0 +1,145 @@
> +/*
> + * Copyright (c) 2013 Jie Liu.
> + * All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_log.h"
> +#include "xfs_trans.h"
> +#include "xfs_ag.h"
> +#include "xfs_sb.h"
> +#include "xfs_mount.h"
> +#include "xfs_trans_space.h"
> +#include "xfs_bmap_btree.h"
> +#include "xfs_inode.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_attr_leaf.h"
> +
> +/*
> + * Calculate the maximum length in bytes that would be required for a local
> + * attribute value as large attributes out of line are not logged.
> + */
> +STATIC int
> +xfs_log_calc_max_attrsetm_res(
> + struct xfs_mount *mp)
> +{
> + int size;
> + int nblks;
> +
> + size = xfs_attr_leaf_entsize_local_max(mp->m_sb.sb_blocksize) -
> + MAXNAMELEN - 1;
> + nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
> + nblks += XFS_B_TO_FSB(mp, size);
> + nblks += XFS_NEXTENTADD_SPACE_RES(mp, size, XFS_ATTR_FORK);
> +
> + return M_RES(mp)->tr_attrsetm.tr_logres +
> + M_RES(mp)->tr_attrsetrt.tr_logres * nblks;
> +}
> +
> +/*
> + * Iterate over the log space reservation table to figure out and return
> + * the maximum one in terms of the pre-calculated values which were done
> + * at mount time.
> + */
> +STATIC void
> +xfs_log_get_max_trans_res(
> + struct xfs_mount *mp,
> + struct xfs_trans_res *max_resp)
> +{
> + struct xfs_trans_res *resp;
> + struct xfs_trans_res *end_resp;
> + int log_space = 0;
> + int attr_space;
> +
> + attr_space = xfs_log_calc_max_attrsetm_res(mp);
> +
> + resp = (struct xfs_trans_res *)M_RES(mp);
> + end_resp = (struct xfs_trans_res *)(M_RES(mp) + 1);
> + for (; resp < end_resp; resp++) {
> + int tmp = resp->tr_logcount > 1 ?
> + resp->tr_logres * resp->tr_logcount :
> + resp->tr_logres;
> + if (log_space < tmp) {
> + log_space = tmp;
> + *max_resp = *resp; /* struct copy */
> + }
> + }
> +
> + if (attr_space > log_space) {
> + *max_resp = M_RES(mp)->tr_attrsetm; /* struct copy */
> + max_resp->tr_logres = attr_space;
> + }
> +}
> +
> +/*
> + * Calculate the minimum valid log size for the given superblock configuration.
> + * Used to calculate the minimum log size at mkfs time, and to determine if
> + * the log is large enough or not at mount time. Returns the minimum size in
> + * filesystem block size units.
> + */
> +int
> +xfs_log_calc_minimum_size(
> + struct xfs_mount *mp)
> +{
> + struct xfs_trans_res tres = {0};
> + int max_logres;
> + int min_logblks = 0;
> + int lsunit = 0;
> +
> + xfs_log_get_max_trans_res(mp, &tres);
> +
> + max_logres = xfs_log_calc_unit_res(mp, tres.tr_logres);
> + if (tres.tr_logcount > 1)
> + max_logres *= tres.tr_logcount;
> +
> + if (xfs_sb_version_haslogv2(&mp->m_sb) && mp->m_sb.sb_logsunit > 1)
> + lsunit = BTOBB(mp->m_sb.sb_logsunit);
> +
> + /*
> + * Two factors should be taken into account for calculating the minimum
> + * log space.
> + * 1) The fundamental limitation is that no single transaction can be
> + * larger than half size of the log.
> + *
> + * From mkfs.xfs, this is considered by the XFS_MIN_LOG_FACTOR
> + * define, which is set to 3. That means we can definitely fit
> + * maximally sized 2 transactions in the log. We'll use this same
> + * value here.
> + *
> + * 2) If the lsunit option is specified, a transaction requires 2 LSU
> + * for the reservation because there are two log writes that can
> + * require padding - the transaction data and the commit record which
> + * are written separately and both can require padding to the LSU.
> + * Consider that we can have an active CIL reservation holding 2*LSU,
> + * but the CIL is not over a push threshold, in this case, if we
> + * don't have enough log space for at one new transaction, which
> + * includes another 2*LSU in the reservation, we will run into dead
> + * loop situation in log space grant procedure. i.e.
> + * xlog_grant_head_wait().
> + *
> + * Hence the log size needs to be able to contain two maximally sized
> + * and padded transactions, which is (2 * (2 * LSU + maxlres)).
> + *
> + * Also, the log size should be a multiple of the log stripe unit, round
> + * it up to lsunit boundary if lsunit is specified.
> + */
> + if (lsunit)
> + min_logblks = roundup(BTOBB(max_logres), lsunit) + 2 * lsunit;
> + else
> + min_logblks = BTOBB(max_logres);
> + min_logblks *= XFS_MIN_LOG_FACTOR;
> + return XFS_BB_TO_FSB(mp, min_logblks);
> +}
>
OK, my overall patch for this patch series is after my closing and may
be wrong. I applied Dave's patchset over a yet-newer git kernel and
may have gotten burned a little bit. It may not even be helpful, as
xfs/297 gets a new assert on a non-CRC 512b-block filesystem, as
opposed to grinding to a halt. [xfs/297 still passes for filesystems
that have a 4k block size.]
The point of this reply is to call on Jeff Liu and ask this: Do you
have a more recent and accurate fix to this code than what I tried to
remember? Or did the fix not work on platforms other than 32-bit x86?
Thanks!
Michael
>From f2a4759c5ed2011daed14b3b44f5eee2e317dbf6 Mon Sep 17 00:00:00 2001
From: "Michael L. Semon" <mlsemon35@gmail.com>
Date: Thu, 20 Jun 2013 03:40:30 -0400
Subject: [PATCH] Residual fixes for 2013-06-19 XFS 3.11 patchset
This patch represents fixing a merge conflict between the functions
xfs_inactive_symlink() and xfs_inactive_symlink_rmt() from a patch
that `git am` did not take cleanly. Sysadmin hack. Please check.
This patch also uses an int cast to make the build not fail
with an undefined reference to _udivdi3. This issue is somewhat
documented near the roundup() function in include/linux/kernel.h
as applying to gcc-3.3, but it also seems to apply to gcc-4.8.1.
It is a Jeff Liu fix that might have gotten lost in the mix.
Reconstructed from memory.
Signed-off-by: Michael L. Semon <mlsemon35@gmail.com>
---
fs/xfs/xfs_inode_ops.c | 2 +-
fs/xfs/xfs_log_rlimit.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_inode_ops.c b/fs/xfs/xfs_inode_ops.c
index c551a8a..6cfab9f 100644
--- a/fs/xfs/xfs_inode_ops.c
+++ b/fs/xfs/xfs_inode_ops.c
@@ -1496,7 +1496,7 @@ xfs_inactive(
* Zero length symlinks _can_ exist.
*/
if (ip->i_d.di_size > XFS_IFORK_DSIZE(ip)) {
- error = xfs_inactive_symlink_rmt(ip, &tp);
+ error = xfs_inactive_symlink(ip, &tp);
if (error)
goto out_cancel;
} else if (ip->i_df.if_bytes > 0) {
diff --git a/fs/xfs/xfs_log_rlimit.c b/fs/xfs/xfs_log_rlimit.c
index e3f4b4e..003ea89 100644
--- a/fs/xfs/xfs_log_rlimit.c
+++ b/fs/xfs/xfs_log_rlimit.c
@@ -137,7 +137,7 @@ xfs_log_calc_minimum_size(
* it up to lsunit boundary if lsunit is specified.
*/
if (lsunit)
- min_logblks = roundup(BTOBB(max_logres), lsunit) + 2 * lsunit;
+ min_logblks = roundup((int)BTOBB(max_logres), lsunit) + 2 * lsunit;
else
min_logblks = BTOBB(max_logres);
min_logblks *= XFS_MIN_LOG_FACTOR;
--
1.8.3
root@plbearer:/var/lib/xfstests# MKFS_OPTIONS='-b log=9' ./check -xfs xfs/297
FSTYP -- xfs (debug)
PLATFORM -- Linux/i686 plbearer 3.10.0-rc6+
MKFS_OPTIONS -- -f -b log=9 /dev/sdb6
MOUNT_OPTIONS -- /dev/sdb6 /mnt/xfstests-scratch
xfs/297 216s ...[ 2262.188646] XFS: Assertion failed: BTOBB(need_bytes) < log->l_logBBsize, file: fs/xfs/xfs_log.c, line: 1498
[ 2262.189898] ------------[ cut here ]------------
[ 2262.190017] kernel BUG at fs/xfs/xfs_message.c:108!
[ 2262.190017] invalid opcode: 0000 [#1]
[ 2262.190017] CPU: 0 PID: 11388 Comm: mkdir Not tainted 3.10.0-rc6+ #5
[ 2262.190017] Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
[ 2262.190017] task: eebe8000 ti: d91fa000 task.ti: d91fa000
[ 2262.190017] EIP: 0060:[<c11b3454>] EFLAGS: 00010282 CPU: 0
[ 2262.190017] EIP is at assfail+0x2b/0x2d
[ 2262.190017] EAX: 0000005f EBX: d961e600 ECX: 000002e1 EDX: eebe84a0
[ 2262.190017] ESI: 00000dcb EDI: 00000000 EBP: d91fbe0c ESP: d91fbdf8
[ 2262.190017] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 2262.190017] CR0: 8005003b CR2: b768a2f0 CR3: 2a4de000 CR4: 000007d0
[ 2262.190017] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 2262.190017] DR6: ffff0ff0 DR7: 00000400
[ 2262.190017] Stack:
[ 2262.190017] 00000000 c16f3f10 c16fbcd4 c16e9391 000005da d91fbe28 c12006bd 00000069
[ 2262.190017] 000111b8 00000069 00000003 000111b8 d91fbe60 c12022bc 00000069 00000001
[ 2262.190017] 00000009 eaba0bb0 d91fbe60 c11b496d 00000001 d961e600 00000000 df116e88
[ 2262.190017] Call Trace:
[ 2262.190017] [<c12006bd>] xlog_grant_push_ail+0x4e/0xf2
[ 2262.190017] [<c12022bc>] xfs_log_reserve+0xc8/0x290
[ 2262.190017] [<c11b496d>] ? xfs_mod_incore_sb+0x46/0x4f
[ 2262.190017] [<c11b9741>] xfs_trans_reserve+0x295/0x2a3
[ 2262.190017] [<c11a8bcf>] xfs_create+0x151/0x57f
[ 2262.190017] [<c10e5788>] ? kern_path_create+0x8b/0x118
[ 2262.190017] [<c11b1146>] xfs_vn_mknod+0x94/0x15f
[ 2262.190017] [<c11b122d>] ? xfs_vn_create+0x1c/0x1c
[ 2262.190017] [<c11b124a>] xfs_vn_mkdir+0x1d/0x1f
[ 2262.190017] [<c10e5be4>] vfs_mkdir+0x75/0x10c
[ 2262.190017] [<c10e2618>] ? putname+0x23/0x32
[ 2262.190017] [<c10e5cd6>] SyS_mkdirat+0x5b/0xab
[ 2262.190017] [<c10e5d4c>] SyS_mkdir+0x26/0x28
[ 2262.190017] [<c156f710>] syscall_call+0x7/0xb
[ 2262.190017] Code: 55 89 e5 83 ec 14 3e 8d 74 26 00 89 4c 24 10 89 54 24 0c 89 44 24 08 c7 44 24 04 10 3f 6f c1 c7 04 24 00 00 00 00 e8 ad fd ff ff <0f> 0b 55 89 e5 83 ec 14 3e 8d 74 26 00 c7 44 24 10 01 00 00 00
[ 2262.190017] EIP: [<c11b3454>] assfail+0x2b/0x2d SS:ESP 0068:d91fbdf8
[ 2262.236310] ---[ end trace fb9dc01903f0ec11 ]---
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 59/60] xfs: Add xfs_log_rlimit.c
2013-06-19 4:51 ` [PATCH 59/60] xfs: Add xfs_log_rlimit.c Dave Chinner
2013-06-20 17:24 ` Michael L. Semon
@ 2013-06-21 6:10 ` Michael L. Semon
2013-06-24 21:26 ` Mark Tinguely
2 siblings, 0 replies; 95+ messages in thread
From: Michael L. Semon @ 2013-06-21 6:10 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
I'm still looking for Jeff's input on this particular patch. However, I
did want to report success on xfs/297. The Pentium 4 test PC was left to
rest for a few hours, then it was started again. The latest xfsprogs
patchset was applied to xfsprogs.
512b blocks, internal logdev, no CRC, projid32bit=1, tracers on: PASS
512b blocks, internal logdev, no CRC, projid32bit=0, tracers on: PASS
512b blocks, internal logdev, no CRC, projid32bit=0, tracers off: PASS
1k blocks, internal logdev, no CRC, tracers off: PASS
...rebooted to avoid RT throttling slowing things down...
1k blocks, internal logdev, CRC, tracers off: PASS
2k blocks, internal logdev, CRC, tracers off: PASS
2k blocks, external logdev, CRC, rtdev, tracers off: PASS
4k blocks, default mkfs.xfs: PASS
Something was fixed between the new xfsprogs and the PC's chance to reset
100%. What? I really don't know! But there is success.
Great job, Dave and Jeff!
Michael
On 06/19/2013 12:51 AM, Dave Chinner wrote:
> From: Jie Liu <jeff.liu@oracle.com>
>
> Add source files for xfs_log_rlimit.c The new file is used for log
> size calculations and validation shared with userspace.
>
> [dchinner: xfs_log_calc_max_attrsetm_res() does not modify the
> tr_attrsetm reservation, just calculates the maximum. ]
>
> [dchinner: rework loop in xfs_log_get_max_trans_res() ]
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/Makefile | 1 +
> fs/xfs/xfs_log_format.h | 8 ++-
> fs/xfs/xfs_log_rlimit.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 153 insertions(+), 1 deletion(-)
> create mode 100644 fs/xfs/xfs_log_rlimit.c
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index c5c38a9..22691eb 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -81,6 +81,7 @@ xfs-y += xfs_alloc.o \
> xfs_inode_fork.o \
> xfs_inode_buf.o \
> xfs_log_recover.o \
> + xfs_log_rlimit.o \
> xfs_sb.o \
> xfs_symlink_remote.o \
> xfs_trans_resv.o
> diff --git a/fs/xfs/xfs_log_format.h b/fs/xfs/xfs_log_format.h
> index 37a7ff9..8f46b6a 100644
> --- a/fs/xfs/xfs_log_format.h
> +++ b/fs/xfs/xfs_log_format.h
> @@ -1,5 +1,6 @@
> /*
> * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
> + * Copyright (c) 2013 Jie Liu.
> * All Rights Reserved.
> *
> * This program is free software; you can redistribute it and/or
> @@ -19,6 +20,7 @@
> #define __XFS_LOG_FORMAT_H__
>
> struct xfs_mount;
> +struct xfs_trans_res;
>
> typedef __uint32_t xlog_tid_t;
>
> @@ -41,6 +43,9 @@ typedef __uint32_t xlog_tid_t;
>
> #define XLOG_HEADER_SIZE 512
>
> +/* Minimum number of transactions that must fit in the log (defined by mkfs) */
> +#define XFS_MIN_LOG_FACTOR 3
> +
> #define XLOG_REC_SHIFT(log) \
> BTOBB(1 << (xfs_sb_version_haslogv2(&log->l_mp->m_sb) ? \
> XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT))
> @@ -125,7 +130,6 @@ typedef struct xlog_op_header {
> __u16 oh_res2; /* 32 bit align : 2 b */
> } xlog_op_header_t;
>
> -
> /* valid values for h_fmt */
> #define XLOG_FMT_UNKNOWN 0
> #define XLOG_FMT_LINUX_LE 1
> @@ -178,5 +182,7 @@ typedef struct xfs_log_iovec {
> } xfs_log_iovec_t;
>
> int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes);
> +int xfs_log_calc_minimum_size(struct xfs_mount *);
> +
>
> #endif /* __XFS_LOG_FORMAT_H__ */
> diff --git a/fs/xfs/xfs_log_rlimit.c b/fs/xfs/xfs_log_rlimit.c
> new file mode 100644
> index 0000000..e3f4b4e
> --- /dev/null
> +++ b/fs/xfs/xfs_log_rlimit.c
> @@ -0,0 +1,145 @@
> +/*
> + * Copyright (c) 2013 Jie Liu.
> + * All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_log.h"
> +#include "xfs_trans.h"
> +#include "xfs_ag.h"
> +#include "xfs_sb.h"
> +#include "xfs_mount.h"
> +#include "xfs_trans_space.h"
> +#include "xfs_bmap_btree.h"
> +#include "xfs_inode.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_attr_leaf.h"
> +
> +/*
> + * Calculate the maximum length in bytes that would be required for a local
> + * attribute value as large attributes out of line are not logged.
> + */
> +STATIC int
> +xfs_log_calc_max_attrsetm_res(
> + struct xfs_mount *mp)
> +{
> + int size;
> + int nblks;
> +
> + size = xfs_attr_leaf_entsize_local_max(mp->m_sb.sb_blocksize) -
> + MAXNAMELEN - 1;
> + nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
> + nblks += XFS_B_TO_FSB(mp, size);
> + nblks += XFS_NEXTENTADD_SPACE_RES(mp, size, XFS_ATTR_FORK);
> +
> + return M_RES(mp)->tr_attrsetm.tr_logres +
> + M_RES(mp)->tr_attrsetrt.tr_logres * nblks;
> +}
> +
> +/*
> + * Iterate over the log space reservation table to figure out and return
> + * the maximum one in terms of the pre-calculated values which were done
> + * at mount time.
> + */
> +STATIC void
> +xfs_log_get_max_trans_res(
> + struct xfs_mount *mp,
> + struct xfs_trans_res *max_resp)
> +{
> + struct xfs_trans_res *resp;
> + struct xfs_trans_res *end_resp;
> + int log_space = 0;
> + int attr_space;
> +
> + attr_space = xfs_log_calc_max_attrsetm_res(mp);
> +
> + resp = (struct xfs_trans_res *)M_RES(mp);
> + end_resp = (struct xfs_trans_res *)(M_RES(mp) + 1);
> + for (; resp < end_resp; resp++) {
> + int tmp = resp->tr_logcount > 1 ?
> + resp->tr_logres * resp->tr_logcount :
> + resp->tr_logres;
> + if (log_space < tmp) {
> + log_space = tmp;
> + *max_resp = *resp; /* struct copy */
> + }
> + }
> +
> + if (attr_space > log_space) {
> + *max_resp = M_RES(mp)->tr_attrsetm; /* struct copy */
> + max_resp->tr_logres = attr_space;
> + }
> +}
> +
> +/*
> + * Calculate the minimum valid log size for the given superblock configuration.
> + * Used to calculate the minimum log size at mkfs time, and to determine if
> + * the log is large enough or not at mount time. Returns the minimum size in
> + * filesystem block size units.
> + */
> +int
> +xfs_log_calc_minimum_size(
> + struct xfs_mount *mp)
> +{
> + struct xfs_trans_res tres = {0};
> + int max_logres;
> + int min_logblks = 0;
> + int lsunit = 0;
> +
> + xfs_log_get_max_trans_res(mp, &tres);
> +
> + max_logres = xfs_log_calc_unit_res(mp, tres.tr_logres);
> + if (tres.tr_logcount > 1)
> + max_logres *= tres.tr_logcount;
> +
> + if (xfs_sb_version_haslogv2(&mp->m_sb) && mp->m_sb.sb_logsunit > 1)
> + lsunit = BTOBB(mp->m_sb.sb_logsunit);
> +
> + /*
> + * Two factors should be taken into account for calculating the minimum
> + * log space.
> + * 1) The fundamental limitation is that no single transaction can be
> + * larger than half size of the log.
> + *
> + * From mkfs.xfs, this is considered by the XFS_MIN_LOG_FACTOR
> + * define, which is set to 3. That means we can definitely fit
> + * maximally sized 2 transactions in the log. We'll use this same
> + * value here.
> + *
> + * 2) If the lsunit option is specified, a transaction requires 2 LSU
> + * for the reservation because there are two log writes that can
> + * require padding - the transaction data and the commit record which
> + * are written separately and both can require padding to the LSU.
> + * Consider that we can have an active CIL reservation holding 2*LSU,
> + * but the CIL is not over a push threshold, in this case, if we
> + * don't have enough log space for at one new transaction, which
> + * includes another 2*LSU in the reservation, we will run into dead
> + * loop situation in log space grant procedure. i.e.
> + * xlog_grant_head_wait().
> + *
> + * Hence the log size needs to be able to contain two maximally sized
> + * and padded transactions, which is (2 * (2 * LSU + maxlres)).
> + *
> + * Also, the log size should be a multiple of the log stripe unit, round
> + * it up to lsunit boundary if lsunit is specified.
> + */
> + if (lsunit)
> + min_logblks = roundup(BTOBB(max_logres), lsunit) + 2 * lsunit;
> + else
> + min_logblks = BTOBB(max_logres);
> + min_logblks *= XFS_MIN_LOG_FACTOR;
> + return XFS_BB_TO_FSB(mp, min_logblks);
> +}
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [PATCH 59/60] xfs: Add xfs_log_rlimit.c
2013-06-19 4:51 ` [PATCH 59/60] xfs: Add xfs_log_rlimit.c Dave Chinner
2013-06-20 17:24 ` Michael L. Semon
2013-06-21 6:10 ` Michael L. Semon
@ 2013-06-24 21:26 ` Mark Tinguely
2013-06-24 22:27 ` Dave Chinner
2 siblings, 1 reply; 95+ messages in thread
From: Mark Tinguely @ 2013-06-24 21:26 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/18/13 23:51, Dave Chinner wrote:
> + * 2) If the lsunit option is specified, a transaction requires 2 LSU
> + * for the reservation because there are two log writes that can
> + * require padding - the transaction data and the commit record which
> + * are written separately and both can require padding to the LSU.
> + * Consider that we can have an active CIL reservation holding 2*LSU,
> + * but the CIL is not over a push threshold, in this case, if we
> + * don't have enough log space for at one new transaction, which
> + * includes another 2*LSU in the reservation, we will run into dead
> + * loop situation in log space grant procedure. i.e.
> + * xlog_grant_head_wait().
> + *
> + * Hence the log size needs to be able to contain two maximally sized
> + * and padded transactions, which is (2 * (2 * LSU + maxlres)).
> + *
Any thoughts on how we can separate the 2 * log stripe unit from the
reservation.
The added extended attribute calls for parent inode pointers (especially
xfs_rename() where it could add up to one and remove up to two
attributes) is causing a huge multiplication cnt for reservation. Those
multiplications would be killers on 256KiB log stripe units.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 59/60] xfs: Add xfs_log_rlimit.c
2013-06-24 21:26 ` Mark Tinguely
@ 2013-06-24 22:27 ` Dave Chinner
2013-06-25 14:06 ` Mark Tinguely
0 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-24 22:27 UTC (permalink / raw)
To: Mark Tinguely; +Cc: xfs
On Mon, Jun 24, 2013 at 04:26:28PM -0500, Mark Tinguely wrote:
> On 06/18/13 23:51, Dave Chinner wrote:
> >+ * 2) If the lsunit option is specified, a transaction requires 2 LSU
> >+ * for the reservation because there are two log writes that can
> >+ * require padding - the transaction data and the commit record which
> >+ * are written separately and both can require padding to the LSU.
> >+ * Consider that we can have an active CIL reservation holding 2*LSU,
> >+ * but the CIL is not over a push threshold, in this case, if we
> >+ * don't have enough log space for at one new transaction, which
> >+ * includes another 2*LSU in the reservation, we will run into dead
> >+ * loop situation in log space grant procedure. i.e.
> >+ * xlog_grant_head_wait().
> >+ *
> >+ * Hence the log size needs to be able to contain two maximally sized
> >+ * and padded transactions, which is (2 * (2 * LSU + maxlres)).
> >+ *
>
> Any thoughts on how we can separate the 2 * log stripe unit from the
> reservation.
You can't. The reservation, by definition, is the worse case log
space usage for the transaction. Therefore, it has to take into
account the LSU padding that may be necessary if this transaction is
committed by itself to the log (e.g. wsync operation)
> The added extended attribute calls for parent inode pointers
> (especially xfs_rename() where it could add up to one and remove up
> to two attributes) is causing a huge multiplication cnt for
> reservation.
So you are adding a attribute reservations to
create/mkdir/rename/link transaction reservations? That doesn't seem
like a good idea, and it's contrary to the direction we want to head
with the transaction subsystem. Can you post your patches so we
some some context to the question you are asking?
FWIW, the intended method of linking multiple independent
transactions together for parent pointers into an atomic commit is
this:
http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Atomic_Multi-Transaction_Operations
While the text is somewhat out of date (talks about rolling
transactions and being able to replace them), it predated the
delayed logging implementation and hence doesn't take into account
the obvious, simple extension that can be made to the CIL to
implement this. i.e. being able to hold the current CIL context
open for a number of transactions to take place before releasing it
again.
The point of doing this is still relevant, however:
"This may also allow us to split some of the large compound
transactions into smaller, more self contained transactions. This
would reduce reservation pressure on log space in the common case
where all the corner cases in the transactions are not taken."
IOWs, rather than building new, large "aggregation" transactions, we
should be adding infrastructure to the CIL to allow atomic
transaction aggregation techniques to be used and then just using
the existing transaction infrastructure to implement operations such
as "create with ACL"....
FYI, we need this for all the operations that add attributes at
create time, such as default ACLs and DMAPI/DMF attributes....
> Those multiplications would be killers on 256KiB log
> stripe units.
Numbers, please?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 59/60] xfs: Add xfs_log_rlimit.c
2013-06-24 22:27 ` Dave Chinner
@ 2013-06-25 14:06 ` Mark Tinguely
2013-06-26 4:05 ` Dave Chinner
0 siblings, 1 reply; 95+ messages in thread
From: Mark Tinguely @ 2013-06-25 14:06 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/24/13 17:27, Dave Chinner wrote:
> On Mon, Jun 24, 2013 at 04:26:28PM -0500, Mark Tinguely wrote:
>> On 06/18/13 23:51, Dave Chinner wrote:
>>> + * 2) If the lsunit option is specified, a transaction requires 2 LSU
>>> + * for the reservation because there are two log writes that can
>>> + * require padding - the transaction data and the commit record which
>>> + * are written separately and both can require padding to the LSU.
>>> + * Consider that we can have an active CIL reservation holding 2*LSU,
>>> + * but the CIL is not over a push threshold, in this case, if we
>>> + * don't have enough log space for at one new transaction, which
>>> + * includes another 2*LSU in the reservation, we will run into dead
>>> + * loop situation in log space grant procedure. i.e.
>>> + * xlog_grant_head_wait().
>>> + *
>>> + * Hence the log size needs to be able to contain two maximally sized
>>> + * and padded transactions, which is (2 * (2 * LSU + maxlres)).
>>> + *
>>
>> Any thoughts on how we can separate the 2 * log stripe unit from the
>> reservation.
>
> You can't. The reservation, by definition, is the worse case log
> space usage for the transaction. Therefore, it has to take into
> account the LSU padding that may be necessary if this transaction is
> committed by itself to the log (e.g. wsync operation)
I am thinking we should separate the extra 2*LSU allocated space from
the individual pieces of the transaction (caused by xfs_trans_rolls and
xfs_bmap_finish) and associate it with the transaction. Sync writes are
for the transaction anyway and not the pieces. It is the multiplication
of the (2*LSU) by each piece of the transaction that is the killer.
A mkdir will have 1.5MB of log reserved space just for the possible LSU
padding.
The cil can steal the left over current ticket space and the global LSU
space.
>
>> The added extended attribute calls for parent inode pointers
>> (especially xfs_rename() where it could add up to one and remove up
>> to two attributes) is causing a huge multiplication cnt for
>> reservation.
>
> So you are adding a attribute reservations to
> create/mkdir/rename/link transaction reservations? That doesn't seem
> like a good idea, and it's contrary to the direction we want to head
> with the transaction subsystem. Can you post your patches so we
> some some context to the question you are asking?
>
> FWIW, the intended method of linking multiple independent
> transactions together for parent pointers into an atomic commit is
> this:
>
> http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Atomic_Multi-Transaction_Operations
>
> While the text is somewhat out of date (talks about rolling
> transactions and being able to replace them), it predated the
> delayed logging implementation and hence doesn't take into account
> the obvious, simple extension that can be made to the CIL to
> implement this. i.e. being able to hold the current CIL context
> open for a number of transactions to take place before releasing it
> again.
>
> The point of doing this is still relevant, however:
>
> "This may also allow us to split some of the large compound
> transactions into smaller, more self contained transactions. This
> would reduce reservation pressure on log space in the common case
> where all the corner cases in the transactions are not taken."
>
> IOWs, rather than building new, large "aggregation" transactions, we
> should be adding infrastructure to the CIL to allow atomic
> transaction aggregation techniques to be used and then just using
> the existing transaction infrastructure to implement operations such
> as "create with ACL"....
I will have to think on this. I don't see how the allocation of log
space for a new transaction while holding locks is a good thing.
>
> FYI, we need this for all the operations that add attributes at
> create time, such as default ACLs and DMAPI/DMF attributes....
>
>> Those multiplications would be killers on 256KiB log
>> stripe units.
>
> Numbers, please?
>
> Cheers,
>
> Dave.
/*
* Various log count values.
*/
#define XFS_DEFAULT_LOG_COUNT 1
#define XFS_DEFAULT_PERM_LOG_COUNT 2
#define XFS_ITRUNCATE_LOG_COUNT 2
#define XFS_INACTIVE_LOG_COUNT 2
#define XFS_CREATE_LOG_COUNT 2
#define XFS_MKDIR_LOG_COUNT 3
#define XFS_SYMLINK_LOG_COUNT 3
#define XFS_REMOVE_LOG_COUNT 2
#define XFS_LINK_LOG_COUNT 2
#define XFS_RENAME_LOG_COUNT 2
#define XFS_WRITE_LOG_COUNT 2
#define XFS_ADDAFORK_LOG_COUNT 2
#define XFS_ATTRINVAL_LOG_COUNT 1
#define XFS_ATTRSET_LOG_COUNT 3
#define XFS_ATTRRM_LOG_COUNT 3
Even if you are creative, the multipliers add up quickly.
xfs_rename will do the directory ops. possibly a attibset and up to 2
attrrm and we have to hold the src and target inodes locks over all the
operations.
Thanks for the ideas.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 59/60] xfs: Add xfs_log_rlimit.c
2013-06-25 14:06 ` Mark Tinguely
@ 2013-06-26 4:05 ` Dave Chinner
2013-06-26 13:48 ` Mark Tinguely
0 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-26 4:05 UTC (permalink / raw)
To: Mark Tinguely; +Cc: xfs
On Tue, Jun 25, 2013 at 09:06:39AM -0500, Mark Tinguely wrote:
> On 06/24/13 17:27, Dave Chinner wrote:
> >On Mon, Jun 24, 2013 at 04:26:28PM -0500, Mark Tinguely wrote:
> >>On 06/18/13 23:51, Dave Chinner wrote:
> >>>+ * 2) If the lsunit option is specified, a transaction requires 2 LSU
> >>>+ * for the reservation because there are two log writes that can
> >>>+ * require padding - the transaction data and the commit record which
> >>>+ * are written separately and both can require padding to the LSU.
> >>>+ * Consider that we can have an active CIL reservation holding 2*LSU,
> >>>+ * but the CIL is not over a push threshold, in this case, if we
> >>>+ * don't have enough log space for at one new transaction, which
> >>>+ * includes another 2*LSU in the reservation, we will run into dead
> >>>+ * loop situation in log space grant procedure. i.e.
> >>>+ * xlog_grant_head_wait().
> >>>+ *
> >>>+ * Hence the log size needs to be able to contain two maximally sized
> >>>+ * and padded transactions, which is (2 * (2 * LSU + maxlres)).
> >>>+ *
> >>
> >>Any thoughts on how we can separate the 2 * log stripe unit from the
> >>reservation.
> >
> >You can't. The reservation, by definition, is the worse case log
> >space usage for the transaction. Therefore, it has to take into
> >account the LSU padding that may be necessary if this transaction is
> >committed by itself to the log (e.g. wsync operation)
>
> I am thinking we should separate the extra 2*LSU allocated space
> from the individual pieces of the transaction
Why? How are you going to prevent log space deadlocks if we don't
reserve that space for each individual transaction in a permanent
transaction reservation?
> (caused by
> xfs_trans_rolls and xfs_bmap_finish) and associate it with the
> transaction. Sync writes are for the transaction anyway and not the
> pieces.
Every transaction commit that is executed can require LSU padding
because it could be the transaction the CIL steals the required LSU
reservation from. When CIL flushes occur - and hence when the LSU
padding needs to be stolen from a transaction commit - is not under
the control of the currently executing transaction.
> It is the multiplication of the (2*LSU) by each piece of the
> transaction that is the killer.
> A mkdir will have 1.5MB of log reserved space just for the possible
> LSU padding.
A killer for what, exactly? This is the current status quo, and I
don't see people having performance problems related to it....
> The cil can steal the left over current ticket space and the global
> LSU space.
The space is avilable only if the current transaction _unit
reservation_ has it reserved. Hence every unit reservation needs
that space to be reserved.
I'm not saying this is optimal - it is an architectural requirement
that the log has always had and therefore *necessary*.
> >>The added extended attribute calls for parent inode pointers
> >>(especially xfs_rename() where it could add up to one and remove up
> >>to two attributes) is causing a huge multiplication cnt for
> >>reservation.
> >
> >So you are adding a attribute reservations to
> >create/mkdir/rename/link transaction reservations? That doesn't seem
> >like a good idea, and it's contrary to the direction we want to head
> >with the transaction subsystem. Can you post your patches so we
> >some some context to the question you are asking?
> >
> >FWIW, the intended method of linking multiple independent
> >transactions together for parent pointers into an atomic commit is
> >this:
> >
> >http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Atomic_Multi-Transaction_Operations
> >
> >While the text is somewhat out of date (talks about rolling
> >transactions and being able to replace them), it predated the
> >delayed logging implementation and hence doesn't take into account
> >the obvious, simple extension that can be made to the CIL to
> >implement this. i.e. being able to hold the current CIL context
> >open for a number of transactions to take place before releasing it
> >again.
> >
> >The point of doing this is still relevant, however:
> >
> >"This may also allow us to split some of the large compound
> >transactions into smaller, more self contained transactions. This
> >would reduce reservation pressure on log space in the common case
> >where all the corner cases in the transactions are not taken."
> >
> >IOWs, rather than building new, large "aggregation" transactions, we
> >should be adding infrastructure to the CIL to allow atomic
> >transaction aggregation techniques to be used and then just using
> >the existing transaction infrastructure to implement operations such
> >as "create with ACL"....
>
> I will have to think on this. I don't see how the allocation of log
> space for a new transaction while holding locks is a good thing.
Who said anything about requiring new locks? :)
The initial implementation I was thinking of is basically a
reference counter and a method for delaying CIL pushes until the
reference count drops to zero.
Indeed, if we push the aggregated space reservations up into this
aggregated transaction, the individual transactions can just pull
out of the space we already know is available to use. IOWs, it's
relatively trivial to do in a way that leaves open many avenues for
later optimisation....
> >FYI, we need this for all the operations that add attributes at
> >create time, such as default ACLs and DMAPI/DMF attributes....
> >
> >>Those multiplications would be killers on 256KiB log
> >>stripe units.
> >
> >Numbers, please?
> >
> >Cheers,
> >
> >Dave.
>
> /*
> * Various log count values.
> */
> #define XFS_DEFAULT_LOG_COUNT 1
> #define XFS_DEFAULT_PERM_LOG_COUNT 2
> #define XFS_ITRUNCATE_LOG_COUNT 2
> #define XFS_INACTIVE_LOG_COUNT 2
> #define XFS_CREATE_LOG_COUNT 2
> #define XFS_MKDIR_LOG_COUNT 3
> #define XFS_SYMLINK_LOG_COUNT 3
> #define XFS_REMOVE_LOG_COUNT 2
> #define XFS_LINK_LOG_COUNT 2
> #define XFS_RENAME_LOG_COUNT 2
> #define XFS_WRITE_LOG_COUNT 2
> #define XFS_ADDAFORK_LOG_COUNT 2
> #define XFS_ATTRINVAL_LOG_COUNT 1
> #define XFS_ATTRSET_LOG_COUNT 3
> #define XFS_ATTRRM_LOG_COUNT 3
>
> Even if you are creative, the multipliers add up quickly.
> xfs_rename will do the directory ops. possibly a attibset and up to
> 2 attrrm and we have to hold the src and target inodes locks over
> all the operations.
We don't have to hold them locked across the entire operation
because the VFS is holding them locked so nothing else can do a
namespace operation which we are modifying them.
Anyway, you're worried about multiplying out the existing
reservations, but that's really not that big of a deal - it's just
log space, and we've got lots of that to use, and we have plenty of
avenues to optimise usage in future..
e.g. mkdir is a compound transaction that is really 2 different
operations. The first is physical inode allocation, then second is
allocating a free inode. the first only happens for 1 in every 64
free inode allocations (rarer if we are also removing inodes, too).
So most of the time we are reserving far more space in the log that
we actually need to allocate an inode.
The actual unit reservation for the create is the maximum of the two
individual component reservations times the number of commits that
will be required, and the total is the unit reservation times the
number of commits needed. So for a create transaction, it's an
overestimation of the worst case.
Is that a problem? No. Can it be improved? Yes - see the comments
about splitting out physical inode chunk allocation into the
background here:
http://xfs.org/index.php/Improving_inode_Caching#Contiguous_Inode_Allocation
And suddenly we end up with mkdir/create/etc only requiring a single
transaction reservation. There goes one of your 2*LSU paddings...
What about attributes? The triple-stage transaction is only required
when -replacing- an existing attribute. The first adds the new
attribute, the second flips the complete/incomplete flags, and the
third removes the old attributes. Just adding a new attribute only
requires a single transaction and most attributes are never
overwritten, so again we are almost always reserving far too much
space for such an operation.
Can we optimise this? Yes, we can. Let's disaggregate it - if we are
replacing attribute content without changing the size or name, then
we could just add a new transaction that logs a direct overwrite.
That's a *tiny* transaction. If we are replacing an attribute of the
same size with a new name, then it's a little more complex as we
have to manipulate the hash index entries, but that is still a
single, simple transaction. If we are adding, knowing we aren't
doing a replacment, then that is a single transaction, too.
Start to see how this greatly reduces the overall operational
transaction reservations? We can build bigger and more complex
transactions as part of the process, but optimisation of the
reservations comes from separation into smaller, more fine-grained
transactions and reservations.
Keep in mind that from this perspective, I'm quite happy for an
initial parent pointer implementation to start with gigantic
create+attr/rename+attr transaction reservations. I'm also happy for
it to take us 2-3 years to optimise/disaggegate the transactions to
reduce the overhead to be smaller and more efficient.
I don't expect the initial implementation to be perfect or even
optimal - attempting to make it so puts us right into prematuve
optimisation territory. Make parent pointers robust first, then we
can worry about where we can optimise reservations down to the
smallest possible subset.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 59/60] xfs: Add xfs_log_rlimit.c
2013-06-26 4:05 ` Dave Chinner
@ 2013-06-26 13:48 ` Mark Tinguely
2013-06-26 22:18 ` Dave Chinner
0 siblings, 1 reply; 95+ messages in thread
From: Mark Tinguely @ 2013-06-26 13:48 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 06/25/13 23:05, Dave Chinner wrote:
> On Tue, Jun 25, 2013 at 09:06:39AM -0500, Mark Tinguely wrote:
>> On 06/24/13 17:27, Dave Chinner wrote:
>>> On Mon, Jun 24, 2013 at 04:26:28PM -0500, Mark Tinguely wrote:
>>>> On 06/18/13 23:51, Dave Chinner wrote:
>>>>> + * 2) If the lsunit option is specified, a transaction requires 2 LSU
>>>>> + * for the reservation because there are two log writes that can
>>>>> + * require padding - the transaction data and the commit record which
>>>>> + * are written separately and both can require padding to the LSU.
>>>>> + * Consider that we can have an active CIL reservation holding 2*LSU,
>>>>> + * but the CIL is not over a push threshold, in this case, if we
>>>>> + * don't have enough log space for at one new transaction, which
>>>>> + * includes another 2*LSU in the reservation, we will run into dead
>>>>> + * loop situation in log space grant procedure. i.e.
>>>>> + * xlog_grant_head_wait().
>>>>> + *
>>>>> + * Hence the log size needs to be able to contain two maximally sized
>>>>> + * and padded transactions, which is (2 * (2 * LSU + maxlres)).
>>>>> + *
>>>>
>>>> Any thoughts on how we can separate the 2 * log stripe unit from the
>>>> reservation.
>>>
>>> You can't. The reservation, by definition, is the worse case log
>>> space usage for the transaction. Therefore, it has to take into
>>> account the LSU padding that may be necessary if this transaction is
>>> committed by itself to the log (e.g. wsync operation)
>>
>> I am thinking we should separate the extra 2*LSU allocated space
>> from the individual pieces of the transaction
>
> Why? How are you going to prevent log space deadlocks if we don't
> reserve that space for each individual transaction in a permanent
> transaction reservation?
Allocate 2 per transaction, not one per segment of a transaction.
It may be a wash for 2 or 3 segments per transaction, but would help if
there are 10 segments per transaction.
>
>> (caused by
>> xfs_trans_rolls and xfs_bmap_finish) and associate it with the
>> transaction. Sync writes are for the transaction anyway and not the
>> pieces.
>
> Every transaction commit that is executed can require LSU padding
> because it could be the transaction the CIL steals the required LSU
> reservation from. When CIL flushes occur - and hence when the LSU
> padding needs to be stolen from a transaction commit - is not under
> the control of the currently executing transaction.
I was thinking we need 2 (2*LSU) per transaction. One for a CIL push
because it could be too large and one if it is sync transaction. In the
mean time if another transaction pushes the log for one of the above
reasons, the CIL will steal whatever is left over from the transaction
and one of the (2*LSU) from that ticket.
>> It is the multiplication of the (2*LSU) by each piece of the
>> transaction that is the killer.
>> A mkdir will have 1.5MB of log reserved space just for the possible
>> LSU padding.
>
> A killer for what, exactly? This is the current status quo, and I
> don't see people having performance problems related to it....
>
>> The cil can steal the left over current ticket space and the global
>> LSU space.
>
> The space is avilable only if the current transaction _unit
> reservation_ has it reserved. Hence every unit reservation needs
> that space to be reserved.
>
> I'm not saying this is optimal - it is an architectural requirement
> that the log has always had and therefore *necessary*.
Killer only in the amount of space as the segments increase.
>>>> The added extended attribute calls for parent inode pointers
>>>> (especially xfs_rename() where it could add up to one and remove up
>>>> to two attributes) is causing a huge multiplication cnt for
>>>> reservation.
>>>
>>> So you are adding a attribute reservations to
>>> create/mkdir/rename/link transaction reservations? That doesn't seem
>>> like a good idea, and it's contrary to the direction we want to head
>>> with the transaction subsystem. Can you post your patches so we
>>> some some context to the question you are asking?
>>>
>>> FWIW, the intended method of linking multiple independent
>>> transactions together for parent pointers into an atomic commit is
>>> this:
>>>
>>> http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Atomic_Multi-Transaction_Operations
>>>
>>> While the text is somewhat out of date (talks about rolling
>>> transactions and being able to replace them), it predated the
>>> delayed logging implementation and hence doesn't take into account
>>> the obvious, simple extension that can be made to the CIL to
>>> implement this. i.e. being able to hold the current CIL context
>>> open for a number of transactions to take place before releasing it
>>> again.
>>>
>>> The point of doing this is still relevant, however:
>>>
>>> "This may also allow us to split some of the large compound
>>> transactions into smaller, more self contained transactions. This
>>> would reduce reservation pressure on log space in the common case
>>> where all the corner cases in the transactions are not taken."
>>>
>>> IOWs, rather than building new, large "aggregation" transactions, we
>>> should be adding infrastructure to the CIL to allow atomic
>>> transaction aggregation techniques to be used and then just using
>>> the existing transaction infrastructure to implement operations such
>>> as "create with ACL"....
>>
>> I will have to think on this. I don't see how the allocation of log
>> space for a new transaction while holding locks is a good thing.
>
> Who said anything about requiring new locks? :)
I meant we have to hold the inode locks between the directory code the
the attribute code. If we do not, then the attributes quickly become out
of line with the directory status.
>
> The initial implementation I was thinking of is basically a
> reference counter and a method for delaying CIL pushes until the
> reference count drops to zero.
>
> Indeed, if we push the aggregated space reservations up into this
> aggregated transaction, the individual transactions can just pull
> out of the space we already know is available to use. IOWs, it's
> relatively trivial to do in a way that leaves open many avenues for
> later optimisation....
sweet.
>
>>> FYI, we need this for all the operations that add attributes at
>>> create time, such as default ACLs and DMAPI/DMF attributes....
>>>
>>>> Those multiplications would be killers on 256KiB log
>>>> stripe units.
>>>
>>> Numbers, please?
>>>
>>> Cheers,
>>>
>>> Dave.
>>
>> /*
>> * Various log count values.
>> */
>> #define XFS_DEFAULT_LOG_COUNT 1
>> #define XFS_DEFAULT_PERM_LOG_COUNT 2
>> #define XFS_ITRUNCATE_LOG_COUNT 2
>> #define XFS_INACTIVE_LOG_COUNT 2
>> #define XFS_CREATE_LOG_COUNT 2
>> #define XFS_MKDIR_LOG_COUNT 3
>> #define XFS_SYMLINK_LOG_COUNT 3
>> #define XFS_REMOVE_LOG_COUNT 2
>> #define XFS_LINK_LOG_COUNT 2
>> #define XFS_RENAME_LOG_COUNT 2
>> #define XFS_WRITE_LOG_COUNT 2
>> #define XFS_ADDAFORK_LOG_COUNT 2
>> #define XFS_ATTRINVAL_LOG_COUNT 1
>> #define XFS_ATTRSET_LOG_COUNT 3
>> #define XFS_ATTRRM_LOG_COUNT 3
>>
>> Even if you are creative, the multipliers add up quickly.
>> xfs_rename will do the directory ops. possibly a attibset and up to
>> 2 attrrm and we have to hold the src and target inodes locks over
>> all the operations.
>
> We don't have to hold them locked across the entire operation
> because the VFS is holding them locked so nothing else can do a
> namespace operation which we are modifying them.
I find without holding the lock over both dir and attrib routines that
remove occasionally can't find parent pointer attribute entry. I suspect
remove racing in rename.
> Anyway, you're worried about multiplying out the existing
> reservations, but that's really not that big of a deal - it's just
> log space, and we've got lots of that to use, and we have plenty of
> avenues to optimise usage in future..
>
> e.g. mkdir is a compound transaction that is really 2 different
> operations. The first is physical inode allocation, then second is
> allocating a free inode. the first only happens for 1 in every 64
> free inode allocations (rarer if we are also removing inodes, too).
> So most of the time we are reserving far more space in the log that
> we actually need to allocate an inode.
>
> The actual unit reservation for the create is the maximum of the two
> individual component reservations times the number of commits that
> will be required, and the total is the unit reservation times the
> number of commits needed. So for a create transaction, it's an
> overestimation of the worst case.
>
> Is that a problem? No. Can it be improved? Yes - see the comments
> about splitting out physical inode chunk allocation into the
> background here:
>
> http://xfs.org/index.php/Improving_inode_Caching#Contiguous_Inode_Allocation
>
> And suddenly we end up with mkdir/create/etc only requiring a single
> transaction reservation. There goes one of your 2*LSU paddings...
>
> What about attributes? The triple-stage transaction is only required
> when -replacing- an existing attribute. The first adds the new
> attribute, the second flips the complete/incomplete flags, and the
> third removes the old attributes. Just adding a new attribute only
> requires a single transaction and most attributes are never
> overwritten, so again we are almost always reserving far too much
> space for such an operation.
>
> Can we optimise this? Yes, we can. Let's disaggregate it - if we are
> replacing attribute content without changing the size or name, then
> we could just add a new transaction that logs a direct overwrite.
> That's a *tiny* transaction. If we are replacing an attribute of the
> same size with a new name, then it's a little more complex as we
> have to manipulate the hash index entries, but that is still a
> single, simple transaction. If we are adding, knowing we aren't
> doing a replacment, then that is a single transaction, too.
>
> Start to see how this greatly reduces the overall operational
> transaction reservations? We can build bigger and more complex
> transactions as part of the process, but optimisation of the
> reservations comes from separation into smaller, more fine-grained
> transactions and reservations.
>
> Keep in mind that from this perspective, I'm quite happy for an
> initial parent pointer implementation to start with gigantic
> create+attr/rename+attr transaction reservations. I'm also happy for
> it to take us 2-3 years to optimise/disaggegate the transactions to
> reduce the overhead to be smaller and more efficient.
>
> I don't expect the initial implementation to be perfect or even
> optimal - attempting to make it so puts us right into prematuve
> optimisation territory. Make parent pointers robust first, then we
> can worry about where we can optimise reservations down to the
> smallest possible subset.
>
> Cheers,
>
> Dave.
Thanks.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 59/60] xfs: Add xfs_log_rlimit.c
2013-06-26 13:48 ` Mark Tinguely
@ 2013-06-26 22:18 ` Dave Chinner
0 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-26 22:18 UTC (permalink / raw)
To: Mark Tinguely; +Cc: xfs
On Wed, Jun 26, 2013 at 08:48:15AM -0500, Mark Tinguely wrote:
> On 06/25/13 23:05, Dave Chinner wrote:
> >On Tue, Jun 25, 2013 at 09:06:39AM -0500, Mark Tinguely wrote:
> >>On 06/24/13 17:27, Dave Chinner wrote:
> >>>On Mon, Jun 24, 2013 at 04:26:28PM -0500, Mark Tinguely wrote:
> >>>>On 06/18/13 23:51, Dave Chinner wrote:
> >>>>>+ * 2) If the lsunit option is specified, a transaction requires 2 LSU
> >>>>>+ * for the reservation because there are two log writes that can
> >>>>>+ * require padding - the transaction data and the commit record which
> >>>>>+ * are written separately and both can require padding to the LSU.
> >>>>>+ * Consider that we can have an active CIL reservation holding 2*LSU,
> >>>>>+ * but the CIL is not over a push threshold, in this case, if we
> >>>>>+ * don't have enough log space for at one new transaction, which
> >>>>>+ * includes another 2*LSU in the reservation, we will run into dead
> >>>>>+ * loop situation in log space grant procedure. i.e.
> >>>>>+ * xlog_grant_head_wait().
> >>>>>+ *
> >>>>>+ * Hence the log size needs to be able to contain two maximally sized
> >>>>>+ * and padded transactions, which is (2 * (2 * LSU + maxlres)).
> >>>>>+ *
> >>>>
> >>>>Any thoughts on how we can separate the 2 * log stripe unit from the
> >>>>reservation.
> >>>
> >>>You can't. The reservation, by definition, is the worse case log
> >>>space usage for the transaction. Therefore, it has to take into
> >>>account the LSU padding that may be necessary if this transaction is
> >>>committed by itself to the log (e.g. wsync operation)
> >>
> >>I am thinking we should separate the extra 2*LSU allocated space
> >>from the individual pieces of the transaction
> >
> >Why? How are you going to prevent log space deadlocks if we don't
> >reserve that space for each individual transaction in a permanent
> >transaction reservation?
>
> Allocate 2 per transaction, not one per segment of a transaction.
> It may be a wash for 2 or 3 segments per transaction, but would help
> if there are 10 segments per transaction.
I'll repeat: The unit reservation used for every commit in a
permanent transaction *requires* LSU padding because we don't know
ahead of time when it will be required. We can't avoid that worst
case behaviour - violating design constraints of the log subsystem
because the worst case reservations get large is simply not an
option....
> >>>IOWs, rather than building new, large "aggregation" transactions, we
> >>>should be adding infrastructure to the CIL to allow atomic
> >>>transaction aggregation techniques to be used and then just using
> >>>the existing transaction infrastructure to implement operations such
> >>>as "create with ACL"....
> >>
> >>I will have to think on this. I don't see how the allocation of log
> >>space for a new transaction while holding locks is a good thing.
> >
> >Who said anything about requiring new locks? :)
>
> I meant we have to hold the inode locks between the directory code
> the the attribute code. If we do not, then the attributes quickly
> become out of line with the directory status.
But why would we need to hold locks atomically across all operations
there? The inodes are already locked at that VFS level via i_mutex,
so nobody is going to see the newly created/modified inodes in the
namespace until we complete the whole create/rename operation, so
the only thing we realy need to be concerned about is whether the
operations can be recovered in whole or in part.....
> >>Even if you are creative, the multipliers add up quickly.
> >>xfs_rename will do the directory ops. possibly a attibset and up to
> >>2 attrrm and we have to hold the src and target inodes locks over
> >>all the operations.
> >
> >We don't have to hold them locked across the entire operation
> >because the VFS is holding them locked so nothing else can do a
> >namespace operation which we are modifying them.
>
> I find without holding the lock over both dir and attrib routines
> that remove occasionally can't find parent pointer attribute entry.
> I suspect remove racing in rename.
Racing with what? The VFS prevents create/link/unlink/rename races
via the i_mutex locking on the parent directories and the
vfs_rename_lock....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 60/60] xfs: Validate log space at mount time
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (58 preceding siblings ...)
2013-06-19 4:51 ` [PATCH 59/60] xfs: Add xfs_log_rlimit.c Dave Chinner
@ 2013-06-19 4:51 ` Dave Chinner
2013-06-19 9:15 ` [PATCH 00/60] xfs: patch queue for 3.11 Christoph Hellwig
2013-06-19 14:35 ` Ben Myers
61 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 4:51 UTC (permalink / raw)
To: xfs
From: Jie Liu <jeff.liu@oracle.com>
Validate log space during log mount stage, the underlying function
will drop a warning message via syslog in critical level if the log
space is too small or too large.
[ dchinner: For CRC enable filesystems, abort the mounting of the
filesystem as mkfs should never make a log too small for the given
filesystem configuration. ]
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 48 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index e1f9c72..6c6e307 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -614,7 +614,9 @@ xfs_log_mount(
xfs_daddr_t blk_offset,
int num_bblks)
{
- int error;
+ int error = 0;
+ int min_logfsbs;
+ int max_logfsbs;
if (!(mp->m_flags & XFS_MOUNT_NORECOVERY))
xfs_notice(mp, "Mounting Filesystem");
@@ -631,6 +633,51 @@ xfs_log_mount(
}
/*
+ * Validate the given log space and drop a critical message via syslog
+ * if the log size is too small that would lead to some unexpected
+ * situations in transaction log space reservation stage.
+ *
+ * Note: we can't just reject the mount if the validation fails. This
+ * would mean that people would have to downgrade their kernel just to
+ * remedy the situation as there is no way to grow the log (short of
+ * black magic surgery with xfs_db).
+ *
+ * We can, however, reject mounts for CRC format filesystems, as the
+ * mkfs binary being used to make the filesystem should never create a
+ * filesystem with a log that is too small.
+ */
+ min_logfsbs = xfs_log_calc_minimum_size(mp);
+ max_logfsbs = XFS_BB_TO_FSB(mp, XFS_MAX_LOG_BLOCKS);
+
+ if (mp->m_sb.sb_logblocks < min_logfsbs) {
+ xfs_warn(mp,
+ "Log size %d blocks too small, minimum size is %d blocks",
+ mp->m_sb.sb_logblocks, min_logfsbs);
+ error = EINVAL;
+ } else if (mp->m_sb.sb_logblocks > max_logfsbs) {
+ xfs_warn(mp,
+ "Log size %d blocks too large, maximum size is %d blocks",
+ mp->m_sb.sb_logblocks, max_logfsbs);
+ error = EINVAL;
+ } else if (XFS_FSB_TO_B(mp, mp->m_sb.sb_logblocks) > XFS_MAX_LOG_BYTES) {
+ xfs_warn(mp,
+ "log size %lld bytes too large, maximum size is %lld bytes",
+ XFS_FSB_TO_B(mp, mp->m_sb.sb_logblocks),
+ XFS_MAX_LOG_BYTES);
+ error = EINVAL;
+ }
+ if (error) {
+ if (xfs_sb_version_hascrc(&mp->m_sb)) {
+ xfs_crit(mp, "AAIEEE! Log failed size checks. Abort!");
+ ASSERT(0);
+ goto out_free_log;
+ }
+ xfs_crit(mp,
+"Log size out of supported range. Continuing onwards, but if log hangs are\n"
+"experienced then please report this message in the bug report.");
+ }
+
+ /*
* Initialize the AIL now we have a log.
*/
error = xfs_trans_ail_init(mp);
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (59 preceding siblings ...)
2013-06-19 4:51 ` [PATCH 60/60] xfs: Validate log space at mount time Dave Chinner
@ 2013-06-19 9:15 ` Christoph Hellwig
2013-06-19 21:34 ` Dave Chinner
2013-06-19 14:35 ` Ben Myers
61 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2013-06-19 9:15 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Hi Dave,
I like this version a lot better from a quick glance.
A few more comments:
- do we really want all that many separate _format.h headers?
I'd be really tempted to say we have just a single xfs_format.h
header, which should declare everything. It's still not all that
large, and it would be a really good start to ease our include mess.
- xfs_extent_ops.c still has that odd _ops.c name I hate because it
tends to imply it's an implementation of some ops vector. How about
xfs_alloc_util.c to go with the other _util name you added?
- any chance to reorder the inode.c split so that stuff doesn't get
move around multiple times?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 9:15 ` [PATCH 00/60] xfs: patch queue for 3.11 Christoph Hellwig
@ 2013-06-19 21:34 ` Dave Chinner
2013-06-20 9:17 ` Christoph Hellwig
0 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 21:34 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Wed, Jun 19, 2013 at 02:15:38AM -0700, Christoph Hellwig wrote:
> Hi Dave,
>
> I like this version a lot better from a quick glance.
>
> A few more comments:
>
> - do we really want all that many separate _format.h headers?
> I'd be really tempted to say we have just a single xfs_format.h
> header, which should declare everything. It's still not all that
> large, and it would be a really good start to ease our include mess.
I don't see any problem with doing that, though I would like to keep
some of it separate - the dir2 stuff is complex enough that I would
like to keep it separate, but stuff like all the log item format
headers can be aggregated.
How about we end up with xfs_log_format.h for the log and log item
formats, xfs_da_format.h for all the dir2/da/attr format definitions,
and xfs_format.h for everything else?
> - xfs_extent_ops.c still has that odd _ops.c name I hate because it
> tends to imply it's an implementation of some ops vector. How about
> xfs_alloc_util.c to go with the other _util name you added?
Yup, sounds reasonable. sed can sort that one out.
> - any chance to reorder the inode.c split so that stuff doesn't get
> move around multiple times?
I was waiting for that question - I have been considering doing that
but I've been putting it off because it is a fair bit of shuffling
that I'd have to do from scratch. Sounds like another vote for :redo
it"...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 21:34 ` Dave Chinner
@ 2013-06-20 9:17 ` Christoph Hellwig
0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2013-06-20 9:17 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Thu, Jun 20, 2013 at 07:34:17AM +1000, Dave Chinner wrote:
> I don't see any problem with doing that, though I would like to keep
> some of it separate - the dir2 stuff is complex enough that I would
> like to keep it separate, but stuff like all the log item format
> headers can be aggregated.
>
> How about we end up with xfs_log_format.h for the log and log item
> formats, xfs_da_format.h for all the dir2/da/attr format definitions,
> and xfs_format.h for everything else?
Sounds reasonable.
> > - any chance to reorder the inode.c split so that stuff doesn't get
> > move around multiple times?
>
> I was waiting for that question - I have been considering doing that
> but I've been putting it off because it is a fair bit of shuffling
> that I'd have to do from scratch. Sounds like another vote for :redo
> it"...
Maybe wait for the discussion about having the trans stuff and the
separate project quotas to go in before this, as it would mean another
big redo..
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
` (60 preceding siblings ...)
2013-06-19 9:15 ` [PATCH 00/60] xfs: patch queue for 3.11 Christoph Hellwig
@ 2013-06-19 14:35 ` Ben Myers
2013-06-19 14:44 ` Christoph Hellwig
2013-06-19 22:54 ` Dave Chinner
61 siblings, 2 replies; 95+ messages in thread
From: Ben Myers @ 2013-06-19 14:35 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Dave,
On Wed, Jun 19, 2013 at 02:50:08PM +1000, Dave Chinner wrote:
> This is my patch queue for 3.11 as it stands right now.
Getting all of this in for 3.11 does not strike me as being realistic. You
need to think about how this can be split up. I see that you have rebased
Jeff's log size validation patch set after your rearrangement. I'd rather
you'd taken Jeff's series first and then made your changes. Now we can't pull
in Jeff's work without pulling in a bunch of rearrangement that hasn't been
fully discussed. You have also crowded out Chandra's quota work. We had an
agreement with him to go for 3.11 with that work which you have broken.
-Ben
> The first 27 patches have been posted previously, and I've addressed
> these comments directly:
>
> - non-debug static function prblems (Brian)
> - agblock_t in xfs-ialloc.c (Brian)
> - xfs-dir2_format.h changes reverted, now included before
> xfs_dir2.h (Christoph)
> - rename xfs_aops_punch_delalloc_range() (Christoph)
>
> The next set of patches then build on this and implement the first
> part of where the discussions went - the removal of __KERNEL__ from
> all the XFS code. This generally takes the form of:
>
> - split on-disk format definitions into a separate header
> - include the header in the original it came from
> - share the new header file with userspace
> - repeat until all definitions are split out and the only
> thing is left in the orignal header is #includes and
> ifdef __KERNEL__ sections.
> - remove the ifdef __KERNEL__, as this header file is no
> longer shared with userspace.
>
> The same process is done for the .c files, resulting in .c files
> that contain only kernel code or only code shared with libxfs in
> userspace.
>
> This has resulted in some less than optimal code locality, mostly to
> do with xfs_inode_ops.c. The code is kernel only, and it's in that
> file because it all related to inode operatianos. However, some are
> internal core operations, some are interfaces that the VFS inode
> operations just wrap around, and some are related more closely to
> inode allocation than anything else. Hence more thought and effort
> needs to be put into factoring and merging this kernel code back
> into the most appropriate places, and that is work for a future
> release.
>
> The final part of the patchset is a rebase of Jeff Liu's log size
> validation patch set. This calculates the minimum size of the log
> that we should have to guarantee that we don't end up with
> transactions being too large for the log. This is an issue because
> mkfs currently doesn't take into account the log stripe unit padding
> that is necessary when calculating the log size, and this can drive
> the size of transaction over the maximum size the log can safely
> support.
>
> This code will warn on existing filesystems at mount time if the log
> is smaller than the minimum safe size, and for CRC enabled
> filesystems it will abort immediately as it is expected that you are
> using a mkfs binary that has this problem fixed.
>
> Overall, there is a lot of code movement in the patch set. The tail
> of the diffstat looks like:
>
> 113 files changed, 14734 insertions(+), 13221 deletions(-)
> create mode 100644 fs/xfs/xfs_attr_inactive.c
> create mode 100644 fs/xfs/xfs_attr_list.c
> create mode 100644 fs/xfs/xfs_bmap_util.c
> create mode 100644 fs/xfs/xfs_bmap_util.h
> create mode 100644 fs/xfs/xfs_buf_item_format.h
> delete mode 100644 fs/xfs/xfs_dfrag.c
> delete mode 100644 fs/xfs/xfs_dfrag.h
> create mode 100644 fs/xfs/xfs_dir2_readdir.c
> create mode 100644 fs/xfs/xfs_dquot_format.h
> create mode 100644 fs/xfs/xfs_extent_ops.c
> rename fs/xfs/{xfs_utils.h => xfs_extent_ops.h} (55%)
> create mode 100644 fs/xfs/xfs_extfree_item_format.h
> create mode 100644 fs/xfs/xfs_icreate_item.c
> create mode 100644 fs/xfs/xfs_icreate_item.h
> create mode 100644 fs/xfs/xfs_icreate_item_format.h
> create mode 100644 fs/xfs/xfs_inode_buf.c
> create mode 100644 fs/xfs/xfs_inode_buf.h
> create mode 100644 fs/xfs/xfs_inode_fork.c
> create mode 100644 fs/xfs/xfs_inode_fork.h
> create mode 100644 fs/xfs/xfs_inode_item_format.h
> create mode 100644 fs/xfs/xfs_inode_ops.c
> create mode 100644 fs/xfs/xfs_inode_ops.h
> create mode 100644 fs/xfs/xfs_log_format.h
> create mode 100644 fs/xfs/xfs_log_rlimit.c
> create mode 100644 fs/xfs/xfs_quota_defs.h
> delete mode 100644 fs/xfs/xfs_rename.c
> create mode 100644 fs/xfs/xfs_rtalloc_defs.h
> create mode 100644 fs/xfs/xfs_sb.c
> create mode 100644 fs/xfs/xfs_symlink_remote.c
> create mode 100644 fs/xfs/xfs_symlink_remote.h
> create mode 100644 fs/xfs/xfs_trans_format.h
> create mode 100644 fs/xfs/xfs_trans_resv.c
> create mode 100644 fs/xfs/xfs_trans_resv.h
> delete mode 100644 fs/xfs/xfs_utils.c
> delete mode 100644 fs/xfs/xfs_vnodeops.c
> delete mode 100644 fs/xfs/xfs_vnodeops.h
>
> Which should give you an idea of the number of new files that have
> been added to split out shared userspace code, and hence an idea
> of how much ofthe diffstat is just moving stuff around. FWIW, this
> kills about 2000 lines of __KERNEL__ code in header files that is
> currently exported to userspace. You can see why I'm thinking a
> kernel fs/xfs/libxfs subdir might be a good idea. :)
>
> Comments, flames, etc all welcome.
>
> Cheers,
>
> Dave.
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 14:35 ` Ben Myers
@ 2013-06-19 14:44 ` Christoph Hellwig
2013-06-19 14:54 ` Ric Wheeler
2013-06-19 22:54 ` Dave Chinner
1 sibling, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2013-06-19 14:44 UTC (permalink / raw)
To: Ben Myers; +Cc: xfs
On Wed, Jun 19, 2013 at 09:35:37AM -0500, Ben Myers wrote:
> Dave,
>
> On Wed, Jun 19, 2013 at 02:50:08PM +1000, Dave Chinner wrote:
> > This is my patch queue for 3.11 as it stands right now.
>
> Getting all of this in for 3.11 does not strike me as being realistic. You
> need to think about how this can be split up. I see that you have rebased
> Jeff's log size validation patch set after your rearrangement. I'd rather
> you'd taken Jeff's series first and then made your changes. Now we can't pull
> in Jeff's work without pulling in a bunch of rearrangement that hasn't been
> fully discussed. You have also crowded out Chandra's quota work. We had an
> agreement with him to go for 3.11 with that work which you have broken.
I think 3.11 is a realistic target for all the code movearound, but
maybe not as part of the normal pull request for -rc1. If we make sure
it's really moving code around and not changing it I think a sending a
second pull request to Linus saying this is just code movearounds we
wanted to do when the churn causes least problems with actual code work
he should be fine with it.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 14:44 ` Christoph Hellwig
@ 2013-06-19 14:54 ` Ric Wheeler
2013-06-19 15:47 ` Ben Myers
0 siblings, 1 reply; 95+ messages in thread
From: Ric Wheeler @ 2013-06-19 14:54 UTC (permalink / raw)
To: xfs
On 06/19/2013 10:44 AM, Christoph Hellwig wrote:
> On Wed, Jun 19, 2013 at 09:35:37AM -0500, Ben Myers wrote:
>> >Dave,
>> >
>> >On Wed, Jun 19, 2013 at 02:50:08PM +1000, Dave Chinner wrote:
>>> > >This is my patch queue for 3.11 as it stands right now.
>> >
>> >Getting all of this in for 3.11 does not strike me as being realistic. You
>> >need to think about how this can be split up. I see that you have rebased
>> >Jeff's log size validation patch set after your rearrangement. I'd rather
>> >you'd taken Jeff's series first and then made your changes. Now we can't pull
>> >in Jeff's work without pulling in a bunch of rearrangement that hasn't been
>> >fully discussed. You have also crowded out Chandra's quota work. We had an
>> >agreement with him to go for 3.11 with that work which you have broken.
> I think 3.11 is a realistic target for all the code movearound, but
> maybe not as part of the normal pull request for -rc1. If we make sure
> it's really moving code around and not changing it I think a sending a
> second pull request to Linus saying this is just code movearounds we
> wanted to do when the churn causes least problems with actual code work
> he should be fine with it.
Just to chime in here, we have a lot of resources focused on testing these XFS
updates both internally with our QA team and with a range of other RH partners.
I am confident we can shake out any integration concerns in time.
Thanks!
Ric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 14:54 ` Ric Wheeler
@ 2013-06-19 15:47 ` Ben Myers
2013-06-19 23:33 ` Dave Chinner
0 siblings, 1 reply; 95+ messages in thread
From: Ben Myers @ 2013-06-19 15:47 UTC (permalink / raw)
To: Ric Wheeler; +Cc: xfs
Hey Ric,
On Wed, Jun 19, 2013 at 10:54:26AM -0400, Ric Wheeler wrote:
> On 06/19/2013 10:44 AM, Christoph Hellwig wrote:
> >On Wed, Jun 19, 2013 at 09:35:37AM -0500, Ben Myers wrote:
> >>>On Wed, Jun 19, 2013 at 02:50:08PM +1000, Dave Chinner wrote:
> >>>> >This is my patch queue for 3.11 as it stands right now.
> >>>
> >>>Getting all of this in for 3.11 does not strike me as being realistic. You
> >>>need to think about how this can be split up. I see that you have rebased
> >>>Jeff's log size validation patch set after your rearrangement. I'd rather
> >>>you'd taken Jeff's series first and then made your changes. Now we can't pull
> >>>in Jeff's work without pulling in a bunch of rearrangement that hasn't been
> >>>fully discussed. You have also crowded out Chandra's quota work. We had an
> >>>agreement with him to go for 3.11 with that work which you have broken.
>
> >I think 3.11 is a realistic target for all the code movearound, but
> >maybe not as part of the normal pull request for -rc1. If we make sure
> >it's really moving code around and not changing it I think a sending a
> >second pull request to Linus saying this is just code movearounds we
> >wanted to do when the churn causes least problems with actual code work
> >he should be fine with it.
>
> Just to chime in here, we have a lot of resources focused on testing
> these XFS updates both internally with our QA team and with a range
> of other RH partners.
This isn't about the size of your QA team or the number of other RH partners.
We had an agreement with Chandra to work toward getting his quota work in 3.11
and it appears that Dave has crowded him out with a rearrangement of code which
we had no agreement would go into 3.11. Further, Dave has taken Jeff's log
size validation series hostage by rebasing it on top of this rearrangement of
code.
If there is a strategic reason that RH needs to have the kernel/libxfs code
rearranged and separated in 3.11 I would have liked to have heard about it
before now. I'm all for getting this work done, but not at the expense of
crowding out other XFS contributors.
-Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 15:47 ` Ben Myers
@ 2013-06-19 23:33 ` Dave Chinner
2013-06-20 19:14 ` Ben Myers
0 siblings, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 23:33 UTC (permalink / raw)
To: Ben Myers; +Cc: Ric Wheeler, xfs
On Wed, Jun 19, 2013 at 10:47:09AM -0500, Ben Myers wrote:
> Hey Ric,
>
> On Wed, Jun 19, 2013 at 10:54:26AM -0400, Ric Wheeler wrote:
> > On 06/19/2013 10:44 AM, Christoph Hellwig wrote:
> > >On Wed, Jun 19, 2013 at 09:35:37AM -0500, Ben Myers wrote:
> > >>>On Wed, Jun 19, 2013 at 02:50:08PM +1000, Dave Chinner wrote:
> > >>>> >This is my patch queue for 3.11 as it stands right now.
> > >>>
> > >>>Getting all of this in for 3.11 does not strike me as being realistic. You
> > >>>need to think about how this can be split up. I see that you have rebased
> > >>>Jeff's log size validation patch set after your rearrangement. I'd rather
> > >>>you'd taken Jeff's series first and then made your changes. Now we can't pull
> > >>>in Jeff's work without pulling in a bunch of rearrangement that hasn't been
> > >>>fully discussed. You have also crowded out Chandra's quota work. We had an
> > >>>agreement with him to go for 3.11 with that work which you have broken.
> >
> > >I think 3.11 is a realistic target for all the code movearound, but
> > >maybe not as part of the normal pull request for -rc1. If we make sure
> > >it's really moving code around and not changing it I think a sending a
> > >second pull request to Linus saying this is just code movearounds we
> > >wanted to do when the churn causes least problems with actual code work
> > >he should be fine with it.
> >
> > Just to chime in here, we have a lot of resources focused on testing
> > these XFS updates both internally with our QA team and with a range
> > of other RH partners.
>
> This isn't about the size of your QA team or the number of other RH partners.
>
> We had an agreement with Chandra to work toward getting his quota work in 3.11
> and it appears that Dave has crowded him out with a rearrangement of code which
> we had no agreement would go into 3.11.
What I posted is what I'm *proposing* for 3.11. You can't have an
agreement with first having a proposal....
> Further, Dave has taken Jeff's log
> size validation series hostage by rebasing it on top of this rearrangement of
> code.
Ben, I think you're being a little melodramatic here. I asked Jeff
if it was OK to rebase his patchset, and he said that was fine:
http://oss.sgi.com/pipermail/xfs/2013-June/027270.html
You don't have to take my rebase of Jeff's patches - you're welcome
to take them direct from Jeff, but then I'll have to send reviews
asking for changes to problems I found when integrating it so that's
going to delay any integration you can do of that series. Please let
Jeff and myself know what you want to do here...
> If there is a strategic reason that RH needs to have the kernel/libxfs code
> rearranged and separated in 3.11 I would have liked to have heard about it
> before now. I'm all for getting this work done, but not at the expense of
> crowding out other XFS contributors.
You are making a mountain out of a molehill. I had an itch, and I
scratched it. Simple as that. It is only a couple of days work.
If you think it's too much for 3.11, then just say so and leave it
at that. I'll move it to my for-3.12 queue and you won't see it
again until after 3.11-rc1 is released...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 23:33 ` Dave Chinner
@ 2013-06-20 19:14 ` Ben Myers
2013-06-20 19:31 ` Chandra Seetharaman
0 siblings, 1 reply; 95+ messages in thread
From: Ben Myers @ 2013-06-20 19:14 UTC (permalink / raw)
To: Dave Chinner; +Cc: Ric Wheeler, xfs
Hey Dave,
On Thu, Jun 20, 2013 at 09:33:47AM +1000, Dave Chinner wrote:
> On Wed, Jun 19, 2013 at 10:47:09AM -0500, Ben Myers wrote:
> > On Wed, Jun 19, 2013 at 10:54:26AM -0400, Ric Wheeler wrote:
> > > On 06/19/2013 10:44 AM, Christoph Hellwig wrote:
> > > >On Wed, Jun 19, 2013 at 09:35:37AM -0500, Ben Myers wrote:
> > > >>>On Wed, Jun 19, 2013 at 02:50:08PM +1000, Dave Chinner wrote:
> > > >>>> >This is my patch queue for 3.11 as it stands right now.
> > > >>>
> > > >>>Getting all of this in for 3.11 does not strike me as being realistic. You
> > > >>>need to think about how this can be split up. I see that you have rebased
> > > >>>Jeff's log size validation patch set after your rearrangement. I'd rather
> > > >>>you'd taken Jeff's series first and then made your changes. Now we can't pull
> > > >>>in Jeff's work without pulling in a bunch of rearrangement that hasn't been
> > > >>>fully discussed. You have also crowded out Chandra's quota work. We had an
> > > >>>agreement with him to go for 3.11 with that work which you have broken.
> > >
> > > >I think 3.11 is a realistic target for all the code movearound, but
> > > >maybe not as part of the normal pull request for -rc1. If we make sure
> > > >it's really moving code around and not changing it I think a sending a
> > > >second pull request to Linus saying this is just code movearounds we
> > > >wanted to do when the churn causes least problems with actual code work
> > > >he should be fine with it.
> > >
> > > Just to chime in here, we have a lot of resources focused on testing
> > > these XFS updates both internally with our QA team and with a range
> > > of other RH partners.
> >
> > This isn't about the size of your QA team or the number of other RH partners.
> >
> > We had an agreement with Chandra to work toward getting his quota work in 3.11
> > and it appears that Dave has crowded him out with a rearrangement of code which
> > we had no agreement would go into 3.11.
>
> What I posted is what I'm *proposing* for 3.11. You can't have an
> agreement with first having a proposal....
>
> > Further, Dave has taken Jeff's log
> > size validation series hostage by rebasing it on top of this rearrangement of
> > code.
>
> Ben, I think you're being a little melodramatic here. I asked Jeff
> if it was OK to rebase his patchset, and he said that was fine:
>
> http://oss.sgi.com/pipermail/xfs/2013-June/027270.html
>
> You don't have to take my rebase of Jeff's patches - you're welcome
> to take them direct from Jeff, but then I'll have to send reviews
> asking for changes to problems I found when integrating it so that's
> going to delay any integration you can do of that series. Please let
> Jeff and myself know what you want to do here...
>
> > If there is a strategic reason that RH needs to have the kernel/libxfs code
> > rearranged and separated in 3.11 I would have liked to have heard about it
> > before now. I'm all for getting this work done, but not at the expense of
> > crowding out other XFS contributors.
>
> You are making a mountain out of a molehill. I had an itch, and I
> scratched it. Simple as that. It is only a couple of days work.
You jumped the queue in front of the other cars. I'm asking you not to do
that, even if one of the drivers was kind enough to let you in.
> If you think it's too much for 3.11, then just say so and leave it at that.
> I'll move it to my for-3.12 queue and you won't see it again until after
> 3.11-rc1 is released...
Lets see where Chandra is at with his quota work. If he has already rebased on
top of your series I don't see a good reason to rearrange things now. If he
hasn't, I'd like focus on getting his code merged before pulling in your
rearrangement. Now that you've rebased Jeff's work, I don't see much point in
redoing that, so maybe that will have to wait for the rearrangement to get
merged.
For now we'll focus on the first 13 patches.
Thanks,
Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-20 19:14 ` Ben Myers
@ 2013-06-20 19:31 ` Chandra Seetharaman
0 siblings, 0 replies; 95+ messages in thread
From: Chandra Seetharaman @ 2013-06-20 19:31 UTC (permalink / raw)
To: Ben Myers; +Cc: Ric Wheeler, xfs
On Thu, 2013-06-20 at 14:14 -0500, Ben Myers wrote:
> Hey Dave,
>
> On Thu, Jun 20, 2013 at 09:33:47AM +1000, Dave Chinner wrote:
> > On Wed, Jun 19, 2013 at 10:47:09AM -0500, Ben Myers wrote:
> > > On Wed, Jun 19, 2013 at 10:54:26AM -0400, Ric Wheeler wrote:
> > > > On 06/19/2013 10:44 AM, Christoph Hellwig wrote:
> > > > >On Wed, Jun 19, 2013 at 09:35:37AM -0500, Ben Myers wrote:
> > > > >>>On Wed, Jun 19, 2013 at 02:50:08PM +1000, Dave Chinner wrote:
> > > > >>>> >This is my patch queue for 3.11 as it stands right now.
> > > > >>>
> > > > >>>Getting all of this in for 3.11 does not strike me as being realistic. You
> > > > >>>need to think about how this can be split up. I see that you have rebased
> > > > >>>Jeff's log size validation patch set after your rearrangement. I'd rather
> > > > >>>you'd taken Jeff's series first and then made your changes. Now we can't pull
> > > > >>>in Jeff's work without pulling in a bunch of rearrangement that hasn't been
> > > > >>>fully discussed. You have also crowded out Chandra's quota work. We had an
> > > > >>>agreement with him to go for 3.11 with that work which you have broken.
> > > >
> > > > >I think 3.11 is a realistic target for all the code movearound, but
> > > > >maybe not as part of the normal pull request for -rc1. If we make sure
> > > > >it's really moving code around and not changing it I think a sending a
> > > > >second pull request to Linus saying this is just code movearounds we
> > > > >wanted to do when the churn causes least problems with actual code work
> > > > >he should be fine with it.
> > > >
> > > > Just to chime in here, we have a lot of resources focused on testing
> > > > these XFS updates both internally with our QA team and with a range
> > > > of other RH partners.
> > >
> > > This isn't about the size of your QA team or the number of other RH partners.
> > >
> > > We had an agreement with Chandra to work toward getting his quota work in 3.11
> > > and it appears that Dave has crowded him out with a rearrangement of code which
> > > we had no agreement would go into 3.11.
> >
> > What I posted is what I'm *proposing* for 3.11. You can't have an
> > agreement with first having a proposal....
> >
> > > Further, Dave has taken Jeff's log
> > > size validation series hostage by rebasing it on top of this rearrangement of
> > > code.
> >
> > Ben, I think you're being a little melodramatic here. I asked Jeff
> > if it was OK to rebase his patchset, and he said that was fine:
> >
> > http://oss.sgi.com/pipermail/xfs/2013-June/027270.html
> >
> > You don't have to take my rebase of Jeff's patches - you're welcome
> > to take them direct from Jeff, but then I'll have to send reviews
> > asking for changes to problems I found when integrating it so that's
> > going to delay any integration you can do of that series. Please let
> > Jeff and myself know what you want to do here...
> >
> > > If there is a strategic reason that RH needs to have the kernel/libxfs code
> > > rearranged and separated in 3.11 I would have liked to have heard about it
> > > before now. I'm all for getting this work done, but not at the expense of
> > > crowding out other XFS contributors.
> >
> > You are making a mountain out of a molehill. I had an itch, and I
> > scratched it. Simple as that. It is only a couple of days work.
>
> You jumped the queue in front of the other cars. I'm asking you not to do
> that, even if one of the drivers was kind enough to let you in.
>
> > If you think it's too much for 3.11, then just say so and leave it at that.
> > I'll move it to my for-3.12 queue and you won't see it again until after
> > 3.11-rc1 is released...
>
> Lets see where Chandra is at with his quota work. If he has already rebased on
> top of your series I don't see a good reason to rearrange things now. If he
No, I haven't rebased on top of Dave's patches. I will post my patchset
by EOD today.
> hasn't, I'd like focus on getting his code merged before pulling in your
> rearrangement. Now that you've rebased Jeff's work, I don't see much point in
> redoing that, so maybe that will have to wait for the rearrangement to get
> merged.
>
> For now we'll focus on the first 13 patches.
>
> Thanks,
> Ben
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 14:35 ` Ben Myers
2013-06-19 14:44 ` Christoph Hellwig
@ 2013-06-19 22:54 ` Dave Chinner
2013-06-20 4:51 ` Dave Chinner
1 sibling, 1 reply; 95+ messages in thread
From: Dave Chinner @ 2013-06-19 22:54 UTC (permalink / raw)
To: Ben Myers; +Cc: xfs
On Wed, Jun 19, 2013 at 09:35:37AM -0500, Ben Myers wrote:
> Dave,
>
> On Wed, Jun 19, 2013 at 02:50:08PM +1000, Dave Chinner wrote:
> > This is my patch queue for 3.11 as it stands right now.
>
> Getting all of this in for 3.11 does not strike me as being
> realistic.
Why not? We've done more complex changes in the past...
> You need to think about how this can be split up.
Sure - the first 27 patches at the necessary ones for 3.11.
> I
> see that you have rebased Jeff's log size validation patch set
> after your rearrangement. I'd rather you'd taken Jeff's series
> first and then made your changes. Now we can't pull in Jeff's
> work without pulling in a bunch of rearrangement that hasn't been
> fully discussed.
Sure, but Jeff's work hadn't been reviewed or QA'd by anyone else,
so what's the problem? If the Maintainers had their job that a month
ago when Jeff posted the patches, then this wouldn't be an issue as
my patchset would now be based on top of it.
You're welcome to go review, test and integrate Jeff's patchset
before mine - I'll just have to rebase my patchset.
> You have also crowded out Chandra's quota work. We had an
> agreement with him to go for 3.11 with that work which you have broken.
You're blaming me for the fact the Maintainers haven't reviewed and
integrated Chandra's latest quota patchset that was posted 6 weeks
ago?
Feel free to get Chandra to repost the patchset and review and
integrate it before taking any of my changes. I'll just rebase my
patchset on top of whatever ends up in the XFS tree....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 00/60] xfs: patch queue for 3.11
2013-06-19 22:54 ` Dave Chinner
@ 2013-06-20 4:51 ` Dave Chinner
0 siblings, 0 replies; 95+ messages in thread
From: Dave Chinner @ 2013-06-20 4:51 UTC (permalink / raw)
To: Ben Myers; +Cc: xfs
On Thu, Jun 20, 2013 at 08:54:04AM +1000, Dave Chinner wrote:
> On Wed, Jun 19, 2013 at 09:35:37AM -0500, Ben Myers wrote:
> > Dave,
> >
> > On Wed, Jun 19, 2013 at 02:50:08PM +1000, Dave Chinner wrote:
> > > This is my patch queue for 3.11 as it stands right now.
> >
> > Getting all of this in for 3.11 does not strike me as being
> > realistic.
>
> Why not? We've done more complex changes in the past...
>
> > You need to think about how this can be split up.
>
> Sure - the first 27 patches at the necessary ones for 3.11.
Actaully if you really concerned about how little time you have left
before the next merge window opens, then just concentrate on the
first 13 patches (i.e. up to the local format fork extent handling
change for xfs_write_bmapi).
I'm about to rework the series based on the
log_format/da_format/format split that I suggested in reponse to
Christoph's comments and also to avoid creating xfs_inode_ops.[ch].
If I get that done this afternoon, I'll repost it all for you....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 95+ messages in thread