* [PATCH 0/6] btrfs: speedup directory logging/fsync by copying index keys only
@ 2021-10-25 9:56 fdmanana
2021-10-25 9:56 ` [PATCH 1/6] btrfs: remove root argument from drop_one_dir_item() fdmanana
` (6 more replies)
0 siblings, 7 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 9:56 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
This patchset reworks directory logging to make it copy only the dir index
keys, instead of copying both dir index keys and dir item keys, as both have
the same type of information. This reduces the amount of logged metadata by
about half, and therefore we do about half of the cpu bound work, half of
the IO and use less log tree space (except for very small directories).
This will allow other optimizations to build on top, some of which are only
possible after this change, while others become easier and less cumbersome
to implement after this change. Performance tests are in the changelog of
patch 5/6. Patch 6/6 only removes code that deals with dir item keys when
replaying directory deletions and it could have been squashed into patch
5/6, but since that one is already large and works without 6/6, I opted
to make it separate to make it easier to review.
Also, after this change we are still able to correctly replay a log tree
generated by an old kernel, and an old kernel is also able to correctly
replay a log tree generated by a kernel that has this patchset applied.
I'm sending this close to the 5.16 merge window, but my intention is to
have it only for the next merge window (5.17).
Filipe Manana (6):
btrfs: remove root argument from drop_one_dir_item()
btrfs: remove root argument from btrfs_unlink_inode()
btrfs: remove root argument from add_link()
btrfs: remove root argument from check_item_in_log()
btrfs: only copy dir index keys when logging a directory
btrfs: remove no longer needed logic for replaying directory deletes
fs/btrfs/btrfs_inode.h | 18 +-
fs/btrfs/ctree.h | 1 -
fs/btrfs/inode.c | 25 +-
fs/btrfs/tree-log.c | 582 ++++++++++++++------------------
include/uapi/linux/btrfs_tree.h | 4 +-
5 files changed, 280 insertions(+), 350 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/6] btrfs: remove root argument from drop_one_dir_item()
2021-10-25 9:56 [PATCH 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
@ 2021-10-25 9:56 ` fdmanana
2021-10-25 9:56 ` [PATCH 2/6] btrfs: remove root argument from btrfs_unlink_inode() fdmanana
` (5 subsequent siblings)
6 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 9:56 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The root argument for drop_one_dir_item() always matches the root of the
given directory inode, since each log tree is associated to one and only
one subvolume/root, so remove the argument.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 7c900eb27cf8..23f1a35ea04f 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -921,11 +921,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
* item
*/
static noinline int drop_one_dir_item(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
struct btrfs_path *path,
struct btrfs_inode *dir,
struct btrfs_dir_item *di)
{
+ struct btrfs_root *root = dir->root;
struct inode *inode;
char *name;
int name_len;
@@ -1220,7 +1220,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
if (IS_ERR(di)) {
return PTR_ERR(di);
} else if (di) {
- ret = drop_one_dir_item(trans, root, path, dir, di);
+ ret = drop_one_dir_item(trans, path, dir, di);
if (ret)
return ret;
}
@@ -1232,7 +1232,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
if (IS_ERR(di)) {
return PTR_ERR(di);
} else if (di) {
- ret = drop_one_dir_item(trans, root, path, dir, di);
+ ret = drop_one_dir_item(trans, path, dir, di);
if (ret)
return ret;
}
@@ -2049,7 +2049,7 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
if (!exists)
goto out;
- ret = drop_one_dir_item(trans, root, path, BTRFS_I(dir), dst_di);
+ ret = drop_one_dir_item(trans, path, BTRFS_I(dir), dst_di);
if (ret)
goto out;
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 2/6] btrfs: remove root argument from btrfs_unlink_inode()
2021-10-25 9:56 [PATCH 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
2021-10-25 9:56 ` [PATCH 1/6] btrfs: remove root argument from drop_one_dir_item() fdmanana
@ 2021-10-25 9:56 ` fdmanana
2021-10-25 9:56 ` [PATCH 3/6] btrfs: remove root argument from add_link() fdmanana
` (4 subsequent siblings)
6 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 9:56 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The root argument passed to btrfs_unlink_inode() and its callee,
__btrfs_unlink_inode(), always matches the root of the given directory and
the given inode. So remove the argument and make __btrfs_unlink_inode()
use the root of the directory.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/ctree.h | 1 -
fs/btrfs/inode.c | 25 +++++++++++--------------
fs/btrfs/tree-log.c | 14 +++++++-------
3 files changed, 18 insertions(+), 22 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index dc6a6173a2c0..7553e9dc5f93 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3195,7 +3195,6 @@ void __btrfs_del_delalloc_inode(struct btrfs_root *root,
struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry);
int btrfs_set_inode_index(struct btrfs_inode *dir, u64 *index);
int btrfs_unlink_inode(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
struct btrfs_inode *dir, struct btrfs_inode *inode,
const char *name, int name_len);
int btrfs_add_link(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3032893efee5..5fec009fbe63 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4056,11 +4056,11 @@ int btrfs_update_inode_fallback(struct btrfs_trans_handle *trans,
* also drops the back refs in the inode to the directory
*/
static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
struct btrfs_inode *dir,
struct btrfs_inode *inode,
const char *name, int name_len)
{
+ struct btrfs_root *root = dir->root;
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_path *path;
int ret = 0;
@@ -4150,15 +4150,14 @@ static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans,
}
int btrfs_unlink_inode(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
struct btrfs_inode *dir, struct btrfs_inode *inode,
const char *name, int name_len)
{
int ret;
- ret = __btrfs_unlink_inode(trans, root, dir, inode, name, name_len);
+ ret = __btrfs_unlink_inode(trans, dir, inode, name, name_len);
if (!ret) {
drop_nlink(&inode->vfs_inode);
- ret = btrfs_update_inode(trans, root, inode);
+ ret = btrfs_update_inode(trans, inode->root, inode);
}
return ret;
}
@@ -4187,7 +4186,6 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir)
static int btrfs_unlink(struct inode *dir, struct dentry *dentry)
{
- struct btrfs_root *root = BTRFS_I(dir)->root;
struct btrfs_trans_handle *trans;
struct inode *inode = d_inode(dentry);
int ret;
@@ -4199,7 +4197,7 @@ static int btrfs_unlink(struct inode *dir, struct dentry *dentry)
btrfs_record_unlink_dir(trans, BTRFS_I(dir), BTRFS_I(d_inode(dentry)),
0);
- ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir),
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir),
BTRFS_I(d_inode(dentry)), dentry->d_name.name,
dentry->d_name.len);
if (ret)
@@ -4213,7 +4211,7 @@ static int btrfs_unlink(struct inode *dir, struct dentry *dentry)
out:
btrfs_end_transaction(trans);
- btrfs_btree_balance_dirty(root->fs_info);
+ btrfs_btree_balance_dirty(BTRFS_I(dir)->root->fs_info);
return ret;
}
@@ -4564,7 +4562,6 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
{
struct inode *inode = d_inode(dentry);
int err = 0;
- struct btrfs_root *root = BTRFS_I(dir)->root;
struct btrfs_trans_handle *trans;
u64 last_unlink_trans;
@@ -4589,7 +4586,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
last_unlink_trans = BTRFS_I(inode)->last_unlink_trans;
/* now the directory is empty */
- err = btrfs_unlink_inode(trans, root, BTRFS_I(dir),
+ err = btrfs_unlink_inode(trans, BTRFS_I(dir),
BTRFS_I(d_inode(dentry)), dentry->d_name.name,
dentry->d_name.len);
if (!err) {
@@ -4610,7 +4607,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
}
out:
btrfs_end_transaction(trans);
- btrfs_btree_balance_dirty(root->fs_info);
+ btrfs_btree_balance_dirty(BTRFS_I(dir)->root->fs_info);
return err;
}
@@ -9468,7 +9465,7 @@ static int btrfs_rename_exchange(struct inode *old_dir,
if (old_ino == BTRFS_FIRST_FREE_OBJECTID) {
ret = btrfs_unlink_subvol(trans, old_dir, old_dentry);
} else { /* src is an inode */
- ret = __btrfs_unlink_inode(trans, root, BTRFS_I(old_dir),
+ ret = __btrfs_unlink_inode(trans, BTRFS_I(old_dir),
BTRFS_I(old_dentry->d_inode),
old_dentry->d_name.name,
old_dentry->d_name.len);
@@ -9484,7 +9481,7 @@ static int btrfs_rename_exchange(struct inode *old_dir,
if (new_ino == BTRFS_FIRST_FREE_OBJECTID) {
ret = btrfs_unlink_subvol(trans, new_dir, new_dentry);
} else { /* dest is an inode */
- ret = __btrfs_unlink_inode(trans, dest, BTRFS_I(new_dir),
+ ret = __btrfs_unlink_inode(trans, BTRFS_I(new_dir),
BTRFS_I(new_dentry->d_inode),
new_dentry->d_name.name,
new_dentry->d_name.len);
@@ -9759,7 +9756,7 @@ static int btrfs_rename(struct user_namespace *mnt_userns,
*/
btrfs_pin_log_trans(root);
log_pinned = true;
- ret = __btrfs_unlink_inode(trans, root, BTRFS_I(old_dir),
+ ret = __btrfs_unlink_inode(trans, BTRFS_I(old_dir),
BTRFS_I(d_inode(old_dentry)),
old_dentry->d_name.name,
old_dentry->d_name.len);
@@ -9779,7 +9776,7 @@ static int btrfs_rename(struct user_namespace *mnt_userns,
ret = btrfs_unlink_subvol(trans, new_dir, new_dentry);
BUG_ON(new_inode->i_nlink == 0);
} else {
- ret = btrfs_unlink_inode(trans, dest, BTRFS_I(new_dir),
+ ret = btrfs_unlink_inode(trans, BTRFS_I(new_dir),
BTRFS_I(d_inode(new_dentry)),
new_dentry->d_name.name,
new_dentry->d_name.len);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 23f1a35ea04f..568509371b68 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -954,7 +954,7 @@ static noinline int drop_one_dir_item(struct btrfs_trans_handle *trans,
if (ret)
goto out;
- ret = btrfs_unlink_inode(trans, root, dir, BTRFS_I(inode), name,
+ ret = btrfs_unlink_inode(trans, dir, BTRFS_I(inode), name,
name_len);
if (ret)
goto out;
@@ -1119,7 +1119,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
inc_nlink(&inode->vfs_inode);
btrfs_release_path(path);
- ret = btrfs_unlink_inode(trans, root, dir, inode,
+ ret = btrfs_unlink_inode(trans, dir, inode,
victim_name, victim_name_len);
kfree(victim_name);
if (ret)
@@ -1190,7 +1190,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
inc_nlink(&inode->vfs_inode);
btrfs_release_path(path);
- ret = btrfs_unlink_inode(trans, root,
+ ret = btrfs_unlink_inode(trans,
BTRFS_I(victim_parent),
inode,
victim_name,
@@ -1352,7 +1352,7 @@ static int unlink_old_inode_refs(struct btrfs_trans_handle *trans,
kfree(name);
goto out;
}
- ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir),
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir),
inode, name, namelen);
kfree(name);
iput(dir);
@@ -1450,7 +1450,7 @@ static int add_link(struct btrfs_trans_handle *trans, struct btrfs_root *root,
ret = -ENOENT;
goto out;
}
- ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir), BTRFS_I(other_inode),
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(other_inode),
name, namelen);
if (ret)
goto out;
@@ -1596,7 +1596,7 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
ret = btrfs_inode_ref_exists(inode, dir, key->type,
name, namelen);
if (ret > 0) {
- ret = btrfs_unlink_inode(trans, root,
+ ret = btrfs_unlink_inode(trans,
BTRFS_I(dir),
BTRFS_I(inode),
name, namelen);
@@ -2346,7 +2346,7 @@ static noinline int check_item_in_log(struct btrfs_trans_handle *trans,
}
inc_nlink(inode);
- ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir),
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir),
BTRFS_I(inode), name, name_len);
if (!ret)
ret = btrfs_run_delayed_items(trans);
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 3/6] btrfs: remove root argument from add_link()
2021-10-25 9:56 [PATCH 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
2021-10-25 9:56 ` [PATCH 1/6] btrfs: remove root argument from drop_one_dir_item() fdmanana
2021-10-25 9:56 ` [PATCH 2/6] btrfs: remove root argument from btrfs_unlink_inode() fdmanana
@ 2021-10-25 9:56 ` fdmanana
2021-10-25 9:56 ` [PATCH 4/6] btrfs: remove root argument from check_item_in_log() fdmanana
` (3 subsequent siblings)
6 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 9:56 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The root argument for tree-log.c:add_link() always matches the root of the
given directory and the given inode, so it can eliminated.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 568509371b68..b753b3c87e14 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1413,10 +1413,11 @@ static int btrfs_inode_ref_exists(struct inode *inode, struct inode *dir,
return ret;
}
-static int add_link(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+static int add_link(struct btrfs_trans_handle *trans,
struct inode *dir, struct inode *inode, const char *name,
int namelen, u64 ref_index)
{
+ struct btrfs_root *root = BTRFS_I(dir)->root;
struct btrfs_dir_item *dir_item;
struct btrfs_key key;
struct btrfs_path *path;
@@ -1612,7 +1613,7 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
goto out;
/* insert our name */
- ret = add_link(trans, root, dir, inode, name, namelen,
+ ret = add_link(trans, dir, inode, name, namelen,
ref_index);
if (ret)
goto out;
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 4/6] btrfs: remove root argument from check_item_in_log()
2021-10-25 9:56 [PATCH 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
` (2 preceding siblings ...)
2021-10-25 9:56 ` [PATCH 3/6] btrfs: remove root argument from add_link() fdmanana
@ 2021-10-25 9:56 ` fdmanana
2021-10-25 9:56 ` [PATCH 5/6] btrfs: only copy dir index keys when logging a directory fdmanana
` (2 subsequent siblings)
6 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 9:56 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The root argument passed to check_item_in_log() always matches the root
of the given directory, so it can be eliminated.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index b753b3c87e14..8ab33caf016f 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2280,13 +2280,13 @@ static noinline int find_dir_range(struct btrfs_root *root,
* to is unlinked
*/
static noinline int check_item_in_log(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
struct btrfs_root *log,
struct btrfs_path *path,
struct btrfs_path *log_path,
struct inode *dir,
struct btrfs_key *dir_key)
{
+ struct btrfs_root *root = BTRFS_I(dir)->root;
int ret;
struct extent_buffer *eb;
int slot;
@@ -2560,7 +2560,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
if (found_key.offset > range_end)
break;
- ret = check_item_in_log(trans, root, log, path,
+ ret = check_item_in_log(trans, log, path,
log_path, dir,
&found_key);
if (ret)
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 5/6] btrfs: only copy dir index keys when logging a directory
2021-10-25 9:56 [PATCH 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
` (3 preceding siblings ...)
2021-10-25 9:56 ` [PATCH 4/6] btrfs: remove root argument from check_item_in_log() fdmanana
@ 2021-10-25 9:56 ` fdmanana
2021-10-25 15:36 ` Josef Bacik
2021-10-25 9:56 ` [PATCH 6/6] btrfs: remove no longer needed logic for replaying directory deletes fdmanana
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
6 siblings, 1 reply; 21+ messages in thread
From: fdmanana @ 2021-10-25 9:56 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Currently, when logging a directory, we copy both dir items and dir index
items from the fs/subvolume tree to the log tree. Both items have exactly
the same data (same struct btrfs_dir_item), the difference lies in the key
values, where a dir index key contains the index number of a directory
entry while the dir item key does not, as it's used for doing fast lookups
of an entry by name, while the former is used for sorting entries when
listing a directory.
We can exploit that and log only the dir index items, since they contain
all the information needed to correctly add, replace and delete directory
entries when replaying a log tree. Logging only the dir index items is
also backward and forward compatible: an unpatched kernel (without this
change) can correctly replay a log tree generated by a patched kernel
(with this patch), and a patched kernel can correctly replay a log tree
generated by an unpatched kernel.
So modify directory logging to log only dir index items, and modify the
log replay process to ignore dir item keys, from log trees created by an
unpatched kernel, and process only with dir index keys. This reduces the
amount of logged metadata by about half, and therefore the time spent
logging or fsyncing large directories (less cpu time and less IO).
The following test script was used to measure this change:
#!/bin/bash
DEV=/dev/nvme0n1
MNT=/mnt/nvme0n1
NUM_NEW_FILES=1000000
NUM_FILE_DELETES=10000
mkfs.btrfs -f $DEV
mount -o ssd $DEV $MNT
mkdir $MNT/testdir
for ((i = 1; i <= $NUM_NEW_FILES; i++)); do
echo -n > $MNT/testdir/file_$i
done
start=$(date +%s%N)
xfs_io -c "fsync" $MNT/testdir
end=$(date +%s%N)
dur=$(( (end - start) / 1000000 ))
echo "dir fsync took $dur ms after adding $NUM_NEW_FILES files"
# sync to force transaction commit and wipeout the log.
sync
del_inc=$(( $NUM_NEW_FILES / $NUM_FILE_DELETES ))
for ((i = 1; i <= $NUM_NEW_FILES; i += $del_inc)); do
rm -f $MNT/testdir/file_$i
done
start=$(date +%s%N)
xfs_io -c "fsync" $MNT/testdir
end=$(date +%s%N)
dur=$(( (end - start) / 1000000 ))
echo "dir fsync took $dur ms after deleting $NUM_FILE_DELETES files"
echo
umount $MNT
The tests were run on a physical machine, with a non-debug kernel (Debian's
default kernel config), for different values of $NUM_NEW_FILES and
$NUM_FILE_DELETES, and the results were the following:
** Before patch, NUM_NEW_FILES = 1 000 000, NUM_DELETE_FILES = 10 000 **
dir fsync took 8412 ms after adding 1000000 files
dir fsync took 500 ms after deleting 10000 files
** After patch, NUM_NEW_FILES = 1 000 000, NUM_DELETE_FILES = 10 000 **
dir fsync took 4252 ms after adding 1000000 files (-49.5%)
dir fsync took 269 ms after deleting 10000 files (-46.2%)
** Before patch, NUM_NEW_FILES = 100 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 745 ms after adding 100000 files
dir fsync took 59 ms after deleting 1000 files
** After patch, NUM_NEW_FILES = 100 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 404 ms after adding 100000 files (-45.8%)
dir fsync took 31 ms after deleting 1000 files (-47.5%)
** Before patch, NUM_NEW_FILES = 10 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 67 ms after adding 10000 files
dir fsync took 9 ms after deleting 1000 files
** After patch, NUM_NEW_FILES = 10 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 36 ms after adding 10000 files (-46.3%)
dir fsync took 5 ms after deleting 1000 files (-44.4%)
** Before patch, NUM_NEW_FILES = 1 000, NUM_DELETE_FILES = 100 **
dir fsync took 9 ms after adding 1000 files
dir fsync took 4 ms after deleting 100 files
** After patch, NUM_NEW_FILES = 1 000, NUM_DELETE_FILES = 100 **
dir fsync took 7 ms after adding 1000 files (-22.2%)
dir fsync took 3 ms after deleting 100 files (-25.0%)
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/btrfs_inode.h | 18 +-
fs/btrfs/tree-log.c | 395 ++++++++++++++++++-----------------------
2 files changed, 182 insertions(+), 231 deletions(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index ab2a4a52e0bb..b3e46aabc3d8 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -138,19 +138,11 @@ struct btrfs_inode {
/* a local copy of root's last_log_commit */
int last_log_commit;
- union {
- /*
- * Total number of bytes pending delalloc, used by stat to
- * calculate the real block usage of the file. This is used
- * only for files.
- */
- u64 delalloc_bytes;
- /*
- * The offset of the last dir item key that was logged.
- * This is used only for directories.
- */
- u64 last_dir_item_offset;
- };
+ /*
+ * Total number of bytes pending delalloc, used by stat to calculate the
+ * real block usage of the file. This is used only for files.
+ */
+ u64 delalloc_bytes;
union {
/*
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 8ab33caf016f..e49fe86f390e 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1949,6 +1949,34 @@ static noinline int insert_one_name(struct btrfs_trans_handle *trans,
return ret;
}
+static int delete_conflicting_dir_entry(struct btrfs_trans_handle *trans,
+ struct btrfs_inode *dir,
+ struct btrfs_path *path,
+ struct btrfs_dir_item *dst_di,
+ const struct btrfs_key *log_key,
+ u8 log_type,
+ bool exists)
+{
+ struct btrfs_key found_key;
+
+ btrfs_dir_item_key_to_cpu(path->nodes[0], dst_di, &found_key);
+ /* The existing dentry points to the same inode, don't delete it. */
+ if (found_key.objectid == log_key->objectid &&
+ found_key.type == log_key->type &&
+ found_key.offset == log_key->offset &&
+ btrfs_dir_type(path->nodes[0], dst_di) == log_type)
+ return 1;
+
+ /*
+ * Don't drop the conflicting directory entry if the inode for the new
+ * entry doesn't exist.
+ */
+ if (!exists)
+ return 0;
+
+ return drop_one_dir_item(trans, path, dir, dst_di);
+}
+
/*
* take a single entry in a log directory item and replay it into
* the subvolume.
@@ -1974,14 +2002,17 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
{
char *name;
int name_len;
- struct btrfs_dir_item *dst_di;
- struct btrfs_key found_key;
+ struct btrfs_dir_item *dir_dst_di;
+ struct btrfs_dir_item *index_dst_di;
+ bool dir_dst_matches = false;
+ bool index_dst_matches = false;
struct btrfs_key log_key;
+ struct btrfs_key search_key;
struct inode *dir;
u8 log_type;
bool exists;
int ret;
- bool update_size = (key->type == BTRFS_DIR_INDEX_KEY);
+ bool update_size = true;
bool name_added = false;
dir = read_one_inode(root, key->objectid);
@@ -2007,76 +2038,53 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
exists = (ret == 0);
ret = 0;
- if (key->type == BTRFS_DIR_ITEM_KEY) {
- dst_di = btrfs_lookup_dir_item(trans, root, path, key->objectid,
- name, name_len, 1);
- } else if (key->type == BTRFS_DIR_INDEX_KEY) {
- dst_di = btrfs_lookup_dir_index_item(trans, root, path,
- key->objectid,
- key->offset, name,
- name_len, 1);
- } else {
- /* Corruption */
- ret = -EINVAL;
+ dir_dst_di = btrfs_lookup_dir_item(trans, root, path, key->objectid,
+ name, name_len, 1);
+ if (IS_ERR(dir_dst_di)) {
+ ret = PTR_ERR(dir_dst_di);
goto out;
+ } else if (dir_dst_di) {
+ ret = delete_conflicting_dir_entry(trans, BTRFS_I(dir), path,
+ dir_dst_di, &log_key, log_type,
+ exists);
+ if (ret < 0)
+ goto out;
+ dir_dst_matches = (ret == 1);
}
- if (IS_ERR(dst_di)) {
- ret = PTR_ERR(dst_di);
+ btrfs_release_path(path);
+
+ index_dst_di = btrfs_lookup_dir_index_item(trans, root, path,
+ key->objectid, key->offset,
+ name, name_len, 1);
+ if (IS_ERR(index_dst_di)) {
+ ret = PTR_ERR(index_dst_di);
goto out;
- } else if (!dst_di) {
- /* we need a sequence number to insert, so we only
- * do inserts for the BTRFS_DIR_INDEX_KEY types
- */
- if (key->type != BTRFS_DIR_INDEX_KEY)
+ } else if (index_dst_di) {
+ ret = delete_conflicting_dir_entry(trans, BTRFS_I(dir), path,
+ index_dst_di, &log_key,
+ log_type, exists);
+ if (ret < 0)
goto out;
- goto insert;
+ index_dst_matches = (ret == 1);
}
- btrfs_dir_item_key_to_cpu(path->nodes[0], dst_di, &found_key);
- /* the existing item matches the logged item */
- if (found_key.objectid == log_key.objectid &&
- found_key.type == log_key.type &&
- found_key.offset == log_key.offset &&
- btrfs_dir_type(path->nodes[0], dst_di) == log_type) {
+ btrfs_release_path(path);
+
+ if (dir_dst_matches && index_dst_matches) {
+ ret = 0;
update_size = false;
goto out;
}
- /*
- * don't drop the conflicting directory entry if the inode
- * for the new entry doesn't exist
- */
- if (!exists)
- goto out;
-
- ret = drop_one_dir_item(trans, path, BTRFS_I(dir), dst_di);
- if (ret)
- goto out;
-
- if (key->type == BTRFS_DIR_INDEX_KEY)
- goto insert;
-out:
- btrfs_release_path(path);
- if (!ret && update_size) {
- btrfs_i_size_write(BTRFS_I(dir), dir->i_size + name_len * 2);
- ret = btrfs_update_inode(trans, root, BTRFS_I(dir));
- }
- kfree(name);
- iput(dir);
- if (!ret && name_added)
- ret = 1;
- return ret;
-
-insert:
/*
* Check if the inode reference exists in the log for the given name,
* inode and parent inode
*/
- found_key.objectid = log_key.objectid;
- found_key.type = BTRFS_INODE_REF_KEY;
- found_key.offset = key->objectid;
- ret = backref_in_log(root->log_root, &found_key, 0, name, name_len);
+ search_key.objectid = log_key.objectid;
+ search_key.type = BTRFS_INODE_REF_KEY;
+ search_key.offset = key->objectid;
+ ret = backref_in_log(root->log_root, &search_key, 0, name, name_len);
if (ret < 0) {
goto out;
} else if (ret) {
@@ -2086,10 +2094,10 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
goto out;
}
- found_key.objectid = log_key.objectid;
- found_key.type = BTRFS_INODE_EXTREF_KEY;
- found_key.offset = key->objectid;
- ret = backref_in_log(root->log_root, &found_key, key->objectid, name,
+ search_key.objectid = log_key.objectid;
+ search_key.type = BTRFS_INODE_EXTREF_KEY;
+ search_key.offset = key->objectid;
+ ret = backref_in_log(root->log_root, &search_key, key->objectid, name,
name_len);
if (ret < 0) {
goto out;
@@ -2108,87 +2116,76 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
name_added = true;
update_size = false;
ret = 0;
- goto out;
+
+out:
+ if (!ret && update_size) {
+ btrfs_i_size_write(BTRFS_I(dir), dir->i_size + name_len * 2);
+ ret = btrfs_update_inode(trans, root, BTRFS_I(dir));
+ }
+ kfree(name);
+ iput(dir);
+ if (!ret && name_added)
+ ret = 1;
+ return ret;
}
-/*
- * find all the names in a directory item and reconcile them into
- * the subvolume. Only BTRFS_DIR_ITEM_KEY types will have more than
- * one name in a directory item, but the same code gets used for
- * both directory index types
- */
+/* Replay one dir item from a BTRFS_DIR_INDEX_KEY key. */
static noinline int replay_one_dir_item(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_path *path,
struct extent_buffer *eb, int slot,
struct btrfs_key *key)
{
- int ret = 0;
- u32 item_size = btrfs_item_size_nr(eb, slot);
+ int ret;
struct btrfs_dir_item *di;
- int name_len;
- unsigned long ptr;
- unsigned long ptr_end;
- struct btrfs_path *fixup_path = NULL;
- ptr = btrfs_item_ptr_offset(eb, slot);
- ptr_end = ptr + item_size;
- while (ptr < ptr_end) {
- di = (struct btrfs_dir_item *)ptr;
- name_len = btrfs_dir_name_len(eb, di);
- ret = replay_one_name(trans, root, path, eb, di, key);
- if (ret < 0)
- break;
- ptr = (unsigned long)(di + 1);
- ptr += name_len;
+ /* We only log dir index keys, which only contain a single dir item. */
+ ASSERT(key->type == BTRFS_DIR_INDEX_KEY);
- /*
- * If this entry refers to a non-directory (directories can not
- * have a link count > 1) and it was added in the transaction
- * that was not committed, make sure we fixup the link count of
- * the inode it the entry points to. Otherwise something like
- * the following would result in a directory pointing to an
- * inode with a wrong link that does not account for this dir
- * entry:
- *
- * mkdir testdir
- * touch testdir/foo
- * touch testdir/bar
- * sync
- *
- * ln testdir/bar testdir/bar_link
- * ln testdir/foo testdir/foo_link
- * xfs_io -c "fsync" testdir/bar
- *
- * <power failure>
- *
- * mount fs, log replay happens
- *
- * File foo would remain with a link count of 1 when it has two
- * entries pointing to it in the directory testdir. This would
- * make it impossible to ever delete the parent directory has
- * it would result in stale dentries that can never be deleted.
- */
- if (ret == 1 && btrfs_dir_type(eb, di) != BTRFS_FT_DIR) {
- struct btrfs_key di_key;
+ di = btrfs_item_ptr(eb, slot, struct btrfs_dir_item);
+ ret = replay_one_name(trans, root, path, eb, di, key);
+ if (ret < 0)
+ return ret;
- if (!fixup_path) {
- fixup_path = btrfs_alloc_path();
- if (!fixup_path) {
- ret = -ENOMEM;
- break;
- }
- }
+ /*
+ * If this entry refers to a non-directory (directories can not have a
+ * link count > 1) and it was added in the transaction that was not
+ * committed, make sure we fixup the link count of the inode the entry
+ * points to. Otherwise something like the following would result in a
+ * directory pointing to an inode with a wrong link that does not account
+ * for this dir entry:
+ *
+ * mkdir testdir
+ * touch testdir/foo
+ * touch testdir/bar
+ * sync
+ *
+ * ln testdir/bar testdir/bar_link
+ * ln testdir/foo testdir/foo_link
+ * xfs_io -c "fsync" testdir/bar
+ *
+ * <power failure>
+ *
+ * mount fs, log replay happens
+ *
+ * File foo would remain with a link count of 1 when it has two entries
+ * pointing to it in the directory testdir. This would make it impossible
+ * to ever delete the parent directory has it would result in stale
+ * dentries that can never be deleted.
+ */
+ if (ret == 1 && btrfs_dir_type(eb, di) != BTRFS_FT_DIR) {
+ struct btrfs_path *fixup_path;
+ struct btrfs_key di_key;
- btrfs_dir_item_key_to_cpu(eb, di, &di_key);
- ret = link_to_fixup_dir(trans, root, fixup_path,
- di_key.objectid);
- if (ret)
- break;
- }
- ret = 0;
+ fixup_path = btrfs_alloc_path();
+ if (!fixup_path)
+ return -ENOMEM;
+
+ btrfs_dir_item_key_to_cpu(eb, di, &di_key);
+ ret = link_to_fixup_dir(trans, root, fixup_path, di_key.objectid);
+ btrfs_free_path(fixup_path);
}
- btrfs_free_path(fixup_path);
+
return ret;
}
@@ -2742,12 +2739,13 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
eb, i, &key);
if (ret)
break;
- } else if (key.type == BTRFS_DIR_ITEM_KEY) {
- ret = replay_one_dir_item(wc->trans, root, path,
- eb, i, &key);
- if (ret)
- break;
}
+ /*
+ * We don't log BTRFS_DIR_ITEM_KEY keys anymore, only the
+ * BTRFS_DIR_INDEX_KEY items which we use to derive the
+ * BTRFS_DIR_ITEM_KEY items. If we are replaying a log from an
+ * older kernel with such keys, ignore them.
+ */
}
btrfs_free_path(path);
return ret;
@@ -3549,20 +3547,10 @@ void btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
goto out_unlock;
}
- di = btrfs_lookup_dir_item(trans, log, path, dir_ino,
- name, name_len, -1);
- if (IS_ERR(di)) {
- err = PTR_ERR(di);
- goto fail;
- }
- if (di) {
- ret = btrfs_delete_one_dir_name(trans, log, path, di);
- if (ret) {
- err = ret;
- goto fail;
- }
- }
- btrfs_release_path(path);
+ /*
+ * We only log dir index items of a directory, so we don't need to look
+ * for dir item keys.
+ */
di = btrfs_lookup_dir_index_item(trans, log, path, dir_ino,
index, name, name_len, -1);
if (IS_ERR(di)) {
@@ -3626,7 +3614,7 @@ void btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
static noinline int insert_dir_log_key(struct btrfs_trans_handle *trans,
struct btrfs_root *log,
struct btrfs_path *path,
- int key_type, u64 dirid,
+ u64 dirid,
u64 first_offset, u64 last_offset)
{
int ret;
@@ -3635,10 +3623,7 @@ static noinline int insert_dir_log_key(struct btrfs_trans_handle *trans,
key.objectid = dirid;
key.offset = first_offset;
- if (key_type == BTRFS_DIR_ITEM_KEY)
- key.type = BTRFS_DIR_LOG_ITEM_KEY;
- else
- key.type = BTRFS_DIR_LOG_INDEX_KEY;
+ key.type = BTRFS_DIR_LOG_INDEX_KEY;
ret = btrfs_insert_empty_item(trans, log, path, &key, sizeof(*item));
if (ret)
return ret;
@@ -3730,7 +3715,6 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
struct btrfs_inode *inode,
struct btrfs_path *path,
struct btrfs_path *dst_path,
- int key_type,
struct btrfs_log_ctx *ctx)
{
struct btrfs_root *log = inode->root->log_root;
@@ -3738,24 +3722,18 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
const int nritems = btrfs_header_nritems(src);
const u64 ino = btrfs_ino(inode);
const bool inode_logged_before = inode_logged(trans, inode);
- u64 last_logged_key_offset;
bool last_found = false;
int batch_start = 0;
int batch_size = 0;
int i;
- if (key_type == BTRFS_DIR_ITEM_KEY)
- last_logged_key_offset = inode->last_dir_item_offset;
- else
- last_logged_key_offset = inode->last_dir_index_offset;
-
for (i = path->slots[0]; i < nritems; i++) {
struct btrfs_key key;
int ret;
btrfs_item_key_to_cpu(src, &key, i);
- if (key.objectid != ino || key.type != key_type) {
+ if (key.objectid != ino || key.type != BTRFS_DIR_INDEX_KEY) {
last_found = true;
break;
}
@@ -3804,7 +3782,7 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
* we logged is in the log tree, saving time and avoiding adding
* contention on the log tree.
*/
- if (key.offset > last_logged_key_offset)
+ if (key.offset > inode->last_dir_index_offset)
goto add_to_batch;
/*
* Check if the key was already logged before. If not we can add
@@ -3863,7 +3841,7 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
static noinline int log_dir_items(struct btrfs_trans_handle *trans,
struct btrfs_inode *inode,
struct btrfs_path *path,
- struct btrfs_path *dst_path, int key_type,
+ struct btrfs_path *dst_path,
struct btrfs_log_ctx *ctx,
u64 min_offset, u64 *last_offset_ret)
{
@@ -3877,7 +3855,7 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
u64 ino = btrfs_ino(inode);
min_key.objectid = ino;
- min_key.type = key_type;
+ min_key.type = BTRFS_DIR_INDEX_KEY;
min_key.offset = min_offset;
ret = btrfs_search_forward(root, &min_key, path, trans->transid);
@@ -3886,9 +3864,10 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
* we didn't find anything from this transaction, see if there
* is anything at all
*/
- if (ret != 0 || min_key.objectid != ino || min_key.type != key_type) {
+ if (ret != 0 || min_key.objectid != ino ||
+ min_key.type != BTRFS_DIR_INDEX_KEY) {
min_key.objectid = ino;
- min_key.type = key_type;
+ min_key.type = BTRFS_DIR_INDEX_KEY;
min_key.offset = (u64)-1;
btrfs_release_path(path);
ret = btrfs_search_slot(NULL, root, &min_key, path, 0, 0);
@@ -3896,7 +3875,7 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
btrfs_release_path(path);
return ret;
}
- ret = btrfs_previous_item(root, path, ino, key_type);
+ ret = btrfs_previous_item(root, path, ino, BTRFS_DIR_INDEX_KEY);
/* if ret == 0 there are items for this type,
* create a range to tell us the last key of this type.
@@ -3907,18 +3886,18 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
struct btrfs_key tmp;
btrfs_item_key_to_cpu(path->nodes[0], &tmp,
path->slots[0]);
- if (key_type == tmp.type)
+ if (tmp.type == BTRFS_DIR_INDEX_KEY)
first_offset = max(min_offset, tmp.offset) + 1;
}
goto done;
}
/* go backward to find any previous key */
- ret = btrfs_previous_item(root, path, ino, key_type);
+ ret = btrfs_previous_item(root, path, ino, BTRFS_DIR_INDEX_KEY);
if (ret == 0) {
struct btrfs_key tmp;
btrfs_item_key_to_cpu(path->nodes[0], &tmp, path->slots[0]);
- if (key_type == tmp.type) {
+ if (tmp.type == BTRFS_DIR_INDEX_KEY) {
first_offset = tmp.offset;
ret = overwrite_item(trans, log, dst_path,
path->nodes[0], path->slots[0],
@@ -3949,8 +3928,7 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
* from our directory
*/
while (1) {
- ret = process_dir_items_leaf(trans, inode, path, dst_path,
- key_type, ctx);
+ ret = process_dir_items_leaf(trans, inode, path, dst_path, ctx);
if (ret != 0) {
if (ret < 0)
err = ret;
@@ -3971,7 +3949,7 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
goto done;
}
btrfs_item_key_to_cpu(path->nodes[0], &min_key, path->slots[0]);
- if (min_key.objectid != ino || min_key.type != key_type) {
+ if (min_key.objectid != ino || min_key.type != BTRFS_DIR_INDEX_KEY) {
last_offset = (u64)-1;
goto done;
}
@@ -4001,8 +3979,8 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
* insert the log range keys to indicate where the log
* is valid
*/
- ret = insert_dir_log_key(trans, log, path, key_type,
- ino, first_offset, last_offset);
+ ret = insert_dir_log_key(trans, log, path, ino, first_offset,
+ last_offset);
if (ret)
err = ret;
}
@@ -4030,35 +4008,28 @@ static noinline int log_directory_changes(struct btrfs_trans_handle *trans,
u64 min_key;
u64 max_key;
int ret;
- int key_type = BTRFS_DIR_ITEM_KEY;
/*
* If this is the first time we are being logged in the current
* transaction, or we were logged before but the inode was evicted and
- * reloaded later, in which case its logged_trans is 0, reset the values
- * of the last logged key offsets. Note that we don't use the helper
+ * reloaded later, in which case its logged_trans is 0, reset the value
+ * of the last logged key offset. Note that we don't use the helper
* function inode_logged() here - that is because the function returns
* true after an inode eviction, assuming the worst case as it can not
* know for sure if the inode was logged before. So we can not skip key
* searches in the case the inode was evicted, because it may not have
* been logged in this transaction and may have been logged in a past
- * transaction, so we need to reset the last dir item and index offsets
- * to (u64)-1.
+ * transaction, so we need to reset the last dir index offset to (u64)-1.
*/
- if (inode->logged_trans != trans->transid) {
- inode->last_dir_item_offset = (u64)-1;
+ if (inode->logged_trans != trans->transid)
inode->last_dir_index_offset = (u64)-1;
- }
-again:
+
min_key = 0;
max_key = 0;
- if (key_type == BTRFS_DIR_ITEM_KEY)
- ctx->last_dir_item_offset = inode->last_dir_item_offset;
- else
- ctx->last_dir_item_offset = inode->last_dir_index_offset;
+ ctx->last_dir_item_offset = inode->last_dir_index_offset;
while (1) {
- ret = log_dir_items(trans, inode, path, dst_path, key_type,
+ ret = log_dir_items(trans, inode, path, dst_path,
ctx, min_key, &max_key);
if (ret)
return ret;
@@ -4067,13 +4038,8 @@ static noinline int log_directory_changes(struct btrfs_trans_handle *trans,
min_key = max_key + 1;
}
- if (key_type == BTRFS_DIR_ITEM_KEY) {
- inode->last_dir_item_offset = ctx->last_dir_item_offset;
- key_type = BTRFS_DIR_INDEX_KEY;
- goto again;
- } else {
- inode->last_dir_index_offset = ctx->last_dir_item_offset;
- }
+ inode->last_dir_index_offset = ctx->last_dir_item_offset;
+
return 0;
}
@@ -5896,18 +5862,12 @@ struct btrfs_dir_list {
* link_to_fixup_dir());
*
* 2) For directories we log with a mode of LOG_INODE_ALL. It's possible that
- * while logging the inode's items new items with keys BTRFS_DIR_ITEM_KEY and
- * BTRFS_DIR_INDEX_KEY are added to fs/subvol tree and the logged inode item
+ * while logging the inode's items new index items (key type
+ * BTRFS_DIR_INDEX_KEY) are added to fs/subvol tree and the logged inode item
* has a size that doesn't match the sum of the lengths of all the logged
- * names. This does not result in a problem because if a dir_item key is
- * logged but its matching dir_index key is not logged, at log replay time we
- * don't use it to replay the respective name (see replay_one_name()). On the
- * other hand if only the dir_index key ends up being logged, the respective
- * name is added to the fs/subvol tree with both the dir_item and dir_index
- * keys created (see replay_one_name()).
- * The directory's inode item with a wrong i_size is not a problem as well,
- * since we don't use it at log replay time to set the i_size in the inode
- * item of the fs/subvol tree (see overwrite_item()).
+ * names - this is ok, not a problem, because at log replay time we set the
+ * directory's i_size to the correct value (see replay_one_name() and
+ * do_overwrite_item()).
*/
static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
@@ -5953,7 +5913,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
goto next_dir_inode;
min_key.objectid = dir_elem->ino;
- min_key.type = BTRFS_DIR_ITEM_KEY;
+ min_key.type = BTRFS_DIR_INDEX_KEY;
min_key.offset = 0;
again:
btrfs_release_path(path);
@@ -5978,7 +5938,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
btrfs_item_key_to_cpu(leaf, &min_key, i);
if (min_key.objectid != dir_elem->ino ||
- min_key.type != BTRFS_DIR_ITEM_KEY)
+ min_key.type != BTRFS_DIR_INDEX_KEY)
goto next_dir_inode;
di = btrfs_item_ptr(leaf, i, struct btrfs_dir_item);
@@ -6792,15 +6752,14 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
* was previously logged, make sure the next log attempt on the directory
* is not skipped and logs the inode again. This is because the log may
* not currently be authoritative for a range including the old
- * BTRFS_DIR_ITEM_KEY and BTRFS_DIR_INDEX_KEY keys, so we want to make
- * sure after a log replay we do not end up with both the new and old
- * dentries around (in case the inode is a directory we would have a
- * directory with two hard links and 2 inode references for different
- * parents). The next log attempt of old_dir will happen at
- * btrfs_log_all_parents(), called through btrfs_log_inode_parent()
- * below, because we have previously set inode->last_unlink_trans to the
- * current transaction ID, either here or at btrfs_record_unlink_dir() in
- * case inode is a directory.
+ * BTRFS_DIR_INDEX_KEY key, so we want to make sure after a log replay we
+ * do not end up with both the new and old dentries around (in case the
+ * inode is a directory we would have a directory with two hard links and
+ * 2 inode references for different parents). The next log attempt of
+ * old_dir will happen at btrfs_log_all_parents(), called through
+ * btrfs_log_inode_parent() below, because we have previously set
+ * inode->last_unlink_trans to the current transaction ID, either here or
+ * at btrfs_record_unlink_dir() in case the inode is a directory.
*/
if (old_dir)
old_dir->logged_trans = 0;
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 6/6] btrfs: remove no longer needed logic for replaying directory deletes
2021-10-25 9:56 [PATCH 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
` (4 preceding siblings ...)
2021-10-25 9:56 ` [PATCH 5/6] btrfs: only copy dir index keys when logging a directory fdmanana
@ 2021-10-25 9:56 ` fdmanana
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
6 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 9:56 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Now that we log only dir index keys when logging a directory, we no longer
need to deal with dir item keys in the log replay code for replaying
directory deletes. This is also true for the case when we replay a log
tree created by a kernel that still logs dir items.
So remove the remaining code of the replay of directory deletes algorithm
that deals with dir item keys.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 158 ++++++++++++++------------------
include/uapi/linux/btrfs_tree.h | 4 +-
2 files changed, 72 insertions(+), 90 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index e49fe86f390e..482f0b18b220 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2202,7 +2202,7 @@ static noinline int replay_one_dir_item(struct btrfs_trans_handle *trans,
*/
static noinline int find_dir_range(struct btrfs_root *root,
struct btrfs_path *path,
- u64 dirid, int key_type,
+ u64 dirid,
u64 *start_ret, u64 *end_ret)
{
struct btrfs_key key;
@@ -2215,7 +2215,7 @@ static noinline int find_dir_range(struct btrfs_root *root,
return 1;
key.objectid = dirid;
- key.type = key_type;
+ key.type = BTRFS_DIR_LOG_INDEX_KEY;
key.offset = *start_ret;
ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
@@ -2229,7 +2229,7 @@ static noinline int find_dir_range(struct btrfs_root *root,
if (ret != 0)
btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
- if (key.type != key_type || key.objectid != dirid) {
+ if (key.type != BTRFS_DIR_LOG_INDEX_KEY || key.objectid != dirid) {
ret = 1;
goto next;
}
@@ -2256,7 +2256,7 @@ static noinline int find_dir_range(struct btrfs_root *root,
btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
- if (key.type != key_type || key.objectid != dirid) {
+ if (key.type != BTRFS_DIR_LOG_INDEX_KEY || key.objectid != dirid) {
ret = 1;
goto out;
}
@@ -2287,95 +2287,82 @@ static noinline int check_item_in_log(struct btrfs_trans_handle *trans,
int ret;
struct extent_buffer *eb;
int slot;
- u32 item_size;
struct btrfs_dir_item *di;
- struct btrfs_dir_item *log_di;
int name_len;
- unsigned long ptr;
- unsigned long ptr_end;
char *name;
- struct inode *inode;
+ struct inode *inode = NULL;
struct btrfs_key location;
-again:
+ /*
+ * Currenly we only log dir index keys. Even if we replay a log created
+ * by an older kernel that logged both dir index and dir item keys, all
+ * we need to do is process the dir index keys, we (and our caller) can
+ * safely ignore dir item keys (key type BTRFS_DIR_ITEM_KEY).
+ */
+ ASSERT(dir_key->type == BTRFS_DIR_INDEX_KEY);
+
eb = path->nodes[0];
slot = path->slots[0];
- item_size = btrfs_item_size_nr(eb, slot);
- ptr = btrfs_item_ptr_offset(eb, slot);
- ptr_end = ptr + item_size;
- while (ptr < ptr_end) {
- di = (struct btrfs_dir_item *)ptr;
- name_len = btrfs_dir_name_len(eb, di);
- name = kmalloc(name_len, GFP_NOFS);
- if (!name) {
- ret = -ENOMEM;
- goto out;
- }
- read_extent_buffer(eb, name, (unsigned long)(di + 1),
- name_len);
- log_di = NULL;
- if (log && dir_key->type == BTRFS_DIR_ITEM_KEY) {
- log_di = btrfs_lookup_dir_item(trans, log, log_path,
- dir_key->objectid,
- name, name_len, 0);
- } else if (log && dir_key->type == BTRFS_DIR_INDEX_KEY) {
- log_di = btrfs_lookup_dir_index_item(trans, log,
- log_path,
- dir_key->objectid,
- dir_key->offset,
- name, name_len, 0);
- }
- if (!log_di) {
- btrfs_dir_item_key_to_cpu(eb, di, &location);
- btrfs_release_path(path);
- btrfs_release_path(log_path);
- inode = read_one_inode(root, location.objectid);
- if (!inode) {
- kfree(name);
- return -EIO;
- }
+ di = btrfs_item_ptr(eb, slot, struct btrfs_dir_item);
+ name_len = btrfs_dir_name_len(eb, di);
+ name = kmalloc(name_len, GFP_NOFS);
+ if (!name) {
+ ret = -ENOMEM;
+ goto out;
+ }
- ret = link_to_fixup_dir(trans, root,
- path, location.objectid);
- if (ret) {
- kfree(name);
- iput(inode);
- goto out;
- }
+ read_extent_buffer(eb, name, (unsigned long)(di + 1), name_len);
- inc_nlink(inode);
- ret = btrfs_unlink_inode(trans, BTRFS_I(dir),
- BTRFS_I(inode), name, name_len);
- if (!ret)
- ret = btrfs_run_delayed_items(trans);
- kfree(name);
- iput(inode);
- if (ret)
- goto out;
+ if (log) {
+ struct btrfs_dir_item *log_di;
- /* there might still be more names under this key
- * check and repeat if required
- */
- ret = btrfs_search_slot(NULL, root, dir_key, path,
- 0, 0);
- if (ret == 0)
- goto again;
+ log_di = btrfs_lookup_dir_index_item(trans, log, log_path,
+ dir_key->objectid,
+ dir_key->offset,
+ name, name_len, 0);
+ if (IS_ERR(log_di)) {
+ ret = PTR_ERR(log_di);
+ goto out;
+ } else if (log_di) {
+ /* The dentry exists in the log, we have nothing to do. */
ret = 0;
goto out;
- } else if (IS_ERR(log_di)) {
- kfree(name);
- return PTR_ERR(log_di);
}
- btrfs_release_path(log_path);
- kfree(name);
+ }
- ptr = (unsigned long)(di + 1);
- ptr += name_len;
+ btrfs_dir_item_key_to_cpu(eb, di, &location);
+ btrfs_release_path(path);
+ btrfs_release_path(log_path);
+ inode = read_one_inode(root, location.objectid);
+ if (!inode) {
+ ret = -EIO;
+ goto out;
}
- ret = 0;
+
+ ret = link_to_fixup_dir(trans, root, path, location.objectid);
+ if (ret)
+ goto out;
+
+ inc_nlink(inode);
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(inode), name,
+ name_len);
+ if (ret)
+ goto out;
+
+ ret = btrfs_run_delayed_items(trans);
+ if (ret)
+ goto out;
+
+ /*
+ * Unlike dir item keys, dir index keys can only have one name (entry) in
+ * them, as there are no key collisions since each key has a unique offset
+ * (an index number), so we're done.
+ */
out:
btrfs_release_path(path);
btrfs_release_path(log_path);
+ kfree(name);
+ iput(inode);
return ret;
}
@@ -2495,7 +2482,6 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
{
u64 range_start;
u64 range_end;
- int key_type = BTRFS_DIR_LOG_ITEM_KEY;
int ret = 0;
struct btrfs_key dir_key;
struct btrfs_key found_key;
@@ -2503,7 +2489,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
struct inode *dir;
dir_key.objectid = dirid;
- dir_key.type = BTRFS_DIR_ITEM_KEY;
+ dir_key.type = BTRFS_DIR_INDEX_KEY;
log_path = btrfs_alloc_path();
if (!log_path)
return -ENOMEM;
@@ -2517,14 +2503,14 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
btrfs_free_path(log_path);
return 0;
}
-again:
+
range_start = 0;
range_end = 0;
while (1) {
if (del_all)
range_end = (u64)-1;
else {
- ret = find_dir_range(log, path, dirid, key_type,
+ ret = find_dir_range(log, path, dirid,
&range_start, &range_end);
if (ret < 0)
goto out;
@@ -2551,8 +2537,10 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
btrfs_item_key_to_cpu(path->nodes[0], &found_key,
path->slots[0]);
if (found_key.objectid != dirid ||
- found_key.type != dir_key.type)
- goto next_type;
+ found_key.type != dir_key.type) {
+ ret = 0;
+ goto out;
+ }
if (found_key.offset > range_end)
break;
@@ -2571,15 +2559,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
break;
range_start = range_end + 1;
}
-
-next_type:
ret = 0;
- if (key_type == BTRFS_DIR_LOG_ITEM_KEY) {
- key_type = BTRFS_DIR_LOG_INDEX_KEY;
- dir_key.type = BTRFS_DIR_INDEX_KEY;
- btrfs_release_path(path);
- goto again;
- }
out:
btrfs_release_path(path);
btrfs_free_path(log_path);
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index e1c4c732aaba..5416f1f1a77a 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -146,7 +146,9 @@
/*
* dir items are the name -> inode pointers in a directory. There is one
- * for every name in a directory.
+ * for every name in a directory. BTRFS_DIR_LOG_ITEM_KEY is no longer used
+ * but it's still defined here for documentation purposes and to help avoid
+ * having its numerical value reused in the future.
*/
#define BTRFS_DIR_LOG_ITEM_KEY 60
#define BTRFS_DIR_LOG_INDEX_KEY 72
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 5/6] btrfs: only copy dir index keys when logging a directory
2021-10-25 9:56 ` [PATCH 5/6] btrfs: only copy dir index keys when logging a directory fdmanana
@ 2021-10-25 15:36 ` Josef Bacik
2021-10-25 16:01 ` Filipe Manana
0 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2021-10-25 15:36 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Mon, Oct 25, 2021 at 10:56:25AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> Currently, when logging a directory, we copy both dir items and dir index
> items from the fs/subvolume tree to the log tree. Both items have exactly
> the same data (same struct btrfs_dir_item), the difference lies in the key
> values, where a dir index key contains the index number of a directory
> entry while the dir item key does not, as it's used for doing fast lookups
> of an entry by name, while the former is used for sorting entries when
> listing a directory.
>
> We can exploit that and log only the dir index items, since they contain
> all the information needed to correctly add, replace and delete directory
> entries when replaying a log tree. Logging only the dir index items is
> also backward and forward compatible: an unpatched kernel (without this
> change) can correctly replay a log tree generated by a patched kernel
> (with this patch), and a patched kernel can correctly replay a log tree
> generated by an unpatched kernel.
>
This took me a very long time to grok, so it deserves more explanation.
The problem I had was how this worked in general, and I was missing the fact
that we're only calling drop_dir_item() if we find the name in the root,
otherwise we either goto insert if we're DIR_INDEX or bail if we're DIR_ITEM.
So whichever we find first in the log, we call drop_dir_item() only if there's a
conflict. Then the heavy work is done once we find the DIR_INDEX item.
Which means that the only work we do if we find DIR_ITEM is drop_dir_item(),
which we do in the case if we find DIR_INDEX and the item is there.
This is why we don't actually need the DIR_ITEM to properly replay the log for
older kernels, because DIR_INDEX does the same work, and actually does the heavy
lifting of adding the BACKREF's and such.
This took probably 30-45 minutes for me to work out, and I'm only 90% sure I
have it right, so an explanation as to why it's ok for older kernels would be
very helpful. Thanks,
Josef
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/6] btrfs: only copy dir index keys when logging a directory
2021-10-25 15:36 ` Josef Bacik
@ 2021-10-25 16:01 ` Filipe Manana
2021-10-25 16:06 ` Josef Bacik
0 siblings, 1 reply; 21+ messages in thread
From: Filipe Manana @ 2021-10-25 16:01 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs
On Mon, Oct 25, 2021 at 4:36 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Mon, Oct 25, 2021 at 10:56:25AM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Currently, when logging a directory, we copy both dir items and dir index
> > items from the fs/subvolume tree to the log tree. Both items have exactly
> > the same data (same struct btrfs_dir_item), the difference lies in the key
> > values, where a dir index key contains the index number of a directory
> > entry while the dir item key does not, as it's used for doing fast lookups
> > of an entry by name, while the former is used for sorting entries when
> > listing a directory.
> >
> > We can exploit that and log only the dir index items, since they contain
> > all the information needed to correctly add, replace and delete directory
> > entries when replaying a log tree. Logging only the dir index items is
> > also backward and forward compatible: an unpatched kernel (without this
> > change) can correctly replay a log tree generated by a patched kernel
> > (with this patch), and a patched kernel can correctly replay a log tree
> > generated by an unpatched kernel.
> >
>
> This took me a very long time to grok, so it deserves more explanation.
>
> The problem I had was how this worked in general, and I was missing the fact
> that we're only calling drop_dir_item() if we find the name in the root,
> otherwise we either goto insert if we're DIR_INDEX or bail if we're DIR_ITEM.
>
> So whichever we find first in the log, we call drop_dir_item() only if there's a
> conflict. Then the heavy work is done once we find the DIR_INDEX item.
>
> Which means that the only work we do if we find DIR_ITEM is drop_dir_item(),
> which we do in the case if we find DIR_INDEX and the item is there.
>
> This is why we don't actually need the DIR_ITEM to properly replay the log for
> older kernels, because DIR_INDEX does the same work, and actually does the heavy
> lifting of adding the BACKREF's and such.
>
> This took probably 30-45 minutes for me to work out, and I'm only 90% sure I
> have it right, so an explanation as to why it's ok for older kernels would be
> very helpful. Thanks,
Is the following fine for you?
The backward compatibility is ensured because:
1) For inserting a new dentry: a dentry is only inserted when we find a
new dir index key - we can only insert if we know the dir index offset,
which is encoded in the dir index key's offset;
2) For deleting dentries: during log replay, before adding or replacing
dentries, we first replay dentry deletions. Whenever we find a dir item
key or a dir index key in the subvolume/fs tree that is not logged in
a range for which the log tree is authoritative, we do the unlink of
the dentry, which removes both the existing dir item key and the dir
index key. Therefore logging just dir index keys is enough to ensure
dentry deletions are correctly replayed;
3) For dentry replacements: they work when we log only dir index keys
and this is due to a combination of 1) and 2). If we replace a
dentry with name "foobar" to point from inode A to inode B, then we
know the dir index key for the new dentry is different from the old
one, as it has an index number (key offset) larger than the old one.
This results in replaying a deletion, through replay_dir_deletes(),
that causes the old dentry to be removed, both the dir item key and
the dir index key, as mentioned at 2). Then when processing the new
dir index key, we add the new dentry, adding both a new dir item key
and a new index key pointing to inode B, as stated in 1).
>
> Josef
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/6] btrfs: only copy dir index keys when logging a directory
2021-10-25 16:01 ` Filipe Manana
@ 2021-10-25 16:06 ` Josef Bacik
0 siblings, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-10-25 16:06 UTC (permalink / raw)
To: Filipe Manana; +Cc: linux-btrfs
On Mon, Oct 25, 2021 at 05:01:12PM +0100, Filipe Manana wrote:
> On Mon, Oct 25, 2021 at 4:36 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Mon, Oct 25, 2021 at 10:56:25AM +0100, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > Currently, when logging a directory, we copy both dir items and dir index
> > > items from the fs/subvolume tree to the log tree. Both items have exactly
> > > the same data (same struct btrfs_dir_item), the difference lies in the key
> > > values, where a dir index key contains the index number of a directory
> > > entry while the dir item key does not, as it's used for doing fast lookups
> > > of an entry by name, while the former is used for sorting entries when
> > > listing a directory.
> > >
> > > We can exploit that and log only the dir index items, since they contain
> > > all the information needed to correctly add, replace and delete directory
> > > entries when replaying a log tree. Logging only the dir index items is
> > > also backward and forward compatible: an unpatched kernel (without this
> > > change) can correctly replay a log tree generated by a patched kernel
> > > (with this patch), and a patched kernel can correctly replay a log tree
> > > generated by an unpatched kernel.
> > >
> >
> > This took me a very long time to grok, so it deserves more explanation.
> >
> > The problem I had was how this worked in general, and I was missing the fact
> > that we're only calling drop_dir_item() if we find the name in the root,
> > otherwise we either goto insert if we're DIR_INDEX or bail if we're DIR_ITEM.
> >
> > So whichever we find first in the log, we call drop_dir_item() only if there's a
> > conflict. Then the heavy work is done once we find the DIR_INDEX item.
> >
> > Which means that the only work we do if we find DIR_ITEM is drop_dir_item(),
> > which we do in the case if we find DIR_INDEX and the item is there.
> >
> > This is why we don't actually need the DIR_ITEM to properly replay the log for
> > older kernels, because DIR_INDEX does the same work, and actually does the heavy
> > lifting of adding the BACKREF's and such.
> >
> > This took probably 30-45 minutes for me to work out, and I'm only 90% sure I
> > have it right, so an explanation as to why it's ok for older kernels would be
> > very helpful. Thanks,
>
> Is the following fine for you?
>
> The backward compatibility is ensured because:
>
> 1) For inserting a new dentry: a dentry is only inserted when we find a
> new dir index key - we can only insert if we know the dir index offset,
> which is encoded in the dir index key's offset;
>
> 2) For deleting dentries: during log replay, before adding or replacing
> dentries, we first replay dentry deletions. Whenever we find a dir item
> key or a dir index key in the subvolume/fs tree that is not logged in
> a range for which the log tree is authoritative, we do the unlink of
> the dentry, which removes both the existing dir item key and the dir
> index key. Therefore logging just dir index keys is enough to ensure
> dentry deletions are correctly replayed;
>
> 3) For dentry replacements: they work when we log only dir index keys
> and this is due to a combination of 1) and 2). If we replace a
> dentry with name "foobar" to point from inode A to inode B, then we
> know the dir index key for the new dentry is different from the old
> one, as it has an index number (key offset) larger than the old one.
> This results in replaying a deletion, through replay_dir_deletes(),
> that causes the old dentry to be removed, both the dir item key and
> the dir index key, as mentioned at 2). Then when processing the new
> dir index key, we add the new dentry, adding both a new dir item key
> and a new index key pointing to inode B, as stated in 1).
>
Yup this works for me, thanks,
Josef
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only
2021-10-25 9:56 [PATCH 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
` (5 preceding siblings ...)
2021-10-25 9:56 ` [PATCH 6/6] btrfs: remove no longer needed logic for replaying directory deletes fdmanana
@ 2021-10-25 16:31 ` fdmanana
2021-10-25 16:31 ` [PATCH v2 1/6] btrfs: remove root argument from drop_one_dir_item() fdmanana
` (8 more replies)
6 siblings, 9 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 16:31 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, Filipe Manana
From: Filipe Manana <fdmanana@suse.com>
This patchset reworks directory logging to make it copy only the dir index
keys, instead of copying both dir index keys and dir item keys, as both have
the same type of information. This reduces the amount of logged metadata by
about half, and therefore we do about half of the cpu bound work, half of
the IO and use less log tree space (except for very small directories).
This will allow other optimizations to build on top, some of which are only
possible after this change, while others become easier and less cumbersome
to implement after this change. Performance tests are in the changelog of
patch 5/6. Patch 6/6 only removes code that deals with dir item keys when
replaying directory deletions and it could have been squashed into patch
5/6, but since that one is already large and works without 6/6, I opted
to make it separate to make it easier to review.
Also, after this change we are still able to correctly replay a log tree
generated by an old kernel, and an old kernel is also able to correctly
replay a log tree generated by a kernel that has this patchset applied.
I'm sending this close to the 5.16 merge window, but my intention is to
have it only for the next merge window (5.17).
V2: Updated changelog of patch 5/6 to make it more clear why backward
and forward compatibility are guaranteed.
Filipe Manana (6):
btrfs: remove root argument from drop_one_dir_item()
btrfs: remove root argument from btrfs_unlink_inode()
btrfs: remove root argument from add_link()
btrfs: remove root argument from check_item_in_log()
btrfs: only copy dir index keys when logging a directory
btrfs: remove no longer needed logic for replaying directory deletes
fs/btrfs/btrfs_inode.h | 18 +-
fs/btrfs/ctree.h | 1 -
fs/btrfs/inode.c | 25 +-
fs/btrfs/tree-log.c | 582 ++++++++++++++------------------
include/uapi/linux/btrfs_tree.h | 4 +-
5 files changed, 280 insertions(+), 350 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 1/6] btrfs: remove root argument from drop_one_dir_item()
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
@ 2021-10-25 16:31 ` fdmanana
2021-10-25 16:31 ` [PATCH v2 2/6] btrfs: remove root argument from btrfs_unlink_inode() fdmanana
` (7 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 16:31 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, Filipe Manana
From: Filipe Manana <fdmanana@suse.com>
The root argument for drop_one_dir_item() always matches the root of the
given directory inode, since each log tree is associated to one and only
one subvolume/root, so remove the argument.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 7c900eb27cf8..23f1a35ea04f 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -921,11 +921,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
* item
*/
static noinline int drop_one_dir_item(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
struct btrfs_path *path,
struct btrfs_inode *dir,
struct btrfs_dir_item *di)
{
+ struct btrfs_root *root = dir->root;
struct inode *inode;
char *name;
int name_len;
@@ -1220,7 +1220,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
if (IS_ERR(di)) {
return PTR_ERR(di);
} else if (di) {
- ret = drop_one_dir_item(trans, root, path, dir, di);
+ ret = drop_one_dir_item(trans, path, dir, di);
if (ret)
return ret;
}
@@ -1232,7 +1232,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
if (IS_ERR(di)) {
return PTR_ERR(di);
} else if (di) {
- ret = drop_one_dir_item(trans, root, path, dir, di);
+ ret = drop_one_dir_item(trans, path, dir, di);
if (ret)
return ret;
}
@@ -2049,7 +2049,7 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
if (!exists)
goto out;
- ret = drop_one_dir_item(trans, root, path, BTRFS_I(dir), dst_di);
+ ret = drop_one_dir_item(trans, path, BTRFS_I(dir), dst_di);
if (ret)
goto out;
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 2/6] btrfs: remove root argument from btrfs_unlink_inode()
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
2021-10-25 16:31 ` [PATCH v2 1/6] btrfs: remove root argument from drop_one_dir_item() fdmanana
@ 2021-10-25 16:31 ` fdmanana
2021-10-25 16:31 ` [PATCH v2 3/6] btrfs: remove root argument from add_link() fdmanana
` (6 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 16:31 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, Filipe Manana
From: Filipe Manana <fdmanana@suse.com>
The root argument passed to btrfs_unlink_inode() and its callee,
__btrfs_unlink_inode(), always matches the root of the given directory and
the given inode. So remove the argument and make __btrfs_unlink_inode()
use the root of the directory.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/ctree.h | 1 -
fs/btrfs/inode.c | 25 +++++++++++--------------
fs/btrfs/tree-log.c | 14 +++++++-------
3 files changed, 18 insertions(+), 22 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index dc6a6173a2c0..7553e9dc5f93 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3195,7 +3195,6 @@ void __btrfs_del_delalloc_inode(struct btrfs_root *root,
struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry);
int btrfs_set_inode_index(struct btrfs_inode *dir, u64 *index);
int btrfs_unlink_inode(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
struct btrfs_inode *dir, struct btrfs_inode *inode,
const char *name, int name_len);
int btrfs_add_link(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3032893efee5..5fec009fbe63 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4056,11 +4056,11 @@ int btrfs_update_inode_fallback(struct btrfs_trans_handle *trans,
* also drops the back refs in the inode to the directory
*/
static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
struct btrfs_inode *dir,
struct btrfs_inode *inode,
const char *name, int name_len)
{
+ struct btrfs_root *root = dir->root;
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_path *path;
int ret = 0;
@@ -4150,15 +4150,14 @@ static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans,
}
int btrfs_unlink_inode(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
struct btrfs_inode *dir, struct btrfs_inode *inode,
const char *name, int name_len)
{
int ret;
- ret = __btrfs_unlink_inode(trans, root, dir, inode, name, name_len);
+ ret = __btrfs_unlink_inode(trans, dir, inode, name, name_len);
if (!ret) {
drop_nlink(&inode->vfs_inode);
- ret = btrfs_update_inode(trans, root, inode);
+ ret = btrfs_update_inode(trans, inode->root, inode);
}
return ret;
}
@@ -4187,7 +4186,6 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir)
static int btrfs_unlink(struct inode *dir, struct dentry *dentry)
{
- struct btrfs_root *root = BTRFS_I(dir)->root;
struct btrfs_trans_handle *trans;
struct inode *inode = d_inode(dentry);
int ret;
@@ -4199,7 +4197,7 @@ static int btrfs_unlink(struct inode *dir, struct dentry *dentry)
btrfs_record_unlink_dir(trans, BTRFS_I(dir), BTRFS_I(d_inode(dentry)),
0);
- ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir),
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir),
BTRFS_I(d_inode(dentry)), dentry->d_name.name,
dentry->d_name.len);
if (ret)
@@ -4213,7 +4211,7 @@ static int btrfs_unlink(struct inode *dir, struct dentry *dentry)
out:
btrfs_end_transaction(trans);
- btrfs_btree_balance_dirty(root->fs_info);
+ btrfs_btree_balance_dirty(BTRFS_I(dir)->root->fs_info);
return ret;
}
@@ -4564,7 +4562,6 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
{
struct inode *inode = d_inode(dentry);
int err = 0;
- struct btrfs_root *root = BTRFS_I(dir)->root;
struct btrfs_trans_handle *trans;
u64 last_unlink_trans;
@@ -4589,7 +4586,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
last_unlink_trans = BTRFS_I(inode)->last_unlink_trans;
/* now the directory is empty */
- err = btrfs_unlink_inode(trans, root, BTRFS_I(dir),
+ err = btrfs_unlink_inode(trans, BTRFS_I(dir),
BTRFS_I(d_inode(dentry)), dentry->d_name.name,
dentry->d_name.len);
if (!err) {
@@ -4610,7 +4607,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
}
out:
btrfs_end_transaction(trans);
- btrfs_btree_balance_dirty(root->fs_info);
+ btrfs_btree_balance_dirty(BTRFS_I(dir)->root->fs_info);
return err;
}
@@ -9468,7 +9465,7 @@ static int btrfs_rename_exchange(struct inode *old_dir,
if (old_ino == BTRFS_FIRST_FREE_OBJECTID) {
ret = btrfs_unlink_subvol(trans, old_dir, old_dentry);
} else { /* src is an inode */
- ret = __btrfs_unlink_inode(trans, root, BTRFS_I(old_dir),
+ ret = __btrfs_unlink_inode(trans, BTRFS_I(old_dir),
BTRFS_I(old_dentry->d_inode),
old_dentry->d_name.name,
old_dentry->d_name.len);
@@ -9484,7 +9481,7 @@ static int btrfs_rename_exchange(struct inode *old_dir,
if (new_ino == BTRFS_FIRST_FREE_OBJECTID) {
ret = btrfs_unlink_subvol(trans, new_dir, new_dentry);
} else { /* dest is an inode */
- ret = __btrfs_unlink_inode(trans, dest, BTRFS_I(new_dir),
+ ret = __btrfs_unlink_inode(trans, BTRFS_I(new_dir),
BTRFS_I(new_dentry->d_inode),
new_dentry->d_name.name,
new_dentry->d_name.len);
@@ -9759,7 +9756,7 @@ static int btrfs_rename(struct user_namespace *mnt_userns,
*/
btrfs_pin_log_trans(root);
log_pinned = true;
- ret = __btrfs_unlink_inode(trans, root, BTRFS_I(old_dir),
+ ret = __btrfs_unlink_inode(trans, BTRFS_I(old_dir),
BTRFS_I(d_inode(old_dentry)),
old_dentry->d_name.name,
old_dentry->d_name.len);
@@ -9779,7 +9776,7 @@ static int btrfs_rename(struct user_namespace *mnt_userns,
ret = btrfs_unlink_subvol(trans, new_dir, new_dentry);
BUG_ON(new_inode->i_nlink == 0);
} else {
- ret = btrfs_unlink_inode(trans, dest, BTRFS_I(new_dir),
+ ret = btrfs_unlink_inode(trans, BTRFS_I(new_dir),
BTRFS_I(d_inode(new_dentry)),
new_dentry->d_name.name,
new_dentry->d_name.len);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 23f1a35ea04f..568509371b68 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -954,7 +954,7 @@ static noinline int drop_one_dir_item(struct btrfs_trans_handle *trans,
if (ret)
goto out;
- ret = btrfs_unlink_inode(trans, root, dir, BTRFS_I(inode), name,
+ ret = btrfs_unlink_inode(trans, dir, BTRFS_I(inode), name,
name_len);
if (ret)
goto out;
@@ -1119,7 +1119,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
inc_nlink(&inode->vfs_inode);
btrfs_release_path(path);
- ret = btrfs_unlink_inode(trans, root, dir, inode,
+ ret = btrfs_unlink_inode(trans, dir, inode,
victim_name, victim_name_len);
kfree(victim_name);
if (ret)
@@ -1190,7 +1190,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
inc_nlink(&inode->vfs_inode);
btrfs_release_path(path);
- ret = btrfs_unlink_inode(trans, root,
+ ret = btrfs_unlink_inode(trans,
BTRFS_I(victim_parent),
inode,
victim_name,
@@ -1352,7 +1352,7 @@ static int unlink_old_inode_refs(struct btrfs_trans_handle *trans,
kfree(name);
goto out;
}
- ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir),
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir),
inode, name, namelen);
kfree(name);
iput(dir);
@@ -1450,7 +1450,7 @@ static int add_link(struct btrfs_trans_handle *trans, struct btrfs_root *root,
ret = -ENOENT;
goto out;
}
- ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir), BTRFS_I(other_inode),
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(other_inode),
name, namelen);
if (ret)
goto out;
@@ -1596,7 +1596,7 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
ret = btrfs_inode_ref_exists(inode, dir, key->type,
name, namelen);
if (ret > 0) {
- ret = btrfs_unlink_inode(trans, root,
+ ret = btrfs_unlink_inode(trans,
BTRFS_I(dir),
BTRFS_I(inode),
name, namelen);
@@ -2346,7 +2346,7 @@ static noinline int check_item_in_log(struct btrfs_trans_handle *trans,
}
inc_nlink(inode);
- ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir),
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir),
BTRFS_I(inode), name, name_len);
if (!ret)
ret = btrfs_run_delayed_items(trans);
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 3/6] btrfs: remove root argument from add_link()
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
2021-10-25 16:31 ` [PATCH v2 1/6] btrfs: remove root argument from drop_one_dir_item() fdmanana
2021-10-25 16:31 ` [PATCH v2 2/6] btrfs: remove root argument from btrfs_unlink_inode() fdmanana
@ 2021-10-25 16:31 ` fdmanana
2021-10-25 16:31 ` [PATCH v2 4/6] btrfs: remove root argument from check_item_in_log() fdmanana
` (5 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 16:31 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, Filipe Manana
From: Filipe Manana <fdmanana@suse.com>
The root argument for tree-log.c:add_link() always matches the root of the
given directory and the given inode, so it can eliminated.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 568509371b68..b753b3c87e14 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1413,10 +1413,11 @@ static int btrfs_inode_ref_exists(struct inode *inode, struct inode *dir,
return ret;
}
-static int add_link(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+static int add_link(struct btrfs_trans_handle *trans,
struct inode *dir, struct inode *inode, const char *name,
int namelen, u64 ref_index)
{
+ struct btrfs_root *root = BTRFS_I(dir)->root;
struct btrfs_dir_item *dir_item;
struct btrfs_key key;
struct btrfs_path *path;
@@ -1612,7 +1613,7 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
goto out;
/* insert our name */
- ret = add_link(trans, root, dir, inode, name, namelen,
+ ret = add_link(trans, dir, inode, name, namelen,
ref_index);
if (ret)
goto out;
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 4/6] btrfs: remove root argument from check_item_in_log()
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
` (2 preceding siblings ...)
2021-10-25 16:31 ` [PATCH v2 3/6] btrfs: remove root argument from add_link() fdmanana
@ 2021-10-25 16:31 ` fdmanana
2021-10-25 16:31 ` [PATCH v2 5/6] btrfs: only copy dir index keys when logging a directory fdmanana
` (4 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 16:31 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, Filipe Manana
From: Filipe Manana <fdmanana@suse.com>
The root argument passed to check_item_in_log() always matches the root
of the given directory, so it can be eliminated.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index b753b3c87e14..8ab33caf016f 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2280,13 +2280,13 @@ static noinline int find_dir_range(struct btrfs_root *root,
* to is unlinked
*/
static noinline int check_item_in_log(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
struct btrfs_root *log,
struct btrfs_path *path,
struct btrfs_path *log_path,
struct inode *dir,
struct btrfs_key *dir_key)
{
+ struct btrfs_root *root = BTRFS_I(dir)->root;
int ret;
struct extent_buffer *eb;
int slot;
@@ -2560,7 +2560,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
if (found_key.offset > range_end)
break;
- ret = check_item_in_log(trans, root, log, path,
+ ret = check_item_in_log(trans, log, path,
log_path, dir,
&found_key);
if (ret)
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 5/6] btrfs: only copy dir index keys when logging a directory
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
` (3 preceding siblings ...)
2021-10-25 16:31 ` [PATCH v2 4/6] btrfs: remove root argument from check_item_in_log() fdmanana
@ 2021-10-25 16:31 ` fdmanana
2021-10-25 16:31 ` [PATCH v2 6/6] btrfs: remove no longer needed logic for replaying directory deletes fdmanana
` (3 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 16:31 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, Filipe Manana
From: Filipe Manana <fdmanana@suse.com>
Currently, when logging a directory, we copy both dir items and dir index
items from the fs/subvolume tree to the log tree. Both items have exactly
the same data (same struct btrfs_dir_item), the difference lies in the key
values, where a dir index key contains the index number of a directory
entry while the dir item key does not, as it's used for doing fast lookups
of an entry by name, while the former is used for sorting entries when
listing a directory.
We can exploit that and log only the dir index items, since they contain
all the information needed to correctly add, replace and delete directory
entries when replaying a log tree. Logging only the dir index items is
also backward and forward compatible: an unpatched kernel (without this
change) can correctly replay a log tree generated by a patched kernel
(with this patch), and a patched kernel can correctly replay a log tree
generated by an unpatched kernel.
The backward compatibility is ensured because:
1) For inserting a new dentry: a dentry is only inserted when we find a
new dir index key - we can only insert if we know the dir index offset,
which is encoded in the dir index key's offset;
2) For deleting dentries: during log replay, before adding or replacing
dentries, we first replay dentry deletions. Whenever we find a dir item
key or a dir index key in the subvolume/fs tree that is not logged in
a range for which the log tree is authoritative, we do the unlink of
the dentry, which removes both the existing dir item key and the dir
index key. Therefore logging just dir index keys is enough to ensure
dentry deletions are correctly replayed;
3) For dentry replacements: they work when we log only dir index keys
and this is mostly due to a combination of 1) and 2). If we replace a
dentry with name "foobar" to point from inode A to inode B, then we
know the dir index key for the new dentry is different from the old
one, as it has an index number (key offset) larger than the old one.
This results in replaying a deletion, through replay_dir_deletes(),
that causes the old dentry to be removed, both the dir item key and
the dir index key, as mentioned at 2). Then when processing the new
dir index key, we add the new dentry, adding both a new dir item key
and a new index key pointing to inode B, as stated in 1).
The forward compatibility, the ability for a patched kernel to replay a
log created by an older, unpatched kernel, comes from the changes required
for making sure we are able to replay a log that only contains dir index
keys - we simply ignore every dir item key we find.
So modify directory logging to log only dir index items, and modify the
log replay process to ignore dir item keys, from log trees created by an
unpatched kernel, and process only with dir index keys. This reduces the
amount of logged metadata by about half, and therefore the time spent
logging or fsyncing large directories (less cpu time and less IO).
The following test script was used to measure this change:
#!/bin/bash
DEV=/dev/nvme0n1
MNT=/mnt/nvme0n1
NUM_NEW_FILES=1000000
NUM_FILE_DELETES=10000
mkfs.btrfs -f $DEV
mount -o ssd $DEV $MNT
mkdir $MNT/testdir
for ((i = 1; i <= $NUM_NEW_FILES; i++)); do
echo -n > $MNT/testdir/file_$i
done
start=$(date +%s%N)
xfs_io -c "fsync" $MNT/testdir
end=$(date +%s%N)
dur=$(( (end - start) / 1000000 ))
echo "dir fsync took $dur ms after adding $NUM_NEW_FILES files"
# sync to force transaction commit and wipeout the log.
sync
del_inc=$(( $NUM_NEW_FILES / $NUM_FILE_DELETES ))
for ((i = 1; i <= $NUM_NEW_FILES; i += $del_inc)); do
rm -f $MNT/testdir/file_$i
done
start=$(date +%s%N)
xfs_io -c "fsync" $MNT/testdir
end=$(date +%s%N)
dur=$(( (end - start) / 1000000 ))
echo "dir fsync took $dur ms after deleting $NUM_FILE_DELETES files"
echo
umount $MNT
The tests were run on a physical machine, with a non-debug kernel (Debian's
default kernel config), for different values of $NUM_NEW_FILES and
$NUM_FILE_DELETES, and the results were the following:
** Before patch, NUM_NEW_FILES = 1 000 000, NUM_DELETE_FILES = 10 000 **
dir fsync took 8412 ms after adding 1000000 files
dir fsync took 500 ms after deleting 10000 files
** After patch, NUM_NEW_FILES = 1 000 000, NUM_DELETE_FILES = 10 000 **
dir fsync took 4252 ms after adding 1000000 files (-49.5%)
dir fsync took 269 ms after deleting 10000 files (-46.2%)
** Before patch, NUM_NEW_FILES = 100 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 745 ms after adding 100000 files
dir fsync took 59 ms after deleting 1000 files
** After patch, NUM_NEW_FILES = 100 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 404 ms after adding 100000 files (-45.8%)
dir fsync took 31 ms after deleting 1000 files (-47.5%)
** Before patch, NUM_NEW_FILES = 10 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 67 ms after adding 10000 files
dir fsync took 9 ms after deleting 1000 files
** After patch, NUM_NEW_FILES = 10 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 36 ms after adding 10000 files (-46.3%)
dir fsync took 5 ms after deleting 1000 files (-44.4%)
** Before patch, NUM_NEW_FILES = 1 000, NUM_DELETE_FILES = 100 **
dir fsync took 9 ms after adding 1000 files
dir fsync took 4 ms after deleting 100 files
** After patch, NUM_NEW_FILES = 1 000, NUM_DELETE_FILES = 100 **
dir fsync took 7 ms after adding 1000 files (-22.2%)
dir fsync took 3 ms after deleting 100 files (-25.0%)
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/btrfs_inode.h | 18 +-
fs/btrfs/tree-log.c | 395 ++++++++++++++++++-----------------------
2 files changed, 182 insertions(+), 231 deletions(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index ab2a4a52e0bb..b3e46aabc3d8 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -138,19 +138,11 @@ struct btrfs_inode {
/* a local copy of root's last_log_commit */
int last_log_commit;
- union {
- /*
- * Total number of bytes pending delalloc, used by stat to
- * calculate the real block usage of the file. This is used
- * only for files.
- */
- u64 delalloc_bytes;
- /*
- * The offset of the last dir item key that was logged.
- * This is used only for directories.
- */
- u64 last_dir_item_offset;
- };
+ /*
+ * Total number of bytes pending delalloc, used by stat to calculate the
+ * real block usage of the file. This is used only for files.
+ */
+ u64 delalloc_bytes;
union {
/*
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 8ab33caf016f..e49fe86f390e 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1949,6 +1949,34 @@ static noinline int insert_one_name(struct btrfs_trans_handle *trans,
return ret;
}
+static int delete_conflicting_dir_entry(struct btrfs_trans_handle *trans,
+ struct btrfs_inode *dir,
+ struct btrfs_path *path,
+ struct btrfs_dir_item *dst_di,
+ const struct btrfs_key *log_key,
+ u8 log_type,
+ bool exists)
+{
+ struct btrfs_key found_key;
+
+ btrfs_dir_item_key_to_cpu(path->nodes[0], dst_di, &found_key);
+ /* The existing dentry points to the same inode, don't delete it. */
+ if (found_key.objectid == log_key->objectid &&
+ found_key.type == log_key->type &&
+ found_key.offset == log_key->offset &&
+ btrfs_dir_type(path->nodes[0], dst_di) == log_type)
+ return 1;
+
+ /*
+ * Don't drop the conflicting directory entry if the inode for the new
+ * entry doesn't exist.
+ */
+ if (!exists)
+ return 0;
+
+ return drop_one_dir_item(trans, path, dir, dst_di);
+}
+
/*
* take a single entry in a log directory item and replay it into
* the subvolume.
@@ -1974,14 +2002,17 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
{
char *name;
int name_len;
- struct btrfs_dir_item *dst_di;
- struct btrfs_key found_key;
+ struct btrfs_dir_item *dir_dst_di;
+ struct btrfs_dir_item *index_dst_di;
+ bool dir_dst_matches = false;
+ bool index_dst_matches = false;
struct btrfs_key log_key;
+ struct btrfs_key search_key;
struct inode *dir;
u8 log_type;
bool exists;
int ret;
- bool update_size = (key->type == BTRFS_DIR_INDEX_KEY);
+ bool update_size = true;
bool name_added = false;
dir = read_one_inode(root, key->objectid);
@@ -2007,76 +2038,53 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
exists = (ret == 0);
ret = 0;
- if (key->type == BTRFS_DIR_ITEM_KEY) {
- dst_di = btrfs_lookup_dir_item(trans, root, path, key->objectid,
- name, name_len, 1);
- } else if (key->type == BTRFS_DIR_INDEX_KEY) {
- dst_di = btrfs_lookup_dir_index_item(trans, root, path,
- key->objectid,
- key->offset, name,
- name_len, 1);
- } else {
- /* Corruption */
- ret = -EINVAL;
+ dir_dst_di = btrfs_lookup_dir_item(trans, root, path, key->objectid,
+ name, name_len, 1);
+ if (IS_ERR(dir_dst_di)) {
+ ret = PTR_ERR(dir_dst_di);
goto out;
+ } else if (dir_dst_di) {
+ ret = delete_conflicting_dir_entry(trans, BTRFS_I(dir), path,
+ dir_dst_di, &log_key, log_type,
+ exists);
+ if (ret < 0)
+ goto out;
+ dir_dst_matches = (ret == 1);
}
- if (IS_ERR(dst_di)) {
- ret = PTR_ERR(dst_di);
+ btrfs_release_path(path);
+
+ index_dst_di = btrfs_lookup_dir_index_item(trans, root, path,
+ key->objectid, key->offset,
+ name, name_len, 1);
+ if (IS_ERR(index_dst_di)) {
+ ret = PTR_ERR(index_dst_di);
goto out;
- } else if (!dst_di) {
- /* we need a sequence number to insert, so we only
- * do inserts for the BTRFS_DIR_INDEX_KEY types
- */
- if (key->type != BTRFS_DIR_INDEX_KEY)
+ } else if (index_dst_di) {
+ ret = delete_conflicting_dir_entry(trans, BTRFS_I(dir), path,
+ index_dst_di, &log_key,
+ log_type, exists);
+ if (ret < 0)
goto out;
- goto insert;
+ index_dst_matches = (ret == 1);
}
- btrfs_dir_item_key_to_cpu(path->nodes[0], dst_di, &found_key);
- /* the existing item matches the logged item */
- if (found_key.objectid == log_key.objectid &&
- found_key.type == log_key.type &&
- found_key.offset == log_key.offset &&
- btrfs_dir_type(path->nodes[0], dst_di) == log_type) {
+ btrfs_release_path(path);
+
+ if (dir_dst_matches && index_dst_matches) {
+ ret = 0;
update_size = false;
goto out;
}
- /*
- * don't drop the conflicting directory entry if the inode
- * for the new entry doesn't exist
- */
- if (!exists)
- goto out;
-
- ret = drop_one_dir_item(trans, path, BTRFS_I(dir), dst_di);
- if (ret)
- goto out;
-
- if (key->type == BTRFS_DIR_INDEX_KEY)
- goto insert;
-out:
- btrfs_release_path(path);
- if (!ret && update_size) {
- btrfs_i_size_write(BTRFS_I(dir), dir->i_size + name_len * 2);
- ret = btrfs_update_inode(trans, root, BTRFS_I(dir));
- }
- kfree(name);
- iput(dir);
- if (!ret && name_added)
- ret = 1;
- return ret;
-
-insert:
/*
* Check if the inode reference exists in the log for the given name,
* inode and parent inode
*/
- found_key.objectid = log_key.objectid;
- found_key.type = BTRFS_INODE_REF_KEY;
- found_key.offset = key->objectid;
- ret = backref_in_log(root->log_root, &found_key, 0, name, name_len);
+ search_key.objectid = log_key.objectid;
+ search_key.type = BTRFS_INODE_REF_KEY;
+ search_key.offset = key->objectid;
+ ret = backref_in_log(root->log_root, &search_key, 0, name, name_len);
if (ret < 0) {
goto out;
} else if (ret) {
@@ -2086,10 +2094,10 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
goto out;
}
- found_key.objectid = log_key.objectid;
- found_key.type = BTRFS_INODE_EXTREF_KEY;
- found_key.offset = key->objectid;
- ret = backref_in_log(root->log_root, &found_key, key->objectid, name,
+ search_key.objectid = log_key.objectid;
+ search_key.type = BTRFS_INODE_EXTREF_KEY;
+ search_key.offset = key->objectid;
+ ret = backref_in_log(root->log_root, &search_key, key->objectid, name,
name_len);
if (ret < 0) {
goto out;
@@ -2108,87 +2116,76 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
name_added = true;
update_size = false;
ret = 0;
- goto out;
+
+out:
+ if (!ret && update_size) {
+ btrfs_i_size_write(BTRFS_I(dir), dir->i_size + name_len * 2);
+ ret = btrfs_update_inode(trans, root, BTRFS_I(dir));
+ }
+ kfree(name);
+ iput(dir);
+ if (!ret && name_added)
+ ret = 1;
+ return ret;
}
-/*
- * find all the names in a directory item and reconcile them into
- * the subvolume. Only BTRFS_DIR_ITEM_KEY types will have more than
- * one name in a directory item, but the same code gets used for
- * both directory index types
- */
+/* Replay one dir item from a BTRFS_DIR_INDEX_KEY key. */
static noinline int replay_one_dir_item(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_path *path,
struct extent_buffer *eb, int slot,
struct btrfs_key *key)
{
- int ret = 0;
- u32 item_size = btrfs_item_size_nr(eb, slot);
+ int ret;
struct btrfs_dir_item *di;
- int name_len;
- unsigned long ptr;
- unsigned long ptr_end;
- struct btrfs_path *fixup_path = NULL;
- ptr = btrfs_item_ptr_offset(eb, slot);
- ptr_end = ptr + item_size;
- while (ptr < ptr_end) {
- di = (struct btrfs_dir_item *)ptr;
- name_len = btrfs_dir_name_len(eb, di);
- ret = replay_one_name(trans, root, path, eb, di, key);
- if (ret < 0)
- break;
- ptr = (unsigned long)(di + 1);
- ptr += name_len;
+ /* We only log dir index keys, which only contain a single dir item. */
+ ASSERT(key->type == BTRFS_DIR_INDEX_KEY);
- /*
- * If this entry refers to a non-directory (directories can not
- * have a link count > 1) and it was added in the transaction
- * that was not committed, make sure we fixup the link count of
- * the inode it the entry points to. Otherwise something like
- * the following would result in a directory pointing to an
- * inode with a wrong link that does not account for this dir
- * entry:
- *
- * mkdir testdir
- * touch testdir/foo
- * touch testdir/bar
- * sync
- *
- * ln testdir/bar testdir/bar_link
- * ln testdir/foo testdir/foo_link
- * xfs_io -c "fsync" testdir/bar
- *
- * <power failure>
- *
- * mount fs, log replay happens
- *
- * File foo would remain with a link count of 1 when it has two
- * entries pointing to it in the directory testdir. This would
- * make it impossible to ever delete the parent directory has
- * it would result in stale dentries that can never be deleted.
- */
- if (ret == 1 && btrfs_dir_type(eb, di) != BTRFS_FT_DIR) {
- struct btrfs_key di_key;
+ di = btrfs_item_ptr(eb, slot, struct btrfs_dir_item);
+ ret = replay_one_name(trans, root, path, eb, di, key);
+ if (ret < 0)
+ return ret;
- if (!fixup_path) {
- fixup_path = btrfs_alloc_path();
- if (!fixup_path) {
- ret = -ENOMEM;
- break;
- }
- }
+ /*
+ * If this entry refers to a non-directory (directories can not have a
+ * link count > 1) and it was added in the transaction that was not
+ * committed, make sure we fixup the link count of the inode the entry
+ * points to. Otherwise something like the following would result in a
+ * directory pointing to an inode with a wrong link that does not account
+ * for this dir entry:
+ *
+ * mkdir testdir
+ * touch testdir/foo
+ * touch testdir/bar
+ * sync
+ *
+ * ln testdir/bar testdir/bar_link
+ * ln testdir/foo testdir/foo_link
+ * xfs_io -c "fsync" testdir/bar
+ *
+ * <power failure>
+ *
+ * mount fs, log replay happens
+ *
+ * File foo would remain with a link count of 1 when it has two entries
+ * pointing to it in the directory testdir. This would make it impossible
+ * to ever delete the parent directory has it would result in stale
+ * dentries that can never be deleted.
+ */
+ if (ret == 1 && btrfs_dir_type(eb, di) != BTRFS_FT_DIR) {
+ struct btrfs_path *fixup_path;
+ struct btrfs_key di_key;
- btrfs_dir_item_key_to_cpu(eb, di, &di_key);
- ret = link_to_fixup_dir(trans, root, fixup_path,
- di_key.objectid);
- if (ret)
- break;
- }
- ret = 0;
+ fixup_path = btrfs_alloc_path();
+ if (!fixup_path)
+ return -ENOMEM;
+
+ btrfs_dir_item_key_to_cpu(eb, di, &di_key);
+ ret = link_to_fixup_dir(trans, root, fixup_path, di_key.objectid);
+ btrfs_free_path(fixup_path);
}
- btrfs_free_path(fixup_path);
+
return ret;
}
@@ -2742,12 +2739,13 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
eb, i, &key);
if (ret)
break;
- } else if (key.type == BTRFS_DIR_ITEM_KEY) {
- ret = replay_one_dir_item(wc->trans, root, path,
- eb, i, &key);
- if (ret)
- break;
}
+ /*
+ * We don't log BTRFS_DIR_ITEM_KEY keys anymore, only the
+ * BTRFS_DIR_INDEX_KEY items which we use to derive the
+ * BTRFS_DIR_ITEM_KEY items. If we are replaying a log from an
+ * older kernel with such keys, ignore them.
+ */
}
btrfs_free_path(path);
return ret;
@@ -3549,20 +3547,10 @@ void btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
goto out_unlock;
}
- di = btrfs_lookup_dir_item(trans, log, path, dir_ino,
- name, name_len, -1);
- if (IS_ERR(di)) {
- err = PTR_ERR(di);
- goto fail;
- }
- if (di) {
- ret = btrfs_delete_one_dir_name(trans, log, path, di);
- if (ret) {
- err = ret;
- goto fail;
- }
- }
- btrfs_release_path(path);
+ /*
+ * We only log dir index items of a directory, so we don't need to look
+ * for dir item keys.
+ */
di = btrfs_lookup_dir_index_item(trans, log, path, dir_ino,
index, name, name_len, -1);
if (IS_ERR(di)) {
@@ -3626,7 +3614,7 @@ void btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
static noinline int insert_dir_log_key(struct btrfs_trans_handle *trans,
struct btrfs_root *log,
struct btrfs_path *path,
- int key_type, u64 dirid,
+ u64 dirid,
u64 first_offset, u64 last_offset)
{
int ret;
@@ -3635,10 +3623,7 @@ static noinline int insert_dir_log_key(struct btrfs_trans_handle *trans,
key.objectid = dirid;
key.offset = first_offset;
- if (key_type == BTRFS_DIR_ITEM_KEY)
- key.type = BTRFS_DIR_LOG_ITEM_KEY;
- else
- key.type = BTRFS_DIR_LOG_INDEX_KEY;
+ key.type = BTRFS_DIR_LOG_INDEX_KEY;
ret = btrfs_insert_empty_item(trans, log, path, &key, sizeof(*item));
if (ret)
return ret;
@@ -3730,7 +3715,6 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
struct btrfs_inode *inode,
struct btrfs_path *path,
struct btrfs_path *dst_path,
- int key_type,
struct btrfs_log_ctx *ctx)
{
struct btrfs_root *log = inode->root->log_root;
@@ -3738,24 +3722,18 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
const int nritems = btrfs_header_nritems(src);
const u64 ino = btrfs_ino(inode);
const bool inode_logged_before = inode_logged(trans, inode);
- u64 last_logged_key_offset;
bool last_found = false;
int batch_start = 0;
int batch_size = 0;
int i;
- if (key_type == BTRFS_DIR_ITEM_KEY)
- last_logged_key_offset = inode->last_dir_item_offset;
- else
- last_logged_key_offset = inode->last_dir_index_offset;
-
for (i = path->slots[0]; i < nritems; i++) {
struct btrfs_key key;
int ret;
btrfs_item_key_to_cpu(src, &key, i);
- if (key.objectid != ino || key.type != key_type) {
+ if (key.objectid != ino || key.type != BTRFS_DIR_INDEX_KEY) {
last_found = true;
break;
}
@@ -3804,7 +3782,7 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
* we logged is in the log tree, saving time and avoiding adding
* contention on the log tree.
*/
- if (key.offset > last_logged_key_offset)
+ if (key.offset > inode->last_dir_index_offset)
goto add_to_batch;
/*
* Check if the key was already logged before. If not we can add
@@ -3863,7 +3841,7 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
static noinline int log_dir_items(struct btrfs_trans_handle *trans,
struct btrfs_inode *inode,
struct btrfs_path *path,
- struct btrfs_path *dst_path, int key_type,
+ struct btrfs_path *dst_path,
struct btrfs_log_ctx *ctx,
u64 min_offset, u64 *last_offset_ret)
{
@@ -3877,7 +3855,7 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
u64 ino = btrfs_ino(inode);
min_key.objectid = ino;
- min_key.type = key_type;
+ min_key.type = BTRFS_DIR_INDEX_KEY;
min_key.offset = min_offset;
ret = btrfs_search_forward(root, &min_key, path, trans->transid);
@@ -3886,9 +3864,10 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
* we didn't find anything from this transaction, see if there
* is anything at all
*/
- if (ret != 0 || min_key.objectid != ino || min_key.type != key_type) {
+ if (ret != 0 || min_key.objectid != ino ||
+ min_key.type != BTRFS_DIR_INDEX_KEY) {
min_key.objectid = ino;
- min_key.type = key_type;
+ min_key.type = BTRFS_DIR_INDEX_KEY;
min_key.offset = (u64)-1;
btrfs_release_path(path);
ret = btrfs_search_slot(NULL, root, &min_key, path, 0, 0);
@@ -3896,7 +3875,7 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
btrfs_release_path(path);
return ret;
}
- ret = btrfs_previous_item(root, path, ino, key_type);
+ ret = btrfs_previous_item(root, path, ino, BTRFS_DIR_INDEX_KEY);
/* if ret == 0 there are items for this type,
* create a range to tell us the last key of this type.
@@ -3907,18 +3886,18 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
struct btrfs_key tmp;
btrfs_item_key_to_cpu(path->nodes[0], &tmp,
path->slots[0]);
- if (key_type == tmp.type)
+ if (tmp.type == BTRFS_DIR_INDEX_KEY)
first_offset = max(min_offset, tmp.offset) + 1;
}
goto done;
}
/* go backward to find any previous key */
- ret = btrfs_previous_item(root, path, ino, key_type);
+ ret = btrfs_previous_item(root, path, ino, BTRFS_DIR_INDEX_KEY);
if (ret == 0) {
struct btrfs_key tmp;
btrfs_item_key_to_cpu(path->nodes[0], &tmp, path->slots[0]);
- if (key_type == tmp.type) {
+ if (tmp.type == BTRFS_DIR_INDEX_KEY) {
first_offset = tmp.offset;
ret = overwrite_item(trans, log, dst_path,
path->nodes[0], path->slots[0],
@@ -3949,8 +3928,7 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
* from our directory
*/
while (1) {
- ret = process_dir_items_leaf(trans, inode, path, dst_path,
- key_type, ctx);
+ ret = process_dir_items_leaf(trans, inode, path, dst_path, ctx);
if (ret != 0) {
if (ret < 0)
err = ret;
@@ -3971,7 +3949,7 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
goto done;
}
btrfs_item_key_to_cpu(path->nodes[0], &min_key, path->slots[0]);
- if (min_key.objectid != ino || min_key.type != key_type) {
+ if (min_key.objectid != ino || min_key.type != BTRFS_DIR_INDEX_KEY) {
last_offset = (u64)-1;
goto done;
}
@@ -4001,8 +3979,8 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
* insert the log range keys to indicate where the log
* is valid
*/
- ret = insert_dir_log_key(trans, log, path, key_type,
- ino, first_offset, last_offset);
+ ret = insert_dir_log_key(trans, log, path, ino, first_offset,
+ last_offset);
if (ret)
err = ret;
}
@@ -4030,35 +4008,28 @@ static noinline int log_directory_changes(struct btrfs_trans_handle *trans,
u64 min_key;
u64 max_key;
int ret;
- int key_type = BTRFS_DIR_ITEM_KEY;
/*
* If this is the first time we are being logged in the current
* transaction, or we were logged before but the inode was evicted and
- * reloaded later, in which case its logged_trans is 0, reset the values
- * of the last logged key offsets. Note that we don't use the helper
+ * reloaded later, in which case its logged_trans is 0, reset the value
+ * of the last logged key offset. Note that we don't use the helper
* function inode_logged() here - that is because the function returns
* true after an inode eviction, assuming the worst case as it can not
* know for sure if the inode was logged before. So we can not skip key
* searches in the case the inode was evicted, because it may not have
* been logged in this transaction and may have been logged in a past
- * transaction, so we need to reset the last dir item and index offsets
- * to (u64)-1.
+ * transaction, so we need to reset the last dir index offset to (u64)-1.
*/
- if (inode->logged_trans != trans->transid) {
- inode->last_dir_item_offset = (u64)-1;
+ if (inode->logged_trans != trans->transid)
inode->last_dir_index_offset = (u64)-1;
- }
-again:
+
min_key = 0;
max_key = 0;
- if (key_type == BTRFS_DIR_ITEM_KEY)
- ctx->last_dir_item_offset = inode->last_dir_item_offset;
- else
- ctx->last_dir_item_offset = inode->last_dir_index_offset;
+ ctx->last_dir_item_offset = inode->last_dir_index_offset;
while (1) {
- ret = log_dir_items(trans, inode, path, dst_path, key_type,
+ ret = log_dir_items(trans, inode, path, dst_path,
ctx, min_key, &max_key);
if (ret)
return ret;
@@ -4067,13 +4038,8 @@ static noinline int log_directory_changes(struct btrfs_trans_handle *trans,
min_key = max_key + 1;
}
- if (key_type == BTRFS_DIR_ITEM_KEY) {
- inode->last_dir_item_offset = ctx->last_dir_item_offset;
- key_type = BTRFS_DIR_INDEX_KEY;
- goto again;
- } else {
- inode->last_dir_index_offset = ctx->last_dir_item_offset;
- }
+ inode->last_dir_index_offset = ctx->last_dir_item_offset;
+
return 0;
}
@@ -5896,18 +5862,12 @@ struct btrfs_dir_list {
* link_to_fixup_dir());
*
* 2) For directories we log with a mode of LOG_INODE_ALL. It's possible that
- * while logging the inode's items new items with keys BTRFS_DIR_ITEM_KEY and
- * BTRFS_DIR_INDEX_KEY are added to fs/subvol tree and the logged inode item
+ * while logging the inode's items new index items (key type
+ * BTRFS_DIR_INDEX_KEY) are added to fs/subvol tree and the logged inode item
* has a size that doesn't match the sum of the lengths of all the logged
- * names. This does not result in a problem because if a dir_item key is
- * logged but its matching dir_index key is not logged, at log replay time we
- * don't use it to replay the respective name (see replay_one_name()). On the
- * other hand if only the dir_index key ends up being logged, the respective
- * name is added to the fs/subvol tree with both the dir_item and dir_index
- * keys created (see replay_one_name()).
- * The directory's inode item with a wrong i_size is not a problem as well,
- * since we don't use it at log replay time to set the i_size in the inode
- * item of the fs/subvol tree (see overwrite_item()).
+ * names - this is ok, not a problem, because at log replay time we set the
+ * directory's i_size to the correct value (see replay_one_name() and
+ * do_overwrite_item()).
*/
static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
@@ -5953,7 +5913,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
goto next_dir_inode;
min_key.objectid = dir_elem->ino;
- min_key.type = BTRFS_DIR_ITEM_KEY;
+ min_key.type = BTRFS_DIR_INDEX_KEY;
min_key.offset = 0;
again:
btrfs_release_path(path);
@@ -5978,7 +5938,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
btrfs_item_key_to_cpu(leaf, &min_key, i);
if (min_key.objectid != dir_elem->ino ||
- min_key.type != BTRFS_DIR_ITEM_KEY)
+ min_key.type != BTRFS_DIR_INDEX_KEY)
goto next_dir_inode;
di = btrfs_item_ptr(leaf, i, struct btrfs_dir_item);
@@ -6792,15 +6752,14 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
* was previously logged, make sure the next log attempt on the directory
* is not skipped and logs the inode again. This is because the log may
* not currently be authoritative for a range including the old
- * BTRFS_DIR_ITEM_KEY and BTRFS_DIR_INDEX_KEY keys, so we want to make
- * sure after a log replay we do not end up with both the new and old
- * dentries around (in case the inode is a directory we would have a
- * directory with two hard links and 2 inode references for different
- * parents). The next log attempt of old_dir will happen at
- * btrfs_log_all_parents(), called through btrfs_log_inode_parent()
- * below, because we have previously set inode->last_unlink_trans to the
- * current transaction ID, either here or at btrfs_record_unlink_dir() in
- * case inode is a directory.
+ * BTRFS_DIR_INDEX_KEY key, so we want to make sure after a log replay we
+ * do not end up with both the new and old dentries around (in case the
+ * inode is a directory we would have a directory with two hard links and
+ * 2 inode references for different parents). The next log attempt of
+ * old_dir will happen at btrfs_log_all_parents(), called through
+ * btrfs_log_inode_parent() below, because we have previously set
+ * inode->last_unlink_trans to the current transaction ID, either here or
+ * at btrfs_record_unlink_dir() in case the inode is a directory.
*/
if (old_dir)
old_dir->logged_trans = 0;
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 6/6] btrfs: remove no longer needed logic for replaying directory deletes
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
` (4 preceding siblings ...)
2021-10-25 16:31 ` [PATCH v2 5/6] btrfs: only copy dir index keys when logging a directory fdmanana
@ 2021-10-25 16:31 ` fdmanana
2021-10-25 18:59 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only Josef Bacik
` (2 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: fdmanana @ 2021-10-25 16:31 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, Filipe Manana
From: Filipe Manana <fdmanana@suse.com>
Now that we log only dir index keys when logging a directory, we no longer
need to deal with dir item keys in the log replay code for replaying
directory deletes. This is also true for the case when we replay a log
tree created by a kernel that still logs dir items.
So remove the remaining code of the replay of directory deletes algorithm
that deals with dir item keys.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 158 ++++++++++++++------------------
include/uapi/linux/btrfs_tree.h | 4 +-
2 files changed, 72 insertions(+), 90 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index e49fe86f390e..482f0b18b220 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2202,7 +2202,7 @@ static noinline int replay_one_dir_item(struct btrfs_trans_handle *trans,
*/
static noinline int find_dir_range(struct btrfs_root *root,
struct btrfs_path *path,
- u64 dirid, int key_type,
+ u64 dirid,
u64 *start_ret, u64 *end_ret)
{
struct btrfs_key key;
@@ -2215,7 +2215,7 @@ static noinline int find_dir_range(struct btrfs_root *root,
return 1;
key.objectid = dirid;
- key.type = key_type;
+ key.type = BTRFS_DIR_LOG_INDEX_KEY;
key.offset = *start_ret;
ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
@@ -2229,7 +2229,7 @@ static noinline int find_dir_range(struct btrfs_root *root,
if (ret != 0)
btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
- if (key.type != key_type || key.objectid != dirid) {
+ if (key.type != BTRFS_DIR_LOG_INDEX_KEY || key.objectid != dirid) {
ret = 1;
goto next;
}
@@ -2256,7 +2256,7 @@ static noinline int find_dir_range(struct btrfs_root *root,
btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
- if (key.type != key_type || key.objectid != dirid) {
+ if (key.type != BTRFS_DIR_LOG_INDEX_KEY || key.objectid != dirid) {
ret = 1;
goto out;
}
@@ -2287,95 +2287,82 @@ static noinline int check_item_in_log(struct btrfs_trans_handle *trans,
int ret;
struct extent_buffer *eb;
int slot;
- u32 item_size;
struct btrfs_dir_item *di;
- struct btrfs_dir_item *log_di;
int name_len;
- unsigned long ptr;
- unsigned long ptr_end;
char *name;
- struct inode *inode;
+ struct inode *inode = NULL;
struct btrfs_key location;
-again:
+ /*
+ * Currenly we only log dir index keys. Even if we replay a log created
+ * by an older kernel that logged both dir index and dir item keys, all
+ * we need to do is process the dir index keys, we (and our caller) can
+ * safely ignore dir item keys (key type BTRFS_DIR_ITEM_KEY).
+ */
+ ASSERT(dir_key->type == BTRFS_DIR_INDEX_KEY);
+
eb = path->nodes[0];
slot = path->slots[0];
- item_size = btrfs_item_size_nr(eb, slot);
- ptr = btrfs_item_ptr_offset(eb, slot);
- ptr_end = ptr + item_size;
- while (ptr < ptr_end) {
- di = (struct btrfs_dir_item *)ptr;
- name_len = btrfs_dir_name_len(eb, di);
- name = kmalloc(name_len, GFP_NOFS);
- if (!name) {
- ret = -ENOMEM;
- goto out;
- }
- read_extent_buffer(eb, name, (unsigned long)(di + 1),
- name_len);
- log_di = NULL;
- if (log && dir_key->type == BTRFS_DIR_ITEM_KEY) {
- log_di = btrfs_lookup_dir_item(trans, log, log_path,
- dir_key->objectid,
- name, name_len, 0);
- } else if (log && dir_key->type == BTRFS_DIR_INDEX_KEY) {
- log_di = btrfs_lookup_dir_index_item(trans, log,
- log_path,
- dir_key->objectid,
- dir_key->offset,
- name, name_len, 0);
- }
- if (!log_di) {
- btrfs_dir_item_key_to_cpu(eb, di, &location);
- btrfs_release_path(path);
- btrfs_release_path(log_path);
- inode = read_one_inode(root, location.objectid);
- if (!inode) {
- kfree(name);
- return -EIO;
- }
+ di = btrfs_item_ptr(eb, slot, struct btrfs_dir_item);
+ name_len = btrfs_dir_name_len(eb, di);
+ name = kmalloc(name_len, GFP_NOFS);
+ if (!name) {
+ ret = -ENOMEM;
+ goto out;
+ }
- ret = link_to_fixup_dir(trans, root,
- path, location.objectid);
- if (ret) {
- kfree(name);
- iput(inode);
- goto out;
- }
+ read_extent_buffer(eb, name, (unsigned long)(di + 1), name_len);
- inc_nlink(inode);
- ret = btrfs_unlink_inode(trans, BTRFS_I(dir),
- BTRFS_I(inode), name, name_len);
- if (!ret)
- ret = btrfs_run_delayed_items(trans);
- kfree(name);
- iput(inode);
- if (ret)
- goto out;
+ if (log) {
+ struct btrfs_dir_item *log_di;
- /* there might still be more names under this key
- * check and repeat if required
- */
- ret = btrfs_search_slot(NULL, root, dir_key, path,
- 0, 0);
- if (ret == 0)
- goto again;
+ log_di = btrfs_lookup_dir_index_item(trans, log, log_path,
+ dir_key->objectid,
+ dir_key->offset,
+ name, name_len, 0);
+ if (IS_ERR(log_di)) {
+ ret = PTR_ERR(log_di);
+ goto out;
+ } else if (log_di) {
+ /* The dentry exists in the log, we have nothing to do. */
ret = 0;
goto out;
- } else if (IS_ERR(log_di)) {
- kfree(name);
- return PTR_ERR(log_di);
}
- btrfs_release_path(log_path);
- kfree(name);
+ }
- ptr = (unsigned long)(di + 1);
- ptr += name_len;
+ btrfs_dir_item_key_to_cpu(eb, di, &location);
+ btrfs_release_path(path);
+ btrfs_release_path(log_path);
+ inode = read_one_inode(root, location.objectid);
+ if (!inode) {
+ ret = -EIO;
+ goto out;
}
- ret = 0;
+
+ ret = link_to_fixup_dir(trans, root, path, location.objectid);
+ if (ret)
+ goto out;
+
+ inc_nlink(inode);
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(inode), name,
+ name_len);
+ if (ret)
+ goto out;
+
+ ret = btrfs_run_delayed_items(trans);
+ if (ret)
+ goto out;
+
+ /*
+ * Unlike dir item keys, dir index keys can only have one name (entry) in
+ * them, as there are no key collisions since each key has a unique offset
+ * (an index number), so we're done.
+ */
out:
btrfs_release_path(path);
btrfs_release_path(log_path);
+ kfree(name);
+ iput(inode);
return ret;
}
@@ -2495,7 +2482,6 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
{
u64 range_start;
u64 range_end;
- int key_type = BTRFS_DIR_LOG_ITEM_KEY;
int ret = 0;
struct btrfs_key dir_key;
struct btrfs_key found_key;
@@ -2503,7 +2489,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
struct inode *dir;
dir_key.objectid = dirid;
- dir_key.type = BTRFS_DIR_ITEM_KEY;
+ dir_key.type = BTRFS_DIR_INDEX_KEY;
log_path = btrfs_alloc_path();
if (!log_path)
return -ENOMEM;
@@ -2517,14 +2503,14 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
btrfs_free_path(log_path);
return 0;
}
-again:
+
range_start = 0;
range_end = 0;
while (1) {
if (del_all)
range_end = (u64)-1;
else {
- ret = find_dir_range(log, path, dirid, key_type,
+ ret = find_dir_range(log, path, dirid,
&range_start, &range_end);
if (ret < 0)
goto out;
@@ -2551,8 +2537,10 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
btrfs_item_key_to_cpu(path->nodes[0], &found_key,
path->slots[0]);
if (found_key.objectid != dirid ||
- found_key.type != dir_key.type)
- goto next_type;
+ found_key.type != dir_key.type) {
+ ret = 0;
+ goto out;
+ }
if (found_key.offset > range_end)
break;
@@ -2571,15 +2559,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
break;
range_start = range_end + 1;
}
-
-next_type:
ret = 0;
- if (key_type == BTRFS_DIR_LOG_ITEM_KEY) {
- key_type = BTRFS_DIR_LOG_INDEX_KEY;
- dir_key.type = BTRFS_DIR_INDEX_KEY;
- btrfs_release_path(path);
- goto again;
- }
out:
btrfs_release_path(path);
btrfs_free_path(log_path);
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index e1c4c732aaba..5416f1f1a77a 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -146,7 +146,9 @@
/*
* dir items are the name -> inode pointers in a directory. There is one
- * for every name in a directory.
+ * for every name in a directory. BTRFS_DIR_LOG_ITEM_KEY is no longer used
+ * but it's still defined here for documentation purposes and to help avoid
+ * having its numerical value reused in the future.
*/
#define BTRFS_DIR_LOG_ITEM_KEY 60
#define BTRFS_DIR_LOG_INDEX_KEY 72
--
2.33.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
` (5 preceding siblings ...)
2021-10-25 16:31 ` [PATCH v2 6/6] btrfs: remove no longer needed logic for replaying directory deletes fdmanana
@ 2021-10-25 18:59 ` Josef Bacik
2021-10-25 19:55 ` David Sterba
2021-11-04 19:31 ` David Sterba
8 siblings, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-10-25 18:59 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs, Filipe Manana
On Mon, Oct 25, 2021 at 05:31:48PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> This patchset reworks directory logging to make it copy only the dir index
> keys, instead of copying both dir index keys and dir item keys, as both have
> the same type of information. This reduces the amount of logged metadata by
> about half, and therefore we do about half of the cpu bound work, half of
> the IO and use less log tree space (except for very small directories).
>
> This will allow other optimizations to build on top, some of which are only
> possible after this change, while others become easier and less cumbersome
> to implement after this change. Performance tests are in the changelog of
> patch 5/6. Patch 6/6 only removes code that deals with dir item keys when
> replaying directory deletions and it could have been squashed into patch
> 5/6, but since that one is already large and works without 6/6, I opted
> to make it separate to make it easier to review.
>
> Also, after this change we are still able to correctly replay a log tree
> generated by an old kernel, and an old kernel is also able to correctly
> replay a log tree generated by a kernel that has this patchset applied.
>
> I'm sending this close to the 5.16 merge window, but my intention is to
> have it only for the next merge window (5.17).
>
> V2: Updated changelog of patch 5/6 to make it more clear why backward
> and forward compatibility are guaranteed.
>
> Filipe Manana (6):
> btrfs: remove root argument from drop_one_dir_item()
> btrfs: remove root argument from btrfs_unlink_inode()
> btrfs: remove root argument from add_link()
> btrfs: remove root argument from check_item_in_log()
> btrfs: only copy dir index keys when logging a directory
> btrfs: remove no longer needed logic for replaying directory deletes
>
You can add
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
to the series, thanks,
Josef
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
` (6 preceding siblings ...)
2021-10-25 18:59 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only Josef Bacik
@ 2021-10-25 19:55 ` David Sterba
2021-10-26 9:43 ` Filipe Manana
2021-11-04 19:31 ` David Sterba
8 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2021-10-25 19:55 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs, josef, Filipe Manana
On Mon, Oct 25, 2021 at 05:31:48PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> This patchset reworks directory logging to make it copy only the dir index
> keys, instead of copying both dir index keys and dir item keys, as both have
> the same type of information. This reduces the amount of logged metadata by
> about half, and therefore we do about half of the cpu bound work, half of
> the IO and use less log tree space (except for very small directories).
>
> This will allow other optimizations to build on top, some of which are only
> possible after this change, while others become easier and less cumbersome
> to implement after this change. Performance tests are in the changelog of
> patch 5/6. Patch 6/6 only removes code that deals with dir item keys when
> replaying directory deletions and it could have been squashed into patch
> 5/6, but since that one is already large and works without 6/6, I opted
> to make it separate to make it easier to review.
>
> Also, after this change we are still able to correctly replay a log tree
> generated by an old kernel, and an old kernel is also able to correctly
> replay a log tree generated by a kernel that has this patchset applied.
>
> I'm sending this close to the 5.16 merge window, but my intention is to
> have it only for the next merge window (5.17).
>
> V2: Updated changelog of patch 5/6 to make it more clear why backward
> and forward compatibility are guaranteed.
>
> Filipe Manana (6):
> btrfs: remove root argument from drop_one_dir_item()
> btrfs: remove root argument from btrfs_unlink_inode()
> btrfs: remove root argument from add_link()
> btrfs: remove root argument from check_item_in_log()
If you want, I can add the first 4 patches to misc-next still queued for
5.16 pull next week, it's fairly trivial and you won't have to refresh
the patches. For preparatory patches like this it's free pass even
before merge window.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only
2021-10-25 19:55 ` David Sterba
@ 2021-10-26 9:43 ` Filipe Manana
0 siblings, 0 replies; 21+ messages in thread
From: Filipe Manana @ 2021-10-26 9:43 UTC (permalink / raw)
To: David Sterba, Filipe Manana, linux-btrfs, Josef Bacik,
Filipe Manana
On Mon, Oct 25, 2021 at 8:55 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Mon, Oct 25, 2021 at 05:31:48PM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > This patchset reworks directory logging to make it copy only the dir index
> > keys, instead of copying both dir index keys and dir item keys, as both have
> > the same type of information. This reduces the amount of logged metadata by
> > about half, and therefore we do about half of the cpu bound work, half of
> > the IO and use less log tree space (except for very small directories).
> >
> > This will allow other optimizations to build on top, some of which are only
> > possible after this change, while others become easier and less cumbersome
> > to implement after this change. Performance tests are in the changelog of
> > patch 5/6. Patch 6/6 only removes code that deals with dir item keys when
> > replaying directory deletions and it could have been squashed into patch
> > 5/6, but since that one is already large and works without 6/6, I opted
> > to make it separate to make it easier to review.
> >
> > Also, after this change we are still able to correctly replay a log tree
> > generated by an old kernel, and an old kernel is also able to correctly
> > replay a log tree generated by a kernel that has this patchset applied.
> >
> > I'm sending this close to the 5.16 merge window, but my intention is to
> > have it only for the next merge window (5.17).
> >
> > V2: Updated changelog of patch 5/6 to make it more clear why backward
> > and forward compatibility are guaranteed.
> >
> > Filipe Manana (6):
> > btrfs: remove root argument from drop_one_dir_item()
> > btrfs: remove root argument from btrfs_unlink_inode()
> > btrfs: remove root argument from add_link()
> > btrfs: remove root argument from check_item_in_log()
>
> If you want, I can add the first 4 patches to misc-next still queued for
> 5.16 pull next week, it's fairly trivial and you won't have to refresh
> the patches. For preparatory patches like this it's free pass even
> before merge window.
I'm fine with either way.
Thanks.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
` (7 preceding siblings ...)
2021-10-25 19:55 ` David Sterba
@ 2021-11-04 19:31 ` David Sterba
8 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2021-11-04 19:31 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs, josef, Filipe Manana
On Mon, Oct 25, 2021 at 05:31:48PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> This patchset reworks directory logging to make it copy only the dir index
> keys, instead of copying both dir index keys and dir item keys, as both have
> the same type of information. This reduces the amount of logged metadata by
> about half, and therefore we do about half of the cpu bound work, half of
> the IO and use less log tree space (except for very small directories).
>
> This will allow other optimizations to build on top, some of which are only
> possible after this change, while others become easier and less cumbersome
> to implement after this change. Performance tests are in the changelog of
> patch 5/6. Patch 6/6 only removes code that deals with dir item keys when
> replaying directory deletions and it could have been squashed into patch
> 5/6, but since that one is already large and works without 6/6, I opted
> to make it separate to make it easier to review.
>
> Also, after this change we are still able to correctly replay a log tree
> generated by an old kernel, and an old kernel is also able to correctly
> replay a log tree generated by a kernel that has this patchset applied.
>
> I'm sending this close to the 5.16 merge window, but my intention is to
> have it only for the next merge window (5.17).
>
> V2: Updated changelog of patch 5/6 to make it more clear why backward
> and forward compatibility are guaranteed.
>
> Filipe Manana (6):
> btrfs: remove root argument from drop_one_dir_item()
> btrfs: remove root argument from btrfs_unlink_inode()
> btrfs: remove root argument from add_link()
> btrfs: remove root argument from check_item_in_log()
The above have been added to 5.15 pull request.
> btrfs: only copy dir index keys when logging a directory
> btrfs: remove no longer needed logic for replaying directory deletes
And the rest is now in misc-next. Thanks.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2021-11-04 19:31 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-25 9:56 [PATCH 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
2021-10-25 9:56 ` [PATCH 1/6] btrfs: remove root argument from drop_one_dir_item() fdmanana
2021-10-25 9:56 ` [PATCH 2/6] btrfs: remove root argument from btrfs_unlink_inode() fdmanana
2021-10-25 9:56 ` [PATCH 3/6] btrfs: remove root argument from add_link() fdmanana
2021-10-25 9:56 ` [PATCH 4/6] btrfs: remove root argument from check_item_in_log() fdmanana
2021-10-25 9:56 ` [PATCH 5/6] btrfs: only copy dir index keys when logging a directory fdmanana
2021-10-25 15:36 ` Josef Bacik
2021-10-25 16:01 ` Filipe Manana
2021-10-25 16:06 ` Josef Bacik
2021-10-25 9:56 ` [PATCH 6/6] btrfs: remove no longer needed logic for replaying directory deletes fdmanana
2021-10-25 16:31 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only fdmanana
2021-10-25 16:31 ` [PATCH v2 1/6] btrfs: remove root argument from drop_one_dir_item() fdmanana
2021-10-25 16:31 ` [PATCH v2 2/6] btrfs: remove root argument from btrfs_unlink_inode() fdmanana
2021-10-25 16:31 ` [PATCH v2 3/6] btrfs: remove root argument from add_link() fdmanana
2021-10-25 16:31 ` [PATCH v2 4/6] btrfs: remove root argument from check_item_in_log() fdmanana
2021-10-25 16:31 ` [PATCH v2 5/6] btrfs: only copy dir index keys when logging a directory fdmanana
2021-10-25 16:31 ` [PATCH v2 6/6] btrfs: remove no longer needed logic for replaying directory deletes fdmanana
2021-10-25 18:59 ` [PATCH v2 0/6] btrfs: speedup directory logging/fsync by copying index keys only Josef Bacik
2021-10-25 19:55 ` David Sterba
2021-10-26 9:43 ` Filipe Manana
2021-11-04 19:31 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox