* [PATCH 1/3] btrfs: always set an inode's delayed_inode with WRITE_ONCE()
2024-05-17 13:13 [PATCH 0/3] btrfs: avoid data races when accessing an inode's delayed_node fdmanana
@ 2024-05-17 13:13 ` fdmanana
2024-05-17 22:51 ` Qu Wenruo
2024-05-17 13:13 ` [PATCH 2/3] btrfs: use READ_ONCE() when accessing delayed_node at btrfs_dirty_node() fdmanana
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: fdmanana @ 2024-05-17 13:13 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Currently we have a couple places using READ_ONCE() to access an inode's
delayed_inode without taking the lock that protects the xarray for delayed
inodes, while all the other places access it while holding the lock.
However we never update the delayed_inode pointer of an inode with
WRITE_ONCE(), making the use of READ_ONCE() pointless since it should
always be paired with a WRITE_ONCE() in order to protect against issues
such as write tearing for example.
So change all the places that update struct btrfs_inode::delayed_inode to
use WRITE_ONCE() instead of simple assignments.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/delayed-inode.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 483c141dc488..6df7e44d9d31 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -106,7 +106,7 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node(
*/
if (refcount_inc_not_zero(&node->refs)) {
refcount_inc(&node->refs);
- btrfs_inode->delayed_node = node;
+ WRITE_ONCE(btrfs_inode->delayed_node, node);
} else {
node = NULL;
}
@@ -161,7 +161,7 @@ static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
ASSERT(xa_err(ptr) != -EINVAL);
ASSERT(xa_err(ptr) != -ENOMEM);
ASSERT(ptr == NULL);
- btrfs_inode->delayed_node = node;
+ WRITE_ONCE(btrfs_inode->delayed_node, node);
xa_unlock(&root->delayed_nodes);
return node;
@@ -1312,7 +1312,7 @@ void btrfs_remove_delayed_node(struct btrfs_inode *inode)
if (!delayed_node)
return;
- inode->delayed_node = NULL;
+ WRITE_ONCE(inode->delayed_node, NULL);
btrfs_release_delayed_node(delayed_node);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 1/3] btrfs: always set an inode's delayed_inode with WRITE_ONCE()
2024-05-17 13:13 ` [PATCH 1/3] btrfs: always set an inode's delayed_inode with WRITE_ONCE() fdmanana
@ 2024-05-17 22:51 ` Qu Wenruo
0 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2024-05-17 22:51 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2024/5/17 22:43, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> Currently we have a couple places using READ_ONCE() to access an inode's
> delayed_inode without taking the lock that protects the xarray for delayed
> inodes, while all the other places access it while holding the lock.
>
> However we never update the delayed_inode pointer of an inode with
> WRITE_ONCE(), making the use of READ_ONCE() pointless since it should
> always be paired with a WRITE_ONCE() in order to protect against issues
> such as write tearing for example.
>
> So change all the places that update struct btrfs_inode::delayed_inode to
> use WRITE_ONCE() instead of simple assignments.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
> ---
> fs/btrfs/delayed-inode.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index 483c141dc488..6df7e44d9d31 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -106,7 +106,7 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node(
> */
> if (refcount_inc_not_zero(&node->refs)) {
> refcount_inc(&node->refs);
> - btrfs_inode->delayed_node = node;
> + WRITE_ONCE(btrfs_inode->delayed_node, node);
> } else {
> node = NULL;
> }
> @@ -161,7 +161,7 @@ static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
> ASSERT(xa_err(ptr) != -EINVAL);
> ASSERT(xa_err(ptr) != -ENOMEM);
> ASSERT(ptr == NULL);
> - btrfs_inode->delayed_node = node;
> + WRITE_ONCE(btrfs_inode->delayed_node, node);
> xa_unlock(&root->delayed_nodes);
>
> return node;
> @@ -1312,7 +1312,7 @@ void btrfs_remove_delayed_node(struct btrfs_inode *inode)
> if (!delayed_node)
> return;
>
> - inode->delayed_node = NULL;
> + WRITE_ONCE(inode->delayed_node, NULL);
> btrfs_release_delayed_node(delayed_node);
> }
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/3] btrfs: use READ_ONCE() when accessing delayed_node at btrfs_dirty_node()
2024-05-17 13:13 [PATCH 0/3] btrfs: avoid data races when accessing an inode's delayed_node fdmanana
2024-05-17 13:13 ` [PATCH 1/3] btrfs: always set an inode's delayed_inode with WRITE_ONCE() fdmanana
@ 2024-05-17 13:13 ` fdmanana
2024-05-17 22:51 ` Qu Wenruo
2024-05-17 13:13 ` [PATCH 3/3] btrfs: add and use helpers to get and set an inode's delayed_node fdmanana
2024-05-20 15:48 ` [PATCH 0/3] btrfs: avoid data races when accessing " David Sterba
3 siblings, 1 reply; 11+ messages in thread
From: fdmanana @ 2024-05-17 13:13 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
An inode's delayed_node is set while holding the root->delayed_nodes
xarray's lock, so accessing it without holding the lock at
btrfs_dirty_node() is racy and will likely trigger warnings from tools
like KCSAN.
Since we update the inode's delayed_node with WRITE_ONCE(), we can use
READ_ONCE() without taking the lock, as we do in several other places
at delayed-inode.c. So change btrfs_dirty_node() to use READ_ONCE().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 000809e16aba..11cad22d7b4c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6100,7 +6100,7 @@ static int btrfs_dirty_inode(struct btrfs_inode *inode)
ret = btrfs_update_inode(trans, inode);
}
btrfs_end_transaction(trans);
- if (inode->delayed_node)
+ if (READ_ONCE(inode->delayed_node))
btrfs_balance_delayed_items(fs_info);
return ret;
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] btrfs: use READ_ONCE() when accessing delayed_node at btrfs_dirty_node()
2024-05-17 13:13 ` [PATCH 2/3] btrfs: use READ_ONCE() when accessing delayed_node at btrfs_dirty_node() fdmanana
@ 2024-05-17 22:51 ` Qu Wenruo
0 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2024-05-17 22:51 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2024/5/17 22:43, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> An inode's delayed_node is set while holding the root->delayed_nodes
> xarray's lock, so accessing it without holding the lock at
> btrfs_dirty_node() is racy and will likely trigger warnings from tools
> like KCSAN.
>
> Since we update the inode's delayed_node with WRITE_ONCE(), we can use
> READ_ONCE() without taking the lock, as we do in several other places
> at delayed-inode.c. So change btrfs_dirty_node() to use READ_ONCE().
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
> ---
> fs/btrfs/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 000809e16aba..11cad22d7b4c 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6100,7 +6100,7 @@ static int btrfs_dirty_inode(struct btrfs_inode *inode)
> ret = btrfs_update_inode(trans, inode);
> }
> btrfs_end_transaction(trans);
> - if (inode->delayed_node)
> + if (READ_ONCE(inode->delayed_node))
> btrfs_balance_delayed_items(fs_info);
>
> return ret;
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/3] btrfs: add and use helpers to get and set an inode's delayed_node
2024-05-17 13:13 [PATCH 0/3] btrfs: avoid data races when accessing an inode's delayed_node fdmanana
2024-05-17 13:13 ` [PATCH 1/3] btrfs: always set an inode's delayed_inode with WRITE_ONCE() fdmanana
2024-05-17 13:13 ` [PATCH 2/3] btrfs: use READ_ONCE() when accessing delayed_node at btrfs_dirty_node() fdmanana
@ 2024-05-17 13:13 ` fdmanana
2024-05-17 22:51 ` Qu Wenruo
2024-05-20 15:48 ` [PATCH 0/3] btrfs: avoid data races when accessing " David Sterba
3 siblings, 1 reply; 11+ messages in thread
From: fdmanana @ 2024-05-17 13:13 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When reading the delayed_node of an inode without taking the lock of the
root->delayed_nodes we are using READ_ONCE(), and when updating it we are
using WRITE_ONCE().
Add and use helpers that hide the usage of READ_ONCE() and WRITE_ONCE(),
like we do for other inode fields such as first_dir_index_to_log or the
log_transid field of struct btrfs_root for example.
Also make use of the setter helper at btrfs_alloc_inode() for consistency
only - it shouldn't be needed since when allocating an inode no one else
can access it concurrently.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/btrfs_inode.h | 12 ++++++++++++
fs/btrfs/delayed-inode.c | 10 +++++-----
fs/btrfs/inode.c | 4 ++--
3 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4d9299789a03..3c8bc7a8ebdd 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -330,6 +330,18 @@ struct btrfs_inode {
struct inode vfs_inode;
};
+static inline struct btrfs_delayed_node *btrfs_get_inode_delayed_node(
+ const struct btrfs_inode *inode)
+{
+ return READ_ONCE(inode->delayed_node);
+}
+
+static inline void btrfs_set_inode_delayed_node(struct btrfs_inode *inode,
+ struct btrfs_delayed_node *node)
+{
+ WRITE_ONCE(inode->delayed_node, node);
+}
+
static inline u64 btrfs_get_first_dir_index_to_log(const struct btrfs_inode *inode)
{
return READ_ONCE(inode->first_dir_index_to_log);
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 6df7e44d9d31..f2fe488665e8 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -71,7 +71,7 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node(
u64 ino = btrfs_ino(btrfs_inode);
struct btrfs_delayed_node *node;
- node = READ_ONCE(btrfs_inode->delayed_node);
+ node = btrfs_get_inode_delayed_node(btrfs_inode);
if (node) {
refcount_inc(&node->refs);
return node;
@@ -106,7 +106,7 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node(
*/
if (refcount_inc_not_zero(&node->refs)) {
refcount_inc(&node->refs);
- WRITE_ONCE(btrfs_inode->delayed_node, node);
+ btrfs_set_inode_delayed_node(btrfs_inode, node);
} else {
node = NULL;
}
@@ -161,7 +161,7 @@ static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
ASSERT(xa_err(ptr) != -EINVAL);
ASSERT(xa_err(ptr) != -ENOMEM);
ASSERT(ptr == NULL);
- WRITE_ONCE(btrfs_inode->delayed_node, node);
+ btrfs_set_inode_delayed_node(btrfs_inode, node);
xa_unlock(&root->delayed_nodes);
return node;
@@ -1308,11 +1308,11 @@ void btrfs_remove_delayed_node(struct btrfs_inode *inode)
{
struct btrfs_delayed_node *delayed_node;
- delayed_node = READ_ONCE(inode->delayed_node);
+ delayed_node = btrfs_get_inode_delayed_node(inode);
if (!delayed_node)
return;
- WRITE_ONCE(inode->delayed_node, NULL);
+ btrfs_set_inode_delayed_node(inode, NULL);
btrfs_release_delayed_node(delayed_node);
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 11cad22d7b4c..2f3129fe0e58 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6100,7 +6100,7 @@ static int btrfs_dirty_inode(struct btrfs_inode *inode)
ret = btrfs_update_inode(trans, inode);
}
btrfs_end_transaction(trans);
- if (READ_ONCE(inode->delayed_node))
+ if (btrfs_get_inode_delayed_node(inode))
btrfs_balance_delayed_items(fs_info);
return ret;
@@ -8475,7 +8475,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
ei->prop_compress = BTRFS_COMPRESS_NONE;
ei->defrag_compress = BTRFS_COMPRESS_NONE;
- ei->delayed_node = NULL;
+ btrfs_set_inode_delayed_node(ei, NULL);
ei->i_otime_sec = 0;
ei->i_otime_nsec = 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 3/3] btrfs: add and use helpers to get and set an inode's delayed_node
2024-05-17 13:13 ` [PATCH 3/3] btrfs: add and use helpers to get and set an inode's delayed_node fdmanana
@ 2024-05-17 22:51 ` Qu Wenruo
0 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2024-05-17 22:51 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2024/5/17 22:43, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> When reading the delayed_node of an inode without taking the lock of the
> root->delayed_nodes we are using READ_ONCE(), and when updating it we are
> using WRITE_ONCE().
>
> Add and use helpers that hide the usage of READ_ONCE() and WRITE_ONCE(),
> like we do for other inode fields such as first_dir_index_to_log or the
> log_transid field of struct btrfs_root for example.
>
> Also make use of the setter helper at btrfs_alloc_inode() for consistency
> only - it shouldn't be needed since when allocating an inode no one else
> can access it concurrently.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
> ---
> fs/btrfs/btrfs_inode.h | 12 ++++++++++++
> fs/btrfs/delayed-inode.c | 10 +++++-----
> fs/btrfs/inode.c | 4 ++--
> 3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 4d9299789a03..3c8bc7a8ebdd 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -330,6 +330,18 @@ struct btrfs_inode {
> struct inode vfs_inode;
> };
>
> +static inline struct btrfs_delayed_node *btrfs_get_inode_delayed_node(
> + const struct btrfs_inode *inode)
> +{
> + return READ_ONCE(inode->delayed_node);
> +}
> +
> +static inline void btrfs_set_inode_delayed_node(struct btrfs_inode *inode,
> + struct btrfs_delayed_node *node)
> +{
> + WRITE_ONCE(inode->delayed_node, node);
> +}
> +
> static inline u64 btrfs_get_first_dir_index_to_log(const struct btrfs_inode *inode)
> {
> return READ_ONCE(inode->first_dir_index_to_log);
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index 6df7e44d9d31..f2fe488665e8 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -71,7 +71,7 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node(
> u64 ino = btrfs_ino(btrfs_inode);
> struct btrfs_delayed_node *node;
>
> - node = READ_ONCE(btrfs_inode->delayed_node);
> + node = btrfs_get_inode_delayed_node(btrfs_inode);
> if (node) {
> refcount_inc(&node->refs);
> return node;
> @@ -106,7 +106,7 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node(
> */
> if (refcount_inc_not_zero(&node->refs)) {
> refcount_inc(&node->refs);
> - WRITE_ONCE(btrfs_inode->delayed_node, node);
> + btrfs_set_inode_delayed_node(btrfs_inode, node);
> } else {
> node = NULL;
> }
> @@ -161,7 +161,7 @@ static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
> ASSERT(xa_err(ptr) != -EINVAL);
> ASSERT(xa_err(ptr) != -ENOMEM);
> ASSERT(ptr == NULL);
> - WRITE_ONCE(btrfs_inode->delayed_node, node);
> + btrfs_set_inode_delayed_node(btrfs_inode, node);
> xa_unlock(&root->delayed_nodes);
>
> return node;
> @@ -1308,11 +1308,11 @@ void btrfs_remove_delayed_node(struct btrfs_inode *inode)
> {
> struct btrfs_delayed_node *delayed_node;
>
> - delayed_node = READ_ONCE(inode->delayed_node);
> + delayed_node = btrfs_get_inode_delayed_node(inode);
> if (!delayed_node)
> return;
>
> - WRITE_ONCE(inode->delayed_node, NULL);
> + btrfs_set_inode_delayed_node(inode, NULL);
> btrfs_release_delayed_node(delayed_node);
> }
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 11cad22d7b4c..2f3129fe0e58 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6100,7 +6100,7 @@ static int btrfs_dirty_inode(struct btrfs_inode *inode)
> ret = btrfs_update_inode(trans, inode);
> }
> btrfs_end_transaction(trans);
> - if (READ_ONCE(inode->delayed_node))
> + if (btrfs_get_inode_delayed_node(inode))
> btrfs_balance_delayed_items(fs_info);
>
> return ret;
> @@ -8475,7 +8475,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
> ei->prop_compress = BTRFS_COMPRESS_NONE;
> ei->defrag_compress = BTRFS_COMPRESS_NONE;
>
> - ei->delayed_node = NULL;
> + btrfs_set_inode_delayed_node(ei, NULL);
>
> ei->i_otime_sec = 0;
> ei->i_otime_nsec = 0;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] btrfs: avoid data races when accessing an inode's delayed_node
2024-05-17 13:13 [PATCH 0/3] btrfs: avoid data races when accessing an inode's delayed_node fdmanana
` (2 preceding siblings ...)
2024-05-17 13:13 ` [PATCH 3/3] btrfs: add and use helpers to get and set an inode's delayed_node fdmanana
@ 2024-05-20 15:48 ` David Sterba
2024-05-20 16:58 ` Filipe Manana
3 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2024-05-20 15:48 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Fri, May 17, 2024 at 02:13:23PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> We do have some data races when accessing an inode's delayed_node, namely
> we use READ_ONCE() in a couple places while there's no pairing WRITE_ONCE()
> anywhere, and in one place (btrfs_dirty_inode()) we neither user READ_ONCE()
> nor take the lock that protects the delayed_node. So fix these and add
> helpers to access and update an inode's delayed_node.
>
> Filipe Manana (3):
> btrfs: always set an inode's delayed_inode with WRITE_ONCE()
> btrfs: use READ_ONCE() when accessing delayed_node at btrfs_dirty_node()
> btrfs: add and use helpers to get and set an inode's delayed_node
The READ_ONCE for delayed nodes has been there historically but I don't
think it's needed everywhere. The legitimate case is in
btrfs_get_delayed_node() where first use is without lock and then again
recheck under the lock so we do want to read fresh value. This is to
prevent compiler optimization to coalesce the reads.
Writing to delayed node under lock also does not need WRITE_ONCE.
IOW, I would rather remove use of the _ONCE helpers and not add more as
it is not the pattern where it's supposed to be used. You say it's to
prevent load tearing but for a pointer type this does not happen and is
an assumption of the hardware.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 0/3] btrfs: avoid data races when accessing an inode's delayed_node
2024-05-20 15:48 ` [PATCH 0/3] btrfs: avoid data races when accessing " David Sterba
@ 2024-05-20 16:58 ` Filipe Manana
2024-05-20 20:20 ` David Sterba
0 siblings, 1 reply; 11+ messages in thread
From: Filipe Manana @ 2024-05-20 16:58 UTC (permalink / raw)
To: dsterba; +Cc: linux-btrfs
On Mon, May 20, 2024 at 4:48 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Fri, May 17, 2024 at 02:13:23PM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > We do have some data races when accessing an inode's delayed_node, namely
> > we use READ_ONCE() in a couple places while there's no pairing WRITE_ONCE()
> > anywhere, and in one place (btrfs_dirty_inode()) we neither user READ_ONCE()
> > nor take the lock that protects the delayed_node. So fix these and add
> > helpers to access and update an inode's delayed_node.
> >
> > Filipe Manana (3):
> > btrfs: always set an inode's delayed_inode with WRITE_ONCE()
> > btrfs: use READ_ONCE() when accessing delayed_node at btrfs_dirty_node()
> > btrfs: add and use helpers to get and set an inode's delayed_node
>
> The READ_ONCE for delayed nodes has been there historically but I don't
> think it's needed everywhere. The legitimate case is in
> btrfs_get_delayed_node() where first use is without lock and then again
> recheck under the lock so we do want to read fresh value. This is to
> prevent compiler optimization to coalesce the reads.
>
> Writing to delayed node under lock also does not need WRITE_ONCE.
>
> IOW, I would rather remove use of the _ONCE helpers and not add more as
> it is not the pattern where it's supposed to be used. You say it's to
> prevent load tearing but for a pointer type this does not happen and is
> an assumption of the hardware.
If you are sure that pointers aren't subject to load/store tearing
issues, then I'm fine with dropping the patchset.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] btrfs: avoid data races when accessing an inode's delayed_node
2024-05-20 16:58 ` Filipe Manana
@ 2024-05-20 20:20 ` David Sterba
2024-05-21 14:47 ` David Sterba
0 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2024-05-20 20:20 UTC (permalink / raw)
To: Filipe Manana; +Cc: linux-btrfs
On Mon, May 20, 2024 at 05:58:37PM +0100, Filipe Manana wrote:
> On Mon, May 20, 2024 at 4:48 PM David Sterba <dsterba@suse.cz> wrote:
> >
> > On Fri, May 17, 2024 at 02:13:23PM +0100, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > We do have some data races when accessing an inode's delayed_node, namely
> > > we use READ_ONCE() in a couple places while there's no pairing WRITE_ONCE()
> > > anywhere, and in one place (btrfs_dirty_inode()) we neither user READ_ONCE()
> > > nor take the lock that protects the delayed_node. So fix these and add
> > > helpers to access and update an inode's delayed_node.
> > >
> > > Filipe Manana (3):
> > > btrfs: always set an inode's delayed_inode with WRITE_ONCE()
> > > btrfs: use READ_ONCE() when accessing delayed_node at btrfs_dirty_node()
> > > btrfs: add and use helpers to get and set an inode's delayed_node
> >
> > The READ_ONCE for delayed nodes has been there historically but I don't
> > think it's needed everywhere. The legitimate case is in
> > btrfs_get_delayed_node() where first use is without lock and then again
> > recheck under the lock so we do want to read fresh value. This is to
> > prevent compiler optimization to coalesce the reads.
> >
> > Writing to delayed node under lock also does not need WRITE_ONCE.
> >
> > IOW, I would rather remove use of the _ONCE helpers and not add more as
> > it is not the pattern where it's supposed to be used. You say it's to
> > prevent load tearing but for a pointer type this does not happen and is
> > an assumption of the hardware.
>
> If you are sure that pointers aren't subject to load/store tearing
> issues, then I'm fine with dropping the patchset.
This will be in some CPU manual, the thread on LWN
https://lwn.net/Articles/793895/
mentions that. I base my claim on reading about that in various other
discussions on LKML over time. Pointers match the unsigned long type
that is the machine word and register size, assignment from register to
register/memory happens in one go. What could be problematic is constant
(immediate) assigned to register on architectures like SPARC that have
fixed size instructions and the constatnt has to be written in two
steps.
The need for READ_ONCE/WRITE_ONCE is to prevent compiler optimizations
and also the load tearing, but for native types up to unsigned long this
seems to be true.
https://elixir.bootlin.com/linux/latest/source/include/asm-generic/rwonce.h#L29
The only requirement that can possibly cause the tearing even for
pointer is if it's not aligned and could be split over two cachelines.
This should be the case for structures defined in a normal way (ie. no
forced mis-alignment or __packed).
In case of u64 we use _ONCE access because of 32bit architectures if
there's an unlocked access.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] btrfs: avoid data races when accessing an inode's delayed_node
2024-05-20 20:20 ` David Sterba
@ 2024-05-21 14:47 ` David Sterba
0 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2024-05-21 14:47 UTC (permalink / raw)
To: David Sterba; +Cc: Filipe Manana, linux-btrfs
On Mon, May 20, 2024 at 10:20:26PM +0200, David Sterba wrote:
> On Mon, May 20, 2024 at 05:58:37PM +0100, Filipe Manana wrote:
> > On Mon, May 20, 2024 at 4:48 PM David Sterba <dsterba@suse.cz> wrote:
> > >
> > > On Fri, May 17, 2024 at 02:13:23PM +0100, fdmanana@kernel.org wrote:
> > > > From: Filipe Manana <fdmanana@suse.com>
> > > >
> > > > We do have some data races when accessing an inode's delayed_node, namely
> > > > we use READ_ONCE() in a couple places while there's no pairing WRITE_ONCE()
> > > > anywhere, and in one place (btrfs_dirty_inode()) we neither user READ_ONCE()
> > > > nor take the lock that protects the delayed_node. So fix these and add
> > > > helpers to access and update an inode's delayed_node.
> > > >
> > > > Filipe Manana (3):
> > > > btrfs: always set an inode's delayed_inode with WRITE_ONCE()
> > > > btrfs: use READ_ONCE() when accessing delayed_node at btrfs_dirty_node()
> > > > btrfs: add and use helpers to get and set an inode's delayed_node
> > >
> > > The READ_ONCE for delayed nodes has been there historically but I don't
> > > think it's needed everywhere. The legitimate case is in
> > > btrfs_get_delayed_node() where first use is without lock and then again
> > > recheck under the lock so we do want to read fresh value. This is to
> > > prevent compiler optimization to coalesce the reads.
> > >
> > > Writing to delayed node under lock also does not need WRITE_ONCE.
> > >
> > > IOW, I would rather remove use of the _ONCE helpers and not add more as
> > > it is not the pattern where it's supposed to be used. You say it's to
> > > prevent load tearing but for a pointer type this does not happen and is
> > > an assumption of the hardware.
> >
> > If you are sure that pointers aren't subject to load/store tearing
> > issues, then I'm fine with dropping the patchset.
>
> This will be in some CPU manual, the thread on LWN
> https://lwn.net/Articles/793895/
> mentions that. I base my claim on reading about that in various other
> discussions on LKML over time. Pointers match the unsigned long type
> that is the machine word and register size, assignment from register to
> register/memory happens in one go. What could be problematic is constant
> (immediate) assigned to register on architectures like SPARC that have
> fixed size instructions and the constatnt has to be written in two
> steps.
>
> The need for READ_ONCE/WRITE_ONCE is to prevent compiler optimizations
> and also the load tearing, but for native types up to unsigned long this
> seems to be true.
>
> https://elixir.bootlin.com/linux/latest/source/include/asm-generic/rwonce.h#L29
>
> The only requirement that can possibly cause the tearing even for
> pointer is if it's not aligned and could be split over two cachelines.
> This should be the case for structures defined in a normal way (ie. no
> forced mis-alignment or __packed).
For future reference: documented for example in Intel 64 and IA-32
Architectures, Software Developer's Manual, volume 3A, 9.2 Memory
ordering.
From section 9.2.3.1: "[...] Instructions that read or write a quadword
(8 bytes) whose address is aligned on an 8 byte boundary."
^ permalink raw reply [flat|nested] 11+ messages in thread