* [PATCH 1/2] Btrfs: serialize unlocked dio reads with truncate
@ 2013-01-31 9:23 Miao Xie
2013-01-31 16:40 ` Josef Bacik
0 siblings, 1 reply; 4+ messages in thread
From: Miao Xie @ 2013-01-31 9:23 UTC (permalink / raw)
To: Linux Btrfs
Currently, we can do unlocked dio reads, but the following race
is possible:
dio_read_task truncate_task
->btrfs_setattr()
->btrfs_direct_IO
->__blockdev_direct_IO
->btrfs_get_block
->btrfs_truncate()
#alloc truncated blocks
#to other inode
->submit_io()
#INFORMATION LEAK
In order to avoid this problem, we must serialize unlocked dio reads with
truncate by inode_dio_wait().
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/btrfs_inode.h | 19 +++++++++++++++++++
fs/btrfs/inode.c | 31 +++++++++++++++++++++++++++----
2 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 2a8c242..00e2601 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -40,6 +40,7 @@
#define BTRFS_INODE_HAS_ASYNC_EXTENT 6
#define BTRFS_INODE_NEEDS_FULL_SYNC 7
#define BTRFS_INODE_COPY_EVERYTHING 8
+#define BTRFS_INODE_READDIO_NEED_LOCK 9
/* in memory btrfs inode */
struct btrfs_inode {
@@ -216,4 +217,22 @@ static inline int btrfs_inode_in_log(struct inode *inode, u64 generation)
return 0;
}
+/*
+ * Disable DIO read nolock optimization, so new dio readers will be forced
+ * to grab i_mutex. It is used to avoid the endless truncate due to
+ * nonlocked dio read.
+ */
+static inline void btrfs_inode_block_unlocked_dio(struct inode *inode)
+{
+ set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &BTRFS_I(inode)->runtime_flags);
+ smp_mb();
+}
+
+static inline void btrfs_inode_resume_unlocked_dio(struct inode *inode)
+{
+ smp_mb__before_clear_bit();
+ clear_bit(BTRFS_INODE_READDIO_NEED_LOCK,
+ &BTRFS_I(inode)->runtime_flags);
+}
+
#endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 97f4c30..d17a04b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3785,6 +3785,11 @@ static int btrfs_setsize(struct inode *inode, loff_t newsize)
/* we don't support swapfiles, so vmtruncate shouldn't fail */
truncate_setsize(inode, newsize);
+
+ btrfs_inode_block_unlocked_dio(inode);
+ inode_dio_wait(inode);
+ btrfs_inode_resume_unlocked_dio(inode);
+
ret = btrfs_truncate(inode);
}
@@ -6583,15 +6588,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
{
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
+ int flags = 0;
+ bool wakeup = false;
+ int ret;
if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov,
offset, nr_segs))
return 0;
- return __blockdev_direct_IO(rw, iocb, inode,
- BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
- iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
- btrfs_submit_direct, 0);
+ if (rw == READ) {
+ atomic_inc(&inode->i_dio_count);
+ smp_mb__after_atomic_inc();
+ if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
+ &BTRFS_I(inode)->runtime_flags))) {
+ inode_dio_done(inode);
+ flags = DIO_LOCKING | DIO_SKIP_HOLES;
+ } else {
+ wakeup = true;
+ }
+ }
+
+ ret = __blockdev_direct_IO(rw, iocb, inode,
+ BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
+ iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
+ btrfs_submit_direct, flags);
+ if (wakeup)
+ inode_dio_done(inode);
+ return ret;
}
#define BTRFS_FIEMAP_FLAGS (FIEMAP_FLAG_SYNC)
--
1.7.11.7
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] Btrfs: serialize unlocked dio reads with truncate
2013-01-31 9:23 [PATCH 1/2] Btrfs: serialize unlocked dio reads with truncate Miao Xie
@ 2013-01-31 16:40 ` Josef Bacik
2013-02-01 5:56 ` Miao Xie
0 siblings, 1 reply; 4+ messages in thread
From: Josef Bacik @ 2013-01-31 16:40 UTC (permalink / raw)
To: Miao Xie; +Cc: Linux Btrfs
On Thu, Jan 31, 2013 at 02:23:19AM -0700, Miao Xie wrote:
> Currently, we can do unlocked dio reads, but the following race
> is possible:
>
> dio_read_task truncate_task
> ->btrfs_setattr()
> ->btrfs_direct_IO
> ->__blockdev_direct_IO
> ->btrfs_get_block
> ->btrfs_truncate()
> #alloc truncated blocks
> #to other inode
> ->submit_io()
> #INFORMATION LEAK
>
> In order to avoid this problem, we must serialize unlocked dio reads with
> truncate by inode_dio_wait().
>
So I had thinking about this, are we sure we don't want to just lock the extent
range when we truncate? I'm good with this, but it seems like we might as well
and be consistent and use the extent locks. What do you think? Thanks,
Josef
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] Btrfs: serialize unlocked dio reads with truncate
2013-01-31 16:40 ` Josef Bacik
@ 2013-02-01 5:56 ` Miao Xie
2013-02-01 14:40 ` Josef Bacik
0 siblings, 1 reply; 4+ messages in thread
From: Miao Xie @ 2013-02-01 5:56 UTC (permalink / raw)
To: Josef Bacik; +Cc: Linux Btrfs
On Thu, 31 Jan 2013 11:40:41 -0500, Josef Bacik wrote:
> On Thu, Jan 31, 2013 at 02:23:19AM -0700, Miao Xie wrote:
>> Currently, we can do unlocked dio reads, but the following race
>> is possible:
>>
>> dio_read_task truncate_task
>> ->btrfs_setattr()
>> ->btrfs_direct_IO
>> ->__blockdev_direct_IO
>> ->btrfs_get_block
>> ->btrfs_truncate()
>> #alloc truncated blocks
>> #to other inode
>> ->submit_io()
>> #INFORMATION LEAK
>>
>> In order to avoid this problem, we must serialize unlocked dio reads with
>> truncate by inode_dio_wait().
>>
>
> So I had thinking about this, are we sure we don't want to just lock the extent
> range when we truncate? I'm good with this, but it seems like we might as well
> and be consistent and use the extent locks. What do you think? Thanks,
But comparing with the current approach, the extent lock has the following problem:
Dio_Read_Task Truncate_task
truncate file
set isize to 4096
drop pages
lock extent[4096, 8191]
read extent[4096, 8191]
unlock extent[4096, 8191]
lock extent[4096, -1ULL]
truncate item
unlock extent[4096, -1ULL]
lock extent[8192, ...]
read extent[8192, ...]
no extent item
zero the buffer
unlock extent[8192, ...]
we get the data that is mixed with new data.(Punch hole also has this problem, we need
fix)
Thanks
Miao
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] Btrfs: serialize unlocked dio reads with truncate
2013-02-01 5:56 ` Miao Xie
@ 2013-02-01 14:40 ` Josef Bacik
0 siblings, 0 replies; 4+ messages in thread
From: Josef Bacik @ 2013-02-01 14:40 UTC (permalink / raw)
To: Miao Xie; +Cc: Josef Bacik, Linux Btrfs
On Thu, Jan 31, 2013 at 10:56:34PM -0700, Miao Xie wrote:
> On Thu, 31 Jan 2013 11:40:41 -0500, Josef Bacik wrote:
> > On Thu, Jan 31, 2013 at 02:23:19AM -0700, Miao Xie wrote:
> >> Currently, we can do unlocked dio reads, but the following race
> >> is possible:
> >>
> >> dio_read_task truncate_task
> >> ->btrfs_setattr()
> >> ->btrfs_direct_IO
> >> ->__blockdev_direct_IO
> >> ->btrfs_get_block
> >> ->btrfs_truncate()
> >> #alloc truncated blocks
> >> #to other inode
> >> ->submit_io()
> >> #INFORMATION LEAK
> >>
> >> In order to avoid this problem, we must serialize unlocked dio reads with
> >> truncate by inode_dio_wait().
> >>
> >
> > So I had thinking about this, are we sure we don't want to just lock the extent
> > range when we truncate? I'm good with this, but it seems like we might as well
> > and be consistent and use the extent locks. What do you think? Thanks,
>
> But comparing with the current approach, the extent lock has the following problem:
> Dio_Read_Task Truncate_task
> truncate file
> set isize to 4096
> drop pages
> lock extent[4096, 8191]
> read extent[4096, 8191]
> unlock extent[4096, 8191]
> lock extent[4096, -1ULL]
> truncate item
> unlock extent[4096, -1ULL]
> lock extent[8192, ...]
> read extent[8192, ...]
> no extent item
> zero the buffer
> unlock extent[8192, ...]
>
> we get the data that is mixed with new data.(Punch hole also has this problem, we need
> fix)
So this case is fine, since we'll still get valid data, the extents would still
be there. If you are mixing dio reads with simultaneous truncate/hole punching
you deserve to get your ass bitten :). The other option would be to lock before
we set the isize, or check the isize in get_extents. Thanks,
Josef
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-02-01 14:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-31 9:23 [PATCH 1/2] Btrfs: serialize unlocked dio reads with truncate Miao Xie
2013-01-31 16:40 ` Josef Bacik
2013-02-01 5:56 ` Miao Xie
2013-02-01 14:40 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).