* [PATCH 1/3] Btrfs: don't skip the page flush since the enospc is not brought by it
@ 2014-02-27 5:58 Miao Xie
2014-02-27 5:58 ` [PATCH 2/3] Btrfs: fix wrong lock range and write size in check_can_nocow() Miao Xie
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Miao Xie @ 2014-02-27 5:58 UTC (permalink / raw)
To: linux-btrfs
As we know, btrfs flushes the continuous pages as many as possible,
but if all the free spaces are small, we will allocate the spaces by
several times, and if there is something wrong with the space
reservation, it is very likely that some allocations succeed and
the others fail. But the current code doesn't take this case into
account, and set the error flag for the pages though their spaces
are allocated successfully. It introduces a problem that we can not
umount the fs after the above problem happens because we need wait
for the flush of those pages whose spaces are allocated. This patch
fixes the above problem, and makes the btrfs developers happy when
they investigate the problem of the space reservation.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/extent_io.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fbe501d..97595ea 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3198,8 +3198,19 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
&nr_written);
/* File system has been set read-only */
if (ret) {
- SetPageError(page);
- goto done;
+ /*
+ * Private2 means we allocate the space
+ * for this page successfully, and enospc
+ * error doesn't set the fs to be R/O, so
+ * we can write out the page and needn't
+ * set error flag for this page. If so, we
+ * can prevent the umount task from being
+ * blocked.
+ */
+ if (!(ret == -ENOSPC && PagePrivate2(page))) {
+ SetPageError(page);
+ goto done;
+ }
}
/*
* delalloc_end is already one less than the total
--
1.8.1.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/3] Btrfs: fix wrong lock range and write size in check_can_nocow()
2014-02-27 5:58 [PATCH 1/3] Btrfs: don't skip the page flush since the enospc is not brought by it Miao Xie
@ 2014-02-27 5:58 ` Miao Xie
2014-02-27 5:58 ` [PATCH 3/3] Btrfs: fix preallocate vs double nocow write Miao Xie
2014-03-07 16:00 ` [PATCH 1/3] Btrfs: don't skip the page flush since the enospc is not brought by it Josef Bacik
2 siblings, 0 replies; 4+ messages in thread
From: Miao Xie @ 2014-02-27 5:58 UTC (permalink / raw)
To: linux-btrfs
The write range may not be sector-aligned, for example:
|--------|--------| <- write range, sector-unaligned, size: 2blocks
|--------|--------|--------| <- correct lock range, size: 3blocks
But according to the old code, we used the size of write range to calculate
the lock range directly, not considered the offset, we would get a wrong lock
range:
|--------|--------| <- write range, sector-unaligned, size: 2blocks
|--------|--------| <- wrong lock range, size: 2blocks
And besides that, the old code also had the same problem when calculating
the real write size. Correct them.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/file.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0165b86..56f8a07 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1397,7 +1397,7 @@ static noinline int check_can_nocow(struct inode *inode, loff_t pos,
int ret;
lockstart = round_down(pos, root->sectorsize);
- lockend = lockstart + round_up(*write_bytes, root->sectorsize) - 1;
+ lockend = round_up(pos + *write_bytes, root->sectorsize) - 1;
while (1) {
lock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend);
@@ -1420,7 +1420,8 @@ static noinline int check_can_nocow(struct inode *inode, loff_t pos,
EXTENT_DIRTY | EXTENT_DELALLOC |
EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0,
NULL, GFP_NOFS);
- *write_bytes = min_t(size_t, *write_bytes, num_bytes);
+ *write_bytes = min_t(size_t, *write_bytes ,
+ num_bytes - pos + lockstart);
}
unlock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 3/3] Btrfs: fix preallocate vs double nocow write
2014-02-27 5:58 [PATCH 1/3] Btrfs: don't skip the page flush since the enospc is not brought by it Miao Xie
2014-02-27 5:58 ` [PATCH 2/3] Btrfs: fix wrong lock range and write size in check_can_nocow() Miao Xie
@ 2014-02-27 5:58 ` Miao Xie
2014-03-07 16:00 ` [PATCH 1/3] Btrfs: don't skip the page flush since the enospc is not brought by it Josef Bacik
2 siblings, 0 replies; 4+ messages in thread
From: Miao Xie @ 2014-02-27 5:58 UTC (permalink / raw)
To: linux-btrfs
We can not release the reserved metadata space for the first write if we
find the write position is pre-allocated. Because the kernel might write
the data on the disk before we do the second write but after the can-nocow
check, if we release the space for the first write, we might fail to update
the metadata because of no space.
Fix this problem by end nocow write if there is dirty data in the range whose
space is pre-allocated.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/file.c | 9 ++-------
fs/btrfs/inode.c | 16 +++++++++++++++-
2 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 56f8a07..1654d68 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1413,16 +1413,11 @@ static noinline int check_can_nocow(struct inode *inode, loff_t pos,
num_bytes = lockend - lockstart + 1;
ret = can_nocow_extent(inode, lockstart, &num_bytes, NULL, NULL, NULL);
- if (ret <= 0) {
+ if (ret <= 0)
ret = 0;
- } else {
- clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend,
- EXTENT_DIRTY | EXTENT_DELALLOC |
- EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0,
- NULL, GFP_NOFS);
+ else
*write_bytes = min_t(size_t, *write_bytes ,
num_bytes - pos + lockstart);
- }
unlock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1af34d0..17415a3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6546,6 +6546,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
int ret;
struct extent_buffer *leaf;
struct btrfs_root *root = BTRFS_I(inode)->root;
+ struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
struct btrfs_file_extent_item *fi;
struct btrfs_key key;
u64 disk_bytenr;
@@ -6622,6 +6623,20 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
if (btrfs_extent_readonly(root, disk_bytenr))
goto out;
+
+ num_bytes = min(offset + *len, extent_end) - offset;
+ if (!nocow && found_type == BTRFS_FILE_EXTENT_PREALLOC) {
+ u64 range_end;
+
+ range_end = round_up(offset + num_bytes, root->sectorsize) - 1;
+ ret = test_range_bit(io_tree, offset, range_end,
+ EXTENT_DELALLOC, 0, NULL);
+ if (ret) {
+ ret = -EAGAIN;
+ goto out;
+ }
+ }
+
btrfs_release_path(path);
/*
@@ -6650,7 +6665,6 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
*/
disk_bytenr += backref_offset;
disk_bytenr += offset - key.offset;
- num_bytes = min(offset + *len, extent_end) - offset;
if (csum_exist_in_range(root, disk_bytenr, num_bytes))
goto out;
/*
--
1.8.1.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/3] Btrfs: don't skip the page flush since the enospc is not brought by it
2014-02-27 5:58 [PATCH 1/3] Btrfs: don't skip the page flush since the enospc is not brought by it Miao Xie
2014-02-27 5:58 ` [PATCH 2/3] Btrfs: fix wrong lock range and write size in check_can_nocow() Miao Xie
2014-02-27 5:58 ` [PATCH 3/3] Btrfs: fix preallocate vs double nocow write Miao Xie
@ 2014-03-07 16:00 ` Josef Bacik
2 siblings, 0 replies; 4+ messages in thread
From: Josef Bacik @ 2014-03-07 16:00 UTC (permalink / raw)
To: Miao Xie, linux-btrfs
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 02/27/2014 12:58 AM, Miao Xie wrote:
> As we know, btrfs flushes the continuous pages as many as
> possible, but if all the free spaces are small, we will allocate
> the spaces by several times, and if there is something wrong with
> the space reservation, it is very likely that some allocations
> succeed and the others fail. But the current code doesn't take this
> case into account, and set the error flag for the pages though
> their spaces are allocated successfully. It introduces a problem
> that we can not umount the fs after the above problem happens
> because we need wait for the flush of those pages whose spaces are
> allocated. This patch fixes the above problem, and makes the btrfs
> developers happy when they investigate the problem of the space
> reservation.
>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> ---
> fs/btrfs/extent_io.c | 15 +++++++++++++-- 1 file changed, 13
> insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index
> fbe501d..97595ea 100644 --- a/fs/btrfs/extent_io.c +++
> b/fs/btrfs/extent_io.c @@ -3198,8 +3198,19 @@ static int
> __extent_writepage(struct page *page, struct writeback_control
> *wbc, &nr_written); /* File system has been set read-only */ if
> (ret) { - SetPageError(page); - goto done; + /* + *
> Private2 means we allocate the space + * for this page
> successfully, and enospc + * error doesn't set the fs to be
> R/O, so + * we can write out the page and needn't + * set
> error flag for this page. If so, we + * can prevent the umount
> task from being + * blocked. + */ + if (!(ret == -ENOSPC
> && PagePrivate2(page))) { + SetPageError(page); + goto
> done; + }
So now we just ignore ret == -ENOSPC, we still need to return this
back to the caller. Thanks,
Josef
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBAgAGBQJTGez/AAoJEANb+wAKly3B9z4QAJ7VWrt7yfW+IpjzVGViPrmb
S/Q+nuUEZjFk9huKePz33MaT76gWZlrFM6ZgfSLIEP3gPDaQIz+mrCmDBDSnD0uT
ZJypte70l74Rfmp0gBHs+kv9VQPQkeDE17PMGCJBHnqvm/P45PgYU81nWAMhh3uQ
Khsu5TDW7Tddi8FHyuD9a8TBdMlZfRLDOvWCTY/9OUPpRhaOhL8RBffw6iKfnq/t
M32nGEbZGH1y/kWWc39JNKlUwzW0m0oclznPKp0wXo+lMFmHwl76TGgBl0tvLyAs
VGhv101dH0b7vuQOerXSajVjBGFUqovIW3lDPwf4aU4XWSSebbXxeaYiEs5YMFU4
ivzXEMpYBHEteSW9TzJ0Th+D60LhG2vApsG10Ewj7xEblCjcaO8mej6L0j7citw0
WtNiSACyvIlw4zCWT3699UJlmIJ2fsrxEb5SKfE5QFpOXshhSciA8YZJyuX2EBVE
je/rUMhTLCqkuY4zKZLzJiqRhNGQQbWZn4+SmRxpg5RsuxzIZzZcWiEEfucdhB6w
7fKhtMBcY4rUzszG2otEbQaPLNvUa/u04448/gsYhvwy9II0WSkLbe3LF0IS1Z6o
DM0vUE3FAI80EL/fwyg2VvyxOOSYSE9feoX5ClJOZ5qJamdSqLI9PND04lMz6Dp0
+UUKtPx2wn6CnqO3vBeH
=94MT
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-03-07 16:00 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-27 5:58 [PATCH 1/3] Btrfs: don't skip the page flush since the enospc is not brought by it Miao Xie
2014-02-27 5:58 ` [PATCH 2/3] Btrfs: fix wrong lock range and write size in check_can_nocow() Miao Xie
2014-02-27 5:58 ` [PATCH 3/3] Btrfs: fix preallocate vs double nocow write Miao Xie
2014-03-07 16:00 ` [PATCH 1/3] Btrfs: don't skip the page flush since the enospc is not brought by it Josef Bacik
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.