From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
alan@lxorguk.ukuu.org.uk, Dmitry Monakhov <dmonakhov@openvz.org>,
"Theodore Tso" <tytso@mit.edu>
Subject: [ 01/42] ext4: race-condition protection for ext4_convert_unwritten_extents_endio
Date: Thu, 25 Oct 2012 17:05:09 -0700 [thread overview]
Message-ID: <20121026000124.939750227@linuxfoundation.org> (raw)
In-Reply-To: <20121026000124.790781113@linuxfoundation.org>
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Monakhov <dmonakhov@openvz.org>
commit dee1f973ca341c266229faa5a1a5bb268bed3531 upstream.
We assumed that at the time we call ext4_convert_unwritten_extents_endio()
extent in question is fully inside [map.m_lblk, map->m_len] because
it was already split during submission. But this may not be true due to
a race between writeback vs fallocate.
If extent in question is larger than requested we will split it again.
Special precautions should being done if zeroout required because
[map.m_lblk, map->m_len] already contains valid data.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ext4/extents.c | 57 +++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 46 insertions(+), 11 deletions(-)
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -52,6 +52,9 @@
#define EXT4_EXT_MARK_UNINIT1 0x2 /* mark first half uninitialized */
#define EXT4_EXT_MARK_UNINIT2 0x4 /* mark second half uninitialized */
+#define EXT4_EXT_DATA_VALID1 0x8 /* first half contains valid data */
+#define EXT4_EXT_DATA_VALID2 0x10 /* second half contains valid data */
+
static int ext4_split_extent(handle_t *handle,
struct inode *inode,
struct ext4_ext_path *path,
@@ -2829,6 +2832,9 @@ static int ext4_split_extent_at(handle_t
unsigned int ee_len, depth;
int err = 0;
+ BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)) ==
+ (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2));
+
ext_debug("ext4_split_extents_at: inode %lu, logical"
"block %llu\n", inode->i_ino, (unsigned long long)split);
@@ -2887,7 +2893,14 @@ static int ext4_split_extent_at(handle_t
err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
- err = ext4_ext_zeroout(inode, &orig_ex);
+ if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
+ if (split_flag & EXT4_EXT_DATA_VALID1)
+ err = ext4_ext_zeroout(inode, ex2);
+ else
+ err = ext4_ext_zeroout(inode, ex);
+ } else
+ err = ext4_ext_zeroout(inode, &orig_ex);
+
if (err)
goto fix_extent_len;
/* update the extent length and mark as initialized */
@@ -2940,12 +2953,13 @@ static int ext4_split_extent(handle_t *h
uninitialized = ext4_ext_is_uninitialized(ex);
if (map->m_lblk + map->m_len < ee_block + ee_len) {
- split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT ?
- EXT4_EXT_MAY_ZEROOUT : 0;
+ split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;
if (uninitialized)
split_flag1 |= EXT4_EXT_MARK_UNINIT1 |
EXT4_EXT_MARK_UNINIT2;
+ if (split_flag & EXT4_EXT_DATA_VALID2)
+ split_flag1 |= EXT4_EXT_DATA_VALID1;
err = ext4_split_extent_at(handle, inode, path,
map->m_lblk + map->m_len, split_flag1, flags1);
if (err)
@@ -2958,8 +2972,8 @@ static int ext4_split_extent(handle_t *h
return PTR_ERR(path);
if (map->m_lblk >= ee_block) {
- split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT ?
- EXT4_EXT_MAY_ZEROOUT : 0;
+ split_flag1 = split_flag & (EXT4_EXT_MAY_ZEROOUT |
+ EXT4_EXT_DATA_VALID2);
if (uninitialized)
split_flag1 |= EXT4_EXT_MARK_UNINIT1;
if (split_flag & EXT4_EXT_MARK_UNINIT2)
@@ -3237,26 +3251,47 @@ static int ext4_split_unwritten_extents(
split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
split_flag |= EXT4_EXT_MARK_UNINIT2;
-
+ if (flags & EXT4_GET_BLOCKS_CONVERT)
+ split_flag |= EXT4_EXT_DATA_VALID2;
flags |= EXT4_GET_BLOCKS_PRE_IO;
return ext4_split_extent(handle, inode, path, map, split_flag, flags);
}
static int ext4_convert_unwritten_extents_endio(handle_t *handle,
- struct inode *inode,
- struct ext4_ext_path *path)
+ struct inode *inode,
+ struct ext4_map_blocks *map,
+ struct ext4_ext_path *path)
{
struct ext4_extent *ex;
+ ext4_lblk_t ee_block;
+ unsigned int ee_len;
int depth;
int err = 0;
depth = ext_depth(inode);
ex = path[depth].p_ext;
+ ee_block = le32_to_cpu(ex->ee_block);
+ ee_len = ext4_ext_get_actual_len(ex);
ext_debug("ext4_convert_unwritten_extents_endio: inode %lu, logical"
"block %llu, max_blocks %u\n", inode->i_ino,
- (unsigned long long)le32_to_cpu(ex->ee_block),
- ext4_ext_get_actual_len(ex));
+ (unsigned long long)ee_block, ee_len);
+
+ /* If extent is larger than requested then split is required */
+ if (ee_block != map->m_lblk || ee_len > map->m_len) {
+ err = ext4_split_unwritten_extents(handle, inode, map, path,
+ EXT4_GET_BLOCKS_CONVERT);
+ if (err < 0)
+ goto out;
+ ext4_ext_drop_refs(path);
+ path = ext4_ext_find_extent(inode, map->m_lblk, path);
+ if (IS_ERR(path)) {
+ err = PTR_ERR(path);
+ goto out;
+ }
+ depth = ext_depth(inode);
+ ex = path[depth].p_ext;
+ }
err = ext4_ext_get_access(handle, inode, path + depth);
if (err)
@@ -3564,7 +3599,7 @@ ext4_ext_handle_uninitialized_extents(ha
}
/* IO end_io complete, convert the filled extent to written */
if ((flags & EXT4_GET_BLOCKS_CONVERT)) {
- ret = ext4_convert_unwritten_extents_endio(handle, inode,
+ ret = ext4_convert_unwritten_extents_endio(handle, inode, map,
path);
if (ret >= 0) {
ext4_update_inode_fsync_trans(handle, inode, 1);
next prev parent reply other threads:[~2012-10-26 0:05 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-26 0:05 [ 00/42] 3.4.16-stable review Greg Kroah-Hartman
2012-10-26 0:05 ` Greg Kroah-Hartman [this message]
2012-10-26 0:05 ` [ 02/42] ext4: Avoid underflow in ext4_trim_fs() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 03/42] nohz: Fix idle ticks in cpu summary line of /proc/stat Greg Kroah-Hartman
2012-10-26 0:05 ` [ 04/42] arch/tile: avoid generating .eh_frame information in modules Greg Kroah-Hartman
2012-10-26 0:05 ` [ 05/42] NLM: nlm_lookup_file() may return NLMv4-specific error codes Greg Kroah-Hartman
2012-10-26 0:05 ` [ 06/42] oprofile, x86: Fix wrapping bug in op_x86_get_ctrl() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 07/42] s390: fix linker script for 31 bit builds Greg Kroah-Hartman
2012-10-26 0:05 ` [ 08/42] SUNRPC: Prevent kernel stack corruption on long values of flush Greg Kroah-Hartman
2012-10-26 0:05 ` [ 09/42] SUNRPC: Fix a UDP transport regression Greg Kroah-Hartman
2012-10-26 0:05 ` [ 10/42] pcmcia: sharpsl: dont discard sharpsl_pcmcia_ops Greg Kroah-Hartman
2012-10-26 0:05 ` [ 11/42] kernel/sys.c: fix stack memory content leak via UNAME26 Greg Kroah-Hartman
2012-10-26 0:05 ` [ 13/42] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping Greg Kroah-Hartman
2012-10-26 0:05 ` [ 14/42] xen/x86: dont corrupt %eip when returning from a signal handler Greg Kroah-Hartman
2012-10-26 0:05 ` [ 15/42] USB: cdc-acm: fix pipe type of write endpoint Greg Kroah-Hartman
2012-10-26 0:05 ` [ 16/42] usb: acm: fix the computation of the number of data bits Greg Kroah-Hartman
2012-10-26 0:05 ` [ 17/42] usb: host: xhci: New system added for Compliance Mode Patch on SN65LVPE502CP Greg Kroah-Hartman
2012-10-26 0:05 ` [ 18/42] USB: option: blacklist net interface on ZTE devices Greg Kroah-Hartman
2012-10-26 0:05 ` [ 19/42] USB: option: add more " Greg Kroah-Hartman
2012-10-26 0:05 ` [ 20/42] cgroup: notify_on_release may not be triggered in some cases Greg Kroah-Hartman
2012-10-26 0:05 ` [ 21/42] Revert "cgroup: Remove task_lock() from cgroup_post_fork()" Greg Kroah-Hartman
2012-10-26 0:05 ` [ 22/42] Revert "cgroup: Drop task_lock(parent) on cgroup_fork()" Greg Kroah-Hartman
2012-10-26 0:05 ` [ 23/42] pinctrl: tegra: correct bank for pingroup and drv pingroup Greg Kroah-Hartman
2012-10-26 0:05 ` [ 24/42] pinctrl: tegra: set low power mode bank width to 2 Greg Kroah-Hartman
2012-10-26 0:05 ` [ 25/42] iommu/tegra: smmu: Fix deadly typo Greg Kroah-Hartman
2012-10-26 0:05 ` [ 26/42] amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[] Greg Kroah-Hartman
2012-10-26 0:05 ` [ 27/42] usb: dwc3: gadget: fix endpoint always busy bug Greg Kroah-Hartman
2012-10-26 0:05 ` [ 28/42] media: au0828: fix case where STREAMOFF being called on stopped stream causes BUG() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 29/42] netlink: add reference of module in netlink_dump_start Greg Kroah-Hartman
2012-10-26 0:05 ` [ 30/42] infiniband: pass rdma_cm module to netlink_dump_start Greg Kroah-Hartman
2012-10-26 0:05 ` [ 31/42] net: Fix skb_under_panic oops in neigh_resolve_output Greg Kroah-Hartman
2012-10-26 0:05 ` [ 32/42] skge: Add DMA mask quirk for Marvell 88E8001 on ASUS P5NSLI motherboard Greg Kroah-Hartman
2012-10-26 0:05 ` [ 33/42] vlan: dont deliver frames for unknown vlans to protocols Greg Kroah-Hartman
2012-10-26 0:05 ` [ 34/42] RDS: fix rds-ping spinlock recursion Greg Kroah-Hartman
2012-10-26 0:05 ` [ 35/42] tcp: resets are misrouted Greg Kroah-Hartman
2012-10-26 0:05 ` [ 36/42] ipv6: addrconf: fix /proc/net/if_inet6 Greg Kroah-Hartman
2012-10-26 0:05 ` [ 37/42] sparc64: fix ptrace interaction with force_successful_syscall_return() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 38/42] sparc64: Like x86 we should check current->mm during perf backtrace generation Greg Kroah-Hartman
2012-10-26 0:05 ` [ 39/42] sparc64: Fix bit twiddling in sparc_pmu_enable_event() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 40/42] sparc64: do not clobber personality flags in sys_sparc64_personality() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 41/42] sparc64: Be less verbose during vmemmap population Greg Kroah-Hartman
2012-10-26 0:05 ` [ 42/42] mtd: nand: allow NAND_NO_SUBPAGE_WRITE to be set from driver Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121026000124.939750227@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=dmonakhov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).