From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
alan@lxorguk.ukuu.org.uk, Dmitry Monakhov <dmonakhov@openvz.org>,
"Theodore Tso" <tytso@mit.edu>
Subject: [ 01/42] ext4: race-condition protection for ext4_convert_unwritten_extents_endio
Date: Thu, 25 Oct 2012 17:05:09 -0700 [thread overview]
Message-ID: <20121026000124.939750227@linuxfoundation.org> (raw)
In-Reply-To: <20121026000124.790781113@linuxfoundation.org>
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Monakhov <dmonakhov@openvz.org>
commit dee1f973ca341c266229faa5a1a5bb268bed3531 upstream.
We assumed that at the time we call ext4_convert_unwritten_extents_endio()
extent in question is fully inside [map.m_lblk, map->m_len] because
it was already split during submission. But this may not be true due to
a race between writeback vs fallocate.
If extent in question is larger than requested we will split it again.
Special precautions should being done if zeroout required because
[map.m_lblk, map->m_len] already contains valid data.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ext4/extents.c | 57 +++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 46 insertions(+), 11 deletions(-)
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -52,6 +52,9 @@
#define EXT4_EXT_MARK_UNINIT1 0x2 /* mark first half uninitialized */
#define EXT4_EXT_MARK_UNINIT2 0x4 /* mark second half uninitialized */
+#define EXT4_EXT_DATA_VALID1 0x8 /* first half contains valid data */
+#define EXT4_EXT_DATA_VALID2 0x10 /* second half contains valid data */
+
static int ext4_split_extent(handle_t *handle,
struct inode *inode,
struct ext4_ext_path *path,
@@ -2829,6 +2832,9 @@ static int ext4_split_extent_at(handle_t
unsigned int ee_len, depth;
int err = 0;
+ BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)) ==
+ (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2));
+
ext_debug("ext4_split_extents_at: inode %lu, logical"
"block %llu\n", inode->i_ino, (unsigned long long)split);
@@ -2887,7 +2893,14 @@ static int ext4_split_extent_at(handle_t
err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
- err = ext4_ext_zeroout(inode, &orig_ex);
+ if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
+ if (split_flag & EXT4_EXT_DATA_VALID1)
+ err = ext4_ext_zeroout(inode, ex2);
+ else
+ err = ext4_ext_zeroout(inode, ex);
+ } else
+ err = ext4_ext_zeroout(inode, &orig_ex);
+
if (err)
goto fix_extent_len;
/* update the extent length and mark as initialized */
@@ -2940,12 +2953,13 @@ static int ext4_split_extent(handle_t *h
uninitialized = ext4_ext_is_uninitialized(ex);
if (map->m_lblk + map->m_len < ee_block + ee_len) {
- split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT ?
- EXT4_EXT_MAY_ZEROOUT : 0;
+ split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;
if (uninitialized)
split_flag1 |= EXT4_EXT_MARK_UNINIT1 |
EXT4_EXT_MARK_UNINIT2;
+ if (split_flag & EXT4_EXT_DATA_VALID2)
+ split_flag1 |= EXT4_EXT_DATA_VALID1;
err = ext4_split_extent_at(handle, inode, path,
map->m_lblk + map->m_len, split_flag1, flags1);
if (err)
@@ -2958,8 +2972,8 @@ static int ext4_split_extent(handle_t *h
return PTR_ERR(path);
if (map->m_lblk >= ee_block) {
- split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT ?
- EXT4_EXT_MAY_ZEROOUT : 0;
+ split_flag1 = split_flag & (EXT4_EXT_MAY_ZEROOUT |
+ EXT4_EXT_DATA_VALID2);
if (uninitialized)
split_flag1 |= EXT4_EXT_MARK_UNINIT1;
if (split_flag & EXT4_EXT_MARK_UNINIT2)
@@ -3237,26 +3251,47 @@ static int ext4_split_unwritten_extents(
split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
split_flag |= EXT4_EXT_MARK_UNINIT2;
-
+ if (flags & EXT4_GET_BLOCKS_CONVERT)
+ split_flag |= EXT4_EXT_DATA_VALID2;
flags |= EXT4_GET_BLOCKS_PRE_IO;
return ext4_split_extent(handle, inode, path, map, split_flag, flags);
}
static int ext4_convert_unwritten_extents_endio(handle_t *handle,
- struct inode *inode,
- struct ext4_ext_path *path)
+ struct inode *inode,
+ struct ext4_map_blocks *map,
+ struct ext4_ext_path *path)
{
struct ext4_extent *ex;
+ ext4_lblk_t ee_block;
+ unsigned int ee_len;
int depth;
int err = 0;
depth = ext_depth(inode);
ex = path[depth].p_ext;
+ ee_block = le32_to_cpu(ex->ee_block);
+ ee_len = ext4_ext_get_actual_len(ex);
ext_debug("ext4_convert_unwritten_extents_endio: inode %lu, logical"
"block %llu, max_blocks %u\n", inode->i_ino,
- (unsigned long long)le32_to_cpu(ex->ee_block),
- ext4_ext_get_actual_len(ex));
+ (unsigned long long)ee_block, ee_len);
+
+ /* If extent is larger than requested then split is required */
+ if (ee_block != map->m_lblk || ee_len > map->m_len) {
+ err = ext4_split_unwritten_extents(handle, inode, map, path,
+ EXT4_GET_BLOCKS_CONVERT);
+ if (err < 0)
+ goto out;
+ ext4_ext_drop_refs(path);
+ path = ext4_ext_find_extent(inode, map->m_lblk, path);
+ if (IS_ERR(path)) {
+ err = PTR_ERR(path);
+ goto out;
+ }
+ depth = ext_depth(inode);
+ ex = path[depth].p_ext;
+ }
err = ext4_ext_get_access(handle, inode, path + depth);
if (err)
@@ -3564,7 +3599,7 @@ ext4_ext_handle_uninitialized_extents(ha
}
/* IO end_io complete, convert the filled extent to written */
if ((flags & EXT4_GET_BLOCKS_CONVERT)) {
- ret = ext4_convert_unwritten_extents_endio(handle, inode,
+ ret = ext4_convert_unwritten_extents_endio(handle, inode, map,
path);
if (ret >= 0) {
ext4_update_inode_fsync_trans(handle, inode, 1);
next prev parent reply other threads:[~2012-10-26 0:06 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-26 0:05 [ 00/42] 3.4.16-stable review Greg Kroah-Hartman
2012-10-26 0:05 ` Greg Kroah-Hartman [this message]
2012-10-26 0:05 ` [ 02/42] ext4: Avoid underflow in ext4_trim_fs() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 03/42] nohz: Fix idle ticks in cpu summary line of /proc/stat Greg Kroah-Hartman
2012-10-26 0:05 ` [ 04/42] arch/tile: avoid generating .eh_frame information in modules Greg Kroah-Hartman
2012-10-26 0:05 ` [ 05/42] NLM: nlm_lookup_file() may return NLMv4-specific error codes Greg Kroah-Hartman
2012-10-26 0:05 ` [ 06/42] oprofile, x86: Fix wrapping bug in op_x86_get_ctrl() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 07/42] s390: fix linker script for 31 bit builds Greg Kroah-Hartman
2012-10-26 0:05 ` [ 08/42] SUNRPC: Prevent kernel stack corruption on long values of flush Greg Kroah-Hartman
2012-10-26 0:05 ` [ 09/42] SUNRPC: Fix a UDP transport regression Greg Kroah-Hartman
2012-10-26 0:05 ` [ 10/42] pcmcia: sharpsl: dont discard sharpsl_pcmcia_ops Greg Kroah-Hartman
2012-10-26 0:05 ` [ 11/42] kernel/sys.c: fix stack memory content leak via UNAME26 Greg Kroah-Hartman
2012-10-26 0:05 ` [ 12/42] use clamp_t in UNAME26 fix Greg Kroah-Hartman
2012-10-26 0:05 ` [ 13/42] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping Greg Kroah-Hartman
2012-10-26 0:05 ` [ 14/42] xen/x86: dont corrupt %eip when returning from a signal handler Greg Kroah-Hartman
2012-10-26 0:05 ` [ 15/42] USB: cdc-acm: fix pipe type of write endpoint Greg Kroah-Hartman
2012-10-26 0:05 ` [ 16/42] usb: acm: fix the computation of the number of data bits Greg Kroah-Hartman
2012-10-26 0:05 ` [ 17/42] usb: host: xhci: New system added for Compliance Mode Patch on SN65LVPE502CP Greg Kroah-Hartman
2012-10-26 0:05 ` [ 18/42] USB: option: blacklist net interface on ZTE devices Greg Kroah-Hartman
2012-10-26 0:05 ` [ 19/42] USB: option: add more " Greg Kroah-Hartman
2012-10-26 0:05 ` [ 20/42] cgroup: notify_on_release may not be triggered in some cases Greg Kroah-Hartman
2012-10-26 0:05 ` [ 21/42] Revert "cgroup: Remove task_lock() from cgroup_post_fork()" Greg Kroah-Hartman
2012-10-26 0:05 ` [ 22/42] Revert "cgroup: Drop task_lock(parent) on cgroup_fork()" Greg Kroah-Hartman
2012-10-26 0:05 ` [ 23/42] pinctrl: tegra: correct bank for pingroup and drv pingroup Greg Kroah-Hartman
2012-10-26 0:05 ` [ 24/42] pinctrl: tegra: set low power mode bank width to 2 Greg Kroah-Hartman
2012-10-26 0:05 ` [ 25/42] iommu/tegra: smmu: Fix deadly typo Greg Kroah-Hartman
2012-10-26 0:05 ` [ 26/42] amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[] Greg Kroah-Hartman
2012-10-26 0:05 ` [ 27/42] usb: dwc3: gadget: fix endpoint always busy bug Greg Kroah-Hartman
2012-10-26 0:05 ` [ 28/42] media: au0828: fix case where STREAMOFF being called on stopped stream causes BUG() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 29/42] netlink: add reference of module in netlink_dump_start Greg Kroah-Hartman
2012-10-26 0:05 ` [ 30/42] infiniband: pass rdma_cm module to netlink_dump_start Greg Kroah-Hartman
2012-10-26 0:05 ` [ 31/42] net: Fix skb_under_panic oops in neigh_resolve_output Greg Kroah-Hartman
2012-10-26 0:05 ` [ 32/42] skge: Add DMA mask quirk for Marvell 88E8001 on ASUS P5NSLI motherboard Greg Kroah-Hartman
2012-10-26 0:05 ` [ 33/42] vlan: dont deliver frames for unknown vlans to protocols Greg Kroah-Hartman
2012-10-26 0:05 ` [ 34/42] RDS: fix rds-ping spinlock recursion Greg Kroah-Hartman
2012-10-26 0:05 ` [ 35/42] tcp: resets are misrouted Greg Kroah-Hartman
2012-10-26 0:05 ` [ 36/42] ipv6: addrconf: fix /proc/net/if_inet6 Greg Kroah-Hartman
2012-10-26 0:05 ` [ 37/42] sparc64: fix ptrace interaction with force_successful_syscall_return() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 38/42] sparc64: Like x86 we should check current->mm during perf backtrace generation Greg Kroah-Hartman
2012-10-26 0:05 ` [ 39/42] sparc64: Fix bit twiddling in sparc_pmu_enable_event() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 40/42] sparc64: do not clobber personality flags in sys_sparc64_personality() Greg Kroah-Hartman
2012-10-26 0:05 ` [ 41/42] sparc64: Be less verbose during vmemmap population Greg Kroah-Hartman
2012-10-26 0:05 ` [ 42/42] mtd: nand: allow NAND_NO_SUBPAGE_WRITE to be set from driver Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121026000124.939750227@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=dmonakhov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.