From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: akpm@linux-foundation.org,
"George Barnett" <gbarnett@atlassian.com>,
"Theodore Ts'o" <tytso@mit.edu>
Subject: [017/118] ext4/jbd2: don't wait (forever) for stale tid caused by wraparound
Date: Fri, 10 May 2013 14:39:41 +0100 [thread overview]
Message-ID: <lsq.1368193181.373182929@decadent.org.uk> (raw)
In-Reply-To: <lsq.1368193179.973009354@decadent.org.uk>
3.2.45-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o <tytso@mit.edu>
commit d76a3a77113db020d9bb1e894822869410450bd9 upstream.
In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.
Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().
Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.
As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: George Barnett <gbarnett@atlassian.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
fs/ext4/fsync.c | 3 +--
fs/ext4/inode.c | 3 +--
fs/jbd2/journal.c | 31 +++++++++++++++++++++++++++++++
include/linux/jbd2.h | 1 +
4 files changed, 34 insertions(+), 4 deletions(-)
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -260,8 +260,7 @@ int ext4_sync_file(struct file *file, lo
if (journal->j_flags & JBD2_BARRIER &&
!jbd2_trans_will_send_data_barrier(journal, commit_tid))
needs_barrier = true;
- jbd2_log_start_commit(journal, commit_tid);
- ret = jbd2_log_wait_commit(journal, commit_tid);
+ ret = jbd2_complete_transaction(journal, commit_tid);
if (needs_barrier)
blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
out:
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -147,8 +147,7 @@ void ext4_evict_inode(struct inode *inod
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
tid_t commit_tid = EXT4_I(inode)->i_datasync_tid;
- jbd2_log_start_commit(journal, commit_tid);
- jbd2_log_wait_commit(journal, commit_tid);
+ jbd2_complete_transaction(journal, commit_tid);
filemap_write_and_wait(&inode->i_data);
}
truncate_inode_pages(&inode->i_data, 0);
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -663,6 +663,37 @@ int jbd2_log_wait_commit(journal_t *jour
}
/*
+ * When this function returns the transaction corresponding to tid
+ * will be completed. If the transaction has currently running, start
+ * committing that transaction before waiting for it to complete. If
+ * the transaction id is stale, it is by definition already completed,
+ * so just return SUCCESS.
+ */
+int jbd2_complete_transaction(journal_t *journal, tid_t tid)
+{
+ int need_to_wait = 1;
+
+ read_lock(&journal->j_state_lock);
+ if (journal->j_running_transaction &&
+ journal->j_running_transaction->t_tid == tid) {
+ if (journal->j_commit_request != tid) {
+ /* transaction not yet started, so request it */
+ read_unlock(&journal->j_state_lock);
+ jbd2_log_start_commit(journal, tid);
+ goto wait_commit;
+ }
+ } else if (!(journal->j_committing_transaction &&
+ journal->j_committing_transaction->t_tid == tid))
+ need_to_wait = 0;
+ read_unlock(&journal->j_state_lock);
+ if (!need_to_wait)
+ return 0;
+wait_commit:
+ return jbd2_log_wait_commit(journal, tid);
+}
+EXPORT_SYMBOL(jbd2_complete_transaction);
+
+/*
* Log buffer allocation routines:
*/
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1165,6 +1165,7 @@ int __jbd2_log_start_commit(journal_t *j
int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
int jbd2_journal_force_commit_nested(journal_t *journal);
int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
+int jbd2_complete_transaction(journal_t *journal, tid_t tid);
int jbd2_log_do_checkpoint(journal_t *journal);
int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
next prev parent reply other threads:[~2013-05-10 14:51 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-10 13:39 [000/118] 3.2.45-rc1 review Ben Hutchings
2013-05-10 13:39 ` [078/118] perf: Treat attr.config as u64 in perf_swevent_init() Ben Hutchings
2013-05-10 13:39 ` [021/118] hrtimer: Add expiry time overflow check in hrtimer_interrupt Ben Hutchings
2013-05-10 13:39 ` [007/118] Wrong asm register contraints in the futex implementation Ben Hutchings
2013-05-10 13:39 ` [069/118] fs/dcache.c: add cond_resched() to shrink_dcache_parent() Ben Hutchings
2013-05-10 13:39 ` [051/118] nfsd: Decode and send 64bit time values Ben Hutchings
2013-05-10 13:39 ` [033/118] PCI/PM: Fix fallback to PCI_D0 in pci_platform_power_transition() Ben Hutchings
2013-05-10 13:39 ` [091/118] af_unix: If we don't care about credentials coallesce all messages Ben Hutchings
2013-05-10 13:39 ` [071/118] drm/radeon: fix endian bugs in atom_allocate_fb_scratch() Ben Hutchings
2013-05-10 13:39 ` [094/118] tcp: incoming connections might use wrong route under synflood Ben Hutchings
2013-05-10 13:39 ` [047/118] USB: ftdi_sio: correct ST Micro Connect Lite PIDs Ben Hutchings
2013-05-10 13:39 ` [004/118] tracing: Fix stack tracer with fentry use Ben Hutchings
2013-05-10 13:39 ` [082/118] vm: convert snd_pcm_lib_mmap_iomem() to vm_iomap_memory() helper Ben Hutchings
2013-05-10 13:39 ` [050/118] i2c: xiic: must always write 16-bit words to TX_FIFO Ben Hutchings
2013-05-10 13:39 ` [107/118] rose: fix info leak via msg_name in rose_recvmsg() Ben Hutchings
2013-05-10 13:39 ` [117/118] r8169: fix vlan tag read ordering Ben Hutchings
2013-05-10 13:39 ` [055/118] ALSA: snd-usb: try harder to find USB_DT_CS_ENDPOINT Ben Hutchings
2013-05-10 13:39 ` [116/118] powerpc: fix numa distance for form0 device tree Ben Hutchings
2013-05-10 13:39 ` [108/118] tipc: fix info leaks via msg_name in recv_msg/recv_stream Ben Hutchings
2013-05-10 13:39 ` [056/118] gianfar: do not advertise any alarm capability Ben Hutchings
2013-05-10 13:39 ` [046/118] drm/radeon: fix hdmi mode enable on RS600/RS690/RS740 Ben Hutchings
2013-05-10 13:39 ` [038/118] ASoC: max98088: Fix logging of hardware revision Ben Hutchings
2013-05-10 13:39 ` [103/118] irda: Fix missing msg_namelen update in irda_recvmsg_dgram() Ben Hutchings
2013-05-10 13:39 ` [085/118] cbq: incorrect processing of high limits Ben Hutchings
2013-05-10 13:39 ` [060/118] powerpc: Add isync to copy_and_flush Ben Hutchings
2013-05-10 13:39 ` [014/118] sysfs: fix use after free in case of concurrent read/write and readdir Ben Hutchings
2013-05-10 13:39 ` [034/118] wireless: regulatory: fix channel disabling race condition Ben Hutchings
2013-05-10 13:39 ` [072/118] drm/radeon: fix possible segfault when parsing pm tables Ben Hutchings
2013-05-10 13:39 ` [058/118] clockevents: Set dummy handler on CPU_DEAD shutdown Ben Hutchings
2013-05-10 13:39 ` [012/118] drm/radeon: use frac fb div on RS780/RS880 Ben Hutchings
2013-05-10 13:39 ` [061/118] s390/memory hotplug: prevent offline of active memory increments Ben Hutchings
2013-05-10 13:39 ` [049/118] usb-storage: CY7C68300A chips do not support Cypress ATACB Ben Hutchings
2013-05-10 13:39 ` [066/118] drivers/rtc/rtc-cmos.c: don't disable hpet emulation on suspend Ben Hutchings
2013-05-10 13:39 ` [118/118] x86/mm: account for PGDIR_SIZE alignment Ben Hutchings
2013-05-10 13:39 ` [041/118] drm/i915: ensure single initialization and cleanup of backlight device Ben Hutchings
2013-05-10 13:39 ` [095/118] esp4: fix error return code in esp_output() Ben Hutchings
2013-05-10 13:39 ` [109/118] netrom: fix invalid use of sizeof in nr_recvmsg() Ben Hutchings
2013-05-10 13:39 ` [113/118] drm/i915: Fix detection of base of stolen memory Ben Hutchings
2013-05-10 13:39 ` [045/118] ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG Ben Hutchings
2013-05-10 13:39 ` [112/118] r8169: fix 8168evl frame padding Ben Hutchings
2013-05-10 13:39 ` [044/118] LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot Ben Hutchings
2013-05-10 13:39 ` [039/118] usbfs: Always allow ctrl requests with USB_RECIP_ENDPOINT on the ctrl ep Ben Hutchings
2013-05-10 13:39 ` [075/118] TTY: fix atime/mtime regression Ben Hutchings
2013-05-10 13:39 ` [059/118] ixgbe: fix EICR write in ixgbe_msix_other Ben Hutchings
2013-05-10 13:39 ` [083/118] vm: convert fb_mmap to vm_iomap_memory() helper Ben Hutchings
2013-05-10 13:39 ` [031/118] drm/radeon: cleanup properly if mmio mapping fails Ben Hutchings
2013-05-10 13:39 ` [002/118] aio: fix possible invalid memory access when DEBUG is enabled Ben Hutchings
2013-05-10 13:39 ` [037/118] xen/time: Fix kasprintf splat when allocating timer%d IRQ line Ben Hutchings
2013-05-10 13:39 ` [027/118] tracing: Check return value of tracing_init_dentry() Ben Hutchings
2013-05-10 13:39 ` [114/118] ixgbe: add missing rtnl_lock in PM resume path Ben Hutchings
2013-05-10 13:39 ` [063/118] mwifiex: Call pci_release_region after calling pci_disable_device Ben Hutchings
2013-05-10 13:39 ` [020/118] USB: add ftdi_sio USB ID for GDM Boost V1.x Ben Hutchings
2013-05-10 13:39 ` [106/118] netrom: fix info leak via msg_name in nr_recvmsg() Ben Hutchings
2013-05-10 13:39 ` [081/118] vm: add vm_iomap_memory() helper function Ben Hutchings
2013-05-10 13:39 ` [062/118] mwifiex: Use pci_release_region() instead of a pci_release_regions() Ben Hutchings
2013-05-10 13:39 ` [010/118] PCI / ACPI: Don't query OSC support with all possible controls Ben Hutchings
2013-05-10 13:39 ` [111/118] sparc64: Fix race in TLB batch processing Ben Hutchings
2013-05-10 13:39 ` [088/118] atl1e: limit gso segment size to prevent generation of wrong ip length fields Ben Hutchings
2013-05-10 13:39 ` [110/118] net: drop dst before queueing fragments Ben Hutchings
2013-05-10 13:39 ` [003/118] tracing: Use stack of calling function for stack tracer Ben Hutchings
2013-05-10 13:39 ` [054/118] USB: ftdi_sio: enable two UART ports on ST Microconnect Lite Ben Hutchings
2013-05-10 13:39 ` [030/118] drm/radeon/evergreen+: don't enable HPD interrupts on eDP/LVDS Ben Hutchings
2013-05-10 13:39 ` [009/118] cgroup: fix an off-by-one bug which may trigger BUG_ON() Ben Hutchings
2013-05-10 13:39 ` [026/118] tracing: Reset ftrace_graph_filter_enabled if count is zero Ben Hutchings
2013-05-10 13:39 ` [070/118] ipc: sysv shared memory limited to 8TiB Ben Hutchings
2013-05-10 13:39 ` [105/118] llc: Fix missing msg_namelen update in llc_ui_recvmsg() Ben Hutchings
2013-05-10 13:39 ` [104/118] iucv: Fix missing msg_namelen update in iucv_sock_recvmsg() Ben Hutchings
2013-05-10 13:39 ` [040/118] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs Ben Hutchings
2013-05-10 13:39 ` [057/118] ALSA: usb-audio: Fix autopm error during probing Ben Hutchings
2013-05-10 13:39 ` [100/118] Bluetooth: fix possible info leak in bt_sock_recvmsg() Ben Hutchings
2013-05-10 13:39 ` [025/118] USB: option: add a D-Link DWM-156 variant Ben Hutchings
2013-05-10 13:39 ` [006/118] tracing: Fix ftrace_dump() Ben Hutchings
2013-05-10 13:39 ` [016/118] nfsd: don't run get_file if nfs4_preprocess_stateid_op return error Ben Hutchings
2013-05-10 13:39 ` [036/118] xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline Ben Hutchings
2013-05-10 13:39 ` [013/118] Fix initialization of CMCI/CMCP interrupts Ben Hutchings
2013-05-10 13:39 ` [035/118] xen/smp: Fix leakage of timer interrupt line for every CPU online/offline Ben Hutchings
2013-05-10 13:39 ` [101/118] Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg() Ben Hutchings
2013-05-10 13:39 ` [089/118] bonding: fix bonding_masters race condition in bond unloading Ben Hutchings
2013-05-10 13:39 ` [005/118] tracing: Remove most or all of stack tracer stack size from stack_max_size Ben Hutchings
2013-05-10 13:39 ` [093/118] rtnetlink: Call nlmsg_parse() with correct header length Ben Hutchings
2013-05-10 13:39 ` [032/118] serial_core.c: add put_device() after device_find_child() Ben Hutchings
2013-05-10 13:39 ` [023/118] nfsd4: don't close read-write opens too soon Ben Hutchings
2013-05-10 13:39 ` [067/118] md: bad block list should default to disabled Ben Hutchings
2013-05-10 13:39 ` [102/118] caif: Fix missing msg_namelen update in caif_seqpkt_recvmsg() Ben Hutchings
2013-05-10 13:39 ` [090/118] bonding: IFF_BONDING is not stripped on enslave failure Ben Hutchings
2013-05-10 13:39 ` [073/118] drm/radeon: fix handling of v6 power tables Ben Hutchings
2013-05-10 13:39 ` [024/118] tracing: Fix off-by-one on allocating stat->pages Ben Hutchings
2013-05-10 13:39 ` [080/118] s390: move dummy io_remap_pfn_range() to asm/pgtable.h Ben Hutchings
2013-05-10 13:39 ` [043/118] drm/i915: Fall back to bit banging mode for DVO transmitter detection Ben Hutchings
2013-05-10 13:39 ` [076/118] tty: fix up atime/mtime mess, take three Ben Hutchings
2013-05-10 13:39 ` [042/118] iwlwifi: dvm: don't send zeroed LQ cmd Ben Hutchings
2013-05-10 13:39 ` [096/118] net: sctp: sctp_auth_key_put: use kzfree instead of kfree Ben Hutchings
2013-05-10 13:39 ` [068/118] inotify: invalid mask should return a error number but not set it Ben Hutchings
2013-05-10 13:39 ` [018/118] jbd2: fix race between jbd2_journal_remove_checkpoint and ->j_commit_callback Ben Hutchings
2013-05-10 13:39 ` [097/118] tcp: call tcp_replace_ts_recent() from tcp_ack() Ben Hutchings
2013-05-10 13:39 ` [087/118] net: count hw_addr syncs so that unsync works properly Ben Hutchings
2013-05-10 13:39 ` [019/118] drm/i915: Add no-lvds quirk for Fujitsu Esprimo Q900 Ben Hutchings
2013-05-10 13:39 ` [011/118] drm/radeon: don't use get_engine_clock() on APUs Ben Hutchings
2013-05-10 13:39 ` [052/118] fbcon: when font is freed, clear also vc_font.data Ben Hutchings
2013-05-10 13:39 ` [074/118] TTY: do not update atime/mtime on read/write Ben Hutchings
2013-05-10 13:39 ` [115/118] kernel/audit_tree.c: tree will leak memory when failure occurs in audit_trim_trees() Ben Hutchings
2013-05-10 13:39 ` [099/118] ax25: fix info leak via msg_name in ax25_recvmsg() Ben Hutchings
2013-05-10 13:39 ` [008/118] Wrong asm register contraints in the kvm implementation Ben Hutchings
2013-05-10 13:39 ` [092/118] netfilter: don't reset nf_trace in nf_reset() Ben Hutchings
2013-05-10 13:39 ` [086/118] net IPv6 : Fix broken IPv6 routing table after loopback down-up Ben Hutchings
2013-05-10 13:39 ` [077/118] perf: Fix error return code Ben Hutchings
2013-05-10 13:39 ` [053/118] powerpc/spufs: Initialise inode->i_ino in spufs_new_inode() Ben Hutchings
2013-05-10 13:39 ` [022/118] hrtimer: Fix ktime_add_ns() overflow on 32bit architectures Ben Hutchings
2013-05-10 13:39 ` [028/118] ALSA: usb: Add quirk for 192KHz recording on E-Mu devices Ben Hutchings
2013-05-10 13:39 ` [001/118] crypto: algif - suppress sending source address information in recvmsg Ben Hutchings
2013-05-10 13:39 ` [065/118] fs/fscache/stats.c: fix memory leak Ben Hutchings
2013-05-10 13:39 ` [064/118] ARM: u300: fix ages old copy/paste bug Ben Hutchings
2013-05-10 13:39 ` [079/118] perf/x86: Fix offcore_rsp valid mask for SNB/IVB Ben Hutchings
2013-05-10 13:39 ` [029/118] ALSA: usb-audio: disable autopm for MIDI devices Ben Hutchings
2013-05-10 13:39 ` Ben Hutchings [this message]
2013-05-10 13:39 ` [084/118] vm: convert HPET mmap to vm_iomap_memory() helper Ben Hutchings
2013-05-10 13:39 ` [048/118] USB: serial: option: Added support Olivetti Olicard 145 Ben Hutchings
2013-05-10 13:39 ` [015/118] usb/misc/appledisplay: Add 24" LED Cinema display Ben Hutchings
2013-05-10 13:39 ` [098/118] atm: update msg_namelen in vcc_recvmsg() Ben Hutchings
2013-05-10 14:40 ` [000/118] 3.2.45-rc1 review Ben Hutchings
2013-05-11 12:05 ` Satoru Takeuchi
2013-05-11 12:25 ` Ben Hutchings
2013-05-12 8:31 ` Sebastian Reichel
2013-05-12 21:31 ` Ben Hutchings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=lsq.1368193181.373182929@decadent.org.uk \
--to=ben@decadent.org.uk \
--cc=akpm@linux-foundation.org \
--cc=gbarnett@atlassian.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox