All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, "Theodore Tso" <tytso@mit.edu>,
	Ben Hutchings <ben@decadent.org.uk>,
	George Barnett <gbarnett@atlassian.com>
Subject: [ 100/115] ext4/jbd2: dont wait (forever) for stale tid caused by wraparound
Date: Mon,  6 May 2013 13:45:36 -0700	[thread overview]
Message-ID: <20130506203105.863169067@linuxfoundation.org> (raw)
In-Reply-To: <20130506203055.537199268@linuxfoundation.org>

3.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit d76a3a77113db020d9bb1e894822869410450bd9 upstream.

In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.

Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().

Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time.  To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.

As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started.  This
should have a small (but probably not measurable) improvement for
ext4's scalability.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: George Barnett <gbarnett@atlassian.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ext4/fsync.c      |    3 +--
 fs/ext4/inode.c      |    3 +--
 fs/jbd2/journal.c    |   31 +++++++++++++++++++++++++++++++
 include/linux/jbd2.h |    1 +
 4 files changed, 34 insertions(+), 4 deletions(-)

--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -166,8 +166,7 @@ int ext4_sync_file(struct file *file, lo
 	if (journal->j_flags & JBD2_BARRIER &&
 	    !jbd2_trans_will_send_data_barrier(journal, commit_tid))
 		needs_barrier = true;
-	jbd2_log_start_commit(journal, commit_tid);
-	ret = jbd2_log_wait_commit(journal, commit_tid);
+	ret = jbd2_complete_transaction(journal, commit_tid);
 	if (needs_barrier) {
 		err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
 		if (!ret)
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -210,8 +210,7 @@ void ext4_evict_inode(struct inode *inod
 			journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
 			tid_t commit_tid = EXT4_I(inode)->i_datasync_tid;
 
-			jbd2_log_start_commit(journal, commit_tid);
-			jbd2_log_wait_commit(journal, commit_tid);
+			jbd2_complete_transaction(journal, commit_tid);
 			filemap_write_and_wait(&inode->i_data);
 		}
 		truncate_inode_pages(&inode->i_data, 0);
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -710,6 +710,37 @@ int jbd2_log_wait_commit(journal_t *jour
 }
 
 /*
+ * When this function returns the transaction corresponding to tid
+ * will be completed.  If the transaction has currently running, start
+ * committing that transaction before waiting for it to complete.  If
+ * the transaction id is stale, it is by definition already completed,
+ * so just return SUCCESS.
+ */
+int jbd2_complete_transaction(journal_t *journal, tid_t tid)
+{
+	int	need_to_wait = 1;
+
+	read_lock(&journal->j_state_lock);
+	if (journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == tid) {
+		if (journal->j_commit_request != tid) {
+			/* transaction not yet started, so request it */
+			read_unlock(&journal->j_state_lock);
+			jbd2_log_start_commit(journal, tid);
+			goto wait_commit;
+		}
+	} else if (!(journal->j_committing_transaction &&
+		     journal->j_committing_transaction->t_tid == tid))
+		need_to_wait = 0;
+	read_unlock(&journal->j_state_lock);
+	if (!need_to_wait)
+		return 0;
+wait_commit:
+	return jbd2_log_wait_commit(journal, tid);
+}
+EXPORT_SYMBOL(jbd2_complete_transaction);
+
+/*
  * Log buffer allocation routines:
  */
 
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1200,6 +1200,7 @@ int __jbd2_log_start_commit(journal_t *j
 int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
 int jbd2_journal_force_commit_nested(journal_t *journal);
 int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
+int jbd2_complete_transaction(journal_t *journal, tid_t tid);
 int jbd2_log_do_checkpoint(journal_t *journal);
 int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
 



  parent reply	other threads:[~2013-05-06 20:47 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-06 20:43 [ 000/115] 3.9.1-stable review Greg Kroah-Hartman
2013-05-06 20:43 ` [ 001/115] USB: serial: option: Added support Olivetti Olicard 145 Greg Kroah-Hartman
2013-05-06 20:43 ` [ 002/115] USB: option: add a D-Link DWM-156 variant Greg Kroah-Hartman
2013-05-06 20:43 ` [ 003/115] staging: zsmalloc: Fix link error on ARM Greg Kroah-Hartman
2013-05-06 20:44 ` [ 004/115] ARM: omap3: cpuidle: enable time keeping Greg Kroah-Hartman
2013-05-06 20:44 ` [ 005/115] ARM: u300: fix ages old copy/paste bug Greg Kroah-Hartman
2013-05-06 20:44 ` [ 006/115] ARM: at91/at91sam9260.dtsi: fix u(s)art pinctrl encoding Greg Kroah-Hartman
2013-05-06 20:44 ` [ 007/115] ARM: at91: remove partial parameter in bootargs for at91sam9x5ek.dtsi Greg Kroah-Hartman
2013-05-06 20:44 ` [ 008/115] ARM: at91: Fix typo in restart code panic message Greg Kroah-Hartman
2013-05-06 20:44 ` [ 009/115] ARM: at91/trivial: fix model name for SAM9G15-EK Greg Kroah-Hartman
2013-05-06 20:44 ` [ 010/115] ARM: at91/trivial: typos in compatible property Greg Kroah-Hartman
2013-05-06 20:44 ` [ 011/115] powerpc: Add isync to copy_and_flush Greg Kroah-Hartman
2013-05-06 20:44 ` [ 012/115] powerpc: Fix hardware IRQs with MMU on exceptions when HV=0 Greg Kroah-Hartman
2013-05-06 20:44 ` [ 013/115] powerpc/power8: Fix secondary CPUs hanging on boot for HV=0 Greg Kroah-Hartman
2013-05-06 20:44 ` [ 014/115] powerpc/spufs: Initialise inode->i_ino in spufs_new_inode() Greg Kroah-Hartman
2013-05-06 20:44 ` [ 015/115] iwlwifi: fix freeing uninitialized pointer Greg Kroah-Hartman
2013-05-06 20:44 ` [ 016/115] iwlwifi: dvm: dont send zeroed LQ cmd Greg Kroah-Hartman
2013-05-06 20:44 ` [ 017/115] mwifiex: Use pci_release_region() instead of a pci_release_regions() Greg Kroah-Hartman
2013-05-06 20:44 ` [ 018/115] mwifiex: Call pci_release_region after calling pci_disable_device Greg Kroah-Hartman
2013-05-06 20:44 ` [ 019/115] mac80211: fix station entry leak/warning while suspending Greg Kroah-Hartman
2013-05-06 20:44 ` [ 020/115] mac80211: use synchronize_rcu() with rcu_barrier() Greg Kroah-Hartman
2013-05-06 20:44 ` [ 021/115] usb/misc/appledisplay: Add 24" LED Cinema display Greg Kroah-Hartman
2013-05-06 20:44 ` [ 022/115] USB: add ftdi_sio USB ID for GDM Boost V1.x Greg Kroah-Hartman
2013-05-06 20:44 ` [ 023/115] USB: ftdi_sio: correct ST Micro Connect Lite PIDs Greg Kroah-Hartman
2013-05-06 20:44 ` [ 024/115] USB: ftdi_sio: enable two UART ports on ST Microconnect Lite Greg Kroah-Hartman
2013-05-06 20:44 ` [ 025/115] USB: io_ti: fix TIOCGSERIAL Greg Kroah-Hartman
2013-05-06 20:44 ` [ 026/115] usbfs: Always allow ctrl requests with USB_RECIP_ENDPOINT on the ctrl ep Greg Kroah-Hartman
2013-05-06 20:44 ` [ 027/115] usb: chipidea: udc: fix memory access of shared memory on armv5 machines Greg Kroah-Hartman
2013-05-06 20:44 ` [ 028/115] usb: chipidea: udc: fix memory leak in _ep_nuke Greg Kroah-Hartman
2013-05-06 20:44 ` [ 029/115] usb: remove redundant tdi_reset Greg Kroah-Hartman
2013-05-06 20:44 ` [ 030/115] usb-storage: CY7C68300A chips do not support Cypress ATACB Greg Kroah-Hartman
2013-05-06 20:44 ` [ 031/115] s390/memory hotplug: prevent offline of active memory increments Greg Kroah-Hartman
2013-05-06 20:44 ` [ 032/115] xen/time: Fix kasprintf splat when allocating timer%d IRQ line Greg Kroah-Hartman
2013-05-06 20:44 ` [ 033/115] xen/smp: Fix leakage of timer interrupt line for every CPU online/offline Greg Kroah-Hartman
2013-05-06 20:44 ` [ 034/115] xen/smp/spinlock: Fix leakage of the spinlock " Greg Kroah-Hartman
2013-05-06 20:44 ` [ 035/115] serial_core.c: add put_device() after device_find_child() Greg Kroah-Hartman
2013-05-06 20:44 ` [ 036/115] arm: set the page table freeing ceiling to TASK_SIZE Greg Kroah-Hartman
2013-05-06 20:44 ` [ 037/115] gianfar: do not advertise any alarm capability Greg Kroah-Hartman
2013-05-06 20:44 ` [ 038/115] tty: fix up atime/mtime mess, take three Greg Kroah-Hartman
2013-05-06 20:44 ` [ 039/115] fbcon: when font is freed, clear also vc_font.data Greg Kroah-Hartman
2013-05-06 20:44 ` [ 040/115] tracing: Use stack of calling function for stack tracer Greg Kroah-Hartman
2013-05-06 20:44 ` [ 041/115] tracing: Fix stack tracer with fentry use Greg Kroah-Hartman
2013-05-06 20:44 ` [ 042/115] tracing: Remove most or all of stack tracer stack size from stack_max_size Greg Kroah-Hartman
2013-05-06 20:44 ` [ 043/115] tracing: Fix off-by-one on allocating stat->pages Greg Kroah-Hartman
2013-05-06 20:44 ` [ 044/115] tracing: Check return value of tracing_init_dentry() Greg Kroah-Hartman
2013-05-06 20:44 ` [ 045/115] tracing: Reset ftrace_graph_filter_enabled if count is zero Greg Kroah-Hartman
2013-05-06 20:44 ` [ 046/115] i2c: xiic: must always write 16-bit words to TX_FIFO Greg Kroah-Hartman
2013-05-06 20:44 ` [ 047/115] crypto: crc32-pclmul - Use gas macro for pclmulqdq Greg Kroah-Hartman
2013-05-06 20:44   ` Greg Kroah-Hartman
2013-05-06 20:44 ` [ 048/115] Drivers: hv: vmbus: Fix a bug in hv_need_to_signal() Greg Kroah-Hartman
2013-05-06 20:44 ` [ 049/115] sysfs: fix use after free in case of concurrent read/write and readdir Greg Kroah-Hartman
2013-05-06 20:44 ` [ 050/115] Fix initialization of CMCI/CMCP interrupts Greg Kroah-Hartman
2013-05-06 20:44 ` [ 051/115] PCI / ACPI: Dont query OSC support with all possible controls Greg Kroah-Hartman
2013-05-06 20:44 ` [ 052/115] PCI/PM: Fix fallback to PCI_D0 in pci_platform_power_transition() Greg Kroah-Hartman
2013-05-06 20:44 ` [ 053/115] rt2x00: Fix transmit power troubles on some Ralink RT30xx cards Greg Kroah-Hartman
2013-05-06 20:44 ` [ 054/115] Wrong asm register contraints in the futex implementation Greg Kroah-Hartman
2013-05-06 20:44 ` [ 055/115] Wrong asm register contraints in the kvm implementation Greg Kroah-Hartman
2013-05-06 20:44 ` [ 056/115] fs/fscache/stats.c: fix memory leak Greg Kroah-Hartman
2013-05-06 20:44 ` [ 057/115] mm: allow arch code to control the user page table ceiling Greg Kroah-Hartman
2013-05-06 20:44 ` [ 058/115] TPM: Retry SaveState command in suspend path Greg Kroah-Hartman
2013-05-06 20:44 ` [ 059/115] ALSA: emu10k1: Fix dock firmware loading Greg Kroah-Hartman
2013-05-06 20:44 ` [ 060/115] ALSA: snd-usb: try harder to find USB_DT_CS_ENDPOINT Greg Kroah-Hartman
2013-05-06 20:44 ` [ 061/115] ALSA: usb: Add quirk for 192KHz recording on E-Mu devices Greg Kroah-Hartman
2013-05-06 20:44 ` [ 062/115] ALSA: usb-audio: disable autopm for MIDI devices Greg Kroah-Hartman
2013-05-06 20:44 ` [ 063/115] ALSA: usb-audio: Fix autopm error during probing Greg Kroah-Hartman
2013-05-06 20:45 ` [ 064/115] ALSA: USB: adjust for changed 3.8 USB API Greg Kroah-Hartman
2013-05-06 20:45 ` [ 065/115] ALSA: hda - Fix aamix activation with loopback control on VIA codecs Greg Kroah-Hartman
2013-05-06 20:45 ` [ 066/115] ALSA: hda - Add the support for ALC286 codec Greg Kroah-Hartman
2013-05-06 20:45 ` [ 067/115] ARM: 7702/1: Set the page table freeing ceiling to TASK_SIZE Greg Kroah-Hartman
2013-05-07  0:00   ` Hugh Dickins
2013-05-07 17:23     ` Greg Kroah-Hartman
2013-05-06 20:45 ` [ 068/115] ASoC: max98088: Fix logging of hardware revision Greg Kroah-Hartman
2013-05-06 20:45 ` [ 069/115] hrtimer: Fix ktime_add_ns() overflow on 32bit architectures Greg Kroah-Hartman
2013-05-06 20:45 ` [ 070/115] hrtimer: Add expiry time overflow check in hrtimer_interrupt Greg Kroah-Hartman
2013-05-06 20:45 ` [ 071/115] swap: redirty page if page write fails on swap file Greg Kroah-Hartman
2013-05-06 20:45 ` [ 072/115] mm: swap: mark swap pages writeback before queueing for direct IO Greg Kroah-Hartman
2013-05-06 20:45 ` [ 073/115] drivers/rtc/rtc-cmos.c: dont disable hpet emulation on suspend Greg Kroah-Hartman
2013-05-06 20:45 ` [ 074/115] drivers/rtc/rtc-at91rm9200.c: fix missing iounmap Greg Kroah-Hartman
2013-05-06 20:45 ` [ 075/115] libata: acpi: make ata_ap_acpi_handle not block Greg Kroah-Hartman
2013-05-06 20:45 ` [ 076/115] ACPI: Fix wrong parameter passed to memblock_reserve Greg Kroah-Hartman
2013-05-06 20:45 ` [ 077/115] ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points Greg Kroah-Hartman
2013-05-06 20:45 ` [ 078/115] cgroup: fix an off-by-one bug which may trigger BUG_ON() Greg Kroah-Hartman
2013-05-06 20:45 ` [ 079/115] cgroup: fix broken file xattrs Greg Kroah-Hartman
2013-05-06 20:45 ` [ 080/115] localmodconfig: Process source kconfig files as they are found Greg Kroah-Hartman
2013-05-06 20:45 ` [ 081/115] clockevents: Set dummy handler on CPU_DEAD shutdown Greg Kroah-Hartman
2013-05-06 20:45 ` [ 082/115] sata_highbank: Rename proc_name to the module name Greg Kroah-Hartman
2013-05-06 20:45 ` [ 083/115] inotify: invalid mask should return a error number but not set it Greg Kroah-Hartman
2013-05-06 20:45 ` [ 084/115] fs/dcache.c: add cond_resched() to shrink_dcache_parent() Greg Kroah-Hartman
2013-05-06 20:45 ` [ 085/115] exec: do not abuse ->cred_guard_mutex in threadgroup_lock() Greg Kroah-Hartman
2013-05-06 20:45 ` [ 086/115] LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot Greg Kroah-Hartman
2013-05-06 20:45 ` [ 087/115] md: bad block list should default to disabled Greg Kroah-Hartman
2013-05-06 20:45 ` [ 088/115] MD: ignore discard request for hard disks of hybid raid1/raid10 array Greg Kroah-Hartman
2013-05-06 20:45 ` [ 089/115] NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_lock_delegation_recall Greg Kroah-Hartman
2013-05-06 20:45 ` [ 090/115] NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_open_delegation_recall Greg Kroah-Hartman
2013-05-06 20:45 ` [ 091/115] nfsd4: dont close read-write opens too soon Greg Kroah-Hartman
2013-05-06 20:45 ` [ 092/115] nfsd: dont run get_file if nfs4_preprocess_stateid_op return error Greg Kroah-Hartman
2013-05-06 20:45 ` [ 093/115] nfsd: use kmem_cache_free() instead of kfree() Greg Kroah-Hartman
2013-05-06 20:45 ` [ 094/115] nfsd: Decode and send 64bit time values Greg Kroah-Hartman
2013-05-06 20:45 ` [ 095/115] wireless: regulatory: fix channel disabling race condition Greg Kroah-Hartman
2013-05-06 20:45 ` [ 096/115] ipc: sysv shared memory limited to 8TiB Greg Kroah-Hartman
2013-05-06 20:45 ` [ 097/115] ixgbe: fix EICR write in ixgbe_msix_other Greg Kroah-Hartman
2013-05-06 20:45 ` [ 098/115] e1000e: fix numeric overflow in phc settime method Greg Kroah-Hartman
2013-05-06 20:45 ` [ 099/115] x86-64, init: Do not set NX bits on non-NX capable hardware Greg Kroah-Hartman
2013-05-06 20:45 ` Greg Kroah-Hartman [this message]
2013-05-06 20:45 ` [ 101/115] jbd2: fix race between jbd2_journal_remove_checkpoint and ->j_commit_callback Greg Kroah-Hartman
2013-05-06 20:45 ` [ 102/115] ext4: fix journal callback list traversal Greg Kroah-Hartman
2013-05-06 20:45 ` [ 103/115] ext4: unregister es_shrinker if mount failed Greg Kroah-Hartman
2013-05-06 20:45 ` [ 104/115] ext4: fix big-endian bug in metadata checksum calculations Greg Kroah-Hartman
2013-05-06 20:45 ` [ 105/115] ext4: fix online resizing for ext3-compat file systems Greg Kroah-Hartman
2013-05-06 20:45 ` [ 106/115] ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG Greg Kroah-Hartman
2013-05-06 20:45 ` [ 107/115] do_mount(): fix a leak introduced in 3.9 ("mount: consolidate permission checks") Greg Kroah-Hartman
2013-05-06 20:45 ` [ 108/115] mmc: at91/avr32/atmel-mci: fix DMA-channel leak on module unload Greg Kroah-Hartman
2013-05-06 20:45 ` [ 109/115] Give the OID registry file module info to avoid kernel tainting Greg Kroah-Hartman
2013-05-06 20:45 ` [ 110/115] KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions Greg Kroah-Hartman
2013-05-06 20:45 ` [ 111/115] x86: Eliminate irq_mis_count counted in arch_irq_stat Greg Kroah-Hartman
2013-05-06 20:45 ` [ 112/115] mmc: core: Fix bit width test failing on old eMMC cards Greg Kroah-Hartman
2013-05-06 20:45 ` [ 113/115] mmc: atmel-mci: pio hang on block errors Greg Kroah-Hartman
2013-05-06 20:45 ` [ 114/115] rcutrace: single_open() leaks Greg Kroah-Hartman
2013-05-06 20:45 ` [ 115/115] mfd: adp5520: Restore mode bits on resume Greg Kroah-Hartman
     [not found] ` <CAKocOOOcpAeLo=K5usRnk2w5S4tXx2Y4rXe9bff=2ba=ZWdDJA@mail.gmail.com>
2013-05-07 19:11   ` Fwd: [ 000/115] 3.9.1-stable review Shuah Khan
2013-05-07 19:11     ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130506203105.863169067@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=ben@decadent.org.uk \
    --cc=gbarnett@atlassian.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.