public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
	Zwane Mwaikambo <zwane@arm.linux.org.uk>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Dave Jones <davej@redhat.com>,
	Chuck Wolber <chuckw@quantumlinux.com>,
	Chris Wedgwood <reviews@ml.cw.f00f.org>,
	Michael Krufky <mkrufky@linuxtv.org>,
	Chuck Ebbert <cebbert@redhat.com>,
	Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
	Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
	Jake Edge <jake@lwn.net>, Eugene Teo <eteo@redhat.com>,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk,
	"Jorge Boncompte [DTI2]" <jorge@dti2.net>,
	Jan Kara <jack@suse.cz>, Nick Piggin <npiggin@suse.de>
Subject: [patch 38/96] fs: new inode i_state corruption fix
Date: Fri, 13 Mar 2009 17:05:46 -0700	[thread overview]
Message-ID: <20090314000744.708904965@mini.kroah.org> (raw)
In-Reply-To: <20090314001449.GA4485@kroah.com>

[-- Attachment #1: fs-new-inode-i_state-corruption-fix.patch --]
[-- Type: text/plain, Size: 5351 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Nick Piggin <npiggin@suse.de>

commit 7ef0d7377cb287e08f3ae94cebc919448e1f5dff upstream.

There was a report of a data corruption
http://lkml.org/lkml/2008/11/14/121.  There is a script included to
reproduce the problem.

During testing, I encountered a number of strange things with ext3, so I
tried ext2 to attempt to reduce complexity of the problem.  I found that
fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
cleared, even though instrumentation showed that unlock_new_inode had
already been called for that inode.  This points to memory scribble, or
synchronisation problme.

i_state of I_NEW inodes is not protected by inode_lock because other
processes are not supposed to touch them until I_LOCK (and I_NEW) is
cleared.  Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
i_state revealed that generic_sync_sb_inodes is picking up new inodes from
the inode lists and passing them to __writeback_single_inode without
waiting for I_NEW.  Subsequently modifying i_state causes corruption.  In
my case it would look like this:

CPU0                            CPU1
unlock_new_inode()              __sync_single_inode()
 reg <- inode->i_state
 reg -> reg & ~(I_LOCK|I_NEW)   reg <- inode->i_state
 reg -> inode->i_state          reg -> reg | I_SYNC
                                reg -> inode->i_state

Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

Fix for this is rather than wait for I_NEW inodes, just skip over them:
inodes concurrently being created are not subject to data integrity
operations, and should not significantly contribute to dirty memory
either.

After this change, I'm unable to reproduce any of the added warnings or
hangs after ~1hour of running.  Previously, the new warnings would start
immediately and hang would happen in under 5 minutes.

I'm also testing on ext3 now, and so far no problems there either.  I
don't know whether this fixes the problem reported above, but it fixes a
real problem for me.

Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Reported-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/fs-writeback.c |    9 ++++++++-
 fs/inode.c        |    7 +++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -274,6 +274,7 @@ __sync_single_inode(struct inode *inode,
 	int ret;
 
 	BUG_ON(inode->i_state & I_SYNC);
+	WARN_ON(inode->i_state & I_NEW);
 
 	/* Set I_SYNC, reset I_DIRTY */
 	dirty = inode->i_state & I_DIRTY;
@@ -298,6 +299,7 @@ __sync_single_inode(struct inode *inode,
 	}
 
 	spin_lock(&inode_lock);
+	WARN_ON(inode->i_state & I_NEW);
 	inode->i_state &= ~I_SYNC;
 	if (!(inode->i_state & I_FREEING)) {
 		if (!(inode->i_state & I_DIRTY) &&
@@ -470,6 +472,11 @@ void generic_sync_sb_inodes(struct super
 			break;
 		}
 
+		if (inode->i_state & I_NEW) {
+			requeue_io(inode);
+			continue;
+		}
+
 		if (wbc->nonblocking && bdi_write_congested(bdi)) {
 			wbc->encountered_congestion = 1;
 			if (!sb_is_blkdev_sb(sb))
@@ -531,7 +538,7 @@ void generic_sync_sb_inodes(struct super
 		list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
 			struct address_space *mapping;
 
-			if (inode->i_state & (I_FREEING|I_WILL_FREE))
+			if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
 				continue;
 			mapping = inode->i_mapping;
 			if (mapping->nrpages == 0)
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -339,6 +339,7 @@ static int invalidate_list(struct list_h
 		invalidate_inode_buffers(inode);
 		if (!atomic_read(&inode->i_count)) {
 			list_move(&inode->i_list, dispose);
+			WARN_ON(inode->i_state & I_NEW);
 			inode->i_state |= I_FREEING;
 			count++;
 			continue;
@@ -440,6 +441,7 @@ static void prune_icache(int nr_to_scan)
 				continue;
 		}
 		list_move(&inode->i_list, &freeable);
+		WARN_ON(inode->i_state & I_NEW);
 		inode->i_state |= I_FREEING;
 		nr_pruned++;
 	}
@@ -595,6 +597,7 @@ void unlock_new_inode(struct inode *inod
 	 * just created it (so there can be no old holders
 	 * that haven't tested I_LOCK).
 	 */
+	WARN_ON((inode->i_state & (I_LOCK|I_NEW)) != (I_LOCK|I_NEW));
 	inode->i_state &= ~(I_LOCK|I_NEW);
 	wake_up_inode(inode);
 }
@@ -1041,6 +1044,7 @@ void generic_delete_inode(struct inode *
 
 	list_del_init(&inode->i_list);
 	list_del_init(&inode->i_sb_list);
+	WARN_ON(inode->i_state & I_NEW);
 	inode->i_state |= I_FREEING;
 	inodes_stat.nr_inodes--;
 	spin_unlock(&inode_lock);
@@ -1082,16 +1086,19 @@ static void generic_forget_inode(struct 
 			spin_unlock(&inode_lock);
 			return;
 		}
+		WARN_ON(inode->i_state & I_NEW);
 		inode->i_state |= I_WILL_FREE;
 		spin_unlock(&inode_lock);
 		write_inode_now(inode, 1);
 		spin_lock(&inode_lock);
+		WARN_ON(inode->i_state & I_NEW);
 		inode->i_state &= ~I_WILL_FREE;
 		inodes_stat.nr_unused--;
 		hlist_del_init(&inode->i_hash);
 	}
 	list_del_init(&inode->i_list);
 	list_del_init(&inode->i_sb_list);
+	WARN_ON(inode->i_state & I_NEW);
 	inode->i_state |= I_FREEING;
 	inodes_stat.nr_inodes--;
 	spin_unlock(&inode_lock);



  parent reply	other threads:[~2009-03-14  0:30 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090314000508.803142980@mini.kroah.org>
2009-03-14  0:14 ` [patch 00/96] 2.6.27.20-stable review Greg KH
2009-03-14  0:05   ` [patch 01/96] USB: net: asix: add support for Cables-to-Go USB Ethernet adapter Greg KH
2009-03-14  0:05   ` [patch 02/96] bridge: netfilter: fix update_pmtu crash with GRE Greg KH
2009-03-14  0:05   ` [patch 03/96] net: amend the fix for SO_BSDCOMPAT gsopt infoleak Greg KH
2009-03-14  0:05   ` [patch 04/96] net: Kill skb_truesize_check(), it only catches false-positives Greg KH
2009-03-14  0:05   ` [patch 05/96] sparc64: Fix DAX handling via userspace access from kernel Greg KH
2009-03-14  0:05   ` [patch 06/96] sparc: We need to implement arch_ptrace_stop() Greg KH
2009-03-14  0:05   ` [patch 07/96] documnt FMODE_ constants Greg KH
2009-03-14 12:46     ` Christoph Hellwig
2009-03-14 15:38       ` Greg KH
2009-03-14 19:18         ` Christoph Hellwig
2009-03-14  0:05   ` [patch 08/96] vfs: separate FMODE_PREAD/FMODE_PWRITE into separate flags Greg KH
2009-03-14  0:05   ` [patch 09/96] seq_file: properly cope with pread Greg KH
2009-03-14  0:05   ` [patch 10/96] vt: Declare PIO_CMAP/GIO_CMAP as compatbile ioctls Greg KH
2009-03-14  0:05   ` [patch 11/96] aoe: ignore vendor extension AoE responses Greg KH
2009-03-14  0:05   ` [patch 12/96] [CIFS] Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS Greg KH
2009-03-14  0:05   ` [patch 13/96] JFFS2: fix mount crash caused by removed nodes Greg KH
     [not found]     ` <17cc26190905290126l8158b10t44ff0649feb5d7a9@mail.gmail.com>
2009-05-29 11:02       ` David Woodhouse
2009-03-14  0:05   ` [patch 14/96] mm: clean up for early_pfn_to_nid() Greg KH
2009-03-14  0:05   ` [patch 15/96] mm: fix memmap init for handling memory hole Greg KH
2009-03-14  0:05   ` [patch 16/96] PCI quirk: enable MSI on 8132 Greg KH
2009-03-14  0:05   ` [patch 17/96] rtl8187: New USB IDs for RTL8187L Greg KH
2009-03-14  0:05   ` [patch 18/96] SCSI: hptiop: Add new PCI device ID Greg KH
2009-03-14  0:05   ` [patch 19/96] SCSI: sd: revive sd_index_lock Greg KH
2009-03-14  0:05   ` [patch 20/96] timerfd: add flags check Greg KH
2009-03-14  0:05   ` [patch 21/96] USB: cdc-acm: add usb id for motomagx phones Greg KH
2009-03-14  0:05   ` [patch 22/96] USB: usb_get_string should check the descriptor type Greg KH
2009-03-14  0:05   ` [patch 23/96] USB: usb-storage: add IGNORE_RESIDUE flag for Genesys Logic adapters Greg KH
2009-03-14  0:05   ` [patch 24/96] WATCHDOG: ks8695_wdt.c: CLOCK_TICK_RATE undeclared Greg KH
2009-03-14  0:05   ` [patch 25/96] WATCHDOG: rc32434_wdt: fix watchdog driver Greg KH
2009-03-14  0:05   ` [patch 26/96] WATCHDOG: rc32434_wdt: fix sections Greg KH
2009-03-14  0:05   ` [patch 27/96] 8250: fix boot hang with serial console when using with Serial Over Lan port Greg KH
2009-03-14  0:05   ` [patch 28/96] ALSA: aw2: do not grab every saa7146 based device Greg KH
2009-03-14  0:05   ` [patch 29/96] ALSA: fix excessive background noise introduced by OSS emulation rate shrink Greg KH
2009-03-14  0:05   ` [patch 30/96] ALSA: hda - add another MacBook Pro 3,1 SSID Greg KH
2009-03-14  0:05   ` [patch 31/96] ALSA: usb-audio - Fix non-continuous rate detection Greg KH
2009-03-14  0:05   ` [patch 32/96] ALSA: usb-audio - Workaround for misdetected sample rate with CM6207 Greg KH
2009-03-14  0:05   ` [patch 33/96] asix: new device ids Greg KH
2009-03-14  0:05   ` [patch 34/96] cdc_ether: add usb id for Ericsson F3507g Greg KH
2009-03-14  0:05   ` [patch 35/96] copy_process: fix CLONE_PARENT && parent_exec_id interaction Greg KH
2009-03-14  0:05   ` [patch 36/96] Fix fixpoint divide exception in acct_update_integrals Greg KH
2009-03-14  0:05   ` [patch 37/96] fore200: fix oops on failed firmware load Greg KH
2009-03-14  0:05   ` Greg KH [this message]
2009-03-14  0:05   ` [patch 39/96] hpilo: new pci device Greg KH
2009-03-14  0:05   ` [patch 40/96] inotify: fix GFP_KERNEL related deadlock Greg KH
2009-03-14  0:05   ` [patch 41/96] intel-agp: fix a panic with 1M of shared memory, no GTT entries Greg KH
2009-03-14  0:05   ` [patch 42/96] jsm: additional device support Greg KH
2009-03-14  0:05   ` [patch 43/96] libata: Dont trust current capacity values in identify words 57-58 Greg KH
2009-03-14  0:05   ` [patch 44/96] libata: make sure port is thawed when skipping resets Greg KH
2009-03-14  0:05   ` [patch 45/96] md: avoid races when stopping resync Greg KH
2009-03-14  0:05   ` [patch 46/96] md/raid10: Dont call bitmap_cond_end_sync when we are doing recovery Greg KH
2009-03-14  0:05   ` [patch 47/96] md/raid10: Dont skip more than 1 bitmap-chunk at a time during recovery Greg KH
2009-03-14  0:05   ` [patch 48/96] mmc: s3cmci: fix s3c2410_dma_config() arguments Greg KH
2009-03-14  0:05   ` [patch 49/96] mmc_test: fix basic read test Greg KH
2009-03-14  0:05   ` [patch 50/96] mtd_dataflash: fix probing of AT45DB321C chips Greg KH
2009-03-14  0:05   ` [patch 51/96] PCI: Add PCI quirk to disable L0s ASPM state for 82575 and 82598 Greg KH
2009-03-14  0:06   ` [patch 52/96] PCI: dont enable too many HT MSI mappings Greg KH
2009-03-14  0:06   ` [patch 53/96] PCI: Enable PCIe AER only after checking firmware support Greg KH
2009-03-14  0:06   ` [patch 54/96] PCIe: portdrv: call pci_disable_device during remove Greg KH
2009-03-14  0:06   ` [patch 55/96] powerpc: Fix load/store float double alignment handler Greg KH
2009-03-14  0:06   ` [patch 56/96] proc: fix kflags to uflags copying in /proc/kpageflags Greg KH
2009-03-14  0:06   ` [patch 57/96] proc: fix PG_locked reporting " Greg KH
2009-03-14  0:06   ` [patch 58/96] RDMA/nes: Dont allow userspace QPs to use STag zero Greg KH
2009-03-14  0:06   ` [patch 59/96] sdhci: fix led naming Greg KH
2009-03-14  0:06   ` [patch 60/96] selinux: Fix a panic in selinux_netlbl_inode_permission() Greg KH
2009-03-14  0:06   ` [patch 61/96] selinux: Fix the NetLabel glue code for setsockopt() Greg KH
2009-03-14  0:06   ` [patch 62/96] sis190: add identifier for Atheros AR8021 PHY Greg KH
2009-03-14  0:06   ` [patch 63/96] sound: usb-audio: fix uninitialized variable with M-Audio MIDI interfaces Greg KH
2009-03-14  0:06   ` [patch 64/96] sound: virtuoso: revert "do not overwrite EEPROM on Xonar D2/D2X" Greg KH
2009-03-14  0:06   ` [patch 65/96] USB: EHCI: slow down ITD reuse Greg KH
2009-03-14  0:06   ` [patch 66/96] USB: option: add BenQ 3g modem information Greg KH
2009-03-14  0:06   ` [patch 67/96] x86-64: fix int $0x80 -ENOSYS return Greg KH
2009-03-14  0:06   ` [patch 68/96] x86-64: seccomp: fix 32/64 syscall hole Greg KH
2009-03-14  0:06   ` [patch 69/96] x86-64: syscall-audit: " Greg KH
2009-03-14  0:06   ` [patch 70/96] x86: add Dell XPS710 reboot quirk Greg KH
2009-03-14  0:06   ` [patch 71/96] x86: tone down mtrr_trim_uncached_memory() warning Greg KH
2009-03-14  0:06   ` [patch 72/96] x86, vmi: TSC going backwards check in vmi clocksource Greg KH
2009-03-14  0:06   ` [patch 73/96] xen/blkfront: use blk_rq_map_sg to generate ring entries Greg KH
2009-03-14  0:06   ` [patch 74/96] xen: disable interrupts early, as start_kernel expects Greg KH
2009-03-14  0:06   ` [patch 75/96] zaurus: add usb id for motomagx phones Greg KH
2009-03-14  0:06   ` [patch 76/96] DVB: s5h1409: Perform s5h1409 soft reset after tuning Greg KH
2009-03-14  0:06   ` [patch 77/96] [PATCH] V4L: tda8290: fix TDA8290 + TDA18271 initialization Greg KH
2009-03-14  0:06   ` [patch 78/96] V4L: ivtv: fix decoder crash regression Greg KH
2009-03-14  0:06   ` [patch 79/96] ACPI: fix broken usage of name.ascii Greg KH
2009-03-14  0:06   ` [patch 80/96] hwmon: (f71882fg) Hide misleading error message Greg KH
2009-03-14  0:06   ` [patch 81/96] i2c: Fix misplaced parentheses Greg KH
2009-03-14  0:06   ` [patch 82/96] i2c: Timeouts reach -1 Greg KH
2009-03-14  0:06   ` [patch 83/96] ide-iops: fix odd-length ATAPI PIO transfers Greg KH
2009-03-14  0:06   ` [patch 84/96] ARM: Add i2c_board_info for RiscPC PCF8583 Greg KH
2009-03-14  0:06   ` [patch 85/96] Fix no_timer_check on x86_64 Greg KH
2009-03-14  0:06   ` [patch 86/96] jbd2: Fix return value of jbd2_journal_start_commit() Greg KH
2009-03-14  0:06   ` [patch 87/96] Revert "ext4: wait on all pending commits in ext4_sync_fs()" Greg KH
2009-03-14  0:06   ` [patch 88/96] jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() Greg KH
2009-03-14  0:06   ` [patch 89/96] ext4: Fix to read empty directory blocks correctly in 64k Greg KH
2009-03-14  0:06   ` [patch 90/96] ext4: Fix lockdep warning Greg KH
2009-03-14  0:06   ` [patch 91/96] ext4: Initialize preallocation list_heads properly Greg KH
2009-03-14  0:06   ` [patch 92/96] ext4: Fix NULL dereference in ext4_ext_migrate()s error handling Greg KH
2009-03-14  0:06   ` [patch 93/96] ext4: Add fallback for find_group_flex Greg KH
2009-03-14  0:06   ` [patch 94/96] ext4: Fix deadlock in ext4_write_begin() and ext4_da_write_begin() Greg KH
2009-03-14  0:06   ` [patch 95/96] MIPS: compat: Implement is_compat_task Greg KH
2009-03-14  0:06   ` [patch 96/96] hwmon: (it87) Properly decode -128 degrees C temperature Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090314000744.708904965@mini.kroah.org \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=cavokz@gmail.com \
    --cc=cebbert@redhat.com \
    --cc=chuckw@quantumlinux.com \
    --cc=davej@redhat.com \
    --cc=eteo@redhat.com \
    --cc=jack@suse.cz \
    --cc=jake@lwn.net \
    --cc=jmforbes@linuxtx.org \
    --cc=jorge@dti2.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkrufky@linuxtv.org \
    --cc=npiggin@suse.de \
    --cc=rbranco@la.checkpoint.com \
    --cc=rdunlap@xenotime.net \
    --cc=reviews@ml.cw.f00f.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=w@1wt.eu \
    --cc=zwane@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox