public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.4, 2.5: SMP race: __sync_single_inode vs. __mark_inode_dirty
@ 2003-02-02 23:32 Mikulas Patocka
  2003-02-03  0:06 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Mikulas Patocka @ 2003-02-02 23:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds

[-- Attachment #1: Type: TEXT/PLAIN, Size: 927 bytes --]

Hi.

there's a SMP race condition between __sync_single_inode (or __sync_one on
2.4.20) and __mark_inode_dirty. __mark_inode_dirty doesn't take inode
spinlock. As we know -- unless you take a spinlock or use barrier,
processor can change order of instructions.

CPU 1

modify inode
(but modifications are in cpu-local
buffer and do not go to bus)

calls
__mark_inode_dirty
it sees I_DIRTY and exits immediatelly
					CPU 2
					takes spinlock
					calls __sync_single_inode
					inode->i_state &= ~I_DIRTY
					writes the inode (but does not see
					modifications by CPU 1 yet)

CPU 1 flushes its write buffer to the bus
inode is already written, clean, modifications
done by CPU1 are lost

The easiest fix would be to move the test inside spinlock in
__mark_inode_dirty; if you do not want to suffer from performance loss,
use the attached patches that use memory barriers to ensure ordering of
reads and writes.

Mikulas




[-- Attachment #2: 2.4 patch --]
[-- Type: TEXT/PLAIN, Size: 713 bytes --]

--- linux/fs/inode.c_	Sat Aug  3 01:39:44 2002
+++ linux/fs/inode.c	Sun Feb  2 23:21:40 2003
@@ -147,6 +147,10 @@
 			sb->s_op->dirty_inode(inode);
 	}
 
+	/* make sure that changes are seen by all cpus before we test i_state
+		-- mikulas */
+	smp_mb();
+
 	/* avoid the locking if we can */
 	if ((inode->i_state & flags) == flags)
 		return;
@@ -219,6 +223,11 @@
 	dirty = inode->i_state & I_DIRTY;
 	inode->i_state |= I_LOCK;
 	inode->i_state &= ~I_DIRTY;
+
+	smp_rmb(); /* mark_inode_dirty doesn't take spinlock, make sure
+		that inode is not read speculatively by this cpu
+		before &= ~I_DIRTY  -- mikulas */
+
 	spin_unlock(&inode_lock);
 
 	filemap_fdatasync(inode->i_mapping);

[-- Attachment #3: 2.5 patch --]
[-- Type: TEXT/PLAIN, Size: 818 bytes --]

--- linux/fs/fs-writeback.c_	Fri Jan 17 03:23:00 2003
+++ linux/fs/fs-writeback.c	Sun Feb  2 23:22:00 2003
@@ -61,6 +61,10 @@
 			sb->s_op->dirty_inode(inode);
 	}
 
+	/* make sure that changes are seen by all cpus before we test i_state
+		-- mikulas */
+	smp_mb();
+
 	/* avoid the locking if we can */
 	if ((inode->i_state & flags) == flags)
 		return;
@@ -135,6 +139,11 @@
 	dirty = inode->i_state & I_DIRTY;
 	inode->i_state |= I_LOCK;
 	inode->i_state &= ~I_DIRTY;
+
+	/* smp_rmb(); note: if you remove write_lock below, you must add this.
+		mark_inode_dirty doesn't take spinlock, make sure
+		that inode is not read speculatively by this cpu
+		before &= ~I_DIRTY  -- mikulas */
 
 	write_lock(&mapping->page_lock);
 	if (wait || !wbc->for_kupdate || list_empty(&mapping->io_pages))

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-02-03 16:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-02 23:32 2.4, 2.5: SMP race: __sync_single_inode vs. __mark_inode_dirty Mikulas Patocka
2003-02-03  0:06 ` Andrew Morton
2003-02-03 17:00   ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox