* 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected!
@ 2006-06-08 4:27 Miles Lane
2006-06-08 7:32 ` Anton Altaparmakov
0 siblings, 1 reply; 21+ messages in thread
From: Miles Lane @ 2006-06-08 4:27 UTC (permalink / raw)
To: LKML, Andrew Morton
=====================================================
[ BUG: possible circular locking deadlock detected! ]
-----------------------------------------------------
mount/1219 is trying to acquire lock:
(&ni->mrec_lock){--..}, at: [<c031e4b8>] mutex_lock+0x21/0x24
but task is already holding lock:
(&rl->lock){----}, at: [<f8a98e4d>] ntfs_map_runlist+0x2a/0xb5 [ntfs]
which lock already depends on the new lock,
which could lead to circular deadlocks!
the existing dependency chain (in reverse order) is:
-> #1 (&rl->lock){----}:
[<c012ec64>] lock_acquire+0x58/0x74
[<f8a97619>] ntfs_readpage+0x362/0x8fd [ntfs]
[<c01415b0>] read_cache_page+0x8c/0x137
[<f8a9eeb8>] map_mft_record+0xd7/0x1d2 [ntfs]
[<f8a9d661>] ntfs_read_locked_inode+0x74/0xea9 [ntfs]
[<f8a9eabb>] ntfs_read_inode_mount+0x625/0x846 [ntfs]
[<f8aa2ff8>] ntfs_fill_super+0x8ca/0xd14 [ntfs]
[<c01647c4>] get_sb_bdev+0xed/0x14e
[<f8aa15ef>] ntfs_get_sb+0x10/0x12 [ntfs]
[<c0163c11>] vfs_kern_mount+0x76/0x143
[<c0163d17>] do_kern_mount+0x29/0x3d
[<c0177f9f>] do_mount+0x78a/0x7e4
[<c0178058>] sys_mount+0x5f/0x91
[<c031fb5d>] sysenter_past_esp+0x56/0x8d
-> #0 (&ni->mrec_lock){--..}:
[<c012ec64>] lock_acquire+0x58/0x74
[<c031e321>] __mutex_lock_slowpath+0xa7/0x21d
[<c031e4b8>] mutex_lock+0x21/0x24
[<f8a9edfa>] map_mft_record+0x19/0x1d2 [ntfs]
[<f8a98630>] ntfs_map_runlist_nolock+0x48/0x437 [ntfs]
[<f8a98eaa>] ntfs_map_runlist+0x87/0xb5 [ntfs]
[<f8a97739>] ntfs_readpage+0x482/0x8fd [ntfs]
[<c01415b0>] read_cache_page+0x8c/0x137
[<f8aa20bc>] load_system_files+0x155/0x7c7 [ntfs]
[<f8aa30a7>] ntfs_fill_super+0x979/0xd14 [ntfs]
[<c01647c4>] get_sb_bdev+0xed/0x14e
[<f8aa15ef>] ntfs_get_sb+0x10/0x12 [ntfs]
[<c0163c11>] vfs_kern_mount+0x76/0x143
[<c0163d17>] do_kern_mount+0x29/0x3d
[<c0177f9f>] do_mount+0x78a/0x7e4
[<c0178058>] sys_mount+0x5f/0x91
[<c031fb5d>] sysenter_past_esp+0x56/0x8d
other info that might help us debug this:
2 locks held by mount/1219:
#0: (&s->s_umount#18){--..}, at: [<c0164443>] sget+0x223/0x3a1
#1: (&rl->lock){----}, at: [<f8a98e4d>] ntfs_map_runlist+0x2a/0xb5 [ntfs]
stack backtrace:
[<c0103924>] show_trace_log_lvl+0x54/0xfd
[<c01049a9>] show_trace+0xd/0x10
[<c01049c3>] dump_stack+0x17/0x1c
[<c012cefb>] print_circular_bug_tail+0x59/0x64
[<c012e766>] __lock_acquire+0x7c2/0x97a
[<c012ec64>] lock_acquire+0x58/0x74
[<c031e321>] __mutex_lock_slowpath+0xa7/0x21d
[<c031e4b8>] mutex_lock+0x21/0x24
[<f8a9edfa>] map_mft_record+0x19/0x1d2 [ntfs]
[<f8a98630>] ntfs_map_runlist_nolock+0x48/0x437 [ntfs]
[<f8a98eaa>] ntfs_map_runlist+0x87/0xb5 [ntfs]
[<f8a97739>] ntfs_readpage+0x482/0x8fd [ntfs]
[<c01415b0>] read_cache_page+0x8c/0x137
[<f8aa20bc>] load_system_files+0x155/0x7c7 [ntfs]
[<f8aa30a7>] ntfs_fill_super+0x979/0xd14 [ntfs]
[<c01647c4>] get_sb_bdev+0xed/0x14e
[<f8aa15ef>] ntfs_get_sb+0x10/0x12 [ntfs]
[<c0163c11>] vfs_kern_mount+0x76/0x143
[<c0163d17>] do_kern_mount+0x29/0x3d
[<c0177f9f>] do_mount+0x78a/0x7e4
[<c0178058>] sys_mount+0x5f/0x91
[<c031fb5d>] sysenter_past_esp+0x56/0x8d
NTFS volume version 3.1.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-08 4:27 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! Miles Lane @ 2006-06-08 7:32 ` Anton Altaparmakov 2006-06-08 9:55 ` Ingo Molnar 0 siblings, 1 reply; 21+ messages in thread From: Anton Altaparmakov @ 2006-06-08 7:32 UTC (permalink / raw) To: Miles Lane; +Cc: LKML, Andrew Morton Hi, Thanks for the report. On Wed, 2006-06-07 at 21:27 -0700, Miles Lane wrote: > ===================================================== > [ BUG: possible circular locking deadlock detected! ] > ----------------------------------------------------- > mount/1219 is trying to acquire lock: > (&ni->mrec_lock){--..}, at: [<c031e4b8>] mutex_lock+0x21/0x24 > > but task is already holding lock: > (&rl->lock){----}, at: [<f8a98e4d>] ntfs_map_runlist+0x2a/0xb5 [ntfs] > > which lock already depends on the new lock, > which could lead to circular deadlocks! That's rubbish. The lock analyser is not intelligent enough for ntfs... )-: FWIW it appears to be picking up that locks depend on each other even when they relate to different inodes which means that they do not depend on each other. It perhaps is getting confused by the special case for the table of inodes ($MFT) which has the lock dependency reverse to all other inodes but it is special because it can never take the lock recursively (and hence deadlock) because we always keep the whole runlist for $MFT in memory and should a bug or memory corruption cause this not to be the case then ntfs will detect this and go BUG() so it still will not deadlock... The people responsible for the lock analyser will need to fix this in their code or by adding some magic to ntfs to make the errors go away... Best regards, Anton > the existing dependency chain (in reverse order) is: > > -> #1 (&rl->lock){----}: > [<c012ec64>] lock_acquire+0x58/0x74 > [<f8a97619>] ntfs_readpage+0x362/0x8fd [ntfs] > [<c01415b0>] read_cache_page+0x8c/0x137 > [<f8a9eeb8>] map_mft_record+0xd7/0x1d2 [ntfs] > [<f8a9d661>] ntfs_read_locked_inode+0x74/0xea9 [ntfs] > [<f8a9eabb>] ntfs_read_inode_mount+0x625/0x846 [ntfs] > [<f8aa2ff8>] ntfs_fill_super+0x8ca/0xd14 [ntfs] > [<c01647c4>] get_sb_bdev+0xed/0x14e > [<f8aa15ef>] ntfs_get_sb+0x10/0x12 [ntfs] > [<c0163c11>] vfs_kern_mount+0x76/0x143 > [<c0163d17>] do_kern_mount+0x29/0x3d > [<c0177f9f>] do_mount+0x78a/0x7e4 > [<c0178058>] sys_mount+0x5f/0x91 > [<c031fb5d>] sysenter_past_esp+0x56/0x8d > > -> #0 (&ni->mrec_lock){--..}: > [<c012ec64>] lock_acquire+0x58/0x74 > [<c031e321>] __mutex_lock_slowpath+0xa7/0x21d > [<c031e4b8>] mutex_lock+0x21/0x24 > [<f8a9edfa>] map_mft_record+0x19/0x1d2 [ntfs] > [<f8a98630>] ntfs_map_runlist_nolock+0x48/0x437 [ntfs] > [<f8a98eaa>] ntfs_map_runlist+0x87/0xb5 [ntfs] > [<f8a97739>] ntfs_readpage+0x482/0x8fd [ntfs] > [<c01415b0>] read_cache_page+0x8c/0x137 > [<f8aa20bc>] load_system_files+0x155/0x7c7 [ntfs] > [<f8aa30a7>] ntfs_fill_super+0x979/0xd14 [ntfs] > [<c01647c4>] get_sb_bdev+0xed/0x14e > [<f8aa15ef>] ntfs_get_sb+0x10/0x12 [ntfs] > [<c0163c11>] vfs_kern_mount+0x76/0x143 > [<c0163d17>] do_kern_mount+0x29/0x3d > [<c0177f9f>] do_mount+0x78a/0x7e4 > [<c0178058>] sys_mount+0x5f/0x91 > [<c031fb5d>] sysenter_past_esp+0x56/0x8d > > other info that might help us debug this: > > 2 locks held by mount/1219: > #0: (&s->s_umount#18){--..}, at: [<c0164443>] sget+0x223/0x3a1 > #1: (&rl->lock){----}, at: [<f8a98e4d>] ntfs_map_runlist+0x2a/0xb5 [ntfs] > > stack backtrace: > [<c0103924>] show_trace_log_lvl+0x54/0xfd > [<c01049a9>] show_trace+0xd/0x10 > [<c01049c3>] dump_stack+0x17/0x1c > [<c012cefb>] print_circular_bug_tail+0x59/0x64 > [<c012e766>] __lock_acquire+0x7c2/0x97a > [<c012ec64>] lock_acquire+0x58/0x74 > [<c031e321>] __mutex_lock_slowpath+0xa7/0x21d > [<c031e4b8>] mutex_lock+0x21/0x24 > [<f8a9edfa>] map_mft_record+0x19/0x1d2 [ntfs] > [<f8a98630>] ntfs_map_runlist_nolock+0x48/0x437 [ntfs] > [<f8a98eaa>] ntfs_map_runlist+0x87/0xb5 [ntfs] > [<f8a97739>] ntfs_readpage+0x482/0x8fd [ntfs] > [<c01415b0>] read_cache_page+0x8c/0x137 > [<f8aa20bc>] load_system_files+0x155/0x7c7 [ntfs] > [<f8aa30a7>] ntfs_fill_super+0x979/0xd14 [ntfs] > [<c01647c4>] get_sb_bdev+0xed/0x14e > [<f8aa15ef>] ntfs_get_sb+0x10/0x12 [ntfs] > [<c0163c11>] vfs_kern_mount+0x76/0x143 > [<c0163d17>] do_kern_mount+0x29/0x3d > [<c0177f9f>] do_mount+0x78a/0x7e4 > [<c0178058>] sys_mount+0x5f/0x91 > [<c031fb5d>] sysenter_past_esp+0x56/0x8d > NTFS volume version 3.1. -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-08 7:32 ` Anton Altaparmakov @ 2006-06-08 9:55 ` Ingo Molnar 2006-06-08 10:53 ` Anton Altaparmakov 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2006-06-08 9:55 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven hi Anton, * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > [...] It perhaps is getting confused by the special case for the table > of inodes ($MFT) which has the lock dependency reverse to all other > inodes but it is special because it can never take the lock > recursively (and hence deadlock) because we always keep the whole > runlist for $MFT in memory and should a bug or memory corruption cause > this not to be the case then ntfs will detect this and go BUG() so it > still will not deadlock... Please help me understand NTFS locking a bit better. As far as i can see at the moment, the NTFS locking scenario that the lock validator flagged does not involve two inodes - it only involves the MFT inode itself, and two of its locks. Firstly, here is a list of the NTFS terms, locks in question: ni - NTFS inode structure mft_ni - special "Master File Table" inode - one per fs. Consists of "MFT records", which describe an inode each. (mft_ni is also called the "big inode") &ni->mrec_lock - a spinlock protecting a particular inode's MFT data. (finegrained lock for the MFT record) It is typically taken by map_mft_record() and released by unmap_mft_record(). ni->runlist - maps logical addresses to on-disk addresses. (There are (two) runlists, one for normal inode data, another for attribute space data.) &rl->lock - ni->runlist.lock, a rw semaphore that protects the mapping data. Read-locked on access, write-locked on modification (extension of an inode, etc.). The MFT is loaded in-memory permanently at mount time, and its runlist gives us access to NTFS inodes. Is its runlist loaded into memory permanently too? An NTFS inode's runlist gives access to the actual file data. What the validator flagged is the following locking construct: we first acquired the MFT's &ni->mrec_lock in map_mft_record(), at: [<c0340508>] mutex_lock+0x8/0x10 [<c01d4d61>] map_mft_record+0x51/0x2c0 [<c01c51e8>] ntfs_map_runlist_nolock+0x3d8/0x530 [<c01c58b1>] ntfs_map_runlist+0x41/0x70 [<c01c1621>] ntfs_readpage+0x8c1/0x9a0 [<c0142e1c>] read_cache_page+0xac/0x150 [<c01e23f2>] load_system_files+0x472/0x2250 [<c01e4e26>] ntfs_fill_super+0xc56/0x1a50 [<c016bdee>] get_sb_bdev+0xde/0x120 [<c01e028b>] ntfs_get_sb+0x1b/0x30 [<c016b413>] vfs_kern_mount+0x33/0xa0 [<c016b4d6>] do_kern_mount+0x36/0x50 [<c01818de>] do_mount+0x28e/0x640 [<c0181cff>] sys_mount+0x6f/0xb0 then we read-locked &rl->lock [the MFT's runlist semaphore] later in map_mft_record() -> ntfs_readpage(), while still holding &ni->mrec_lock: [<c0134c4e>] down_read+0x2e/0x40 [<c01c159c>] ntfs_readpage+0x83c/0x9a0 [<c0142e1c>] read_cache_page+0xac/0x150 [<c01d4e22>] map_mft_record+0x112/0x2c0 [<c01d229d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3c6b>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4dcb>] ntfs_fill_super+0xbfb/0x1a50 [<c016bdee>] get_sb_bdev+0xde/0x120 [<c01e028b>] ntfs_get_sb+0x1b/0x30 [<c016b413>] vfs_kern_mount+0x33/0xa0 [<c016b4d6>] do_kern_mount+0x36/0x50 [<c01818de>] do_mount+0x28e/0x640 [<c0181cff>] sys_mount+0x6f/0xb0 so this is a "&ni->mrec_lock => &rl->lock" dependency for the MFT, which the validator recorded. Then the validator also observed the reverse order. We first write-locked &rl->lock (of the MFT inode): [<c0134c8e>] down_write+0x2e/0x50 [<c01c5910>] ntfs_map_runlist+0x20/0x70 [<c01c16a1>] ntfs_readpage+0x8c1/0x9a0 [<c0142e9c>] read_cache_page+0xac/0x150 [<c01e2472>] load_system_files+0x472/0x2250 [<c01e4ea6>] ntfs_fill_super+0xc56/0x1a50 [<c016be6e>] get_sb_bdev+0xde/0x120 [<c01e030b>] ntfs_get_sb+0x1b/0x30 [<c016b493>] vfs_kern_mount+0x33/0xa0 [<c016b556>] do_kern_mount+0x36/0x50 [<c018195e>] do_mount+0x28e/0x640 [<c0181d7f>] sys_mount+0x6f/0xb0 then we took &ni->mrec_lock [this is still the MFT inode's mrec_lock, and we have the &rl->lock still held]: [<c0340588>] mutex_lock+0x8/0x10 [<c01d4de1>] map_mft_record+0x51/0x2c0 [<c01c5268>] ntfs_map_runlist_nolock+0x3d8/0x530 [<c01c5931>] ntfs_map_runlist+0x41/0x70 [<c01c16a1>] ntfs_readpage+0x8c1/0x9a0 [<c0142e9c>] read_cache_page+0xac/0x150 [<c01e2472>] load_system_files+0x472/0x2250 [<c01e4ea6>] ntfs_fill_super+0xc56/0x1a50 [<c016be6e>] get_sb_bdev+0xde/0x120 [<c01e030b>] ntfs_get_sb+0x1b/0x30 [<c016b493>] vfs_kern_mount+0x33/0xa0 [<c016b556>] do_kern_mount+0x36/0x50 [<c018195e>] do_mount+0x28e/0x640 [<c0181d7f>] sys_mount+0x6f/0xb0 this means a "&rl->lock => &ni->mrec_lock" dependency, which stands in contrast with the already observed "&ni->mrec_lock => &rl->lock" dependency. The dependencies were observed for the same locks (the MFT's runlist lock and mrec_lock), i.e. this is not a confusion of normal inodes vs. the MFT inode. First and foremost, are my observations and interpretations correct? Assuming that i made no mistake that invalidates my analysis, why are the two MFT inode locks apparently taken in opposite order? Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-08 9:55 ` Ingo Molnar @ 2006-06-08 10:53 ` Anton Altaparmakov 2006-06-08 11:23 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Anton Altaparmakov @ 2006-06-08 10:53 UTC (permalink / raw) To: Ingo Molnar; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven Hi Ingo, On Thu, 2006-06-08 at 11:55 +0200, Ingo Molnar wrote: > * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > [...] It perhaps is getting confused by the special case for the table > > of inodes ($MFT) which has the lock dependency reverse to all other > > inodes but it is special because it can never take the lock > > recursively (and hence deadlock) because we always keep the whole > > runlist for $MFT in memory and should a bug or memory corruption cause > > this not to be the case then ntfs will detect this and go BUG() so it > > still will not deadlock... > > Please help me understand NTFS locking a bit better. As far as i can see Sure. > at the moment, the NTFS locking scenario that the lock validator flagged > does not involve two inodes - it only involves the MFT inode itself, and > two of its locks. > > Firstly, here is a list of the NTFS terms, locks in question: > > ni - NTFS inode structure > > mft_ni - special "Master File Table" inode - one per fs. > Consists of "MFT records", which describe an inode > each. (mft_ni is also called the "big inode") Correct except big inode is just any ntfs inode that has a vfs inode (struct inode) to go with it, i.e. a real inode visible to the vfs. There is nothing special with the MFT inode and big inodes. All files have big inodes. The non-big inodes, are not real inodes. They are inode extents. Basically one inode is typically 1024 bytes so if it fills up (e.g. lots of hard links and/or a long run list due to fragmentation and/or a huge file) then ntfs allocates more mft records and attaches them logically to the real inode thus you can have effectively infinite amounts of fragmentation/lots of hard links even though an inode is just 1024 bytes in size you just end up having lots of "extent inodes" as I call them and in the driver this means you end up with "ntfs_inode" structures that do not have a VFS struct inode attached to them. > &ni->mrec_lock - a spinlock protecting a particular inode's MFT data. > (finegrained lock for the MFT record) It is > typically taken by map_mft_record() and released by > unmap_mft_record(). Correct, except s/spinlock/semaphore/ (that really should become a mutex one day). > ni->runlist - maps logical addresses to on-disk addresses. (There > are (two) runlists, one for normal inode data, > another for attribute space data.) Correct. > &rl->lock - ni->runlist.lock, a rw semaphore that protects the > mapping data. Read-locked on access, write-locked on > modification (extension of an inode, etc.). Correct, except also write locked when the runlist is modified in memory not necessarily due to write to the inode. This occurs on first read/write to the inode because at iget() time we do not read the runlist into memory thus the first read/write will cause &rl->lock to be taken for writing and the runlist will be read in. This also happens when the file is fragmented. Then there are multiple runlist "fragments", each describing a portion of the data, thus you have fragment 0 describing file offsets 0 to X, then fragment 1 describing offsets X+1 to Y, etc... We only load into memory each fragment as it is used, thus on a read from a file we do: down_read(&rl->lock); look up in the runlist the physical location if (physical location is unknown because this runlist fragment was never needed before) { up_read(&rl->lock); down_write(&rl->lock); redo the lookup to ensure no-one else mapped this fragment under our feet and if not read it in and merge it with the existing runlist and goto above looking up the physical location in the runlist which now will succeed up_read/up_write(&rl->lock); Basically we are on-demand loading the runlist fragments. > The MFT is loaded in-memory permanently at mount time, and its runlist > gives us access to NTFS inodes. Is its runlist loaded into memory > permanently too? An NTFS inode's runlist gives access to the actual file > data. Clarification: The MFT inode is loaded at mount time and ALL its runlist fragments are loaded and merged into one big runlist available at mft_ni->runlist. The mft data, i.e. the inodes themselves are not loaded all the time. That is what map_mft_record() does and it needs to lookup in mft_ni->runlist where the inode to be loaded is on disk. This last lookup of physical location in map_mft_record() [actually in readpage as map_mft_record() reads the page cache page containing the record] cannot require us to load an extent mft record because the runlist is complete so we cannot deadlock here. Your last statement is correct. The runlist is a mapping of "file offset" to "physical device offset". > What the validator flagged is the following locking construct: > > we first acquired the MFT's &ni->mrec_lock in map_mft_record(), at: No, this shows acquisition of the MFTMirr &ni->mrec_lock (the mirror of the MFT, it only contains the first few MFT records stored in the the middle of the disk for recovery purposes). If you look at "fs/ntfs/super.c::load_system_files()" you will see it calls load_and_init_mft_mirror() as the first thing which as the first thing does: ntfs_iget(vol->sb, FILE_MFTMirr); thus it is locking the mft record for FILE_MFTMirr which is inode 1 (whilst the MFT itself is FILE_MFT which is inode 0). > [<c0340508>] mutex_lock+0x8/0x10 > [<c01d4d61>] map_mft_record+0x51/0x2c0 > [<c01c51e8>] ntfs_map_runlist_nolock+0x3d8/0x530 > [<c01c58b1>] ntfs_map_runlist+0x41/0x70 > [<c01c1621>] ntfs_readpage+0x8c1/0x9a0 > [<c0142e1c>] read_cache_page+0xac/0x150 > [<c01e23f2>] load_system_files+0x472/0x2250 > [<c01e4e26>] ntfs_fill_super+0xc56/0x1a50 > [<c016bdee>] get_sb_bdev+0xde/0x120 > [<c01e028b>] ntfs_get_sb+0x1b/0x30 > [<c016b413>] vfs_kern_mount+0x33/0xa0 > [<c016b4d6>] do_kern_mount+0x36/0x50 > [<c01818de>] do_mount+0x28e/0x640 > [<c0181cff>] sys_mount+0x6f/0xb0 > > then we read-locked &rl->lock [the MFT's runlist semaphore] later in > map_mft_record() -> ntfs_readpage(), But the below trace preceeds the above trace in time! Below we see ntfs_read_inode_mount() being called which is called _before_ load_system_files() is called... (See ntfs_fill_super() which is the only place that calls both of those functions.) Apart from that you are correct that we are taking the &tl->lock of the MFT's runlist semaphore here. > while still holding &ni->mrec_lock: Wrong as explained above this is MFTMirr's ni->mrec_lock being held in the above trace and it only happens later on, not earlier. > [<c0134c4e>] down_read+0x2e/0x40 > [<c01c159c>] ntfs_readpage+0x83c/0x9a0 > [<c0142e1c>] read_cache_page+0xac/0x150 > [<c01d4e22>] map_mft_record+0x112/0x2c0 > [<c01d229d>] ntfs_read_locked_inode+0x8d/0x15d0 > [<c01d3c6b>] ntfs_read_inode_mount+0x48b/0xba0 > [<c01e4dcb>] ntfs_fill_super+0xbfb/0x1a50 > [<c016bdee>] get_sb_bdev+0xde/0x120 > [<c01e028b>] ntfs_get_sb+0x1b/0x30 > [<c016b413>] vfs_kern_mount+0x33/0xa0 > [<c016b4d6>] do_kern_mount+0x36/0x50 > [<c01818de>] do_mount+0x28e/0x640 > [<c0181cff>] sys_mount+0x6f/0xb0 > > so this is a "&ni->mrec_lock => &rl->lock" dependency for the MFT, which > the validator recorded. No it is not as explained above. Something has gotten confused somewhere because the order of events is the wrong way round... > Then the validator also observed the reverse order. We first > write-locked &rl->lock (of the MFT inode): > > [<c0134c8e>] down_write+0x2e/0x50 > [<c01c5910>] ntfs_map_runlist+0x20/0x70 > [<c01c16a1>] ntfs_readpage+0x8c1/0x9a0 > [<c0142e9c>] read_cache_page+0xac/0x150 > [<c01e2472>] load_system_files+0x472/0x2250 > [<c01e4ea6>] ntfs_fill_super+0xc56/0x1a50 > [<c016be6e>] get_sb_bdev+0xde/0x120 > [<c01e030b>] ntfs_get_sb+0x1b/0x30 > [<c016b493>] vfs_kern_mount+0x33/0xa0 > [<c016b556>] do_kern_mount+0x36/0x50 > [<c018195e>] do_mount+0x28e/0x640 > [<c0181d7f>] sys_mount+0x6f/0xb0 No this will be something else's inode. The first time you see the above trace should be when MFTMirr is being read in in load_system_files() when it calls check_mft_mirror(). The runlist of MFTMirr is at that stage not read in yet thus it will take the runlist lock of MFTMirr's inode for writing and read the runlist into memory. > then we took &ni->mrec_lock [this is still the MFT inode's mrec_lock, > and we have the &rl->lock still held]: > > [<c0340588>] mutex_lock+0x8/0x10 > [<c01d4de1>] map_mft_record+0x51/0x2c0 > [<c01c5268>] ntfs_map_runlist_nolock+0x3d8/0x530 > [<c01c5931>] ntfs_map_runlist+0x41/0x70 > [<c01c16a1>] ntfs_readpage+0x8c1/0x9a0 > [<c0142e9c>] read_cache_page+0xac/0x150 > [<c01e2472>] load_system_files+0x472/0x2250 > [<c01e4ea6>] ntfs_fill_super+0xc56/0x1a50 > [<c016be6e>] get_sb_bdev+0xde/0x120 > [<c01e030b>] ntfs_get_sb+0x1b/0x30 > [<c016b493>] vfs_kern_mount+0x33/0xa0 > [<c016b556>] do_kern_mount+0x36/0x50 > [<c018195e>] do_mount+0x28e/0x640 > [<c0181d7f>] sys_mount+0x6f/0xb0 Again this is the mrec_lock of the MFTMirr ntfs_inode not the MFT ntfs_inode (mft_ni). > this means a "&rl->lock => &ni->mrec_lock" dependency, which stands in > contrast with the already observed "&ni->mrec_lock => &rl->lock" > dependency. Could you re-evaluate that conclusion given the above feedback? I think it is wrong given the inversion in time and the difference in inodes as explained above. > The dependencies were observed for the same locks (the MFT's runlist > lock and mrec_lock), i.e. this is not a confusion of normal inodes vs. > the MFT inode. > > First and foremost, are my observations and interpretations correct? > Assuming that i made no mistake that invalidates my analysis, why are > the two MFT inode locks apparently taken in opposite order? I think you got it wrong I am afraid. Does what I explained above make more sense? In particular I am worried about the fact that you showed two events to happen in reverse chronological order which is impossible. (unless time flows backwards when you run your tests... (-;) Please keep the questions coming... Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-08 10:53 ` Anton Altaparmakov @ 2006-06-08 11:23 ` Ingo Molnar 2006-06-09 8:09 ` Anton Altaparmakov 2006-06-08 16:11 ` Ingo Molnar 2006-06-09 7:19 ` Ingo Molnar 2 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2006-06-08 11:23 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > No it is not as explained above. Something has gotten confused > somewhere because the order of events is the wrong way round... ok, next try. Find below a chronological trace of the lock acquire and new dependency events leading up the the circular dependency message. The first (mrec_lock -> runlist.lock) dependency is: new dependency: (&ni->mrec_lock){--..} => (&rl->lock){..--} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c4c>] down_read+0x2c/0x40 [<c01c170c>] ntfs_readpage+0x83c/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d and the opposite (runlist.lock -> mrec_lock) dependency wants to get created at: [<c0340678>] mutex_lock+0x8/0x10 [<c01d4ed1>] map_mft_record+0x51/0x2c0 [<c01c5358>] ntfs_map_runlist_nolock+0x3d8/0x530 [<c01c5a21>] ntfs_map_runlist+0x41/0x70 [<c01c1791>] ntfs_readpage+0x8c1/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01e2562>] load_system_files+0x472/0x2250 [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d does this trace make more sense? Ingo ----{ NTFS lock events trace }-------------> new type c05bfcc4: &ni->mrec_lock [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138f24>] __lock_acquire+0xd04/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c034043f>] __mutex_lock_slowpath+0x6f/0x2a0 [<c0340678>] mutex_lock+0x8/0x10 [<c01d4ed1>] map_mft_record+0x51/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d acquire type [c05bfcc4] &ni->mrec_lock [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138759>] __lock_acquire+0x539/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c034043f>] __mutex_lock_slowpath+0x6f/0x2a0 [<c0340678>] mutex_lock+0x8/0x10 [<c01d4ed1>] map_mft_record+0x51/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new dependency: (&s->s_umount#15){--..} => (&ni->mrec_lock){....} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c034043f>] __mutex_lock_slowpath+0x6f/0x2a0 [<c0340678>] mutex_lock+0x8/0x10 [<c01d4ed1>] map_mft_record+0x51/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d marked lock as {hardirq-on-W}: (&ni->mrec_lock){-...}, at: [<c0340678>] mutex_lock+0x8/0x10 irq event stamp: 2309 hardirqs last enabled at (2307): [<c03420a7>] _spin_unlock_irqrestore+0x47/0x60 hardirqs last disabled at (2309): [<c011cad6>] vprintk+0x26/0x360 softirqs last enabled at (1988): [<c0121e0e>] __do_softirq+0xfe/0x110 softirqs last disabled at (1981): [<c0105049>] do_softirq+0x69/0x100 [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0137d37>] mark_lock+0x1c7/0x6b0 [<c01394db>] mark_held_locks+0x4b/0xa0 [<c0139660>] trace_hardirqs_on+0x70/0x160 [<c034052f>] __mutex_lock_slowpath+0x15f/0x2a0 [<c0340678>] mutex_lock+0x8/0x10 [<c01d4ed1>] map_mft_record+0x51/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d marked lock as {softirq-on-W}: (&ni->mrec_lock){--..}, at: [<c0340678>] mutex_lock+0x8/0x10 irq event stamp: 2309 hardirqs last enabled at (2307): [<c03420a7>] _spin_unlock_irqrestore+0x47/0x60 hardirqs last disabled at (2309): [<c011cad6>] vprintk+0x26/0x360 softirqs last enabled at (1988): [<c0121e0e>] __do_softirq+0xfe/0x110 softirqs last disabled at (1981): [<c0105049>] do_softirq+0x69/0x100 [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0137d37>] mark_lock+0x1c7/0x6b0 [<c0139512>] mark_held_locks+0x82/0xa0 [<c01396a0>] trace_hardirqs_on+0xb0/0x160 [<c034052f>] __mutex_lock_slowpath+0x15f/0x2a0 [<c0340678>] mutex_lock+0x8/0x10 [<c01d4ed1>] map_mft_record+0x51/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new dependency: (&ni->mrec_lock){--..} => (cpa_lock){++..} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0342159>] _spin_lock_irqsave+0x39/0x50 [<c01135fc>] change_page_attr+0x1c/0x2a0 [<c01138aa>] kernel_map_pages+0x2a/0xa0 [<c0147f53>] get_page_from_freelist+0x253/0x3c0 [<c0148110>] __alloc_pages+0x50/0x2f0 [<c0142f65>] read_cache_page+0x85/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new dependency: (&ni->mrec_lock){--..} => (&inode->i_data.tree_lock){++..} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0341c82>] _write_lock_irq+0x32/0x40 [<c0142181>] add_to_page_cache+0x41/0xd0 [<c014222b>] add_to_page_cache_lru+0x1b/0x40 [<c0142f17>] read_cache_page+0x37/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new dependency: (&ni->mrec_lock){--..} => (&inode->i_data.private_lock){--..} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0341bbc>] _spin_lock+0x2c/0x40 [<c0168053>] create_empty_buffers+0x33/0xa0 [<c01c11e7>] ntfs_readpage+0x317/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new type c05bfcbc: &rl->lock [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138f24>] __lock_acquire+0xd04/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c4c>] down_read+0x2c/0x40 [<c01c170c>] ntfs_readpage+0x83c/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d acquire type [c05bfcbc] &rl->lock [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138759>] __lock_acquire+0x539/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c4c>] down_read+0x2c/0x40 [<c01c170c>] ntfs_readpage+0x83c/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d marked lock as {hardirq-on-R}: (&rl->lock){..-.}, at: [<c01c170c>] ntfs_readpage+0x83c/0x9a0 irq event stamp: 2440 hardirqs last enabled at (2439): [<c0342107>] _read_unlock_irqrestore+0x47/0x60 hardirqs last disabled at (2440): [<c011cad6>] vprintk+0x26/0x360 softirqs last enabled at (2436): [<c0121e0e>] __do_softirq+0xfe/0x110 softirqs last disabled at (2419): [<c0105049>] do_softirq+0x69/0x100 [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0137d37>] mark_lock+0x1c7/0x6b0 [<c0138776>] __lock_acquire+0x556/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c4c>] down_read+0x2c/0x40 [<c01c170c>] ntfs_readpage+0x83c/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d marked lock as {softirq-on-R}: (&rl->lock){..--}, at: [<c01c170c>] ntfs_readpage+0x83c/0x9a0 irq event stamp: 2440 hardirqs last enabled at (2439): [<c0342107>] _read_unlock_irqrestore+0x47/0x60 hardirqs last disabled at (2440): [<c011cad6>] vprintk+0x26/0x360 softirqs last enabled at (2436): [<c0121e0e>] __do_softirq+0xfe/0x110 softirqs last disabled at (2419): [<c0105049>] do_softirq+0x69/0x100 [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0137d37>] mark_lock+0x1c7/0x6b0 [<c0138882>] __lock_acquire+0x662/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c4c>] down_read+0x2c/0x40 [<c01c170c>] ntfs_readpage+0x83c/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new dependency: (&ni->mrec_lock){--..} => (&rl->lock){..--} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c4c>] down_read+0x2c/0x40 [<c01c170c>] ntfs_readpage+0x83c/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new dependency: (&ni->mrec_lock){--..} => (&lo->lo_lock){....} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0341f62>] _spin_lock_irq+0x32/0x40 [<c0276e69>] loop_make_request+0x49/0x100 [<c01f3f51>] generic_make_request+0x1c1/0x270 [<c01f404e>] submit_bio+0x4e/0xe0 [<c016731d>] submit_bh+0xcd/0x130 [<c01c173e>] ntfs_readpage+0x86e/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new dependency: (&ni->mrec_lock){--..} => (&q->lock){++..} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0342159>] _spin_lock_irqsave+0x39/0x50 [<c01140cb>] complete+0x1b/0x60 [<c0276ed7>] loop_make_request+0xb7/0x100 [<c01f3f51>] generic_make_request+0x1c1/0x270 [<c01f404e>] submit_bio+0x4e/0xe0 [<c016731d>] submit_bh+0xcd/0x130 [<c01c173e>] ntfs_readpage+0x86e/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new dependency: (&ni->mrec_lock){--..} => (ide_lock){++..} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0341f62>] _spin_lock_irq+0x32/0x40 [<c01f30d1>] generic_unplug_device+0x11/0x30 [<c01f25e2>] blk_backing_dev_unplug+0x12/0x20 [<c0276f57>] loop_unplug+0x37/0x40 [<c01f25e2>] blk_backing_dev_unplug+0x12/0x20 [<c0166269>] block_sync_page+0x39/0x50 [<c0141cc8>] sync_page+0x38/0x50 [<c033fc69>] __wait_on_bit_lock+0x49/0x70 [<c0141dcb>] __lock_page+0x6b/0x80 [<c0142fe5>] read_cache_page+0x105/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new dependency: (&ni->mrec_lock){--..} => (&rq->lock){++..} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0341f62>] _spin_lock_irq+0x32/0x40 [<c033ebba>] schedule+0xea/0xb80 [<c033f678>] io_schedule+0x28/0x40 [<c0141ccd>] sync_page+0x3d/0x50 [<c033fc69>] __wait_on_bit_lock+0x49/0x70 [<c0141dcb>] __lock_page+0x6b/0x80 [<c0142fe5>] read_cache_page+0x105/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d acquire type [c05bfcc4] &ni->mrec_lock [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138759>] __lock_acquire+0x539/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c034043f>] __mutex_lock_slowpath+0x6f/0x2a0 [<c0340678>] mutex_lock+0x8/0x10 [<c01d4ed1>] map_mft_record+0x51/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d4545>] ntfs_iget+0x55/0x80 [<c01e216a>] load_system_files+0x7a/0x2250 [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d acquire type [c05bfcbc] &rl->lock [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138759>] __lock_acquire+0x539/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c4c>] down_read+0x2c/0x40 [<c01c170c>] ntfs_readpage+0x83c/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01e2562>] load_system_files+0x472/0x2250 [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d new dependency: (&s->s_umount#15){--..} => (&rl->lock){..--} [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138bc9>] __lock_acquire+0x9a9/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c4c>] down_read+0x2c/0x40 [<c01c170c>] ntfs_readpage+0x83c/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01e2562>] load_system_files+0x472/0x2250 [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d acquire type [c05bfcbc] &rl->lock [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138759>] __lock_acquire+0x539/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c8c>] down_write+0x2c/0x50 [<c01c5a00>] ntfs_map_runlist+0x20/0x70 [<c01c1791>] ntfs_readpage+0x8c1/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01e2562>] load_system_files+0x472/0x2250 [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d marked lock as {hardirq-on-W}: (&rl->lock){-.--}, at: [<c01c5a00>] ntfs_map_runlist+0x20/0x70 irq event stamp: 2940 hardirqs last enabled at (2939): [<c0342403>] restore_nocheck+0x12/0x15 hardirqs last disabled at (2940): [<c011cad6>] vprintk+0x26/0x360 softirqs last enabled at (2938): [<c0121e0e>] __do_softirq+0xfe/0x110 softirqs last disabled at (2923): [<c0105049>] do_softirq+0x69/0x100 [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0137d37>] mark_lock+0x1c7/0x6b0 [<c0138588>] __lock_acquire+0x368/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c8c>] down_write+0x2c/0x50 [<c01c5a00>] ntfs_map_runlist+0x20/0x70 [<c01c1791>] ntfs_readpage+0x8c1/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01e2562>] load_system_files+0x472/0x2250 [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d marked lock as {softirq-on-W}: (&rl->lock){----}, at: [<c01c5a00>] ntfs_map_runlist+0x20/0x70 irq event stamp: 2940 hardirqs last enabled at (2939): [<c0342403>] restore_nocheck+0x12/0x15 hardirqs last disabled at (2940): [<c011cad6>] vprintk+0x26/0x360 softirqs last enabled at (2938): [<c0121e0e>] __do_softirq+0xfe/0x110 softirqs last disabled at (2923): [<c0105049>] do_softirq+0x69/0x100 [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0137d37>] mark_lock+0x1c7/0x6b0 [<c0138882>] __lock_acquire+0x662/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c8c>] down_write+0x2c/0x50 [<c01c5a00>] ntfs_map_runlist+0x20/0x70 [<c01c1791>] ntfs_readpage+0x8c1/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01e2562>] load_system_files+0x472/0x2250 [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d acquire type [c05bfcc4] &ni->mrec_lock [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0138759>] __lock_acquire+0x539/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c034043f>] __mutex_lock_slowpath+0x6f/0x2a0 [<c0340678>] mutex_lock+0x8/0x10 [<c01d4ed1>] map_mft_record+0x51/0x2c0 [<c01c5358>] ntfs_map_runlist_nolock+0x3d8/0x530 [<c01c5a21>] ntfs_map_runlist+0x41/0x70 [<c01c1791>] ntfs_readpage+0x8c1/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01e2562>] load_system_files+0x472/0x2250 [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d ===================================================== [ BUG: possible circular locking deadlock detected! ] ----------------------------------------------------- mount/2406 is trying to acquire lock: (&ni->mrec_lock){--..}, at: [<c0340678>] mutex_lock+0x8/0x10 but task is already holding lock: (&rl->lock){----}, at: [<c01c5a00>] ntfs_map_runlist+0x20/0x70 which lock already depends on the new lock, which could lead to circular deadlocks! the existing dependency chain (in reverse order) is: -> #1 (&rl->lock){----}: [<c013946b>] lock_acquire+0x7b/0xa0 [<c0134c4c>] down_read+0x2c/0x40 [<c01c170c>] ntfs_readpage+0x83c/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01d4f92>] map_mft_record+0x112/0x2c0 [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d -> #0 (&ni->mrec_lock){--..}: [<c013946b>] lock_acquire+0x7b/0xa0 [<c034043f>] __mutex_lock_slowpath+0x6f/0x2a0 [<c0340678>] mutex_lock+0x8/0x10 [<c01d4ed1>] map_mft_record+0x51/0x2c0 [<c01c5358>] ntfs_map_runlist_nolock+0x3d8/0x530 [<c01c5a21>] ntfs_map_runlist+0x41/0x70 [<c01c1791>] ntfs_readpage+0x8c1/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01e2562>] load_system_files+0x472/0x2250 [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d other info that might help us debug this: 2 locks held by mount/2406: #0: (&s->s_umount#15){--..}, at: [<c016bb8d>] sget+0x11d/0x2f0 #1: (&rl->lock){----}, at: [<c01c5a00>] ntfs_map_runlist+0x20/0x70 stack backtrace: [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0136e61>] print_circular_bug_tail+0x61/0x70 [<c013896f>] __lock_acquire+0x74f/0xde0 [<c013946b>] lock_acquire+0x7b/0xa0 [<c034043f>] __mutex_lock_slowpath+0x6f/0x2a0 [<c0340678>] mutex_lock+0x8/0x10 [<c01d4ed1>] map_mft_record+0x51/0x2c0 [<c01c5358>] ntfs_map_runlist_nolock+0x3d8/0x530 [<c01c5a21>] ntfs_map_runlist+0x41/0x70 [<c01c1791>] ntfs_readpage+0x8c1/0x9a0 [<c0142f8c>] read_cache_page+0xac/0x150 [<c01e2562>] load_system_files+0x472/0x2250 [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 [<c016bf5e>] get_sb_bdev+0xde/0x120 [<c01e03fb>] ntfs_get_sb+0x1b/0x30 [<c016b583>] vfs_kern_mount+0x33/0xa0 [<c016b646>] do_kern_mount+0x36/0x50 [<c0181a4e>] do_mount+0x28e/0x640 [<c0181e6f>] sys_mount+0x6f/0xb0 [<c034233d>] sysenter_past_esp+0x56/0x8d NTFS volume version 1.2. NTFS-fs warning (device loop0): load_system_files(): Disabling sparse support due to NTFS volume version 1.2 (need at least version 3.0). ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-08 11:23 ` Ingo Molnar @ 2006-06-09 8:09 ` Anton Altaparmakov 2006-06-10 7:59 ` Ingo Molnar 0 siblings, 1 reply; 21+ messages in thread From: Anton Altaparmakov @ 2006-06-09 8:09 UTC (permalink / raw) To: Ingo Molnar; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven On Thu, 2006-06-08 at 13:23 +0200, Ingo Molnar wrote: > * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > > No it is not as explained above. Something has gotten confused > > somewhere because the order of events is the wrong way round... > > ok, next try. Find below a chronological trace of the lock acquire and > new dependency events leading up the the circular dependency message. > > The first (mrec_lock -> runlist.lock) dependency is: > > new dependency: (&ni->mrec_lock){--..} => (&rl->lock){..--} > [<c0104bf2>] show_trace+0x12/0x20 > [<c0104c19>] dump_stack+0x19/0x20 > [<c0138bc9>] __lock_acquire+0x9a9/0xde0 > [<c013946b>] lock_acquire+0x7b/0xa0 > [<c0134c4c>] down_read+0x2c/0x40 This locks the runlist of the MFT (i.e. mft_ni->runlist.lock). > [<c01c170c>] ntfs_readpage+0x83c/0x9a0 > [<c0142f8c>] read_cache_page+0xac/0x150 > [<c01d4f92>] map_mft_record+0x112/0x2c0 This locks the mft record of MFT (i.e. mft_ni->mrec_lock). > [<c01d240d>] ntfs_read_locked_inode+0x8d/0x15d0 > [<c01d3ddb>] ntfs_read_inode_mount+0x48b/0xba0 > [<c01e4f3b>] ntfs_fill_super+0xbfb/0x1a50 > [<c016bf5e>] get_sb_bdev+0xde/0x120 > [<c01e03fb>] ntfs_get_sb+0x1b/0x30 > [<c016b583>] vfs_kern_mount+0x33/0xa0 > [<c016b646>] do_kern_mount+0x36/0x50 > [<c0181a4e>] do_mount+0x28e/0x640 > [<c0181e6f>] sys_mount+0x6f/0xb0 > [<c034233d>] sysenter_past_esp+0x56/0x8d Let us call the above case A, and the description of case A is: mapping the mft record of the MFT itself. This is a special case because it is circular, i.e. we are mapping into memory a part of a file using the contents of the file itself. Sound weird? It is. It is a catch-22 situation. How can you read a file if you need to have read the file already to be able read it? (-: The solution is ntfs_read_inode_mount(). It loads by hand using sb_bread() the first mft record in the MFT file, i.e. mft record 0, which is the mft record for the MFT itself. The on-disk location is written down in the ntfs boot sector which is already read and parsed. Armed with that information the volume is boot strapped so that the needed information to read the mft record for the MFT itself can be done via reading a page into memory which is what we see above with the above map_mft_record()->read_cache_page()->ntfs_readpage(). The bootstrapping consists of setting up the struct inode and ntfs_inode mft_ni appropriately and loading the runlist for the first fragment of the mft into memory (mft_ni->runlist). Doing this means that ntfs_readpage() locks the runlist (mft_ni->runlist) for reading, looks up the physical location for the first mft record, unlocks the runlist and then reads the data from disk. This always succeeds as we just initialized the runlist by hand before calling map_mft_record(). ntfs_read_inode_mount() then proceeds to map all extent mft records for the MFT in sequence (if there are any extent mft records) and to load their runlist data into memory into mft_ni->runlist. Thus by the time ntfs_read_inode_mount() is finished mft_ni->runlist is completely mapped into memory and it is guaranteed to stay there until umount time. There are various places in the ntfs driver where we go BUG() if we ever find an unmapped region in mft_ni->runlist. Make sense so far? I hope so. (-: > and the opposite (runlist.lock -> mrec_lock) dependency wants to get > created at: > > [<c0340678>] mutex_lock+0x8/0x10 > [<c01d4ed1>] map_mft_record+0x51/0x2c0 > [<c01c5358>] ntfs_map_runlist_nolock+0x3d8/0x530 > [<c01c5a21>] ntfs_map_runlist+0x41/0x70 > [<c01c1791>] ntfs_readpage+0x8c1/0x9a0 > [<c0142f8c>] read_cache_page+0xac/0x150 > [<c01e2562>] load_system_files+0x472/0x2250 > [<c01e4f96>] ntfs_fill_super+0xc56/0x1a50 > [<c016bf5e>] get_sb_bdev+0xde/0x120 > [<c01e03fb>] ntfs_get_sb+0x1b/0x30 > [<c016b583>] vfs_kern_mount+0x33/0xa0 > [<c016b646>] do_kern_mount+0x36/0x50 > [<c0181a4e>] do_mount+0x28e/0x640 > [<c0181e6f>] sys_mount+0x6f/0xb0 > [<c034233d>] sysenter_past_esp+0x56/0x8d Ok, by now we are in the "normal" code paths with mft_ni->runlist fully loaded in. The above trace I think must be coming from check_mft_mirror()->ntfs_map_page() which tries to read the MFT and then the MFT mirror to compare their contents. The driver loads the mft first but we already have the page index 0 of the MFT thus read_cache_page() finds the page in the page cache (you can see it happen in the long trace you sent me) so I am assuming the above trace comes from the next ntfs_map_page() of the mft mirror page index 0. Let me describe that one: ntfs_map_page(mft mirror, page index 0); -> read_cache_page(); -> ntfs_readpage(); ntfs_readpage() take the runlist lock of the mft mirror inode whose data is being read into the page (NTFS_I(vol->mftmirr_ino)->runlist.lock) for reading. It then looks up the physical location for the blocks in the first page in the runlist. But given we have never accessed the data in the mft mirror inode, the runlist is not mapped into memory. Thus ntfs_readpage() drops the runlist lock and calls ntfs_map_runlist() to map the runlist into memory. ntfs_map_runlist() retakes the runlist lock for writing as it is about to modify the in-memory runlist. It then checks that someone else did not map the runlist into memory whilst we waited on the lock and assuming we still need to map it, it goes and maps the mft record for the mft mirror inode which is the map_mft_record() we see above. This takes the mft record lock for the mft mirror (NTFS_I(vol->mftmirr_ino)->mrec_lock), which the lock validator complains about. Let us call this "normal" code path B and the description is read data from a file that is not MFT itself. For this we need to read the file data to do which we need to know where it is on disk to do which we need the runlist thus we need to lock it so a file modification (or concurrent file read causing another fragment of the runlist to be read into memory) cannot change it under our feet. we then look up the physical location in the runlist and can read the data in the common case. In the uncommon case (first access) the in-memory runlist does not contain the physical mapping data thus we need to map the mft record (i.e. the on disk inode) into memory and read the runlist from it into memory so that we can then look up the physical location of the data in the runlist as in the common case. Thus to summarise, there are two cases: 1) The normal access pattern (B) where the runlist lock is always taken first. And the mft record lock is taken second and only if the runlist is incomplete in-memory. Of course on file modification, this is also the case, the runlist lock is taken first, then the mft record lock is taken and thus both the runlist and the inode can be updated with the new data (e.g. on a file extend). 2) The special access pattern (A) where the MFT record 0 itself is read for the first time which is where the mft record lock is taken first and then the runlist lock is taken to determine its on-disk location. If this were a normal file, we would then need to take the mft record lock _again_ (thus deadlock!) to load the runlist data into memory. In thus far the lock validator is correct. There is a 100% certain deadlock. BUT we bootstrapped the runlist of the mft inode mft_ni in ntfs_inode_read_mount() so that it always finds the physical mapping in the runlist thus it _never_ has to lock the mft record that second time thus the deadlock _never_ happens. > does this trace make more sense? Yes, makes perfect sense to me. I hope with my descriptions above it makes sense to you, too. Please let me know if not. (-: It is not the simplest of things because of the catch-22 situation which is why such an elaborate boot strapping is required. But once boot strapped IMHO it is very elegant because it means that the exact same code is used for accessing both metadata and normal file data. And we get to confuse the lock validator which has got to be a bonus. (-; Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-09 8:09 ` Anton Altaparmakov @ 2006-06-10 7:59 ` Ingo Molnar 2006-06-10 8:48 ` Anton Altaparmakov 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2006-06-10 7:59 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > The normal access pattern (B) where the runlist lock is always taken > first. And the mft record lock is taken second and only if the > runlist is incomplete in-memory. > > Of course on file modification, this is also the case, the runlist > lock is taken first, then the mft record lock is taken and thus both > the runlist and the inode can be updated with the new data (e.g. on a > file extend). thanks for the detailed explanation! I have annotated the code for the lock validator as much as i could, by: - excluding ntfs_fill_super() from the locking rules, - 'splitting' the MFT's mrec_lock and runlist->lock locking rules from the other inodes's locking rules, - splitting the mrec_lock rules of extent inodes. (We map them recursively while having the main inode mft record mapped. The nesting is safe because inode->extent_inode is a noncircular relation.) Still there seems to be a case that the validator does not grok: load_attribute_list() seems to take the lock in the opposite order from what you described above. What locking detail am i missing? [let me know if you need all dependency events leading up to this message from the validator] Ingo ======================================================= [ INFO: possible circular locking dependency detected ] ------------------------------------------------------- ls/2581 is trying to acquire lock: (&rl->lock){----}, at: [<c01c1f5b>] load_attribute_list+0xfb/0x3c0 but task is already holding lock: (&ni->mrec_lock){--..}, at: [<c01d50c5>] map_mft_record_type+0x55/0x2d0 which lock already depends on the new lock, which could lead to circular locking dependencies. the existing dependency chain (in reverse order) is: -> #1 (&ni->mrec_lock){--..}: [<c01394df>] lock_acquire+0x6f/0x90 [<c0346183>] mutex_lock_nested+0x73/0x2a0 [<c01d5e43>] map_mft_record+0x53/0x2c0 [<c01c54f8>] ntfs_map_runlist_nolock+0x3d8/0x530 [<c01c5bc1>] ntfs_map_runlist+0x41/0x70 [<c01c1929>] ntfs_readpage+0x8c9/0x9b0 [<c0142ffc>] read_cache_page+0xac/0x150 [<c01e212d>] ntfs_statfs+0x41d/0x660 [<c0163254>] vfs_statfs+0x54/0x70 [<c0163288>] vfs_statfs64+0x18/0x30 [<c0163384>] sys_statfs64+0x64/0xa0 [<c0347dcd>] sysenter_past_esp+0x56/0x8d -> #0 (&rl->lock){----}: [<c01394df>] lock_acquire+0x6f/0x90 [<c0134c8a>] down_read_nested+0x2a/0x40 [<c01c1f5b>] load_attribute_list+0xfb/0x3c0 [<c01d323e>] ntfs_read_locked_inode+0xcee/0x15d0 [<c01d4735>] ntfs_iget+0x55/0x80 [<c01db3da>] ntfs_lookup+0x14a/0x740 [<c01736b6>] do_lookup+0x126/0x150 [<c0173ef3>] __link_path_walk+0x813/0xe50 [<c017457c>] link_path_walk+0x4c/0xf0 [<c0174a2d>] do_path_lookup+0xad/0x260 [<c0175228>] __user_walk_fd+0x38/0x60 [<c016e3be>] vfs_lstat_fd+0x1e/0x50 [<c016e401>] vfs_lstat+0x11/0x20 [<c016ec04>] sys_lstat64+0x14/0x30 [<c0347dcd>] sysenter_past_esp+0x56/0x8d other info that might help us debug this: 2 locks held by ls/2581: #0: (&inode->i_mutex){--..}, at: [<c0346108>] mutex_lock+0x8/0x10 #1: (&ni->mrec_lock){--..}, at: [<c01d50c5>] map_mft_record_type+0x55/0x2d0 stack backtrace: [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0136ef1>] print_circular_bug_tail+0x61/0x70 [<c01389ff>] __lock_acquire+0x74f/0xde0 [<c01394df>] lock_acquire+0x6f/0x90 [<c0134c8a>] down_read_nested+0x2a/0x40 [<c01c1f5b>] load_attribute_list+0xfb/0x3c0 [<c01d323e>] ntfs_read_locked_inode+0xcee/0x15d0 [<c01d4735>] ntfs_iget+0x55/0x80 [<c01db3da>] ntfs_lookup+0x14a/0x740 [<c01736b6>] do_lookup+0x126/0x150 [<c0173ef3>] __link_path_walk+0x813/0xe50 [<c017457c>] link_path_walk+0x4c/0xf0 [<c0174a2d>] do_path_lookup+0xad/0x260 [<c0175228>] __user_walk_fd+0x38/0x60 [<c016e3be>] vfs_lstat_fd+0x1e/0x50 [<c016e401>] vfs_lstat+0x11/0x20 [<c016ec04>] sys_lstat64+0x14/0x30 [<c0347dcd>] sysenter_past_esp+0x56/0x8d ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-10 7:59 ` Ingo Molnar @ 2006-06-10 8:48 ` Anton Altaparmakov 2006-06-11 5:31 ` Ingo Molnar 0 siblings, 1 reply; 21+ messages in thread From: Anton Altaparmakov @ 2006-06-10 8:48 UTC (permalink / raw) To: Ingo Molnar; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven On Sat, 10 Jun 2006, Ingo Molnar wrote: > * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > The normal access pattern (B) where the runlist lock is always taken > > first. And the mft record lock is taken second and only if the > > runlist is incomplete in-memory. > > > > Of course on file modification, this is also the case, the runlist > > lock is taken first, then the mft record lock is taken and thus both > > the runlist and the inode can be updated with the new data (e.g. on a > > file extend). > > thanks for the detailed explanation! You are welcome. NTFS is not exactly simple code and its locking is certainly not simple at all... > I have annotated the code for the lock validator as much as i could, by: > > - excluding ntfs_fill_super() from the locking rules, > > - 'splitting' the MFT's mrec_lock and runlist->lock locking rules from > the other inodes's locking rules, > > - splitting the mrec_lock rules of extent inodes. (We map them > recursively while having the main inode mft record mapped. The nesting > is safe because inode->extent_inode is a noncircular relation.) Sounds good! > Still there seems to be a case that the validator does not grok: > load_attribute_list() seems to take the lock in the opposite order from > what you described above. What locking detail am i missing? [let me know > if you need all dependency events leading up to this message from the > validator] Ah, yes, I had completely forgotten about that one, sorry. This is another special case I am afraid. Note how there are two runlists in an ntfs_inode. We have: ntfs_inode->runlist - This is the runlist for the inode data (for files this is the actual file data, for directories this is the directory B-tree, for index inodes this is the view index B-tree, for attribute inodes this is the attribute data). This is what we have been dealing with before and what you have already solved by the sounds of things. ntfs_inode->attr_list_rl - This is the runlist for the attribute list attribute of a base inode, i.e. it does not exist for index inodes or attribute inodes, it only exists for regular files and directories (and once we implement symlinks, sockets, fifos and device special files for those, too). Further it only exists for inodes which have extent inodes and those are definitely in the minority but that is irrelevant as we have to support it none the less. The attribute list attribute is a linear list or better perhaps to describe it as an array of variable size records, and each record describes and attribute belonging to the inode, which part of the attribute this attribute extent describes and where (i.e. in which mft record) this attribute extent belongs. As such the attribute list attribute is only ever used when the mft record is mapped and an attribute is being looked up and also when a new attribute is created or an attribute is deleted or an attribute is truncated in which case the attribute list must be updated. In all those cases the mft record is mapped. Thus yes in this case the locking is inverted but that is because ntfs_inode->attr_list_rl->lock will always be taken with the mft record lock held. I think the lock validator has the problem of not knowing that there are two different types of runlist which is why it complains about it. Another thing to note is that load_attribute_list() is only called at iget() time thus we have exclusive access to the inode so it is a special case in any case. And at present the driver in the kernel does not ever modify the attribute list attribute (and hence does not modify its runlist either) thus the only time the attr_list_rl is used is at load_attribute_list() time. Something I am noticing now is that given that the attribute list runlist is only ever accessed under the mft record lock it is in fact pointless to lock it at all as the mft record lock is an exclusive lock. The reason I never noticed this before is that I started off having the mft record lock being a read-write lock so multiple processes could be read-accessing the mft record so I wanted to keep the attribute list locked as well. I have to look through the code to make sure but I think we could simply not take the runlist lock when accessing the attr_list_rl and then the problem would go away. The only thing is that when we start modifying the runlist attr_list_rl we will use functions that are common for the two runlist code paths and they may take the lock. Not a huge problem as they can always use the _nolock version. So I suggest if it is easy, to teach the validator to not complain about this as that is the order the locking is meant to happen and it is not circular because it is two different types of runlist being locked. That gives me the possibility of later on turning the mft record lock back into a read-write lock to improve parallelism when reading data on ntfs... If that is hard then you or I (or both) need to investigate whether the current driver can really simply not take the runlist lock when accessing attr_list_rl i.e. we need to make sure it always happens under the protection of the mft record lock and then the problem will go away. Best regards, Anton > ======================================================= > [ INFO: possible circular locking dependency detected ] > ------------------------------------------------------- > ls/2581 is trying to acquire lock: > (&rl->lock){----}, at: [<c01c1f5b>] load_attribute_list+0xfb/0x3c0 > > but task is already holding lock: > (&ni->mrec_lock){--..}, at: [<c01d50c5>] map_mft_record_type+0x55/0x2d0 > > which lock already depends on the new lock, > which could lead to circular locking dependencies. > > the existing dependency chain (in reverse order) is: > > -> #1 (&ni->mrec_lock){--..}: > [<c01394df>] lock_acquire+0x6f/0x90 > [<c0346183>] mutex_lock_nested+0x73/0x2a0 > [<c01d5e43>] map_mft_record+0x53/0x2c0 > [<c01c54f8>] ntfs_map_runlist_nolock+0x3d8/0x530 > [<c01c5bc1>] ntfs_map_runlist+0x41/0x70 > [<c01c1929>] ntfs_readpage+0x8c9/0x9b0 > [<c0142ffc>] read_cache_page+0xac/0x150 > [<c01e212d>] ntfs_statfs+0x41d/0x660 > [<c0163254>] vfs_statfs+0x54/0x70 > [<c0163288>] vfs_statfs64+0x18/0x30 > [<c0163384>] sys_statfs64+0x64/0xa0 > [<c0347dcd>] sysenter_past_esp+0x56/0x8d > > -> #0 (&rl->lock){----}: > [<c01394df>] lock_acquire+0x6f/0x90 > [<c0134c8a>] down_read_nested+0x2a/0x40 > [<c01c1f5b>] load_attribute_list+0xfb/0x3c0 > [<c01d323e>] ntfs_read_locked_inode+0xcee/0x15d0 > [<c01d4735>] ntfs_iget+0x55/0x80 > [<c01db3da>] ntfs_lookup+0x14a/0x740 > [<c01736b6>] do_lookup+0x126/0x150 > [<c0173ef3>] __link_path_walk+0x813/0xe50 > [<c017457c>] link_path_walk+0x4c/0xf0 > [<c0174a2d>] do_path_lookup+0xad/0x260 > [<c0175228>] __user_walk_fd+0x38/0x60 > [<c016e3be>] vfs_lstat_fd+0x1e/0x50 > [<c016e401>] vfs_lstat+0x11/0x20 > [<c016ec04>] sys_lstat64+0x14/0x30 > [<c0347dcd>] sysenter_past_esp+0x56/0x8d > > other info that might help us debug this: > > 2 locks held by ls/2581: > #0: (&inode->i_mutex){--..}, at: [<c0346108>] mutex_lock+0x8/0x10 > #1: (&ni->mrec_lock){--..}, at: [<c01d50c5>] map_mft_record_type+0x55/0x2d0 > > stack backtrace: > [<c0104bf2>] show_trace+0x12/0x20 > [<c0104c19>] dump_stack+0x19/0x20 > [<c0136ef1>] print_circular_bug_tail+0x61/0x70 > [<c01389ff>] __lock_acquire+0x74f/0xde0 > [<c01394df>] lock_acquire+0x6f/0x90 > [<c0134c8a>] down_read_nested+0x2a/0x40 > [<c01c1f5b>] load_attribute_list+0xfb/0x3c0 > [<c01d323e>] ntfs_read_locked_inode+0xcee/0x15d0 > [<c01d4735>] ntfs_iget+0x55/0x80 > [<c01db3da>] ntfs_lookup+0x14a/0x740 > [<c01736b6>] do_lookup+0x126/0x150 > [<c0173ef3>] __link_path_walk+0x813/0xe50 > [<c017457c>] link_path_walk+0x4c/0xf0 > [<c0174a2d>] do_path_lookup+0xad/0x260 > [<c0175228>] __user_walk_fd+0x38/0x60 > [<c016e3be>] vfs_lstat_fd+0x1e/0x50 > [<c016e401>] vfs_lstat+0x11/0x20 > [<c016ec04>] sys_lstat64+0x14/0x30 > [<c0347dcd>] sysenter_past_esp+0x56/0x8d Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-10 8:48 ` Anton Altaparmakov @ 2006-06-11 5:31 ` Ingo Molnar 2006-06-11 7:10 ` Anton Altaparmakov 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2006-06-11 5:31 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > I think the lock validator has the problem of not knowing that there > are two different types of runlist which is why it complains about it. ah, ok! What happened is that the rwlock_init() 'lock type keys' (inlined via ntfs_init_runlist()) for the two runlists were 'merged': ntfs_init_runlist(&ni->runlist); ntfs_init_runlist(&ni->attr_list_rl); i have annotated things by initializing the two locks separately (via a simple oneliner change), and this has solved the problem. The two types are now properly 'split', and the validator tracks them separately and understands their separate roles. So there's no need to touch attribute runlist locking in the NTFS code. Some background: the validator uses lock initialization as a hint about which locks share the same 'type' (or locking domain). Currently this is done via: #define init_rwsem(sem) \ do { \ static struct lockdep_type_key __key; \ \ __init_rwsem((sem), #sem, &__key); \ } while (0) But since ntfs_init_runlist() got inlined to within the same function, the __key there got shared. A better method might be to use the return address in __init_rwsem() [and i used that in earlier versions of the validator] - but even then there's no guarantee that this code will always be inlined. In any case, this is known to be a heuristics, it is totally valid to initialize locks in arbitrary manner, and the validator only tries to guess it right in 99.9% of the cases. In cases where the validator incorrectly merged (or split) lock types [such as in this case], the problem can be found easily - and the annotation is easy as well. The good news is that after this fix things went pretty well for readonly stuff and i got no new complaints from the validator. Phew! :-) It does not fully cover read-write mode yet. When extending an existing file the validator did not understand the following locking construct: ======================================================= [ INFO: possible circular locking dependency detected ] ------------------------------------------------------- cat/2802 is trying to acquire lock: (&vol->lcnbmp_lock){----}, at: [<c01e80dd>] ntfs_cluster_alloc+0x10d/0x23a0 but task is already holding lock: (&ni->mrec_lock){--..}, at: [<c01d5e53>] map_mft_record+0x53/0x2c0 which lock already depends on the new lock, which could lead to circular dependencies. the existing dependency chain (in reverse order) is: -> #2 (&ni->mrec_lock){--..}: [<c01394df>] lock_acquire+0x6f/0x90 [<c0346193>] mutex_lock_nested+0x73/0x2a0 [<c01d5e53>] map_mft_record+0x53/0x2c0 [<c01c54f8>] ntfs_map_runlist_nolock+0x3d8/0x530 [<c01c5bc1>] ntfs_map_runlist+0x41/0x70 [<c01c1929>] ntfs_readpage+0x8c9/0x9b0 [<c0142ffc>] read_cache_page+0xac/0x150 [<c01e213d>] ntfs_statfs+0x41d/0x660 [<c0163254>] vfs_statfs+0x54/0x70 [<c0163288>] vfs_statfs64+0x18/0x30 [<c0163384>] sys_statfs64+0x64/0xa0 [<c0347ddd>] sysenter_past_esp+0x56/0x8d -> #1 (&rl->lock){----}: [<c01394df>] lock_acquire+0x6f/0x90 [<c0134c8a>] down_read_nested+0x2a/0x40 [<c01c18a4>] ntfs_readpage+0x844/0x9b0 [<c0142ffc>] read_cache_page+0xac/0x150 [<c01e213d>] ntfs_statfs+0x41d/0x660 [<c0163254>] vfs_statfs+0x54/0x70 [<c0163288>] vfs_statfs64+0x18/0x30 [<c0163384>] sys_statfs64+0x64/0xa0 [<c0347ddd>] sysenter_past_esp+0x56/0x8d -> #0 (&vol->lcnbmp_lock){----}: [<c01394df>] lock_acquire+0x6f/0x90 [<c0134ccc>] down_write+0x2c/0x50 [<c01e80dd>] ntfs_cluster_alloc+0x10d/0x23a0 [<c01c427d>] ntfs_attr_extend_allocation+0x5fd/0x14a0 [<c01caa38>] ntfs_file_buffered_write+0x188/0x3880 [<c01ce2a8>] ntfs_file_aio_write_nolock+0x178/0x210 [<c01ce3f1>] ntfs_file_writev+0xb1/0x150 [<c01ce4af>] ntfs_file_write+0x1f/0x30 [<c0164f09>] vfs_write+0x99/0x160 [<c016589d>] sys_write+0x3d/0x70 [<c0347ddd>] sysenter_past_esp+0x56/0x8d other info that might help us debug this: 3 locks held by cat/2802: #0: (&inode->i_mutex){--..}, at: [<c0346118>] mutex_lock+0x8/0x10 #1: (&rl->lock){----}, at: [<c01c3dbe>] ntfs_attr_extend_allocation+0x13e/0x14a0 #2: (&ni->mrec_lock){--..}, at: [<c01d5e53>] map_mft_record+0x53/0x2c0 stack backtrace: [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0136ef1>] print_circular_bug_tail+0x61/0x70 [<c01389ff>] __lock_acquire+0x74f/0xde0 [<c01394df>] lock_acquire+0x6f/0x90 [<c0134ccc>] down_write+0x2c/0x50 [<c01e80dd>] ntfs_cluster_alloc+0x10d/0x23a0 [<c01c427d>] ntfs_attr_extend_allocation+0x5fd/0x14a0 [<c01caa38>] ntfs_file_buffered_write+0x188/0x3880 [<c01ce2a8>] ntfs_file_aio_write_nolock+0x178/0x210 [<c01ce3f1>] ntfs_file_writev+0xb1/0x150 [<c01ce4af>] ntfs_file_write+0x1f/0x30 [<c0164f09>] vfs_write+0x99/0x160 [<c016589d>] sys_write+0x3d/0x70 [<c0347ddd>] sysenter_past_esp+0x56/0x8d this seems to be a pretty complex 3-way dependency related to &vol->lcnbmp_lock and &ni->mrec_lock. Should i send a full dependency events trace perhaps? Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-11 5:31 ` Ingo Molnar @ 2006-06-11 7:10 ` Anton Altaparmakov 2006-06-12 8:31 ` Ingo Molnar 0 siblings, 1 reply; 21+ messages in thread From: Anton Altaparmakov @ 2006-06-11 7:10 UTC (permalink / raw) To: Ingo Molnar; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven On Sun, 11 Jun 2006, Ingo Molnar wrote: > * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > > I think the lock validator has the problem of not knowing that there > > are two different types of runlist which is why it complains about it. > > ah, ok! What happened is that the rwlock_init() 'lock type keys' > (inlined via ntfs_init_runlist()) for the two runlists were 'merged': > > ntfs_init_runlist(&ni->runlist); > ntfs_init_runlist(&ni->attr_list_rl); > > i have annotated things by initializing the two locks separately (via a > simple oneliner change), and this has solved the problem. > > The two types are now properly 'split', and the validator tracks them > separately and understands their separate roles. So there's no need to > touch attribute runlist locking in the NTFS code. Great! > Some background: the validator uses lock initialization as a hint about Thanks for explaining. > The good news is that after this fix things went pretty well for > readonly stuff and i got no new complaints from the validator. Phew! :-) Great! > It does not fully cover read-write mode yet. When extending an existing > file the validator did not understand the following locking construct: > > ======================================================= > [ INFO: possible circular locking dependency detected ] > ------------------------------------------------------- > cat/2802 is trying to acquire lock: > (&vol->lcnbmp_lock){----}, at: [<c01e80dd>] ntfs_cluster_alloc+0x10d/0x23a0 > > but task is already holding lock: > (&ni->mrec_lock){--..}, at: [<c01d5e53>] map_mft_record+0x53/0x2c0 > > which lock already depends on the new lock, > which could lead to circular dependencies. > > the existing dependency chain (in reverse order) is: > > -> #2 (&ni->mrec_lock){--..}: > [<c01394df>] lock_acquire+0x6f/0x90 > [<c0346193>] mutex_lock_nested+0x73/0x2a0 > [<c01d5e53>] map_mft_record+0x53/0x2c0 > [<c01c54f8>] ntfs_map_runlist_nolock+0x3d8/0x530 > [<c01c5bc1>] ntfs_map_runlist+0x41/0x70 > [<c01c1929>] ntfs_readpage+0x8c9/0x9b0 > [<c0142ffc>] read_cache_page+0xac/0x150 > [<c01e213d>] ntfs_statfs+0x41d/0x660 > [<c0163254>] vfs_statfs+0x54/0x70 > [<c0163288>] vfs_statfs64+0x18/0x30 > [<c0163384>] sys_statfs64+0x64/0xa0 > [<c0347ddd>] sysenter_past_esp+0x56/0x8d > > -> #1 (&rl->lock){----}: > [<c01394df>] lock_acquire+0x6f/0x90 > [<c0134c8a>] down_read_nested+0x2a/0x40 > [<c01c18a4>] ntfs_readpage+0x844/0x9b0 > [<c0142ffc>] read_cache_page+0xac/0x150 > [<c01e213d>] ntfs_statfs+0x41d/0x660 > [<c0163254>] vfs_statfs+0x54/0x70 > [<c0163288>] vfs_statfs64+0x18/0x30 > [<c0163384>] sys_statfs64+0x64/0xa0 > [<c0347ddd>] sysenter_past_esp+0x56/0x8d > > -> #0 (&vol->lcnbmp_lock){----}: > [<c01394df>] lock_acquire+0x6f/0x90 > [<c0134ccc>] down_write+0x2c/0x50 > [<c01e80dd>] ntfs_cluster_alloc+0x10d/0x23a0 > [<c01c427d>] ntfs_attr_extend_allocation+0x5fd/0x14a0 > [<c01caa38>] ntfs_file_buffered_write+0x188/0x3880 > [<c01ce2a8>] ntfs_file_aio_write_nolock+0x178/0x210 > [<c01ce3f1>] ntfs_file_writev+0xb1/0x150 > [<c01ce4af>] ntfs_file_write+0x1f/0x30 > [<c0164f09>] vfs_write+0x99/0x160 > [<c016589d>] sys_write+0x3d/0x70 > [<c0347ddd>] sysenter_past_esp+0x56/0x8d > > other info that might help us debug this: > > 3 locks held by cat/2802: > #0: (&inode->i_mutex){--..}, at: [<c0346118>] mutex_lock+0x8/0x10 > #1: (&rl->lock){----}, at: [<c01c3dbe>] ntfs_attr_extend_allocation+0x13e/0x14a0 > #2: (&ni->mrec_lock){--..}, at: [<c01d5e53>] map_mft_record+0x53/0x2c0 > > stack backtrace: > [<c0104bf2>] show_trace+0x12/0x20 > [<c0104c19>] dump_stack+0x19/0x20 > [<c0136ef1>] print_circular_bug_tail+0x61/0x70 > [<c01389ff>] __lock_acquire+0x74f/0xde0 > [<c01394df>] lock_acquire+0x6f/0x90 > [<c0134ccc>] down_write+0x2c/0x50 > [<c01e80dd>] ntfs_cluster_alloc+0x10d/0x23a0 > [<c01c427d>] ntfs_attr_extend_allocation+0x5fd/0x14a0 > [<c01caa38>] ntfs_file_buffered_write+0x188/0x3880 > [<c01ce2a8>] ntfs_file_aio_write_nolock+0x178/0x210 > [<c01ce3f1>] ntfs_file_writev+0xb1/0x150 > [<c01ce4af>] ntfs_file_write+0x1f/0x30 > [<c0164f09>] vfs_write+0x99/0x160 > [<c016589d>] sys_write+0x3d/0x70 > [<c0347ddd>] sysenter_past_esp+0x56/0x8d > > this seems to be a pretty complex 3-way dependency related to > &vol->lcnbmp_lock and &ni->mrec_lock. Should i send a full dependency > events trace perhaps? First an explanation of two relevant locks that are both going to upset the validator. I half expected this to happen given what has happened so far. The two locks are lcnbmp_lock and mftbmp_lock (both are r/w semaphores). Both those locks lock the two big bitmaps on an NTFS volume: lcnbmp_lock locks accesses to the cluster bitmap (i.e. the physical blocks bitmap where each 0 bit means an unallocated block and a 1 bit means an allocated block). mftbmp_lock locks accesses to the mft bitmap (i.e. the inodes bitmap where each 0 bit means an unused inode and a 1 bit means the inode is in use). The locks are taken when a cluster is being allocated / an mft record is being allocated. The reason the validator will likely get very confused is that they introduce reverse semantics depending on what you are locking, i.e. When extending a file the locking order is: Lock runlist (data one) of file to be extended for writing (as it will be modified by the file extension). Lock lcnbmp_lock for writing (as the bitmap will be modified by the allocated clusters). Lock runlist of $Bitmap system file (the contents of which contain the bitmap) for reading so the on-disk physical location can be determined. If the above fails then re-lock the runlist of $Bitmap for writing and lock mft record of $Bitmap system file and load in the runlist of the bitmap, the unlock the mft record, unlock the runlist of $Bitmap. Load the $Bitmap data and allocate clusters in the bitmap. Unlock lcnbmp_lock. Update the runlist of the file being extended with the newly allocated clusters. Lock the mft record of the file being extended and update the mft record with the new allocations. Unlock the mft record and the runlist of the file that was extended. So yes lock dependencies are inverted the validator complains. But they are inverted in a special way that is safe. Is the above description sufficient for you to fix it? The mftbmp_lock works simillarly as above but instead of allocating clusters we are allocating an mft record, the locking sequence is the same though just substitute s/lcnbmp_lock/mftbmp_lock/ and s/cluster allocation/mft record allocation/. Extending the mft in turn can require the mft itself to be extended which will take the lcnbmp_lock, thus mftbmp_lock nests outside the lcnbmp_lock and that should never be violated so the validator should be at least happy with that bit. (-; Sorry ntfs is that locking evil! Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-11 7:10 ` Anton Altaparmakov @ 2006-06-12 8:31 ` Ingo Molnar 2006-06-12 8:47 ` Anton Altaparmakov 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2006-06-12 8:31 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > First an explanation of two relevant locks that are both going to > upset the validator. I half expected this to happen given what has > happened so far. The two locks are lcnbmp_lock and mftbmp_lock (both > are r/w semaphores). thanks! > Is the above description sufficient for you to fix it? yeah. I have split off vol->lcnbmp_ino's locking rules (for mrec_lock and runlist.lock) from normal inode locking rules, and this fixed the file-writing dependency. but i can still trigger a warning, and i think this time it's a real bug: if i mount NTFS with -o show_system_files, and i append data to the $Bitmap, then i get the dependency conflict attached further below. While extending the $Bitmap manually is extremely evil, the filesystem should nevertheless not break - for example a script could do it by accident. I believe NTFS should either disallow writing to the $Bitmap (by forcing it to be readonly under all circumstances), or the writing should be made safe - right now if that happens in parallel to some other process extending an NTFS file then i think we could deadlock, right? Ingo ======================================================= [ INFO: possible circular locking dependency detected ] ------------------------------------------------------- cat/2532 is trying to acquire lock: (&vol->lcnbmp_lock){----}, at: [<c01e809d>] ntfs_cluster_alloc+0x10d/0x23a0 but task is already holding lock: (lcnbmp_mrec_lock){--..}, at: [<c01d5dc3>] map_mft_record+0x53/0x2c0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (lcnbmp_mrec_lock){--..}: [<c013948f>] lock_acquire+0x6f/0x90 [<c0346163>] mutex_lock_nested+0x73/0x2a0 [<c01d5dc3>] map_mft_record+0x53/0x2c0 [<c01c5498>] ntfs_map_runlist_nolock+0x3d8/0x530 [<c01c5b61>] ntfs_map_runlist+0x41/0x70 [<c01c18d1>] ntfs_readpage+0x8c1/0x9a0 [<c0142fac>] read_cache_page+0xac/0x150 [<c01e20ad>] ntfs_statfs+0x41d/0x660 [<c0163204>] vfs_statfs+0x54/0x70 [<c0163238>] vfs_statfs64+0x18/0x30 [<c0163334>] sys_statfs64+0x64/0xa0 [<c0347dad>] sysenter_past_esp+0x56/0x8d -> #1 (lcnbmp_runlist_lock){----}: [<c013948f>] lock_acquire+0x6f/0x90 [<c0134c4c>] down_read+0x2c/0x40 [<c01c184c>] ntfs_readpage+0x83c/0x9a0 [<c0142fac>] read_cache_page+0xac/0x150 [<c01e20ad>] ntfs_statfs+0x41d/0x660 [<c0163204>] vfs_statfs+0x54/0x70 [<c0163238>] vfs_statfs64+0x18/0x30 [<c0163334>] sys_statfs64+0x64/0xa0 [<c0347dad>] sysenter_past_esp+0x56/0x8d -> #0 (&vol->lcnbmp_lock){----}: [<c013948f>] lock_acquire+0x6f/0x90 [<c0134ccc>] down_write+0x2c/0x50 [<c01e809d>] ntfs_cluster_alloc+0x10d/0x23a0 [<c01c421d>] ntfs_attr_extend_allocation+0x5fd/0x14a0 [<c01ca9d8>] ntfs_file_buffered_write+0x188/0x3880 [<c01ce248>] ntfs_file_aio_write_nolock+0x178/0x210 [<c01ce391>] ntfs_file_writev+0xb1/0x150 [<c01ce44f>] ntfs_file_write+0x1f/0x30 [<c0164eb9>] vfs_write+0x99/0x160 [<c016584d>] sys_write+0x3d/0x70 [<c0347dad>] sysenter_past_esp+0x56/0x8d other info that might help us debug this: 3 locks held by cat/2532: #0: (&inode->i_mutex){--..}, at: [<c03460e8>] mutex_lock+0x8/0x10 #1: (lcnbmp_runlist_lock){----}, at: [<c01c3d5e>] ntfs_attr_extend_allocation+0x13e/0x14a0 #2: (lcnbmp_mrec_lock){--..}, at: [<c01d5dc3>] map_mft_record+0x53/0x2c0 stack backtrace: [<c0104bf2>] show_trace+0x12/0x20 [<c0104c19>] dump_stack+0x19/0x20 [<c0136f11>] print_circular_bug_tail+0x61/0x70 [<c01389af>] __lock_acquire+0x6df/0xd70 [<c013948f>] lock_acquire+0x6f/0x90 [<c0134ccc>] down_write+0x2c/0x50 [<c01e809d>] ntfs_cluster_alloc+0x10d/0x23a0 [<c01c421d>] ntfs_attr_extend_allocation+0x5fd/0x14a0 [<c01ca9d8>] ntfs_file_buffered_write+0x188/0x3880 [<c01ce248>] ntfs_file_aio_write_nolock+0x178/0x210 [<c01ce391>] ntfs_file_writev+0xb1/0x150 [<c01ce44f>] ntfs_file_write+0x1f/0x30 [<c0164eb9>] vfs_write+0x99/0x160 [<c016584d>] sys_write+0x3d/0x70 [<c0347dad>] sysenter_past_esp+0x56/0x8d ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-12 8:31 ` Ingo Molnar @ 2006-06-12 8:47 ` Anton Altaparmakov 2006-06-12 9:40 ` Ingo Molnar 0 siblings, 1 reply; 21+ messages in thread From: Anton Altaparmakov @ 2006-06-12 8:47 UTC (permalink / raw) To: Ingo Molnar; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven On Mon, 2006-06-12 at 10:31 +0200, Ingo Molnar wrote: > * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > > First an explanation of two relevant locks that are both going to > > upset the validator. I half expected this to happen given what has > > happened so far. The two locks are lcnbmp_lock and mftbmp_lock (both > > are r/w semaphores). > > thanks! > > > Is the above description sufficient for you to fix it? > > yeah. I have split off vol->lcnbmp_ino's locking rules (for mrec_lock > and runlist.lock) from normal inode locking rules, and this fixed the > file-writing dependency. > > but i can still trigger a warning, and i think this time it's a real > bug: if i mount NTFS with -o show_system_files, and i append data to the > $Bitmap, then i get the dependency conflict attached further below. Not surprising. (-: Writing to metadata from user space is definitely not allowed... > While extending the $Bitmap manually is extremely evil, the filesystem > should nevertheless not break - for example a script could do it by > accident. I believe NTFS should either disallow writing to the $Bitmap > (by forcing it to be readonly under all circumstances), or the writing > should be made safe - right now if that happens in parallel to some > other process extending an NTFS file then i think we could deadlock, > right? Absolutely correct. Writing to metadata files from userspace is guaranteed to break things. I don't think a script could do anything bad because it would never see that a file called $Bitmap exists (-o show_sys_files is only for debugging/recovery purposes not for general use). I never cared for fixing this case because writing to any of those files is almost guaranteed to corrupt the file system to the extent of Windows dieing with a blue screen of death on boot and the only option at that point will be to reformat and reinstall (at least until we write an ntfsck utility)... But yes, we could just make all system files read-only. That would be a simple enough fix, especially for the ones we keep open all the time during a mount as they do not even have to affect the iget() code path. For other system inodes we would have to stick the read-only force in iget() (or open them at mount time and keep them open). I will do that when I have some more spare time... I don't think it is particularly urgent given it has always been like this and if anyone is trying to kill their file system even fixing this will not stop them... They would just need to run the ntfsprogs provided utilities to achieve anything they like... I have always followed the principle of giving people enough rope to hang themselves if they really want to... (-; But I agree we should probably make the system files read-only. Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-12 8:47 ` Anton Altaparmakov @ 2006-06-12 9:40 ` Ingo Molnar 2006-06-12 10:24 ` Anton Altaparmakov 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2006-06-12 9:40 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > I will do that when I have some more spare time... I don't think it > is particularly urgent given it has always been like this and if > anyone is trying to kill their file system even fixing this will not > stop them... They would just need to run the ntfsprogs provided > utilities to achieve anything they like... [...] i agree that it's not urgent, but i think we have a small disagreement wrt. the philosophy behind that conclusion :-) manipulating a device file (i.e. running ntfsprogs) is a known "hang yourself" thing that people are and must be aware of - but messing around within the VFS namespace of a filesystem itself ought to be safe, even if the file can only be accessed via a special mount option and if it's called $Bitmap. The mount option doesnt really say that it's unsafe: show_sys_files If show_sys_files is specified, show the system files in direc- tory listings. Otherwise the default behaviour is to hide the system files. Note that even when show_sys_files is specified, "$MFT" may will not be visible due to bugs/mis-features in glibc. Further, note that irrespective of show_sys_files, all files are accessible by name, i.e. you can always do "ls -l '$UpCase'" for example to specifically show the system file con- taining the Unicode upcase table. and the dangers of direct namespace access are apparently anticipated in the permissions of the $Mft itself: [root@neptune mnt2]# ls -l \$Mft ---------- 1 root root 69632 Jan 1 1970 $Mft and finally, lets call this a real bug in NTFS: we both worked hard to cover it via the lock validator =B-) in any case, the message will only trigger if someone abuses one of the system files, so there is no urgency, and that will also serve as a reminder. below i've attached a cleaned up version of my NTFS annotations so far. I think they are pretty straightforward and nonintrusive - for !CONFIG_LOCKDEP they add zero changes to ntfs.ko. Most of the linecount of the patch comes from comments - so i think we might even consider this a "document locking rules" patch :-) Ingo --------------------------- Subject: lock validator: annotate NTFS locking rules From: Ingo Molnar <mingo@elte.hu> NTFS uses lots of type-opaque objects which acquire their true identity runtime - so the lock validator needs to be helped in a couple of places to figure out object types. Many thanks to Anton Altaparmakov for giving lots of explanations about NTFS locking rules. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- fs/ntfs/inode.c | 33 +++++++++++++++++++++++++++++++++ fs/ntfs/mft.c | 2 +- fs/ntfs/super.c | 31 +++++++++++++++++++++++++++++++ 3 files changed, 65 insertions(+), 1 deletion(-) Index: linux/fs/ntfs/inode.c =================================================================== --- linux.orig/fs/ntfs/inode.c +++ linux/fs/ntfs/inode.c @@ -367,6 +367,12 @@ static void ntfs_destroy_extent_inode(nt kmem_cache_free(ntfs_inode_cache, ni); } +/* + * The attribute runlist lock has separate locking rules from the + * normal runlist lock, so split the two lock-types: + */ +static struct lockdep_type_key attr_list_rl_lock_type; + /** * __ntfs_init_inode - initialize ntfs specific part of an inode * @sb: super block of mounted volume @@ -394,6 +400,8 @@ void __ntfs_init_inode(struct super_bloc ni->attr_list_size = 0; ni->attr_list = NULL; ntfs_init_runlist(&ni->attr_list_rl); + lockdep_reinit_key(&ni->attr_list_rl.lock, + &attr_list_rl_lock_type); ni->itype.index.bmp_ino = NULL; ni->itype.index.block_size = 0; ni->itype.index.vcn_size = 0; @@ -405,6 +413,13 @@ void __ntfs_init_inode(struct super_bloc ni->ext.base_ntfs_ino = NULL; } +/* + * Extent inodes get MFT-mapped in a nested way, while the base inode + * is still mapped. Teach this nesting to the lock validator by creating + * a separate type for nested inode's mrec_lock's: + */ +static struct lockdep_type_key extent_inode_mrec_lock_key; + inline ntfs_inode *ntfs_new_extent_inode(struct super_block *sb, unsigned long mft_no) { @@ -413,6 +428,7 @@ inline ntfs_inode *ntfs_new_extent_inode ntfs_debug("Entering."); if (likely(ni != NULL)) { __ntfs_init_inode(sb, ni); + lockdep_reinit_key(&ni->mrec_lock, &extent_inode_mrec_lock_key); ni->mft_no = mft_no; ni->type = AT_UNUSED; ni->name = NULL; @@ -1722,6 +1738,15 @@ err_out: return err; } +/* + * The MFT inode has special locking, so teach the lock validator + * about this by splitting off the locking rules of the MFT from + * the locking rules of other inodes. The MFT inode can never be + * accessed from the VFS side (or even internally), only by the + * map_mft functions. + */ +static struct lockdep_type_key mft_ni_runlist_lock_key, mft_ni_mrec_lock_key; + /** * ntfs_read_inode_mount - special read_inode for mount time use only * @vi: inode to read @@ -2148,6 +2173,14 @@ int ntfs_read_inode_mount(struct inode * ntfs_attr_put_search_ctx(ctx); ntfs_debug("Done."); ntfs_free(m); + + /* + * Split the locking rules of the MFT inode from the + * locking rules of other inodes: + */ + lockdep_reinit_key(&ni->runlist.lock, &mft_ni_runlist_lock_key); + lockdep_reinit_key(&ni->mrec_lock, &mft_ni_mrec_lock_key); + return 0; em_put_err_out: Index: linux/fs/ntfs/mft.c =================================================================== --- linux.orig/fs/ntfs/mft.c +++ linux/fs/ntfs/mft.c @@ -348,7 +348,7 @@ map_err_out: base_ni->ext.extent_ntfs_inos = tmp; } base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] = ni; - mutex_unlock(&base_ni->extent_lock); + mutex_unlock_non_nested(&base_ni->extent_lock); atomic_dec(&base_ni->count); ntfs_debug("Done 2."); *ntfs_ino = ni; Index: linux/fs/ntfs/super.c =================================================================== --- linux.orig/fs/ntfs/super.c +++ linux/fs/ntfs/super.c @@ -1724,6 +1724,14 @@ upcase_failed: return FALSE; } +/* + * The lcn and mft bitmap inodes are NTFS-internal inodes with + * their own special locking rules: + */ +static struct lockdep_type_key + lcnbmp_runlist_lock_key, lcnbmp_mrec_lock_key, + mftbmp_runlist_lock_key, mftbmp_mrec_lock_key; + /** * load_system_files - open the system files using normal functions * @vol: ntfs super block describing device whose system files to load @@ -1780,6 +1788,10 @@ static BOOL load_system_files(ntfs_volum ntfs_error(sb, "Failed to load $MFT/$BITMAP attribute."); goto iput_mirr_err_out; } + lockdep_reinit_key(&NTFS_I(vol->mftbmp_ino)->runlist.lock, + &mftbmp_runlist_lock_key); + lockdep_reinit_key(&NTFS_I(vol->mftbmp_ino)->mrec_lock, + &mftbmp_mrec_lock_key); /* Read upcase table and setup @vol->upcase and @vol->upcase_len. */ if (!load_and_init_upcase(vol)) goto iput_mftbmp_err_out; @@ -1802,6 +1814,11 @@ static BOOL load_system_files(ntfs_volum iput(vol->lcnbmp_ino); goto bitmap_failed; } + lockdep_reinit_key(&NTFS_I(vol->lcnbmp_ino)->runlist.lock, + &lcnbmp_runlist_lock_key); + lockdep_reinit_key(&NTFS_I(vol->lcnbmp_ino)->mrec_lock, + &lcnbmp_mrec_lock_key); + NInoSetSparseDisabled(NTFS_I(vol->lcnbmp_ino)); if ((vol->nr_clusters + 7) >> 3 > i_size_read(vol->lcnbmp_ino)) { iput(vol->lcnbmp_ino); @@ -2742,6 +2759,17 @@ static int ntfs_fill_super(struct super_ struct inode *tmp_ino; int blocksize, result; + /* + * We do a pretty difficult piece of bootstrap by reading the + * MFT (and other metadata) from disk into memory. We'll only + * release this metadata during umount, so the locking patterns + * observed during bootstrap do not count. So turn off the + * observation of locking patterns (strictly for this context + * only) while mounting NTFS. [The validator is still active + * otherwise, even for this context: it will for example record + * lock type registrations.] + */ + lockdep_off(); ntfs_debug("Entering."); #ifndef NTFS_RW sb->s_flags |= MS_RDONLY; @@ -2753,6 +2781,7 @@ static int ntfs_fill_super(struct super_ if (!silent) ntfs_error(sb, "Allocation of NTFS volume structure " "failed. Aborting mount..."); + lockdep_on(); return -ENOMEM; } /* Initialize ntfs_volume structure. */ @@ -2939,6 +2968,7 @@ static int ntfs_fill_super(struct super_ mutex_unlock(&ntfs_lock); sb->s_export_op = &ntfs_export_ops; lock_kernel(); + lockdep_on(); return 0; } ntfs_error(sb, "Failed to allocate root directory."); @@ -3058,6 +3088,7 @@ err_out_now: sb->s_fs_info = NULL; kfree(vol); ntfs_debug("Failed, returning -EINVAL."); + lockdep_on(); return -EINVAL; } ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-12 9:40 ` Ingo Molnar @ 2006-06-12 10:24 ` Anton Altaparmakov 2006-06-12 10:56 ` Ingo Molnar 0 siblings, 1 reply; 21+ messages in thread From: Anton Altaparmakov @ 2006-06-12 10:24 UTC (permalink / raw) To: Ingo Molnar; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven On Mon, 2006-06-12 at 11:40 +0200, Ingo Molnar wrote: > * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > > I will do that when I have some more spare time... I don't think it > > is particularly urgent given it has always been like this and if > > anyone is trying to kill their file system even fixing this will not > > stop them... They would just need to run the ntfsprogs provided > > utilities to achieve anything they like... [...] > > i agree that it's not urgent, but i think we have a small disagreement > wrt. the philosophy behind that conclusion :-) > > manipulating a device file (i.e. running ntfsprogs) is a known "hang > yourself" thing that people are and must be aware of - but messing > around within the VFS namespace of a filesystem itself ought to be safe, > even if the file can only be accessed via a special mount option and if > it's called $Bitmap. The mount option doesnt really say that it's > unsafe: > > show_sys_files > If show_sys_files is specified, show the system files in direc- > tory listings. Otherwise the default behaviour is to hide the > system files. Note that even when show_sys_files is specified, > "$MFT" may will not be visible due to bugs/mis-features in > glibc. Further, note that irrespective of show_sys_files, all > files are accessible by name, i.e. you can always do "ls -l > '$UpCase'" for example to specifically show the system file con- > taining the Unicode upcase table. > > and the dangers of direct namespace access are apparently anticipated in > the permissions of the $Mft itself: That is even more special due to the mft record locks and the fact that the mft records on disk are not the same as the ones in memory (due to the multi sector transfer protection fixups) so access through a file read or file write would almost certainly crash the ntfs driver in weird and strange ways hence why for mft and mftmirr not only do I force the pemissions to blank but also leave the file operations blank as well, i.e. the only operations mft and mftmirr have are the address space operations... > [root@neptune mnt2]# ls -l \$Mft > ---------- 1 root root 69632 Jan 1 1970 $Mft > > and finally, lets call this a real bug in NTFS: we both worked hard to > cover it via the lock validator =B-) Oh definitely! It is nothing to do with the lock validator. There is definitely nothing I expect you to do about the lock validator or those reports. They only happen if user does stupid thing. Serves them right. And if it happens by accident it is good warning that this is wrong/bad... > in any case, the message will only trigger if someone abuses one of the > system files, so there is no urgency, and that will also serve as a > reminder. Exactly. > below i've attached a cleaned up version of my NTFS annotations so far. > I think they are pretty straightforward and nonintrusive - for > !CONFIG_LOCKDEP they add zero changes to ntfs.ko. Most of the linecount > of the patch comes from comments - so i think we might even consider > this a "document locking rules" patch :-) Thanks, looks great! Can I leave it to you to keep it as part of the lock validator patches so it gets into -mm with them and later makes it into the stock kernel with them? If you want, feel free to add: Signed-off-by: Anton Altaparmakov <aia21@cantab.net> To the patch... Thanks a lot for workings all of this out! Best regards, Anton > > Ingo > > --------------------------- > Subject: lock validator: annotate NTFS locking rules > From: Ingo Molnar <mingo@elte.hu> > > NTFS uses lots of type-opaque objects which acquire their true > identity runtime - so the lock validator needs to be helped in > a couple of places to figure out object types. > > Many thanks to Anton Altaparmakov for giving lots of explanations > about NTFS locking rules. > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > --- > fs/ntfs/inode.c | 33 +++++++++++++++++++++++++++++++++ > fs/ntfs/mft.c | 2 +- > fs/ntfs/super.c | 31 +++++++++++++++++++++++++++++++ > 3 files changed, 65 insertions(+), 1 deletion(-) > > Index: linux/fs/ntfs/inode.c > =================================================================== > --- linux.orig/fs/ntfs/inode.c > +++ linux/fs/ntfs/inode.c > @@ -367,6 +367,12 @@ static void ntfs_destroy_extent_inode(nt > kmem_cache_free(ntfs_inode_cache, ni); > } > > +/* > + * The attribute runlist lock has separate locking rules from the > + * normal runlist lock, so split the two lock-types: > + */ > +static struct lockdep_type_key attr_list_rl_lock_type; > + > /** > * __ntfs_init_inode - initialize ntfs specific part of an inode > * @sb: super block of mounted volume > @@ -394,6 +400,8 @@ void __ntfs_init_inode(struct super_bloc > ni->attr_list_size = 0; > ni->attr_list = NULL; > ntfs_init_runlist(&ni->attr_list_rl); > + lockdep_reinit_key(&ni->attr_list_rl.lock, > + &attr_list_rl_lock_type); > ni->itype.index.bmp_ino = NULL; > ni->itype.index.block_size = 0; > ni->itype.index.vcn_size = 0; > @@ -405,6 +413,13 @@ void __ntfs_init_inode(struct super_bloc > ni->ext.base_ntfs_ino = NULL; > } > > +/* > + * Extent inodes get MFT-mapped in a nested way, while the base inode > + * is still mapped. Teach this nesting to the lock validator by creating > + * a separate type for nested inode's mrec_lock's: > + */ > +static struct lockdep_type_key extent_inode_mrec_lock_key; > + > inline ntfs_inode *ntfs_new_extent_inode(struct super_block *sb, > unsigned long mft_no) > { > @@ -413,6 +428,7 @@ inline ntfs_inode *ntfs_new_extent_inode > ntfs_debug("Entering."); > if (likely(ni != NULL)) { > __ntfs_init_inode(sb, ni); > + lockdep_reinit_key(&ni->mrec_lock, &extent_inode_mrec_lock_key); > ni->mft_no = mft_no; > ni->type = AT_UNUSED; > ni->name = NULL; > @@ -1722,6 +1738,15 @@ err_out: > return err; > } > > +/* > + * The MFT inode has special locking, so teach the lock validator > + * about this by splitting off the locking rules of the MFT from > + * the locking rules of other inodes. The MFT inode can never be > + * accessed from the VFS side (or even internally), only by the > + * map_mft functions. > + */ > +static struct lockdep_type_key mft_ni_runlist_lock_key, mft_ni_mrec_lock_key; > + > /** > * ntfs_read_inode_mount - special read_inode for mount time use only > * @vi: inode to read > @@ -2148,6 +2173,14 @@ int ntfs_read_inode_mount(struct inode * > ntfs_attr_put_search_ctx(ctx); > ntfs_debug("Done."); > ntfs_free(m); > + > + /* > + * Split the locking rules of the MFT inode from the > + * locking rules of other inodes: > + */ > + lockdep_reinit_key(&ni->runlist.lock, &mft_ni_runlist_lock_key); > + lockdep_reinit_key(&ni->mrec_lock, &mft_ni_mrec_lock_key); > + > return 0; > > em_put_err_out: > Index: linux/fs/ntfs/mft.c > =================================================================== > --- linux.orig/fs/ntfs/mft.c > +++ linux/fs/ntfs/mft.c > @@ -348,7 +348,7 @@ map_err_out: > base_ni->ext.extent_ntfs_inos = tmp; > } > base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] = ni; > - mutex_unlock(&base_ni->extent_lock); > + mutex_unlock_non_nested(&base_ni->extent_lock); > atomic_dec(&base_ni->count); > ntfs_debug("Done 2."); > *ntfs_ino = ni; > Index: linux/fs/ntfs/super.c > =================================================================== > --- linux.orig/fs/ntfs/super.c > +++ linux/fs/ntfs/super.c > @@ -1724,6 +1724,14 @@ upcase_failed: > return FALSE; > } > > +/* > + * The lcn and mft bitmap inodes are NTFS-internal inodes with > + * their own special locking rules: > + */ > +static struct lockdep_type_key > + lcnbmp_runlist_lock_key, lcnbmp_mrec_lock_key, > + mftbmp_runlist_lock_key, mftbmp_mrec_lock_key; > + > /** > * load_system_files - open the system files using normal functions > * @vol: ntfs super block describing device whose system files to load > @@ -1780,6 +1788,10 @@ static BOOL load_system_files(ntfs_volum > ntfs_error(sb, "Failed to load $MFT/$BITMAP attribute."); > goto iput_mirr_err_out; > } > + lockdep_reinit_key(&NTFS_I(vol->mftbmp_ino)->runlist.lock, > + &mftbmp_runlist_lock_key); > + lockdep_reinit_key(&NTFS_I(vol->mftbmp_ino)->mrec_lock, > + &mftbmp_mrec_lock_key); > /* Read upcase table and setup @vol->upcase and @vol->upcase_len. */ > if (!load_and_init_upcase(vol)) > goto iput_mftbmp_err_out; > @@ -1802,6 +1814,11 @@ static BOOL load_system_files(ntfs_volum > iput(vol->lcnbmp_ino); > goto bitmap_failed; > } > + lockdep_reinit_key(&NTFS_I(vol->lcnbmp_ino)->runlist.lock, > + &lcnbmp_runlist_lock_key); > + lockdep_reinit_key(&NTFS_I(vol->lcnbmp_ino)->mrec_lock, > + &lcnbmp_mrec_lock_key); > + > NInoSetSparseDisabled(NTFS_I(vol->lcnbmp_ino)); > if ((vol->nr_clusters + 7) >> 3 > i_size_read(vol->lcnbmp_ino)) { > iput(vol->lcnbmp_ino); > @@ -2742,6 +2759,17 @@ static int ntfs_fill_super(struct super_ > struct inode *tmp_ino; > int blocksize, result; > > + /* > + * We do a pretty difficult piece of bootstrap by reading the > + * MFT (and other metadata) from disk into memory. We'll only > + * release this metadata during umount, so the locking patterns > + * observed during bootstrap do not count. So turn off the > + * observation of locking patterns (strictly for this context > + * only) while mounting NTFS. [The validator is still active > + * otherwise, even for this context: it will for example record > + * lock type registrations.] > + */ > + lockdep_off(); > ntfs_debug("Entering."); > #ifndef NTFS_RW > sb->s_flags |= MS_RDONLY; > @@ -2753,6 +2781,7 @@ static int ntfs_fill_super(struct super_ > if (!silent) > ntfs_error(sb, "Allocation of NTFS volume structure " > "failed. Aborting mount..."); > + lockdep_on(); > return -ENOMEM; > } > /* Initialize ntfs_volume structure. */ > @@ -2939,6 +2968,7 @@ static int ntfs_fill_super(struct super_ > mutex_unlock(&ntfs_lock); > sb->s_export_op = &ntfs_export_ops; > lock_kernel(); > + lockdep_on(); > return 0; > } > ntfs_error(sb, "Failed to allocate root directory."); > @@ -3058,6 +3088,7 @@ err_out_now: > sb->s_fs_info = NULL; > kfree(vol); > ntfs_debug("Failed, returning -EINVAL."); > + lockdep_on(); > return -EINVAL; > } > -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-12 10:24 ` Anton Altaparmakov @ 2006-06-12 10:56 ` Ingo Molnar 2006-06-13 5:41 ` Miles Lane 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2006-06-12 10:56 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > below i've attached a cleaned up version of my NTFS annotations so far. > > I think they are pretty straightforward and nonintrusive - for > > !CONFIG_LOCKDEP they add zero changes to ntfs.ko. Most of the linecount > > of the patch comes from comments - so i think we might even consider > > this a "document locking rules" patch :-) > > Thanks, looks great! Can I leave it to you to keep it as part of the > lock validator patches so it gets into -mm with them and later makes > it into the stock kernel with them? sure! > If you want, feel free to add: > > Signed-off-by: Anton Altaparmakov <aia21@cantab.net> > > To the patch... thanks - added. If anyone wants to try the NTFS related lock-validator fixes, they are included in the combo patch: http://redhat.com/~mingo/lockdep-patches/lockdep-combo-2.6.17-rc6-mm2.patch which goes ontop of 2.6.17-rc6-mm2. Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-12 10:56 ` Ingo Molnar @ 2006-06-13 5:41 ` Miles Lane 2006-06-13 21:18 ` Ingo Molnar 0 siblings, 1 reply; 21+ messages in thread From: Miles Lane @ 2006-06-13 5:41 UTC (permalink / raw) To: Ingo Molnar; +Cc: Anton Altaparmakov, LKML, Andrew Morton, Arjan van de Ven if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map 2.6.17-rc6-mm2-lockdep; fi WARNING: /lib/modules/2.6.17-rc6-mm2-lockdep/kernel/fs/ntfs/ntfs.ko needs unknown symbol lockdep_on WARNING: /lib/modules/2.6.17-rc6-mm2-lockdep/kernel/fs/ntfs/ntfs.ko needs unknown symbol lockdep_off Am I doing something wrong? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-13 5:41 ` Miles Lane @ 2006-06-13 21:18 ` Ingo Molnar 0 siblings, 0 replies; 21+ messages in thread From: Ingo Molnar @ 2006-06-13 21:18 UTC (permalink / raw) To: Miles Lane; +Cc: Anton Altaparmakov, LKML, Andrew Morton, Arjan van de Ven * Miles Lane <miles.lane@gmail.com> wrote: > if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F > System.map 2.6.17-rc6-mm2-lockdep; fi > WARNING: /lib/modules/2.6.17-rc6-mm2-lockdep/kernel/fs/ntfs/ntfs.ko > needs unknown symbol lockdep_on > WARNING: /lib/modules/2.6.17-rc6-mm2-lockdep/kernel/fs/ntfs/ntfs.ko > needs unknown symbol lockdep_off oops - please re-download the combo patch, i fixed this one. Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-08 10:53 ` Anton Altaparmakov 2006-06-08 11:23 ` Ingo Molnar @ 2006-06-08 16:11 ` Ingo Molnar 2006-06-08 21:17 ` Anton Altaparmakov 2006-06-09 7:19 ` Ingo Molnar 2 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2006-06-08 16:11 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > &ni->mrec_lock - a spinlock protecting a particular inode's MFT data. > > (finegrained lock for the MFT record) It is > > typically taken by map_mft_record() and released by > > unmap_mft_record(). > > Correct, except s/spinlock/semaphore/ (that really should become a > mutex one day). yeah - it's in fact a mutex already. > No it is not as explained above. Something has gotten confused > somewhere because the order of events is the wrong way round... did my second trace make more sense? The dependency that the validator recorded can be pretty much taken as granted - it only stores dependencies that truly trigger runtime. What shouldnt be taken as granted is my explanation of the events :-) there is a wide array of methods and APIs available to express locking semantics to the validator in a natural and non-intrusive way [for cases where the validator gets it wrong or simply has no way of auto-learning them] - but for that i'll first have to understand the locking semantics :-) Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-08 16:11 ` Ingo Molnar @ 2006-06-08 21:17 ` Anton Altaparmakov 0 siblings, 0 replies; 21+ messages in thread From: Anton Altaparmakov @ 2006-06-08 21:17 UTC (permalink / raw) To: Ingo Molnar; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven On Thu, 8 Jun 2006, Ingo Molnar wrote: > * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > > &ni->mrec_lock - a spinlock protecting a particular inode's MFT data. > > > (finegrained lock for the MFT record) It is > > > typically taken by map_mft_record() and released by > > > unmap_mft_record(). > > > > Correct, except s/spinlock/semaphore/ (that really should become a > > mutex one day). > > yeah - it's in fact a mutex already. > > > No it is not as explained above. Something has gotten confused > > somewhere because the order of events is the wrong way round... > > did my second trace make more sense? The dependency that the validator Which one was that? Could you please quote it again so we both know we are talking about the same thing? > recorded can be pretty much taken as granted - it only stores > dependencies that truly trigger runtime. What shouldnt be taken as > granted is my explanation of the events :-) Ok. (-: > there is a wide array of methods and APIs available to express locking > semantics to the validator in a natural and non-intrusive way [for cases > where the validator gets it wrong or simply has no way of auto-learning > them] - but for that i'll first have to understand the locking semantics > :-) Indeed. (-: Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-08 10:53 ` Anton Altaparmakov 2006-06-08 11:23 ` Ingo Molnar 2006-06-08 16:11 ` Ingo Molnar @ 2006-06-09 7:19 ` Ingo Molnar 2006-06-09 8:31 ` Anton Altaparmakov 2 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2006-06-09 7:19 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > This last lookup of physical location in map_mft_record() [actually in > readpage as map_mft_record() reads the page cache page containing the > record] cannot require us to load an extent mft record because the > runlist is complete so we cannot deadlock here. ah! So basically, the locking sequences the validator observed during load_system_files() are a one-time event that can only occur during ntfs_fill_super(). if we ignore those dependencies (of the initial reading of the MFT inode generating recursive map_mft_record() calls) and take into account that the MFT inode will never again trigger map_mft_record() calls, then the locking is conflict-free after mounting has finished, right? so the validator is technically correct that load_system_files() generates locking patterns that have dependency conflicts - but that code is guaranteed to be single-threaded because it's a one time event and because the VFS has no access to the filesystem at that time yet so there is zero parallellism. can you think of any simple and clean solution that would avoid those conflicting dependencies within the NTFS code? Perhaps some way to read the MFT inode [and perhaps other, always-cached inodes that are later used as metadata] without locking? I can code it up and test it, but i dont think i can find the best method myself :-) Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! 2006-06-09 7:19 ` Ingo Molnar @ 2006-06-09 8:31 ` Anton Altaparmakov 0 siblings, 0 replies; 21+ messages in thread From: Anton Altaparmakov @ 2006-06-09 8:31 UTC (permalink / raw) To: Ingo Molnar; +Cc: Miles Lane, LKML, Andrew Morton, Arjan van de Ven On Fri, 2006-06-09 at 09:19 +0200, Ingo Molnar wrote: > * Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > This last lookup of physical location in map_mft_record() [actually in > > readpage as map_mft_record() reads the page cache page containing the > > record] cannot require us to load an extent mft record because the > > runlist is complete so we cannot deadlock here. > > ah! So basically, the locking sequences the validator observed during > load_system_files() are a one-time event that can only occur during > ntfs_fill_super(). > > if we ignore those dependencies (of the initial reading of the MFT inode > generating recursive map_mft_record() calls) and take into account that > the MFT inode will never again trigger map_mft_record() calls, then the > locking is conflict-free after mounting has finished, right? Correct! (-: > so the validator is technically correct that load_system_files() > generates locking patterns that have dependency conflicts - but that > code is guaranteed to be single-threaded because it's a one time event > and because the VFS has no access to the filesystem at that time yet so > there is zero parallellism. Correct. I just sent you a long description of the process before reading your email above... Did not need to bother if I knew you already understood the case! > can you think of any simple and clean solution that would avoid those > conflicting dependencies within the NTFS code? Perhaps some way to read > the MFT inode [and perhaps other, always-cached inodes that are later > used as metadata] without locking? I can code it up and test it, but i > dont think i can find the best method myself :-) I really do not want to do that. It would penalise ntfs performance in the general case by introducing "if in mount don't lock otherwise lock" checks everywhere in the driver pretty much or you would need to duplicate a ton of code which I avoided duplicating exactly using my bootstrapping mechanism which is why it exists in the first place! And given the only reason for the change you want to make is to make the lock validator happy I would much rather you did something with the lock validator instead... Can't you just insert a call to the validator in ntfs_fill_super() just before the call to ntfs_read_inode_mount() to the effect of "turn_off_lock_validator_data_gathering()" and then just after ntfs_read_inode_mount() returns insert a call to the validator to the effect of "turn_on_lock_validator_data_gathering()". Does that sound reasonable to you? I really cannot see why code needs to be penalised for a validator... I would much rather see a complicated and slow lock validator then a complicated and slow driver given a lock validator is only compiled in for debugging purposes whilst a driver is compiled in all kernels including release ones... Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2006-06-13 21:20 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-08 4:27 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! Miles Lane 2006-06-08 7:32 ` Anton Altaparmakov 2006-06-08 9:55 ` Ingo Molnar 2006-06-08 10:53 ` Anton Altaparmakov 2006-06-08 11:23 ` Ingo Molnar 2006-06-09 8:09 ` Anton Altaparmakov 2006-06-10 7:59 ` Ingo Molnar 2006-06-10 8:48 ` Anton Altaparmakov 2006-06-11 5:31 ` Ingo Molnar 2006-06-11 7:10 ` Anton Altaparmakov 2006-06-12 8:31 ` Ingo Molnar 2006-06-12 8:47 ` Anton Altaparmakov 2006-06-12 9:40 ` Ingo Molnar 2006-06-12 10:24 ` Anton Altaparmakov 2006-06-12 10:56 ` Ingo Molnar 2006-06-13 5:41 ` Miles Lane 2006-06-13 21:18 ` Ingo Molnar 2006-06-08 16:11 ` Ingo Molnar 2006-06-08 21:17 ` Anton Altaparmakov 2006-06-09 7:19 ` Ingo Molnar 2006-06-09 8:31 ` Anton Altaparmakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox