* locking rules for ->dirty_inode() @ 2002-09-20 15:00 Nikita Danilov 2002-09-20 15:52 ` Andrew Morton 2002-09-20 22:41 ` Andrew Morton 0 siblings, 2 replies; 9+ messages in thread From: Nikita Danilov @ 2002-09-20 15:00 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Alexander Viro, Andrew Morton Hello, Documentation/filesystems/Locking states that all super operations may block, but __set_page_dirty_buffers() calls __mark_inode_dirty()->s_op->dirty_inode() under mapping->private_lock spin lock. This seems strange, because file systems' ->dirty_inode() assume that they are allowed to block. For example, ext3_dirty_inode() allocates memory in ext3_journal_start()->journal_start()->new_handle()->... Nikita. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: locking rules for ->dirty_inode() 2002-09-20 15:00 locking rules for ->dirty_inode() Nikita Danilov @ 2002-09-20 15:52 ` Andrew Morton 2002-09-20 16:32 ` Nikita Danilov 2002-09-20 22:41 ` Andrew Morton 1 sibling, 1 reply; 9+ messages in thread From: Andrew Morton @ 2002-09-20 15:52 UTC (permalink / raw) To: Nikita Danilov; +Cc: Linux Kernel Mailing List, Alexander Viro Nikita Danilov wrote: > > Hello, > > Documentation/filesystems/Locking states that all super operations may > block, but __set_page_dirty_buffers() calls > > __mark_inode_dirty()->s_op->dirty_inode() > > under mapping->private_lock spin lock. This seems strange, because file > systems' ->dirty_inode() assume that they are allowed to block. For > example, ext3_dirty_inode() allocates memory in > > ext3_journal_start()->journal_start()->new_handle()->... > OK, thanks. mapping->private_lock is taken there to pin page->buffers() (Can't lock the page because set_page_dirty is called under page_table_lock, and other locks). I'm sure we can just move the spin_unlock up to above the TestSetPageDirty(), but I need to zenuflect for a while over why I did it that way. It's necessary to expose buffer-dirtiness and page-dirtiness to the rest of the world in the correct order. If we set the page dirty and then the buffers, there is a window in which writeback could find the dirty page, try to write it, discover clean buffers and mark the page clean. We would end up with a !PageDirty page, on mapping->clean_pages, with dirty buffers. It would never be written. Yup. We can move that spin_unlock up ten lines. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: locking rules for ->dirty_inode() 2002-09-20 15:52 ` Andrew Morton @ 2002-09-20 16:32 ` Nikita Danilov 2002-09-20 16:47 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Nikita Danilov @ 2002-09-20 16:32 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel Mailing List, Alexander Viro Andrew Morton writes: > Nikita Danilov wrote: > > > > Hello, > > > > Documentation/filesystems/Locking states that all super operations may > > block, but __set_page_dirty_buffers() calls > > > > __mark_inode_dirty()->s_op->dirty_inode() > > > > under mapping->private_lock spin lock. This seems strange, because file > > systems' ->dirty_inode() assume that they are allowed to block. For > > example, ext3_dirty_inode() allocates memory in > > > > ext3_journal_start()->journal_start()->new_handle()->... > > > > OK, thanks. > > mapping->private_lock is taken there to pin page->buffers() > (Can't lock the page because set_page_dirty is called under > page_table_lock, and other locks). > > I'm sure we can just move the spin_unlock up to above the > TestSetPageDirty(), but I need to zenuflect for a while over > why I did it that way. > > It's necessary to expose buffer-dirtiness and page-dirtiness > to the rest of the world in the correct order. If we set the > page dirty and then the buffers, there is a window in which writeback > could find the dirty page, try to write it, discover clean buffers > and mark the page clean. We would end up with a !PageDirty page, > on mapping->clean_pages, with dirty buffers. It would never be > written. > > Yup. We can move that spin_unlock up ten lines. Actually, I came over this while trying to describe lock ordering in reiser4 after I just started integrating other kernel locks there. I wonder, has somebody already done this, writing up kernel lock hierarchy, that is? Nikita. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: locking rules for ->dirty_inode() 2002-09-20 16:32 ` Nikita Danilov @ 2002-09-20 16:47 ` Andrew Morton 2002-09-20 17:32 ` Nikita Danilov 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2002-09-20 16:47 UTC (permalink / raw) To: Nikita Danilov; +Cc: Linux Kernel Mailing List, Alexander Viro Nikita Danilov wrote: > > ... > Actually, I came over this while trying to describe lock ordering in > reiser4 after I just started integrating other kernel locks there. I > wonder, has somebody already done this, writing up kernel lock > hierarchy, that is? > I've been keeping the comment at the top of filemap.c uptodate when I discover things. It got smaller a while ago when certain rude locks were spoken to. Really, this form of representation isn't rich enough, but the format certainly provides enough info to know when you might be taking locks in the wrong order, and it tells you where to look to see them being taken. The problem with the format is that locks are only mentioned once, and it can't describe the whole graph. Maybe it needs something like: * ->i_shared_lock (vmtruncate) * ->private_lock (__free_pte->__set_page_dirty_buffers) * ->swap_list_lock * ->swap_device_lock (exclusive_swap_page, others) * ->mapping->page_lock * ->inode_lock (__mark_inode_dirty) * ->sb_lock (fs/fs-writeback.c) +* ->i_shared_lock +* ->page_table_lock (lots of places) */ Don't know. Maybe someone somewhere has developed a notation for this? How are you doing it? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: locking rules for ->dirty_inode() 2002-09-20 16:47 ` Andrew Morton @ 2002-09-20 17:32 ` Nikita Danilov 2002-09-20 18:21 ` Hans Reiser 0 siblings, 1 reply; 9+ messages in thread From: Nikita Danilov @ 2002-09-20 17:32 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel Mailing List, Alexander Viro Andrew Morton writes: > Nikita Danilov wrote: > > > > ... > > Actually, I came over this while trying to describe lock ordering in > > reiser4 after I just started integrating other kernel locks there. I > > wonder, has somebody already done this, writing up kernel lock > > hierarchy, that is? > > > > I've been keeping the comment at the top of filemap.c uptodate when > I discover things. It got smaller a while ago when certain rude > locks were spoken to. > > Really, this form of representation isn't rich enough, but the > format certainly provides enough info to know when you might be > taking locks in the wrong order, and it tells you where to look > to see them being taken. > > The problem with the format is that locks are only mentioned once, > and it can't describe the whole graph. Maybe it needs something > like: > > > * ->i_shared_lock (vmtruncate) > * ->private_lock (__free_pte->__set_page_dirty_buffers) > * ->swap_list_lock > * ->swap_device_lock (exclusive_swap_page, others) > * ->mapping->page_lock > * ->inode_lock (__mark_inode_dirty) > * ->sb_lock (fs/fs-writeback.c) > +* ->i_shared_lock > +* ->page_table_lock (lots of places) > */ > > Don't know. Maybe someone somewhere has developed a notation > for this? How are you doing it? I am doing it roughly in the same way: in a similar diagram where each lock is mentioned exactly once, locks can be acquired from the left to the right. Locks on the same indentation level are unordered and cannot be held at the same time. This is enough to express lock *ordering*, whenever it exists. For example, you diagram above gives: ->i_shared_lock (vmtruncate) ->private_lock (__free_pte->__set_page_dirty_buffers) ->swap_list_lock ->swap_device_lock (exclusive_swap_page, others) ->mapping->page_lock ->inode_lock (__mark_inode_dirty) ->sb_lock (fs/fs-writeback.c) ->page_table_lock (lots of places) And it means that neither ->private_lock and ->page_table_lock nor ->swap_list_lock and ->inode_lock cannot be held at the same time. As you mentioned (if I understood correctly) this is insufficient to express where locks are taken and, more generally, how the lock graph is embedded into the call graph. But lock ordering itself is quite useful. For example, in reiser4 we have SPIN_LOCK_FUNCTIONS() macro. If one has typedef struct txn_handle { ... spinlock_t guard; ... } txn_handle; then SPIN_LOCK_FUNCTIONS(txn_handle, guard) will generate functions like spin_lock_txn_handle(); spin_unlock_txn_handle(); spin_trylock_txn_handle(); etc. In debugging mode, these functions (aside from manipulating locks) modify special per-thread counters. This allows one to specify lock ordering constraints. Each spin_lock_foo() function checks spin_ordering_pred_foo() macro before taking lock. This was really helpful during early debugging. Nikita. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: locking rules for ->dirty_inode() 2002-09-20 17:32 ` Nikita Danilov @ 2002-09-20 18:21 ` Hans Reiser 0 siblings, 0 replies; 9+ messages in thread From: Hans Reiser @ 2002-09-20 18:21 UTC (permalink / raw) To: Nikita Danilov; +Cc: Andrew Morton, Linux Kernel Mailing List, Alexander Viro You might ask the SGI guys what they do. I know that they have a systematic approach to documenting lock ordering. Hans ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: locking rules for ->dirty_inode() 2002-09-20 15:00 locking rules for ->dirty_inode() Nikita Danilov 2002-09-20 15:52 ` Andrew Morton @ 2002-09-20 22:41 ` Andrew Morton 2002-09-23 16:32 ` Nikita Danilov 1 sibling, 1 reply; 9+ messages in thread From: Andrew Morton @ 2002-09-20 22:41 UTC (permalink / raw) To: Nikita Danilov; +Cc: Linux Kernel Mailing List, Alexander Viro Nikita Danilov wrote: > > Hello, > > Documentation/filesystems/Locking states that all super operations may > block, but __set_page_dirty_buffers() calls > > __mark_inode_dirty()->s_op->dirty_inode() > > under mapping->private_lock spin lock. Actually it doesn't. We do not call down into the filesystem for I_DIRTY_PAGES. set_page_dirty() is already called under locks, via __free_pte (pagetable teardown). 2.4 does this as well. But I'll make the change anyway. I think it removes any ranking requirements between mapping->page_lock and mapping->private_lock, which is always a nice thing. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: locking rules for ->dirty_inode() 2002-09-20 22:41 ` Andrew Morton @ 2002-09-23 16:32 ` Nikita Danilov 2002-09-23 16:42 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Nikita Danilov @ 2002-09-23 16:32 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel Mailing List, Alexander Viro Andrew Morton writes: > Nikita Danilov wrote: > > > > Hello, > > > > Documentation/filesystems/Locking states that all super operations may > > block, but __set_page_dirty_buffers() calls > > > > __mark_inode_dirty()->s_op->dirty_inode() > > > > under mapping->private_lock spin lock. > > Actually it doesn't. We do not call down into the filesystem > for I_DIRTY_PAGES. > > set_page_dirty() is already called under locks, via __free_pte (pagetable > teardown). 2.4 does this as well. Cannot find __free_pte, it is only mentioned in comments in mm/filemap.c and include/asm-generic/tlb.h. > > But I'll make the change anyway. I think it removes any > ranking requirements between mapping->page_lock and > mapping->private_lock, which is always a nice thing. Nikita. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: locking rules for ->dirty_inode() 2002-09-23 16:32 ` Nikita Danilov @ 2002-09-23 16:42 ` Andrew Morton 0 siblings, 0 replies; 9+ messages in thread From: Andrew Morton @ 2002-09-23 16:42 UTC (permalink / raw) To: Nikita Danilov; +Cc: Linux Kernel Mailing List, Alexander Viro Nikita Danilov wrote: > > Andrew Morton writes: > > Nikita Danilov wrote: > > > > > > Hello, > > > > > > Documentation/filesystems/Locking states that all super operations may > > > block, but __set_page_dirty_buffers() calls > > > > > > __mark_inode_dirty()->s_op->dirty_inode() > > > > > > under mapping->private_lock spin lock. > > > > Actually it doesn't. We do not call down into the filesystem > > for I_DIRTY_PAGES. > > > > set_page_dirty() is already called under locks, via __free_pte (pagetable > > teardown). 2.4 does this as well. > > Cannot find __free_pte, it is only mentioned in comments in mm/filemap.c > and include/asm-generic/tlb.h. > It got moved around. 2.4: __free_pte(), 2.5: zap_pte_range(). ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2002-09-23 19:11 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-09-20 15:00 locking rules for ->dirty_inode() Nikita Danilov 2002-09-20 15:52 ` Andrew Morton 2002-09-20 16:32 ` Nikita Danilov 2002-09-20 16:47 ` Andrew Morton 2002-09-20 17:32 ` Nikita Danilov 2002-09-20 18:21 ` Hans Reiser 2002-09-20 22:41 ` Andrew Morton 2002-09-23 16:32 ` Nikita Danilov 2002-09-23 16:42 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox