* locking in sync_old_buffers
@ 2002-04-22 21:01 Dave Hansen
2002-04-22 22:08 ` Andrew Morton
2002-04-22 22:25 ` Martin J. Bligh
0 siblings, 2 replies; 8+ messages in thread
From: Dave Hansen @ 2002-04-22 21:01 UTC (permalink / raw)
To: Alexander Viro
Cc: Andrew Morton, linux-kernel, Linus Torvalds, Martin J. Bligh
[-- Attachment #1: Type: text/plain, Size: 1360 bytes --]
BKL hold times in sync_old_buffers aren't good. This is one of the last
places that we see millisecond-scale BKL hold times.
0.09% 8.1% 4598us(9648us) 105us( 231us)(0.00%) 37 91.9% 8.1%
0% sync_old_buffers+0x1c
So, I'll ask the eternal question that plagues us all, why is the BKL
held here? I saw some discussions referring to races prevented by BKL here:
http://groups.google.com/groups?selm=linux.kernel.Pine.GSO.4.21.0103212354420.2632-100000%40weyl.math.psu.edu&output=gplain
From Al:
> Ehh... Linus, the main problem is in get_super(). Want a nice race?
> sys_ustat() vs. sys_umount(). The former does get_super(), finds
> struct super_block and does nothing to guarantee that it will stay.
> Then it calls ->statfs(). In the meanwhile, you umount the thing
> and do rmmod. Oops..
With the addition of sb_lock and refcounting on the sb, are these races
still there? sync_supers() and sync_unlocked_inodes() both hold sb_lock.
If we just remove the BKL from sync_old_buffers() the long hold time
goes from BKL to sb_lock. But, sb_lock is dropped and reacquired a few
times in that code, so latency should be better than BKL. BKL wouldn't
have been released except in those places where it had to drop the locks
and sleep.
Patch to do it is attached, but I'm not holding my breath.
--
Dave Hansen
haveblue@us.ibm.com
[-- Attachment #2: sync_old_buffers-remove_bkl-2.5.8.patch --]
[-- Type: text/plain, Size: 306 bytes --]
--- linux-2.5.8-clean/fs/buffer.c Mon Apr 22 13:45:34 2002
+++ linux/fs/buffer.c Mon Apr 22 13:45:49 2002
@@ -2612,10 +2612,8 @@
static void sync_old_buffers(unsigned long dummy)
{
- lock_kernel();
sync_unlocked_inodes();
sync_supers();
- unlock_kernel();
for (;;) {
struct buffer_head *bh;
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: locking in sync_old_buffers 2002-04-22 21:01 locking in sync_old_buffers Dave Hansen @ 2002-04-22 22:08 ` Andrew Morton 2002-04-22 22:23 ` Dave Hansen 2002-04-23 1:31 ` Alexander Viro 2002-04-22 22:25 ` Martin J. Bligh 1 sibling, 2 replies; 8+ messages in thread From: Andrew Morton @ 2002-04-22 22:08 UTC (permalink / raw) To: Dave Hansen; +Cc: Alexander Viro, linux-kernel, Linus Torvalds, Martin J. Bligh Dave Hansen wrote: > ... > --- linux-2.5.8-clean/fs/buffer.c Mon Apr 22 13:45:34 2002 > +++ linux/fs/buffer.c Mon Apr 22 13:45:49 2002 > @@ -2612,10 +2612,8 @@ > > static void sync_old_buffers(unsigned long dummy) > { > - lock_kernel(); > sync_unlocked_inodes(); > sync_supers(); > - unlock_kernel(); > > for (;;) { > struct buffer_head *bh; Al would know better than I, but... If you're going to do this, then the BKL should be acquired in fs/super.c:write_super(), so the per-fs ->write_super functions do not see changed external locking rules. Possibly, fs/inode.c:write_inode() needs the same treatment. But Doc/filesystems/Locking says that lock_kernel() is not held across ->write_inode so there should be no need to take it on the kupdate path. That's for 2.4. For 2.5, we'd like sync_old_buffers to just go away. Do you have time to test http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.8/everything.patch.gz - ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locking in sync_old_buffers 2002-04-22 22:08 ` Andrew Morton @ 2002-04-22 22:23 ` Dave Hansen 2002-04-22 22:28 ` Linus Torvalds 2002-04-23 1:31 ` Alexander Viro 1 sibling, 1 reply; 8+ messages in thread From: Dave Hansen @ 2002-04-22 22:23 UTC (permalink / raw) To: Andrew Morton Cc: Alexander Viro, linux-kernel, Linus Torvalds, Martin J. Bligh Andrew Morton wrote: > If you're going to do this, then the BKL should be acquired > in fs/super.c:write_super(), so the per-fs ->write_super > functions do not see changed external locking rules. > > Possibly, fs/inode.c:write_inode() needs the same treatment. > But Doc/filesystems/Locking says that lock_kernel() is not > held across ->write_inode so there should be no need to take > it on the kupdate path. That sounds sane. I was just fishing for information before I go do anything drastic. > That's for 2.4. For 2.5, we'd like sync_old_buffers to just > go away. Do you have time to test >http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.8/everything.patch.gz Absolutely. What else does it contain that I should watch out for? -- Dave Hansen haveblue@us.ibm.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locking in sync_old_buffers 2002-04-22 22:23 ` Dave Hansen @ 2002-04-22 22:28 ` Linus Torvalds 2002-04-22 22:50 ` Andrew Morton 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2002-04-22 22:28 UTC (permalink / raw) To: Dave Hansen; +Cc: Andrew Morton, Alexander Viro, linux-kernel, Martin J. Bligh On Mon, 22 Apr 2002, Dave Hansen wrote: > Andrew Morton wrote: > > If you're going to do this, then the BKL should be acquired > > in fs/super.c:write_super(), so the per-fs ->write_super > > functions do not see changed external locking rules. > > > > Possibly, fs/inode.c:write_inode() needs the same treatment. > > But Doc/filesystems/Locking says that lock_kernel() is not > > held across ->write_inode so there should be no need to take > > it on the kupdate path. > That sounds sane. I was just fishing for information before I go do > anything drastic. I would personally avoid doing these kinds of locking changes in 2.4.x altogether, unless there are really drastic reasons for them (ie real machines under load at real customer sites that really care). > > That's for 2.4. For 2.5, we'd like sync_old_buffers to just > > go away. Do you have time to test > >http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.8/everything.patch.gz > Absolutely. What else does it contain that I should watch out for? Don't use it on a production machine, but since this is in the 2.5.x future, I'd love to hear about not just lock contention but also about whether you can see any problems under heavy load. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locking in sync_old_buffers 2002-04-22 22:28 ` Linus Torvalds @ 2002-04-22 22:50 ` Andrew Morton 0 siblings, 0 replies; 8+ messages in thread From: Andrew Morton @ 2002-04-22 22:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Hansen, Alexander Viro, linux-kernel, Martin J. Bligh Linus Torvalds wrote: > > ... > > >http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.8/everything.patch.gz > > Absolutely. What else does it contain that I should watch out for? > > Don't use it on a production machine, but since this is in the 2.5.x > future, I'd love to hear about not just lock contention but also about > whether you can see any problems under heavy load. > It may choke under metadata-intensive workloads on really large memory machines. It works fine with 2.5 gigabyte x86, but 16 gigs may cause problems. This is because the dirty-memory balancing code doesn't know that blockdev mappings are restricted to ZONE_NORMAL. The correct fix for that is, of course, to allow blockdev mappings to use highmem. That will require methodical picking away at all the filesystems, and will take some time. ext2 should be OK because much of its metadata (directories) are in highmem at present. - ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locking in sync_old_buffers 2002-04-22 22:08 ` Andrew Morton 2002-04-22 22:23 ` Dave Hansen @ 2002-04-23 1:31 ` Alexander Viro 2002-04-23 18:29 ` Dave Hansen 1 sibling, 1 reply; 8+ messages in thread From: Alexander Viro @ 2002-04-23 1:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Hansen, linux-kernel, Linus Torvalds, Martin J. Bligh On Mon, 22 Apr 2002, Andrew Morton wrote: > Al would know better than I, but... > > If you're going to do this, then the BKL should be acquired > in fs/super.c:write_super(), so the per-fs ->write_super > functions do not see changed external locking rules. Definitely. > Possibly, fs/inode.c:write_inode() needs the same treatment. > But Doc/filesystems/Locking says that lock_kernel() is not > held across ->write_inode so there should be no need to take > it on the kupdate path. It isn't - it had been shifted into the instances back in 2.3. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locking in sync_old_buffers 2002-04-23 1:31 ` Alexander Viro @ 2002-04-23 18:29 ` Dave Hansen 0 siblings, 0 replies; 8+ messages in thread From: Dave Hansen @ 2002-04-23 18:29 UTC (permalink / raw) To: Alexander Viro Cc: Andrew Morton, linux-kernel, Linus Torvalds, Martin J. Bligh [-- Attachment #1: Type: text/plain, Size: 582 bytes --] Alexander Viro wrote: > On Mon, 22 Apr 2002, Andrew Morton wrote: >>If you're going to do this, then the BKL should be acquired >>in fs/super.c:write_super(), so the per-fs ->write_super >>functions do not see changed external locking rules. > > Definitely. Would you prefer that it be pushed into fs/super.c:write_super() itself, or the fs-specific write_super()s? The approach so far has been to push into the filesystem itself, so that is what this patch does. I also updated filesystems/Locking and a little comment in the ext3 code. -- Dave Hansen haveblue@us.ibm.com [-- Attachment #2: sync_old_buffers-bkl_shift-2.5.9-2.patch --] [-- Type: text/plain, Size: 7279 bytes --] diff -ur linux-2.5.9-clean/Documentation/filesystems/Locking linux/Documentation/filesystems/Locking --- linux-2.5.9-clean/Documentation/filesystems/Locking Wed Apr 10 13:27:52 2002 +++ linux/Documentation/filesystems/Locking Tue Apr 23 10:54:23 2002 @@ -110,7 +110,7 @@ delete_inode: no clear_inode: no put_super: yes yes maybe (see below) -write_super: yes yes maybe (see below) +write_super: no yes maybe (see below) statfs: yes no no remount_fs: yes yes maybe (see below) umount_begin: yes no maybe (see below) diff -ur linux-2.5.9-clean/fs/affs/super.c linux/fs/affs/super.c --- linux-2.5.9-clean/fs/affs/super.c Wed Apr 10 13:28:10 2002 +++ linux/fs/affs/super.c Tue Apr 23 10:32:31 2002 @@ -38,7 +38,7 @@ affs_put_super(struct super_block *sb) { struct affs_sb_info *sbi = AFFS_SB(sb); - + lock_kernel(); pr_debug("AFFS: put_super()\n"); if (!(sb->s_flags & MS_RDONLY)) { @@ -56,7 +56,7 @@ affs_brelse(sbi->s_root_bh); kfree(sbi); sb->u.generic_sbp = NULL; - + unlock_kernel(); return; } @@ -65,7 +65,7 @@ { int clean = 2; struct affs_sb_info *sbi = AFFS_SB(sb); - + lock_kernel(); if (!(sb->s_flags & MS_RDONLY)) { // if (sbi->s_bitmap[i].bm_bh) { // if (buffer_dirty(sbi->s_bitmap[i].bm_bh)) { @@ -80,6 +80,7 @@ sb->s_dirt = 0; pr_debug("AFFS: write_super() at %lu, clean=%d\n", CURRENT_TIME, clean); + unlock_kernel(); } static kmem_cache_t * affs_inode_cachep; diff -ur linux-2.5.9-clean/fs/bfs/inode.c linux/fs/bfs/inode.c --- linux-2.5.9-clean/fs/bfs/inode.c Tue Apr 23 10:57:34 2002 +++ linux/fs/bfs/inode.c Tue Apr 23 10:56:42 2002 @@ -206,9 +206,11 @@ static void bfs_write_super(struct super_block *s) { + lock_kernel(); if (!(s->s_flags & MS_RDONLY)) mark_buffer_dirty(BFS_SB(s)->si_sbh); s->s_dirt = 0; + unlock_kernel(); } static kmem_cache_t * bfs_inode_cachep; diff -ur linux-2.5.9-clean/fs/buffer.c linux/fs/buffer.c --- linux-2.5.9-clean/fs/buffer.c Mon Apr 22 13:45:34 2002 +++ linux/fs/buffer.c Mon Apr 22 13:45:49 2002 @@ -2612,10 +2612,8 @@ static void sync_old_buffers(unsigned long dummy) { - lock_kernel(); sync_unlocked_inodes(); sync_supers(); - unlock_kernel(); for (;;) { struct buffer_head *bh; diff -ur linux-2.5.9-clean/fs/ext2/super.c linux/fs/ext2/super.c --- linux-2.5.9-clean/fs/ext2/super.c Tue Apr 23 10:57:34 2002 +++ linux/fs/ext2/super.c Tue Apr 23 10:56:42 2002 @@ -754,7 +754,7 @@ void ext2_write_super (struct super_block * sb) { struct ext2_super_block * es; - + lock_kernel(); if (!(sb->s_flags & MS_RDONLY)) { es = EXT2_SB(sb)->s_es; @@ -768,6 +768,7 @@ ext2_commit_super (sb, es); } sb->s_dirt = 0; + unlock_kernel(); } int ext2_remount (struct super_block * sb, int * flags, char * data) diff -ur linux-2.5.9-clean/fs/ext3/super.c linux/fs/ext3/super.c --- linux-2.5.9-clean/fs/ext3/super.c Wed Apr 10 13:28:12 2002 +++ linux/fs/ext3/super.c Tue Apr 23 11:23:42 2002 @@ -501,7 +501,7 @@ put_inode: ext3_put_inode, /* BKL not held. Don't need */ delete_inode: ext3_delete_inode, /* BKL not held. We take it */ put_super: ext3_put_super, /* BKL held */ - write_super: ext3_write_super, /* BKL held */ + write_super: ext3_write_super, /* BKL not held. We take it. Needed? */ write_super_lockfs: ext3_write_super_lockfs, /* BKL not held. Take it */ unlockfs: ext3_unlockfs, /* BKL not held. We take it */ statfs: ext3_statfs, /* BKL held */ @@ -1599,7 +1599,7 @@ void ext3_write_super (struct super_block * sb) { tid_t target; - + lock_kernel(); if (down_trylock(&sb->s_lock) == 0) BUG(); /* aviro detector */ sb->s_dirt = 0; @@ -1610,6 +1610,7 @@ log_wait_commit(EXT3_SB(sb)->s_journal, target); lock_super(sb); } + unlock_kernel(); } /* diff -ur linux-2.5.9-clean/fs/hfs/super.c linux/fs/hfs/super.c --- linux-2.5.9-clean/fs/hfs/super.c Wed Apr 10 13:28:12 2002 +++ linux/fs/hfs/super.c Tue Apr 23 10:33:21 2002 @@ -146,9 +146,10 @@ static void hfs_write_super(struct super_block *sb) { struct hfs_mdb *mdb = HFS_SB(sb)->s_mdb; - + lock_kernel(); /* is this a valid hfs superblock? */ if (!sb || sb->s_magic != HFS_SUPER_MAGIC) { + unlock_kernel(); return; } @@ -157,6 +158,7 @@ hfs_mdb_commit(mdb, 0); } sb->s_dirt = 0; + unlock_kernel(); } /* diff -ur linux-2.5.9-clean/fs/jffs/inode-v23.c linux/fs/jffs/inode-v23.c --- linux-2.5.9-clean/fs/jffs/inode-v23.c Wed Apr 10 13:28:13 2002 +++ linux/fs/jffs/inode-v23.c Tue Apr 23 10:35:15 2002 @@ -1746,8 +1746,9 @@ jffs_write_super(struct super_block *sb) { struct jffs_control *c = (struct jffs_control *)sb->u.generic_sbp; - + lock_kernel(); jffs_garbage_collect_trigger(c); + unlock_kernel(); } static struct super_operations jffs_ops = diff -ur linux-2.5.9-clean/fs/jffs2/fs.c linux/fs/jffs2/fs.c --- linux-2.5.9-clean/fs/jffs2/fs.c Tue Apr 2 10:43:21 2002 +++ linux/fs/jffs2/fs.c Tue Apr 23 10:35:39 2002 @@ -320,16 +320,21 @@ void jffs2_write_super (struct super_block *sb) { struct jffs2_sb_info *c = JFFS2_SB_INFO(sb); + + lock_kernel(); sb->s_dirt = 0; - if (sb->s_flags & MS_RDONLY) + if (sb->s_flags & MS_RDONLY) { + unlock_kernel(); return; + } D1(printk("jffs2_write_super(): flush_wbuf before gc-trigger\n")); jffs2_flush_wbuf(c, 2); jffs2_garbage_collect_trigger(c); jffs2_erase_pending_blocks(c); jffs2_mark_erased_blocks(c); + unlock_kernel(); } diff -ur linux-2.5.9-clean/fs/qnx4/inode.c linux/fs/qnx4/inode.c --- linux-2.5.9-clean/fs/qnx4/inode.c Tue Apr 2 10:43:22 2002 +++ linux/fs/qnx4/inode.c Tue Apr 23 10:35:51 2002 @@ -72,8 +72,10 @@ static void qnx4_write_super(struct super_block *sb) { + lock_kernel(); QNX4DEBUG(("qnx4: write_super\n")); sb->s_dirt = 0; + unlock_kernel(); } static void qnx4_write_inode(struct inode *inode, int unused) diff -ur linux-2.5.9-clean/fs/sysv/inode.c linux/fs/sysv/inode.c --- linux-2.5.9-clean/fs/sysv/inode.c Thu Mar 7 18:18:05 2002 +++ linux/fs/sysv/inode.c Tue Apr 23 10:37:25 2002 @@ -33,6 +33,7 @@ /* This is only called on sync() and umount(), when s_dirt=1. */ static void sysv_write_super(struct super_block *sb) { + lock_kernel(); if (!(sb->s_flags & MS_RDONLY)) { /* If we are going to write out the super block, then attach current time stamp. @@ -46,6 +47,7 @@ mark_buffer_dirty(sb->sv_bh2); } sb->s_dirt = 0; + unlock_kernel(); } static void sysv_put_super(struct super_block *sb) diff -ur linux-2.5.9-clean/fs/udf/super.c linux/fs/udf/super.c --- linux-2.5.9-clean/fs/udf/super.c Wed Apr 10 13:28:14 2002 +++ linux/fs/udf/super.c Tue Apr 23 10:37:33 2002 @@ -359,9 +359,11 @@ void udf_write_super(struct super_block *sb) { + lock_kernel(); if (!(sb->s_flags & MS_RDONLY)) udf_open_lvid(sb); sb->s_dirt = 0; + unlock_kernel(); } static int diff -ur linux-2.5.9-clean/fs/ufs/super.c linux/fs/ufs/super.c --- linux-2.5.9-clean/fs/ufs/super.c Wed Apr 10 13:28:15 2002 +++ linux/fs/ufs/super.c Tue Apr 23 10:49:12 2002 @@ -822,6 +822,8 @@ struct ufs_super_block_third * usb3; unsigned flags; + lock_kernel(); + UFSD(("ENTER\n")) flags = sb->u.ufs_sb.s_flags; uspi = sb->u.ufs_sb.s_uspi; @@ -838,6 +840,7 @@ } sb->s_dirt = 0; UFSD(("EXIT\n")) + unlock_kernel(); } void ufs_put_super (struct super_block * sb) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locking in sync_old_buffers 2002-04-22 21:01 locking in sync_old_buffers Dave Hansen 2002-04-22 22:08 ` Andrew Morton @ 2002-04-22 22:25 ` Martin J. Bligh 1 sibling, 0 replies; 8+ messages in thread From: Martin J. Bligh @ 2002-04-22 22:25 UTC (permalink / raw) To: Dave Hansen, Alexander Viro; +Cc: Andrew Morton, linux-kernel, Linus Torvalds > 0.09% 8.1% 4598us(9648us) 105us( 231us)(0.00%) 37 91.9% 8.1% 0% sync_old_buffers+0x1c 16-way NUMA-Q kernel compile: 0.64% 20.0% 38ms( 54ms) 93us( 93us)(0.00%) 5 80.0% 20.0% 0% sync_old_buffers+0x20 Hold times - ouch. M. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2002-04-23 18:30 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-04-22 21:01 locking in sync_old_buffers Dave Hansen 2002-04-22 22:08 ` Andrew Morton 2002-04-22 22:23 ` Dave Hansen 2002-04-22 22:28 ` Linus Torvalds 2002-04-22 22:50 ` Andrew Morton 2002-04-23 1:31 ` Alexander Viro 2002-04-23 18:29 ` Dave Hansen 2002-04-22 22:25 ` Martin J. Bligh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox