All of lore.kernel.org
 help / color / mirror / Atom feed
* removing lock_kernel() from fs/fat
@ 2007-10-12 17:18 Little, Chris
  2007-10-12 19:28 ` Matthew Wilcox
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Little, Chris @ 2007-10-12 17:18 UTC (permalink / raw)
  To: kernel-janitors

Background:
Not much.  Some C. Have been using Linux since pre-1.0 and s390 Linux
admin for the last seven years or so.  I've been following kernel
janitors and lkml (lkml when i can) for awhile now.  Also reading "Linux
Kernel Development" by Robert Love.

Now:
Saw in the janitor list about removing lock_kernel().  Created the
following patch for linux/fs/fat against vanilla 2.6.23, but I have some
questions.

1.  In file.c, line 304:
	lock_kernel();
	fat_free(inode, nr_clusters);
	unlock_kernel();

	Why wrap fat_free with lock_kernel and not lock in the function
itself?

2.  In inode.c, starting at lines 439 and 570, the two functions
fat_clear_inode() and fat_write_inode() lock_kernel and then spin lock.
Why the spin lock if it already has the bkl?  Since the spin locks are
used, it makes me less sure of this.  I didn't touch lock_kernel() in
inode.c at all.  In fact I'm almost positive I'm missing something, but
I'm here to learn.  So here it is:

--- linux-2.6.23/fs/fat/dir.c   2007-10-09 15:31:38.000000000 -0500
+++ linux-2.6.23-jcl/fs/fat/dir.c     2007-10-12 11:02:17.000000000
-0500
@@ -459,8 +459,9 @@
        int chi, chl, i, i2, j, last, last_u, dotoffset = 0;
        loff_t cpos;
        int ret = 0;
+       rwlock_t mr_rwlock = RW_LOCK_UNLOCKED;
 
-       lock_kernel();
+       read_lock(&mr_rwlock);
 
        cpos = filp->f_pos;
        /* Fake . and .. for the root directory. */
@@ -642,7 +643,7 @@
        if (unicode)
                free_page((unsigned long)unicode);
 out:
-       unlock_kernel();
+       read_unlock(&mr_rwlock);
        return ret;
 }
 
diff -ur linux-2.6.23/fs/fat/file.c linux-2.6.23-jcl/fs/fat/file.c
--- linux-2.6.23/fs/fat/file.c  2007-10-09 15:31:38.000000000 -0500
+++ linux-2.6.23-jcl/fs/fat/file.c    2007-10-12 11:37:47.000000000
-0500
@@ -160,8 +160,9 @@
        struct msdos_sb_info *sbi = MSDOS_SB(dentry->d_sb);
        struct inode *inode = dentry->d_inode;
        int mask, error = 0;
+       rwlock_t mr_rwlock = RW_LOCK_UNLOCKED;
 
-       lock_kernel();
+       write_lock(&mr_rwlock);
 
        /*
         * Expand the file. Since inode_setattr() updates ->i_size
@@ -206,7 +207,7 @@
                mask = sbi->options.fs_fmask;
        inode->i_mode &= S_IFMT | (S_IRWXUGO & ~mask);
 out:
-       unlock_kernel();
+       write_unlock(&mr_rwlock);
        return error;
 }
 
@@ -287,6 +288,7 @@
        struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb);
        const unsigned int cluster_size = sbi->cluster_size;
        int nr_clusters;
+       rwlock_t mr_rwlock = RW_LOCK_UNLOCKED;
 
        /*
         * This protects against truncating a file bigger than it was
then
@@ -297,9 +299,9 @@
 
        nr_clusters = (inode->i_size + (cluster_size - 1)) >>
sbi->cluster_bits;
 
-       lock_kernel();
+       write_lock(&mr_rwlock);
        fat_free(inode, nr_clusters);
-       unlock_kernel();
+       write_lock(&mr_rwlock);
        fat_flush_inodes(inode->i_sb, inode, NULL);
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: removing lock_kernel() from fs/fat
  2007-10-12 17:18 removing lock_kernel() from fs/fat Little, Chris
@ 2007-10-12 19:28 ` Matthew Wilcox
  2007-10-15 16:34 ` Little, Chris
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2007-10-12 19:28 UTC (permalink / raw)
  To: kernel-janitors

On Fri, Oct 12, 2007 at 12:18:23PM -0500, Little, Chris wrote:
> 1.  In file.c, line 304:
> 	lock_kernel();
> 	fat_free(inode, nr_clusters);
> 	unlock_kernel();
> 
> 	Why wrap fat_free with lock_kernel and not lock in the function
> itself?

Probably no good reason.  You should be able to push it down into the
function.

> 2.  In inode.c, starting at lines 439 and 570, the two functions
> fat_clear_inode() and fat_write_inode() lock_kernel and then spin lock.
> Why the spin lock if it already has the bkl?

Because you have to protect against other CPUs reading this inode, some
of which may have the BKL and others may have the inode_hash_lock.

> In fact I'm almost positive I'm missing something, but
> I'm here to learn.  So here it is:
> 
> --- linux-2.6.23/fs/fat/dir.c   2007-10-09 15:31:38.000000000 -0500
> +++ linux-2.6.23-jcl/fs/fat/dir.c     2007-10-12 11:02:17.000000000
> -0500
> @@ -459,8 +459,9 @@
>         int chi, chl, i, i2, j, last, last_u, dotoffset = 0;
>         loff_t cpos;
>         int ret = 0;
> +       rwlock_t mr_rwlock = RW_LOCK_UNLOCKED;
>  
> -       lock_kernel();
> +       read_lock(&mr_rwlock);
>  
>         cpos = filp->f_pos;
>         /* Fake . and .. for the root directory. */
> @@ -642,7 +643,7 @@
>         if (unicode)
>                 free_page((unsigned long)unicode);
>  out:
> -       unlock_kernel();
> +       read_unlock(&mr_rwlock);
>         return ret;
>  }
>  

Basically, you should never put a spinlock on the stack.  You can't
exclude any other caller because they'll have their own copy of this
spinlock.  If it were static, that would be one thing, but I think you
intend for it to be shared with the function below ...

Another mistake you've made is using an rwlock.  This is a classic
beginners mistake, as the problem with it is highly non-obvious.  The
trouble with rwlocks is that they're unfair to writers.  A storm of
readers can prevent a writer from ever getting the lock.  It's normally
a good thing to use a plain spinlock and only convert to an rwlock if
you're sure it's a good idea.

A further mistake is picking a filesystem as a good candidate for BKL
removal.  Take a look at Documentation/filesystems/Locking.  You'll see
there's a lot of places where the VFS takes the BKL on behalf of the
filesystem.  So taking the BKL inside a filesystem implicitly locks out
a lot of places that might call into the filesystem.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: removing lock_kernel() from fs/fat
  2007-10-12 17:18 removing lock_kernel() from fs/fat Little, Chris
  2007-10-12 19:28 ` Matthew Wilcox
@ 2007-10-15 16:34 ` Little, Chris
  2007-10-15 16:39 ` Little, Chris
  2007-10-15 18:23 ` Matthew Wilcox
  3 siblings, 0 replies; 5+ messages in thread
From: Little, Chris @ 2007-10-15 16:34 UTC (permalink / raw)
  To: kernel-janitors

First, thanks for the reply.  Except for the details, it is much what I
expected, but this is a good thing. If nothing else other than realizing
how far I have to go. :) 

> -----Original Message-----
> From: Matthew Wilcox [mailto:matthew@wil.cx] 
> Sent: Friday, October 12, 2007 2:28 PM
> To: Little, Chris
> Cc: kernel-janitors@vger.kernel.org
> Subject: Re: removing lock_kernel() from fs/fat
> 
> On Fri, Oct 12, 2007 at 12:18:23PM -0500, Little, Chris wrote:
> > 1.  In file.c, line 304:
> > 	lock_kernel();
> > 	fat_free(inode, nr_clusters);
> > 	unlock_kernel();
> > 
> > 	Why wrap fat_free with lock_kernel and not lock in the function 
> > itself?
> 
> Probably no good reason.  You should be able to push it down 
> into the function.

I don't know that it matter either way except that it seems logical to
me to have it part of fat_free() IF locking is required for fat_free().
However, fat_free() is inlined and only called once.  Here.  Would that
be worth it?

> 
> > 2.  In inode.c, starting at lines 439 and 570, the two functions
> > fat_clear_inode() and fat_write_inode() lock_kernel and 
> then spin lock.
> > Why the spin lock if it already has the bkl?
> 
> Because you have to protect against other CPUs reading this 
> inode, some of which may have the BKL and others may have the 
> inode_hash_lock.
> 
> > In fact I'm almost positive I'm missing something, but I'm here to 
> > learn.  So here it is:
> > 
> > --- linux-2.6.23/fs/fat/dir.c   2007-10-09 15:31:38.000000000 -0500
> > +++ linux-2.6.23-jcl/fs/fat/dir.c     2007-10-12 11:02:17.000000000
> > -0500
> > @@ -459,8 +459,9 @@
> >         int chi, chl, i, i2, j, last, last_u, dotoffset = 0;
> >         loff_t cpos;
> >         int ret = 0;
> > +       rwlock_t mr_rwlock = RW_LOCK_UNLOCKED;
> >  
> > -       lock_kernel();
> > +       read_lock(&mr_rwlock);
> >  
> >         cpos = filp->f_pos;
> >         /* Fake . and .. for the root directory. */ @@ 
> -642,7 +643,7 
> > @@
> >         if (unicode)
> >                 free_page((unsigned long)unicode);
> >  out:
> > -       unlock_kernel();
> > +       read_unlock(&mr_rwlock);
> >         return ret;
> >  }
> >  
> 
> Basically, you should never put a spinlock on the stack.  You 
> can't exclude any other caller because they'll have their own 
> copy of this spinlock.  If it were static, that would be one 
> thing, but I think you intend for it to be shared with the 
> function below ...
> 
> Another mistake you've made is using an rwlock.  This is a 
> classic beginners mistake, as the problem with it is highly 
> non-obvious.  The trouble with rwlocks is that they're unfair 
> to writers.  A storm of readers can prevent a writer from 
> ever getting the lock.  It's normally a good thing to use a 
> plain spinlock and only convert to an rwlock if you're sure 
> it's a good idea.
> 
> A further mistake is picking a filesystem as a good candidate 
> for BKL removal.  Take a look at 
> Documentation/filesystems/Locking.  You'll see there's a lot 
> of places where the VFS takes the BKL on behalf of the 
> filesystem.  So taking the BKL inside a filesystem implicitly 
> locks out a lot of places that might call into the filesystem.
> 
> --
> Intel are signing my paycheques ... these opinions are still 
> mine "Bill, look, we understand that you're interested in 
> selling us this operating system, but compare it to ours.  We 
> can't possibly take such a retrograde step."
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: removing lock_kernel() from fs/fat
  2007-10-12 17:18 removing lock_kernel() from fs/fat Little, Chris
  2007-10-12 19:28 ` Matthew Wilcox
  2007-10-15 16:34 ` Little, Chris
@ 2007-10-15 16:39 ` Little, Chris
  2007-10-15 18:23 ` Matthew Wilcox
  3 siblings, 0 replies; 5+ messages in thread
From: Little, Chris @ 2007-10-15 16:39 UTC (permalink / raw)
  To: kernel-janitors

Crap.  Hit send before I was done. 

> -----Original Message-----
> From: Matthew Wilcox [mailto:matthew@wil.cx] 
> Sent: Friday, October 12, 2007 2:28 PM
> To: Little, Chris
> Cc: kernel-janitors@vger.kernel.org
> Subject: Re: removing lock_kernel() from fs/fat
>
> > Why the spin lock if it already has the bkl?
> 
> Because you have to protect against other CPUs reading this 
> inode, some of which may have the BKL and others may have the 
> inode_hash_lock.

That makes sense.

> Another mistake you've made is using an rwlock.  This is a 
> classic beginners mistake, as the problem with it is highly 
> non-obvious.  The trouble with rwlocks is that they're unfair 
> to writers.  A storm of readers can prevent a writer from 
> ever getting the lock.  It's normally a good thing to use a 
> plain spinlock and only convert to an rwlock if you're sure 
> it's a good idea.
> 
> A further mistake is picking a filesystem as a good candidate 
> for BKL removal.  Take a look at 
> Documentation/filesystems/Locking.  You'll see there's a lot 
> of places where the VFS takes the BKL on behalf of the 
> filesystem.  So taking the BKL inside a filesystem implicitly 
> locks out a lot of places that might call into the filesystem.

Question:  Are there appropriate times to use rwlock, then? Doesn't it
imply that there are writers?

Many thanks for your comments.  Things to file away.  Pardon me if I
keep throwing stuff up.  One day I might just get good at this.

:)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: removing lock_kernel() from fs/fat
  2007-10-12 17:18 removing lock_kernel() from fs/fat Little, Chris
                   ` (2 preceding siblings ...)
  2007-10-15 16:39 ` Little, Chris
@ 2007-10-15 18:23 ` Matthew Wilcox
  3 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2007-10-15 18:23 UTC (permalink / raw)
  To: kernel-janitors

On Mon, Oct 15, 2007 at 11:39:42AM -0500, Little, Chris wrote:
> > Another mistake you've made is using an rwlock.  This is a 
> > classic beginners mistake, as the problem with it is highly 
> > non-obvious.  The trouble with rwlocks is that they're unfair 
> > to writers.  A storm of readers can prevent a writer from 
> > ever getting the lock.  It's normally a good thing to use a 
> > plain spinlock and only convert to an rwlock if you're sure 
> > it's a good idea.
> 
> Question:  Are there appropriate times to use rwlock, then? Doesn't it
> imply that there are writers?

There definitely are appropriate times to use rwlock.  This might even
be one of them.  But it's a non-obvious risk that needs to be born in
mind, and it's rarely a win.

Here's a good example:
include/linux/netdevice.h:extern rwlock_t dev_base_lock;          /* Device list lock */

So when we're adding a new network device or removing an old one
(infrequent operation, can afford to wait for a long time), we take it
for write.  All kinds of places in the network layer require the list
of network devices don't change while they look things up, so they all
take it for read, and there's no need to exclude them against each other.

It's possible that a situation like this obtains in fat.  I haven't
looked into it.  But anyone who wants to change a lock type really needs
to provide a good argument for why it should be changed.  Back when we
had a BKL brigade, they posted a patch to fs/locks.c which I was able
to show would actually increase contention.

> Many thanks for your comments.  Things to file away.  Pardon me if I
> keep throwing stuff up.  One day I might just get good at this.

It takes a while.  Keep reading code, it's the best way to learn.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-10-15 18:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-12 17:18 removing lock_kernel() from fs/fat Little, Chris
2007-10-12 19:28 ` Matthew Wilcox
2007-10-15 16:34 ` Little, Chris
2007-10-15 16:39 ` Little, Chris
2007-10-15 18:23 ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.