netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Olaf Kirch <okir@suse.de>
To: trond.myklebust@fys.uio.no
Cc: Christoph Hellwig <hch@lst.de>,
	netdev@oss.sgi.com, nfs@lists.sourceforge.net
Subject: Re: [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod
Date: Tue, 10 Feb 2004 10:15:20 +0100	[thread overview]
Message-ID: <20040210091520.GD20932@suse.de> (raw)
In-Reply-To: <35059.80.213.3.64.1076359739.squirrel@webmail.uio.no>

On Mon, Feb 09, 2004 at 09:48:59PM +0100, Trond Myklebust wrote:
> > So if you remove the sunrpc BKL be prepared for lots and lots
> > of bug reports :)
> 
> Ahem....
> 
> xprt_alloc_xid is quite safe. It's always called under xprt->xprt_lock.

It wasn't in 2.4.19 or whenever I ran into this. And a per-transport
lock isn't enough to protect a global variable shared by all transports.
Trust me: I spent several weeks debugging this particular problem :)
(See the PS for a somewhat lengthy description of what I remember)

I think that BKL removal at this point is probably not a good idea. Suse
had a BKL removal patch in their SLES8 kernel, and I spent a total of 4-5
weeks hunting down all sorts of bugs related to SMP races. And I don't
think we've seen all of these bugs yet. It might be better to postpone
this for the 2.7 branch.

> The code in unlink.c is only called from dentry_iput() or from
> sillyrename(). In either case, there is no possibility for SMP races.

Are you sure? nfs_dentry_iput explicitly does a lock_kernel before calling
nfs_complete_unlink. I think that's because dentry_iput explicitly drops
the dcache_lock before calling dentry->d_op->d_iput. I wouldn't remove
that BKL :)

And even if you leave that BKL and just remove the one in rpciod, you may
still have a race between rpciod and dput. One CPU may be adding entries
inside nfs_async_unlink (called from dentry_iput()), and at the same
time rpciod can run on another CPU, calling nfs_async_unlink_release,
removing stuff from the nfs_deletes list. The reference counting on
nfs_unlinkdata isn't atomic either.

Olaf

PS: Let me try to put the pieces back together... this was on a PPC
SMP system (so it can take a while for modified values to be
be visible on other CPUs). There were several mounts being stressed
in parallel.

xprt_request_init did a simple "req->rq_xid = xid++;" without any
additional spin locks protecting the xid variable (we have that
spin lock now).

The bug required at least 3 CPUs, and 3 different mounts.

        CPU0            CPU1            CPU2
       --------------------------------------------------------
   |    lock xprt1      lock xprt2
   |                                    lock xprt3
   |    read xid=0
   |                    read xid=0
   |    write xid=1
                                        read xid=1
 time++                                 write xid=2
                        write xid=1
   |                                    unlock xprt3
   |                                    lock xprt3
   |                                    read xid=1
   |                                    write xid=2
   v                                    unlock xprt3

Note that on PPC, modified data can take a long time to become
visible on other CPUs. Cache sync is only guaranteed by operations
such as spin_unlock etc.

Olaf
-- 
Olaf Kirch     |  Stop wasting entropy - start using predictable
okir@suse.de   |  tempfile names today!
---------------+ 

  parent reply	other threads:[~2004-02-10  9:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-07 14:44 [PATCH][RFC] use completions instead of sleep_on for rpciod Christoph Hellwig
2004-02-08 20:43 ` David S. Miller
2004-02-09 10:28 ` Olaf Kirch
2004-02-09 20:48   ` [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod trond.myklebust
2004-02-09 21:00     ` David S. Miller
2004-02-10  9:15     ` Olaf Kirch [this message]
2004-02-10 11:47       ` trond.myklebust
2004-02-10  1:52   ` [NFS] [PATCH][RFC] use completions instead of sleep_on for rpciod Greg Banks
2004-02-10  9:18     ` Olaf Kirch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040210091520.GD20932@suse.de \
    --to=okir@suse.de \
    --cc=hch@lst.de \
    --cc=netdev@oss.sgi.com \
    --cc=nfs@lists.sourceforge.net \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).