All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olaf Kirch <okir@suse.de>
To: trond.myklebust@fys.uio.no
Cc: Christoph Hellwig <hch@lst.de>,
	netdev@oss.sgi.com, nfs@lists.sourceforge.net
Subject: Re: [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod
Date: Tue, 10 Feb 2004 10:15:20 +0100	[thread overview]
Message-ID: <20040210091520.GD20932@suse.de> (raw)
In-Reply-To: <35059.80.213.3.64.1076359739.squirrel@webmail.uio.no>

On Mon, Feb 09, 2004 at 09:48:59PM +0100, Trond Myklebust wrote:
> > So if you remove the sunrpc BKL be prepared for lots and lots
> > of bug reports :)
> 
> Ahem....
> 
> xprt_alloc_xid is quite safe. It's always called under xprt->xprt_lock.

It wasn't in 2.4.19 or whenever I ran into this. And a per-transport
lock isn't enough to protect a global variable shared by all transports.
Trust me: I spent several weeks debugging this particular problem :)
(See the PS for a somewhat lengthy description of what I remember)

I think that BKL removal at this point is probably not a good idea. Suse
had a BKL removal patch in their SLES8 kernel, and I spent a total of 4-5
weeks hunting down all sorts of bugs related to SMP races. And I don't
think we've seen all of these bugs yet. It might be better to postpone
this for the 2.7 branch.

> The code in unlink.c is only called from dentry_iput() or from
> sillyrename(). In either case, there is no possibility for SMP races.

Are you sure? nfs_dentry_iput explicitly does a lock_kernel before calling
nfs_complete_unlink. I think that's because dentry_iput explicitly drops
the dcache_lock before calling dentry->d_op->d_iput. I wouldn't remove
that BKL :)

And even if you leave that BKL and just remove the one in rpciod, you may
still have a race between rpciod and dput. One CPU may be adding entries
inside nfs_async_unlink (called from dentry_iput()), and at the same
time rpciod can run on another CPU, calling nfs_async_unlink_release,
removing stuff from the nfs_deletes list. The reference counting on
nfs_unlinkdata isn't atomic either.

Olaf

PS: Let me try to put the pieces back together... this was on a PPC
SMP system (so it can take a while for modified values to be
be visible on other CPUs). There were several mounts being stressed
in parallel.

xprt_request_init did a simple "req->rq_xid = xid++;" without any
additional spin locks protecting the xid variable (we have that
spin lock now).

The bug required at least 3 CPUs, and 3 different mounts.

        CPU0            CPU1            CPU2
       --------------------------------------------------------
   |    lock xprt1      lock xprt2
   |                                    lock xprt3
   |    read xid=0
   |                    read xid=0
   |    write xid=1
                                        read xid=1
 time++                                 write xid=2
                        write xid=1
   |                                    unlock xprt3
   |                                    lock xprt3
   |                                    read xid=1
   |                                    write xid=2
   v                                    unlock xprt3

Note that on PPC, modified data can take a long time to become
visible on other CPUs. Cache sync is only guaranteed by operations
such as spin_unlock etc.

Olaf
-- 
Olaf Kirch     |  Stop wasting entropy - start using predictable
okir@suse.de   |  tempfile names today!
---------------+ 

  parent reply	other threads:[~2004-02-10  9:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-07 14:44 [PATCH][RFC] use completions instead of sleep_on for rpciod Christoph Hellwig
2004-02-08 20:43 ` David S. Miller
2004-02-09 10:28 ` Olaf Kirch
2004-02-09 11:38   ` [PATCH][RFC] sleep_on fixes for lockd Christoph Hellwig
2004-02-09 20:48   ` [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod trond.myklebust
2004-02-09 20:48     ` trond.myklebust
2004-02-09 21:00     ` David S. Miller
2004-02-10  9:15     ` Olaf Kirch [this message]
2004-02-10 11:47       ` trond.myklebust
2004-02-10 11:47         ` trond.myklebust
2004-02-10  1:52   ` [NFS] [PATCH][RFC] use completions instead of sleep_on for rpciod Greg Banks
2004-02-10  9:18     ` Olaf Kirch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040210091520.GD20932@suse.de \
    --to=okir@suse.de \
    --cc=hch@lst.de \
    --cc=netdev@oss.sgi.com \
    --cc=nfs@lists.sourceforge.net \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.