public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@digeo.com>,
	Marc-Christian Petersen <m.c.p@wolk-project.de>,
	t.baetzler@bringe.com, linux-kernel@vger.kernel.org,
	marcelo@conectiva.com.br
Subject: Re: xdr nfs highmem deadlock fix [Re: filesystem access slowing system to a crawl]
Date: Fri, 21 Feb 2003 20:46:40 +0100	[thread overview]
Message-ID: <20030221194640.GS10360@x30.school.suse.de> (raw)
In-Reply-To: <20030221124108.N1723@schatzie.adilger.int>

On Fri, Feb 21, 2003 at 12:41:09PM -0700, Andreas Dilger wrote:
> On Feb 21, 2003  10:46 +0100, Andrea Arcangeli wrote:
> > On Thu, Feb 20, 2003 at 04:15:36PM -0700, Andreas Dilger wrote:
> > > What we did was set up a "kmap reservation", which used an atomic_dec()
> > > + wait_event() to reschedule the task until it could get enough kmaps
> > > to satisfy the request without deadlocking (i.e. exceeding the kmap cap
> > > which we conservitavely set at 3/4 of all kmap space).
> > 
> > Your approch was fragile (every arch is free to give you just 1 kmap in
> > the pool and you still must not deadlock) and it's not capable of using
> > the whole kmap pool at the same time. the only robust and efficient way
> > to fix it is the kmap_nonblock IMHO
> 
> So (says the person who only ever uses i386 and ia64), does an arch exist
> which needs highmem/kmap, but only ever gives 1 kmap in the pool?
> 
> > > This works for us because we are the only consumer of huge amounts of kmaps
> > > on our systems, but it would be nice to have a generic interface to do that
> > > so that multiple apps don't deadlock against each other (e.g. NFS + Lustre).
> > 
> > This isn't the problem, if NFS wouldn't be broken it couldn't deadlock
> > against Lustre even with your design (assuming you don't fall in the two
> > problems mentioned above). But still your design is more fragile and
> > less scalable, especially for a generic implementation where you don't
> > know how many pages you'll reserve in mean, and you don't know how many
> > kmaps entries the architecture can provide to you. But of course with
> > kmap_nonblock you'll have to fallback submitting single pages if it
> > fails, it's a bit more difficult but it's more robust and optimized IMHO.
> 
> In our case, Lustre (well Portals really, the underlying network protocol)
> always knows in advance the number of pages that it will need to kmap
> because the client needs to tell the server in advance how much bulk data
> is going to send.  This is required for being able to do RDMA.  It might
> be possible to have the server do the transfer in multiple parts if
> kmap_nonblock() failed, but that is not how things are currently set up,
> which is why we block in advance until we know we can get enough pages.
> 
> This is very similar to ext3 journaling, which requests in advance the
> maximum number of journal blocks it might need, and blocks until it can
> get them all.
> 
> The only problem happens when other parts of the kernel start acquiring
> multiple kmaps without using the same reservation/accounting system as us.
> Each works fine in isolation, but in combination it fails.

no, if the other places are not buggy, it won't fail, regardless if they
use your mechanism or the kmap_nonblock. you don't have to use your
mechanism everywhere to make your mechanism work. For istance you will
be fine with the kmap_nonblock fix in combination with your current
code. Not sure why you think otherwise.

I understand it may be simpler to do the full reservation, in ext3 you
don't even risk anything because you know how large the pool is, but I
think for these cases the kmap_nonblock is superior because you have
obvious depdency on the architecture and you're not able to use at best
all the kmap pool (and here there's not a transaction that has to be
committed all at once so it's doable).  still in practice it will work
fine in combination of the other safe usages (like kmap_nonblock) if you
reserve few enough pages at time.

Andrea

  reply	other threads:[~2003-02-21 19:37 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-04  9:29 filesystem access slowing system to a crawl Thomas Bätzler
2003-02-05  9:03 ` Denis Vlasenko
2003-02-05  9:39 ` Andrew Morton
2003-02-19 16:42   ` Marc-Christian Petersen
2003-02-19 17:49     ` Andrea Arcangeli
2003-02-20 15:29       ` Marc-Christian Petersen
2003-02-20 18:35         ` Andrew Morton
2003-02-20 21:32           ` Marc-Christian Petersen
2003-02-20 21:41             ` Andrew Morton
2003-02-20 22:08               ` Andrea Arcangeli
2003-02-20 21:54           ` xdr nfs highmem deadlock fix [Re: filesystem access slowing system to a crawl] Andrea Arcangeli
2003-02-20 22:56             ` Trond Myklebust
2003-02-20 23:04               ` Jeff Garzik
2003-02-20 23:12                 ` Trond Myklebust
2003-02-21  9:41                   ` Andrea Arcangeli
2003-02-22  0:40                     ` David S. Miller
2003-02-23 15:22                       ` Andrea Arcangeli
2003-02-21  9:41                 ` Andrea Arcangeli
2003-02-21  9:37               ` Andrea Arcangeli
2003-02-21 20:52               ` Andrew Morton
2003-02-21 21:32                 ` Trond Myklebust
2003-02-20 23:15             ` Andreas Dilger
2003-02-21  9:46               ` Andrea Arcangeli
2003-02-21 19:41                 ` Andreas Dilger
2003-02-21 19:46                   ` Andrea Arcangeli [this message]
2003-02-26 23:17       ` filesystem access slowing system to a crawl Marc-Christian Petersen
2003-02-27  8:51         ` Marc-Christian Petersen
2003-02-20 19:30 ` William Stearns

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030221194640.GS10360@x30.school.suse.de \
    --to=andrea@suse.de \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.c.p@wolk-project.de \
    --cc=marcelo@conectiva.com.br \
    --cc=t.baetzler@bringe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox