All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@digeo.com>,
	Marc-Christian Petersen <m.c.p@wolk-project.de>,
	t.baetzler@bringe.com, linux-kernel@vger.kernel.org,
	marcelo@conectiva.com.br
Subject: Re: xdr nfs highmem deadlock fix [Re: filesystem access slowing system to a crawl]
Date: Fri, 21 Feb 2003 20:46:40 +0100	[thread overview]
Message-ID: <20030221194640.GS10360@x30.school.suse.de> (raw)
In-Reply-To: <20030221124108.N1723@schatzie.adilger.int>

On Fri, Feb 21, 2003 at 12:41:09PM -0700, Andreas Dilger wrote:
> On Feb 21, 2003  10:46 +0100, Andrea Arcangeli wrote:
> > On Thu, Feb 20, 2003 at 04:15:36PM -0700, Andreas Dilger wrote:
> > > What we did was set up a "kmap reservation", which used an atomic_dec()
> > > + wait_event() to reschedule the task until it could get enough kmaps
> > > to satisfy the request without deadlocking (i.e. exceeding the kmap cap
> > > which we conservitavely set at 3/4 of all kmap space).
> > 
> > Your approch was fragile (every arch is free to give you just 1 kmap in
> > the pool and you still must not deadlock) and it's not capable of using
> > the whole kmap pool at the same time. the only robust and efficient way
> > to fix it is the kmap_nonblock IMHO
> 
> So (says the person who only ever uses i386 and ia64), does an arch exist
> which needs highmem/kmap, but only ever gives 1 kmap in the pool?
> 
> > > This works for us because we are the only consumer of huge amounts of kmaps
> > > on our systems, but it would be nice to have a generic interface to do that
> > > so that multiple apps don't deadlock against each other (e.g. NFS + Lustre).
> > 
> > This isn't the problem, if NFS wouldn't be broken it couldn't deadlock
> > against Lustre even with your design (assuming you don't fall in the two
> > problems mentioned above). But still your design is more fragile and
> > less scalable, especially for a generic implementation where you don't
> > know how many pages you'll reserve in mean, and you don't know how many
> > kmaps entries the architecture can provide to you. But of course with
> > kmap_nonblock you'll have to fallback submitting single pages if it
> > fails, it's a bit more difficult but it's more robust and optimized IMHO.
> 
> In our case, Lustre (well Portals really, the underlying network protocol)
> always knows in advance the number of pages that it will need to kmap
> because the client needs to tell the server in advance how much bulk data
> is going to send.  This is required for being able to do RDMA.  It might
> be possible to have the server do the transfer in multiple parts if
> kmap_nonblock() failed, but that is not how things are currently set up,
> which is why we block in advance until we know we can get enough pages.
> 
> This is very similar to ext3 journaling, which requests in advance the
> maximum number of journal blocks it might need, and blocks until it can
> get them all.
> 
> The only problem happens when other parts of the kernel start acquiring
> multiple kmaps without using the same reservation/accounting system as us.
> Each works fine in isolation, but in combination it fails.

no, if the other places are not buggy, it won't fail, regardless if they
use your mechanism or the kmap_nonblock. you don't have to use your
mechanism everywhere to make your mechanism work. For istance you will
be fine with the kmap_nonblock fix in combination with your current
code. Not sure why you think otherwise.

I understand it may be simpler to do the full reservation, in ext3 you
don't even risk anything because you know how large the pool is, but I
think for these cases the kmap_nonblock is superior because you have
obvious depdency on the architecture and you're not able to use at best
all the kmap pool (and here there's not a transaction that has to be
committed all at once so it's doable).  still in practice it will work
fine in combination of the other safe usages (like kmap_nonblock) if you
reserve few enough pages at time.

Andrea

  reply	other threads:[~2003-02-21 19:37 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-04  9:29 filesystem access slowing system to a crawl Thomas Bätzler
2003-02-05  9:03 ` Denis Vlasenko
2003-02-05  9:39 ` Andrew Morton
2003-02-19 16:42   ` Marc-Christian Petersen
2003-02-19 17:49     ` Andrea Arcangeli
2003-02-20 15:29       ` Marc-Christian Petersen
2003-02-20 18:35         ` Andrew Morton
2003-02-20 21:32           ` Marc-Christian Petersen
2003-02-20 21:41             ` Andrew Morton
2003-02-20 22:08               ` Andrea Arcangeli
2003-02-20 21:54           ` xdr nfs highmem deadlock fix [Re: filesystem access slowing system to a crawl] Andrea Arcangeli
2003-02-20 22:56             ` Trond Myklebust
2003-02-20 23:04               ` Jeff Garzik
2003-02-20 23:12                 ` Trond Myklebust
2003-02-21  9:41                   ` Andrea Arcangeli
2003-02-22  0:40                     ` David S. Miller
2003-02-23 15:22                       ` Andrea Arcangeli
2003-02-21  9:41                 ` Andrea Arcangeli
2003-02-21  9:37               ` Andrea Arcangeli
2003-02-21 20:52               ` Andrew Morton
2003-02-21 21:32                 ` Trond Myklebust
2003-02-20 23:15             ` Andreas Dilger
2003-02-21  9:46               ` Andrea Arcangeli
2003-02-21 19:41                 ` Andreas Dilger
2003-02-21 19:46                   ` Andrea Arcangeli [this message]
2003-02-26 23:17       ` filesystem access slowing system to a crawl Marc-Christian Petersen
2003-02-27  8:51         ` Marc-Christian Petersen
2003-02-20 19:30 ` William Stearns

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030221194640.GS10360@x30.school.suse.de \
    --to=andrea@suse.de \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.c.p@wolk-project.de \
    --cc=marcelo@conectiva.com.br \
    --cc=t.baetzler@bringe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.