public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@digeo.com>,
	Marc-Christian Petersen <m.c.p@wolk-project.de>,
	t.baetzler@bringe.com, linux-kernel@vger.kernel.org,
	marcelo@conectiva.com.br
Subject: Re: xdr nfs highmem deadlock fix [Re: filesystem access slowing system to a crawl]
Date: Fri, 21 Feb 2003 10:46:27 +0100	[thread overview]
Message-ID: <20030221094627.GI31480@x30.school.suse.de> (raw)
In-Reply-To: <20030220161536.F1723@schatzie.adilger.int>

On Thu, Feb 20, 2003 at 04:15:36PM -0700, Andreas Dilger wrote:
> On Feb 20, 2003  22:54 +0100, Andrea Arcangeli wrote:
> > Explanation is very simple: you _can't_ kmap two times in the context of
> > a single task (especially if more than one task can run the same code at
> > the same time). I don't yet have the confirmation that this fixes the
> > deadlock though (it takes days to reproduce so it will take weeks to
> > confirm), but I can't see anything else wrong at the moment, and this
> > remains a genuine highmem deadlock that has to be fixed.  The fix is
> > optimal, no change unless you run out of kmaps and in turn you can
> > deadlock, i.e. all the light workloads won't be affected at all.
> 
> We had a similar problem in Lustre, where we have to kmap multiple pages
> at once and hold them over a network RPC (which is doing zero-copy DMA
> into multiple pages at once), and there is possibly a very heavy load
> of kmaps because the client and the server can be on the same system.
> 
> What we did was set up a "kmap reservation", which used an atomic_dec()
> + wait_event() to reschedule the task until it could get enough kmaps
> to satisfy the request without deadlocking (i.e. exceeding the kmap cap
> which we conservitavely set at 3/4 of all kmap space).

Your approch was fragile (every arch is free to give you just 1 kmap in
the pool and you still must not deadlock) and it's not capable of using
the whole kmap pool at the same time. the only robust and efficient way
to fix it is the kmap_nonblock IMHO

> A single "server" task could exceed the kmap cap by enough to satisfy the
> maximum possible request size, so that a single system with both clients
> and servers can always make forward progress even in the face of clients
> trying to kmap more than the total amount of kmap space.
> 
> This works for us because we are the only consumer of huge amounts of kmaps
> on our systems, but it would be nice to have a generic interface to do that
> so that multiple apps don't deadlock against each other (e.g. NFS + Lustre).

This isn't the problem, if NFS wouldn't be broken it couldn't deadlock
against Lustre even with your design (assuming you don't fall in the two
problems mentioned above). But still your design is more fragile and
less scalable, especially for a generic implementation where you don't
know how many pages you'll reserve in mean, and you don't know how many
kmaps entries the architecture can provide to you. But of course with
kmap_nonblock you'll have to fallback submitting single pages if it
fails, it's a bit more difficult but it's more robust and optimized IMHO.

Andrea

  reply	other threads:[~2003-02-21  9:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-04  9:29 filesystem access slowing system to a crawl Thomas Bätzler
2003-02-05  9:03 ` Denis Vlasenko
2003-02-05  9:39 ` Andrew Morton
2003-02-19 16:42   ` Marc-Christian Petersen
2003-02-19 17:49     ` Andrea Arcangeli
2003-02-20 15:29       ` Marc-Christian Petersen
2003-02-20 18:35         ` Andrew Morton
2003-02-20 21:32           ` Marc-Christian Petersen
2003-02-20 21:41             ` Andrew Morton
2003-02-20 22:08               ` Andrea Arcangeli
2003-02-20 21:54           ` xdr nfs highmem deadlock fix [Re: filesystem access slowing system to a crawl] Andrea Arcangeli
2003-02-20 22:56             ` Trond Myklebust
2003-02-20 23:04               ` Jeff Garzik
2003-02-20 23:12                 ` Trond Myklebust
2003-02-21  9:41                   ` Andrea Arcangeli
2003-02-22  0:40                     ` David S. Miller
2003-02-23 15:22                       ` Andrea Arcangeli
2003-02-21  9:41                 ` Andrea Arcangeli
2003-02-21  9:37               ` Andrea Arcangeli
2003-02-21 20:52               ` Andrew Morton
2003-02-21 21:32                 ` Trond Myklebust
2003-02-20 23:15             ` Andreas Dilger
2003-02-21  9:46               ` Andrea Arcangeli [this message]
2003-02-21 19:41                 ` Andreas Dilger
2003-02-21 19:46                   ` Andrea Arcangeli
2003-02-26 23:17       ` filesystem access slowing system to a crawl Marc-Christian Petersen
2003-02-27  8:51         ` Marc-Christian Petersen
2003-02-20 19:30 ` William Stearns

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030221094627.GI31480@x30.school.suse.de \
    --to=andrea@suse.de \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.c.p@wolk-project.de \
    --cc=marcelo@conectiva.com.br \
    --cc=t.baetzler@bringe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox