public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Hugh Dickins <hugh@veritas.com>
Cc: linux-kernel@vger.kernel.org, Linus Torvalds <torvalds@transmeta.com>
Subject: Re: pte-highmem-5
Date: Thu, 24 Jan 2002 04:09:37 +0100	[thread overview]
Message-ID: <20020124040937.C20533@athlon.random> (raw)
In-Reply-To: <20020123003449.F1547@athlon.random> <Pine.LNX.4.21.0201230451540.1368-100000@localhost.localdomain>
In-Reply-To: <Pine.LNX.4.21.0201230451540.1368-100000@localhost.localdomain>; from hugh@veritas.com on Wed, Jan 23, 2002 at 05:38:47AM +0000

On Wed, Jan 23, 2002 at 05:38:47AM +0000, Hugh Dickins wrote:
> On Wed, 23 Jan 2002, Andrea Arcangeli wrote:
> > 
> > page->virtual will remain for all the DEFAULT serie, to avoid breaking
> > the regular kmap pagecache users. But to keep a page->virtual for each
> > serie we'd need a page->virtual[KM_NR_SERIES] array, which is very
> > costly in terms of ram, ....
> 
> Agreed, not an option we'd want to use.
> 
> > correct. I'm convinced the mixture problem invalidates completly the
> > deadlock avoidance using the series, so the only way to fix the
> > deadlocks is to avoid the mixture between the series.
> 
> First half agreed, second half not sure.  Maybe no series at all.
> Could it be worked with just the one serie, and count in task_struct
> of kmaps "raised" by task, only task with count >=1 allowed to take
> the last N kmaps?  I suspect something like that would work if not
> scheduling otherwise, but no good held across allocating another
> resource e.g. memory in fault.  Probably rubbish, not thought out

I can imagine an alternate design to avoid the deadlock without the
series (doesn't sound exactly what you had in mind with count >= 1, but
it's on the same lines about using the task_struct to keep some per-task
information about the kmaps), it has some more overhead though, but it
has the very nice goodness of also invalidating the ordering
requirements.

The only new requirement would become the max number of kmap run by a single
task in sequence (which is not a new requirement, we had the very
requirement before too which was the NR_KMAP_SERIES).

The first kmap run by a task will try to reserve the MAX_NR_TASK_KMAPS
slots (we can keep track if it's the first kmap with some task_structure
field, on the lines of what you suggested), if it fails to reserve all
of them, it will release the ones it tried to allocate in the meantime
and it will go to sleep waiting for the resourced to be released by
somebody else. If it succeed it will use the first reserved entry to
succeed the kmap. The later kmaps will use the other two reserved kmap
slots preallocated at the first kmap. If the kernel tries to allocate
one more kmap entry over MAX_NR_TASK_KMAPS we can BUG().

In short this makes sure if a kmap has to sleep, it will always be the
first one. This ensures the deadlock avoidance.

This would solve not only the deadlock but it also drops the ordering
requirements, plus it will solve the mixture thing as well
(optimizations are possible, if the first kmap maps a page just mapped
we'd need to reserve only MAX_NR_TASK_KMAPS-1 entries, simply doing the
reservation + first kmap atomically, which will be natural). We can
define MAX_NR_TASK_KMAPS (suggestions for a better define name are
welcome) to 3, one for the kmap for the pagecache, one for the first
pagetable, and one for the second pagetable map (mremap). 

Comments? Now I tend to believe this way is simpler after all, mostly
because it doesn't create special cases with special series, and it
makes life simpler for the kmap users, in short it reduces the
anti-deadlock requirement dramatically.

> fully, just mentioned in case it gives you an idea.  Another such
> half-baked idea I've played with a little is using one or two ptes
> of the user address space (e.g. at top of stack) as per-task kmaps.
> 
> > The ordering thing is really simple I think. There are very few places
> > where we kmap and kmap_pagetable at the same time. And I don't see how
> > can could ever kmap before kmap_pagetable. so that part looks fine to me.
> 
> Nice if that's so, but I think you're sadly deluded ;-)
> Imagine sys_read going to file_read_actor (kmap, __copy_to_user, kunmap),
> imagine the __copy_to_user faulting (needs to kmap the pagetable),

I said the reverse, but this is the right path I meant of course. I
didn't see the other way happening anywhere, mostly because it would be
a bug if we would ever kmap during pagefaults because we could deadlock
just now in such a case.

> imagine the place faulted is hole in ???fs file (kmap in
> clear_highpage),

we can't call clear_highpage during page faults (hey, if we would ever
do, that would be just a deadlock condition right now in 2.4.17 too,
without pte-highmem applied :).

> imagine low on memory schedules all over.

schedules really shouldn't matter at all here.

thanks again for the helpful feedback,

Andrea

  parent reply	other threads:[~2002-01-24  3:09 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-01-16 17:58 pte-highmem-5 Andrea Arcangeli
2002-01-16 18:04 ` pte-highmem-5 Linus Torvalds
2002-01-16 18:35   ` pte-highmem-5 Andrea Arcangeli
2002-01-16 18:19 ` pte-highmem-5 Linus Torvalds
2002-01-16 18:48   ` pte-highmem-5 Andrea Arcangeli
2002-01-16 19:11     ` pte-highmem-5 Linus Torvalds
2002-01-16 19:30       ` pte-highmem-5 Andrea Arcangeli
2002-01-16 19:30   ` pte-highmem-5 Benjamin LaHaise
2002-01-16 19:50     ` pte-highmem-5 Andrea Arcangeli
2002-01-16 19:34   ` pte-highmem-5 Rik van Riel
2002-01-17  8:31 ` pte-highmem-5 Christoph Rohland
2002-01-17 12:14   ` pte-highmem-5 Hugh Dickins
2002-01-17 15:45     ` pte-highmem-5 Andrea Arcangeli
2002-01-17 16:08       ` pte-highmem-5 Hugh Dickins
2002-01-17 15:30   ` pte-highmem-5 Andrea Arcangeli
2002-01-17 16:11     ` pte-highmem-5 Christoph Rohland
2002-01-17 16:37       ` pte-highmem-5 Andrea Arcangeli
2002-01-17 17:31       ` pte-highmem-5 Rik van Riel
2002-01-17 17:57 ` pte-highmem-5 Hugh Dickins
2002-01-17 18:09   ` pte-highmem-5 Andrea Arcangeli
2002-01-17 19:02     ` pte-highmem-5 Hugh Dickins
2002-01-18  2:38       ` pte-highmem-5 Andrea Arcangeli
2002-01-19 20:56         ` pte-highmem-5 Hugh Dickins
2002-01-21 18:15           ` pte-highmem-5 Andrea Arcangeli
2002-01-22 18:01             ` pte-highmem-5 Hugh Dickins
2002-01-22 19:10               ` pte-highmem-5 Andrea Arcangeli
2002-01-22 21:41                 ` pte-highmem-5 Hugh Dickins
2002-01-22 23:34                   ` pte-highmem-5 Andrea Arcangeli
2002-01-23  0:56                     ` pte-highmem-5 Paul Mackerras
2002-01-23  1:27                       ` pte-highmem-5 Andrea Arcangeli
2002-01-23  5:38                     ` pte-highmem-5 Hugh Dickins
2002-01-23 16:29                       ` pte-highmem-5 Daniel Phillips
2002-01-23 20:23                         ` pte-highmem-5 Hugh Dickins
2002-01-24  3:09                       ` Andrea Arcangeli [this message]
2002-01-24 15:35                         ` pte-highmem-5 Hugh Dickins
2002-01-22 19:29             ` pre4aa1 contig kmaps patch Hugh Dickins
2002-01-23 13:31               ` rwhron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020124040937.C20533@athlon.random \
    --to=andrea@suse.de \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox