All of lore.kernel.org
 help / color / mirror / Atom feed
From: kanoj@google.engr.sgi.com (Kanoj Sarcar)
To: Christoph Rohland <hans-christoph.rohland@sap.com>
Cc: Linus Torvalds <torvalds@transmeta.com>,
	linux-mm@kvack.org, linux-kernel@vger.rutgers.edu
Subject: Re: [RFC] [RFT] Shared /dev/zero mmaping feature
Date: Wed, 1 Mar 2000 12:09:11 -0800 (PST)	[thread overview]
Message-ID: <200003012009.MAA90800@google.engr.sgi.com> (raw)
In-Reply-To: <qwwn1oilbo4.fsf@sap.com> from "Christoph Rohland" at Mar 01, 2000 08:42:35 PM

> 
> Hi Kanoj,
> 
> kanoj@google.engr.sgi.com (Kanoj Sarcar) writes:
> 
> > > kanoj@google.engr.sgi.com (Kanoj Sarcar) writes:
> > > 
> > > > What you have sent is what I used as a first draft for the
> > > > implementation.  The good part of it is that it reduces code
> > > > duplication. The _really_ bad part is that it penalizes users in
> > > > terms of numbers of shared memory segments, max size of
> > > > /dev/zero mappings, and limitations imposed by
> > > > shm_ctlmax/shm_ctlall/shm_ctlmni etc. I do not think taking up a
> > > > shmid for each /dev/zero mapping is a good idea ...
> > > 
> > > We can tune all these parameters at runtime. This should not be a
> > > reason.
> > 
> > Show me the patch ... by the time you are done, you _probably_ would
> > have complicated the code more than the current /dev/zero tweaks.
> 
> It _is_ tunable in the current code.

Oh, you are talking about asking administrators to do this tuning, which
is unfair. I was talking about automatic in-kernel tuning whenever a new
/dev/zero segment is created/destroyed etc ...

> 
> > > > Furthermore, I did not want to change behavior of information
> > > > returned by ipc* and various procfs commands, as well as swapout
> > > > behavior, thus the creation of the zmap_list. I decided a few
> > > > lines of special case checking in a handful of places was a much
> > > > better option.
> > > 
> > > IMHO all this SYSV ipc stuff is a totally broken API and many
> > > agree with me. I do not care to clutter up the output of it a
> > > little bit for this feature.
> > 
> > In reality, /dev/zero should have nothing to do with
> > SYSV. Currently, its that way because I wanted to minimize code
> > duplication. Most of the *_core() routines can be taken into
> > ipc/shm_core.c, and together with util.c, will let /dev/zero be
> > decoupled from SYSV.
> 
> That would be the best case and thus your proposal is a workaround
> until the pagecache can handle it.
> 
> > > Nobody can know who is creating private IPC segments. So nobody
> > > should be irritated by some more segments displayed/used.
> > 
> > The problem is more with the limits, much less with the output ...
> 
> And they are tunable...

See above ...

>  
> > > In the contrary: I like the ability to restrict the usage of these
> > > segments with the ipc parameters. Keep in mind you can stack a lot
> > > of segments for a DOS attack. and all the segments will use the
> > > whole memory.
> > 
> > Not sure what you are talking about here ... /dev/zero mmaps are subject
> > to the same vm-resource checking as other mmaps, and this checking is
> > a little different for "real" shm creation.
> 
> Think about first mmaping an anonymous shared segment, touching all
> the pages, unmapping most of it. Then start over again. You end up
> with loads of pages used and not freed and no longer accessible.

True ... and this can be independently handled via a shmzero_unmap() 
type operation in shmzero_vm_ops (I never claimed the /dev/zero stuff 
is complete :-)) Even in the absence of that, vm_enough_memory() should
in theory be able to prevent deadlocks, with all its known caveats ...

> 
> > > > If the current /dev/zero stuff hampers any plans you have with shm code 
> > > > (eg page cachification), I would be willing to talk about it ...
> > > 
> > > It makes shm fs a lot more work. And the special handling slows down
> > > shm handling.
> > 
> > The shm handling slow down is minimal. Most of this is an extra
> > check in shm_nopage(), but that _only_ happens for /dev/zero
> > segments, not for "real" shm segments.
> 
> The check happens for _all_ segments and shm_nopage can be called very
> often on big machines under heavy load.

I was talking about the 

	if ((shp != shm_lock(shp->id)) && (is_shmzero == 0))

checks in shm_nopage.

> 
> > As to why shm fs is a lot more work, we can talk. Linus/Ingo did
> > bring this up, at a general level, we think that adding an extra
> > inode or other data structure in map_zero_setup() would be able to
> > handle this.
> > 
> > If a small amount of special case code is the problem, I would suggest 
> > keep the code as it is for now. Once you have the shm fs work done for 
> > "real" shm segments, I can look at how to handle /dev/zero segments.
> 
> shm fs is working. I will send a patch against 2.3.48 soon.

Great, we can see what makes sense for /dev/zero wrt shmfs ...

Kanoj

> 
> Greetings
> 		Christoph
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

  reply	other threads:[~2000-03-01 20:09 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-02-25 23:08 [RFC] [RFT] Shared /dev/zero mmaping feature Kanoj Sarcar
2000-02-26 16:38 ` Linus Torvalds
2000-02-26 21:47   ` Kanoj Sarcar
2000-02-29 10:54 ` Christoph Rohland
2000-02-29 18:30   ` Kanoj Sarcar
2000-03-01 12:08     ` Christoph Rohland
2000-03-01 17:34       ` Kanoj Sarcar
2000-03-01 17:55         ` Christoph Rohland
2000-03-01 18:18           ` Kanoj Sarcar
2000-03-01 19:42             ` Christoph Rohland
2000-03-01 20:09               ` Kanoj Sarcar [this message]
2000-03-06 22:43                 ` Stephen C. Tweedie
2000-03-06 23:01                   ` Kanoj Sarcar
2000-03-08 12:02                     ` Christoph Rohland
2000-03-08 17:51                       ` Kanoj Sarcar
2000-03-08 18:35                         ` Christoph Rohland
2000-03-08 18:48                           ` Linus Torvalds
2000-03-08 18:57                           ` Kanoj Sarcar
2000-03-09 18:15                             ` Christoph Rohland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200003012009.MAA90800@google.engr.sgi.com \
    --to=kanoj@google.engr.sgi.com \
    --cc=hans-christoph.rohland@sap.com \
    --cc=linux-kernel@vger.rutgers.edu \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.