From: Ingo Molnar <mingo@elte.hu>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Salman Qazi <sqazi@google.com>,
davem@davemloft.net, linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()
Date: Wed, 25 Feb 2009 09:29:58 +0100 [thread overview]
Message-ID: <20090225082958.GA9322@elte.hu> (raw)
In-Reply-To: <200902251909.42928.nickpiggin@yahoo.com.au>
* Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> On Wednesday 25 February 2009 18:25:03 Ingo Molnar wrote:
> > * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > On Wednesday 25 February 2009 02:52:34 Linus Torvalds wrote:
> > > > On Tue, 24 Feb 2009, Nick Piggin wrote:
> > > > > > it does make some kind of sense to try to avoid the noncached
> > > > > > versions for small writes - because small writes tend to be for
> > > > > > temp-files.
> > > > >
> > > > > I don't see the significance of a temp file. If the pagecache is
> > > > > truncated, then the cachelines remain dirty and so you can't avoid an
> > > > > eventual store back to RAM?
> > > >
> > > > No, because many small files end up being used as scratch-pads (think
> > > > shell script sequences etc), and get read back immediately again. Doing
> > > > non-temporal stores might just be bad simply because trying to play
> > > > games with caching may simply do the wrong thing.
> > >
> > > OK, for that angle it could make sense. Although as has been
> > > noted earlier, at this point of the copy, we don't have much
> > > idea about the length of the write passed into the vfs (and
> > > obviously will never know the higher level intention of
> > > userspace).
> > >
> > > I don't know if we can say a 1 page write is nontemporal, but
> > > anything smaller is temporal. And having these kinds of
> > > behavioural cutoffs I would worry will create strange
> > > performance boundary conditions in code.
> >
> > I agree in principle.
> >
> > The main artifact would be the unaligned edges around a bigger
> > write. In particular the tail portion of a big write will be
> > cached.
> >
> > For example if we write a 100,000 bytes file, we'll copy the
> > first 24 pages (98304 bytes) uncached, while the final 1696
> > bytes cached. But there is nothing that necessiates this
> > assymetry.
> >
> > For that reason it would be nice to pass down the total size of
> > the write to the assembly code. These are single-usage-site APIs
> > anyway so it should be easy.
> >
> > I.e. something like the patch below (untested). I've extended
> > the copy APIs to also pass down a 'total' size as well, and
> > check for that instead of the chunk 'len'. Note that it's
> > checked in the inlined portion so all the "total == len" special
> > cases will optimize out the new parameter completely.
>
> This does give more information, not exactly all (it could be
> a big total write with many smaller writes especially if the
> source is generated on the fly and will be in cache, or if the
> source is not in cache, then we would also want to do
> nontemporal loads from there etc etc).
A big total write with many smaller writes should already be
handled in this codepath, as long as it's properly iovec-ed.
We can do little about user-space doing stupid things as doing a
big write as a series of many smaller-than-4K writes.
> > This should express the 'large vs. small' question
> > adequately, with no artifacts. Agreed?
>
> Well, no artifacts, but it still has a boundary condition
> where one might cut from temporal to nontemporal behaviour.
>
> If it is a *really* important issue, maybe some flags should
> be incorporated into an extended API? It not, then I wonder if
> it is important enough to add such complexity for?
the iozone numbers in the old commits certainly are convincing.
The new numbers from Salman are convincing too - and his fix
should preserve the iozone [large-write] numbers as well.
I.e. i think this is a reasonable compromise, it should handle
all the sane cases.
Ingo
next prev parent reply other threads:[~2009-02-25 8:30 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-24 2:03 Performance regression in write() syscall Salman Qazi
2009-02-24 4:10 ` Nick Piggin
2009-02-24 4:28 ` Linus Torvalds
2009-02-24 9:02 ` Nick Piggin
2009-02-24 15:52 ` Linus Torvalds
2009-02-24 16:24 ` Andi Kleen
2009-02-24 16:51 ` Ingo Molnar
2009-02-25 3:23 ` Nick Piggin
2009-02-25 7:25 ` [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache() Ingo Molnar
2009-02-25 8:09 ` Nick Piggin
2009-02-25 8:29 ` Ingo Molnar [this message]
2009-02-25 8:59 ` Nick Piggin
2009-02-25 12:01 ` Ingo Molnar
2009-02-25 16:04 ` Linus Torvalds
2009-02-25 16:29 ` Ingo Molnar
2009-02-27 12:05 ` Nick Piggin
2009-02-28 8:29 ` Ingo Molnar
2009-02-28 11:49 ` Nick Piggin
2009-02-28 12:58 ` Ingo Molnar
2009-02-28 17:16 ` Linus Torvalds
2009-02-28 17:24 ` Arjan van de Ven
2009-02-28 17:42 ` Linus Torvalds
2009-02-28 17:53 ` Arjan van de Ven
2009-02-28 18:05 ` Andi Kleen
2009-02-28 18:27 ` Ingo Molnar
2009-02-28 18:39 ` Arjan van de Ven
2009-03-02 10:39 ` [PATCH] x86, mm: dont use non-temporal stores in pagecache accesses Ingo Molnar
2009-02-28 18:52 ` [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache() Linus Torvalds
2009-03-01 14:19 ` Nick Piggin
2009-03-01 0:06 ` David Miller
2009-03-01 0:40 ` Andi Kleen
2009-03-01 0:28 ` H. Peter Anvin
2009-03-01 0:38 ` Arjan van de Ven
2009-03-01 1:48 ` Andi Kleen
2009-03-01 1:38 ` Arjan van de Ven
2009-03-01 1:40 ` H. Peter Anvin
2009-03-01 14:06 ` Nick Piggin
2009-03-02 4:46 ` H. Peter Anvin
2009-03-02 6:18 ` Nick Piggin
2009-03-02 21:16 ` Linus Torvalds
2009-03-02 21:25 ` Ingo Molnar
2009-03-03 4:30 ` Nick Piggin
2009-03-03 4:20 ` Nick Piggin
2009-03-03 9:02 ` Ingo Molnar
2009-03-04 3:37 ` Nick Piggin
2009-03-01 2:07 ` Andi Kleen
2009-02-24 5:43 ` Performance regression in write() syscall Salman Qazi
2009-02-24 10:09 ` Andi Kleen
2009-02-24 16:13 ` Ingo Molnar
2009-02-24 16:51 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090225082958.GA9322@elte.hu \
--to=mingo@elte.hu \
--cc=andi@firstfloor.org \
--cc=davem@davemloft.net \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nickpiggin@yahoo.com.au \
--cc=sqazi@google.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.