From: Ingo Molnar <mingo@elte.hu>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Salman Qazi <sqazi@google.com>,
davem@davemloft.net, linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()
Date: Wed, 25 Feb 2009 09:29:58 +0100 [thread overview]
Message-ID: <20090225082958.GA9322@elte.hu> (raw)
In-Reply-To: <200902251909.42928.nickpiggin@yahoo.com.au>
* Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> On Wednesday 25 February 2009 18:25:03 Ingo Molnar wrote:
> > * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > On Wednesday 25 February 2009 02:52:34 Linus Torvalds wrote:
> > > > On Tue, 24 Feb 2009, Nick Piggin wrote:
> > > > > > it does make some kind of sense to try to avoid the noncached
> > > > > > versions for small writes - because small writes tend to be for
> > > > > > temp-files.
> > > > >
> > > > > I don't see the significance of a temp file. If the pagecache is
> > > > > truncated, then the cachelines remain dirty and so you can't avoid an
> > > > > eventual store back to RAM?
> > > >
> > > > No, because many small files end up being used as scratch-pads (think
> > > > shell script sequences etc), and get read back immediately again. Doing
> > > > non-temporal stores might just be bad simply because trying to play
> > > > games with caching may simply do the wrong thing.
> > >
> > > OK, for that angle it could make sense. Although as has been
> > > noted earlier, at this point of the copy, we don't have much
> > > idea about the length of the write passed into the vfs (and
> > > obviously will never know the higher level intention of
> > > userspace).
> > >
> > > I don't know if we can say a 1 page write is nontemporal, but
> > > anything smaller is temporal. And having these kinds of
> > > behavioural cutoffs I would worry will create strange
> > > performance boundary conditions in code.
> >
> > I agree in principle.
> >
> > The main artifact would be the unaligned edges around a bigger
> > write. In particular the tail portion of a big write will be
> > cached.
> >
> > For example if we write a 100,000 bytes file, we'll copy the
> > first 24 pages (98304 bytes) uncached, while the final 1696
> > bytes cached. But there is nothing that necessiates this
> > assymetry.
> >
> > For that reason it would be nice to pass down the total size of
> > the write to the assembly code. These are single-usage-site APIs
> > anyway so it should be easy.
> >
> > I.e. something like the patch below (untested). I've extended
> > the copy APIs to also pass down a 'total' size as well, and
> > check for that instead of the chunk 'len'. Note that it's
> > checked in the inlined portion so all the "total == len" special
> > cases will optimize out the new parameter completely.
>
> This does give more information, not exactly all (it could be
> a big total write with many smaller writes especially if the
> source is generated on the fly and will be in cache, or if the
> source is not in cache, then we would also want to do
> nontemporal loads from there etc etc).
A big total write with many smaller writes should already be
handled in this codepath, as long as it's properly iovec-ed.
We can do little about user-space doing stupid things as doing a
big write as a series of many smaller-than-4K writes.
> > This should express the 'large vs. small' question
> > adequately, with no artifacts. Agreed?
>
> Well, no artifacts, but it still has a boundary condition
> where one might cut from temporal to nontemporal behaviour.
>
> If it is a *really* important issue, maybe some flags should
> be incorporated into an extended API? It not, then I wonder if
> it is important enough to add such complexity for?
the iozone numbers in the old commits certainly are convincing.
The new numbers from Salman are convincing too - and his fix
should preserve the iozone [large-write] numbers as well.
I.e. i think this is a reasonable compromise, it should handle
all the sane cases.
Ingo
next prev parent reply other threads:[~2009-02-25 8:30 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-24 2:03 Performance regression in write() syscall Salman Qazi
2009-02-24 4:10 ` Nick Piggin
2009-02-24 4:28 ` Linus Torvalds
2009-02-24 9:02 ` Nick Piggin
2009-02-24 15:52 ` Linus Torvalds
2009-02-24 16:24 ` Andi Kleen
2009-02-24 16:51 ` Ingo Molnar
2009-02-25 3:23 ` Nick Piggin
2009-02-25 7:25 ` [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache() Ingo Molnar
2009-02-25 8:09 ` Nick Piggin
2009-02-25 8:29 ` Ingo Molnar [this message]
2009-02-25 8:59 ` Nick Piggin
2009-02-25 12:01 ` Ingo Molnar
2009-02-25 16:04 ` Linus Torvalds
2009-02-25 16:29 ` Ingo Molnar
2009-02-27 12:05 ` Nick Piggin
2009-02-28 8:29 ` Ingo Molnar
2009-02-28 11:49 ` Nick Piggin
2009-02-28 12:58 ` Ingo Molnar
2009-02-28 17:16 ` Linus Torvalds
2009-02-28 17:24 ` Arjan van de Ven
2009-02-28 17:42 ` Linus Torvalds
2009-02-28 17:53 ` Arjan van de Ven
2009-02-28 18:05 ` Andi Kleen
2009-02-28 18:27 ` Ingo Molnar
2009-02-28 18:39 ` Arjan van de Ven
2009-03-02 10:39 ` [PATCH] x86, mm: dont use non-temporal stores in pagecache accesses Ingo Molnar
2009-02-28 18:52 ` [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache() Linus Torvalds
2009-03-01 14:19 ` Nick Piggin
2009-03-01 0:06 ` David Miller
2009-03-01 0:40 ` Andi Kleen
2009-03-01 0:28 ` H. Peter Anvin
2009-03-01 0:38 ` Arjan van de Ven
2009-03-01 1:48 ` Andi Kleen
2009-03-01 1:38 ` Arjan van de Ven
2009-03-01 1:40 ` H. Peter Anvin
2009-03-01 14:06 ` Nick Piggin
2009-03-02 4:46 ` H. Peter Anvin
2009-03-02 6:18 ` Nick Piggin
2009-03-02 21:16 ` Linus Torvalds
2009-03-02 21:25 ` Ingo Molnar
2009-03-03 4:30 ` Nick Piggin
2009-03-03 4:20 ` Nick Piggin
2009-03-03 9:02 ` Ingo Molnar
2009-03-04 3:37 ` Nick Piggin
2009-03-01 2:07 ` Andi Kleen
2009-02-24 5:43 ` Performance regression in write() syscall Salman Qazi
2009-02-24 10:09 ` Andi Kleen
2009-02-24 16:13 ` Ingo Molnar
2009-02-24 16:51 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090225082958.GA9322@elte.hu \
--to=mingo@elte.hu \
--cc=andi@firstfloor.org \
--cc=davem@davemloft.net \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nickpiggin@yahoo.com.au \
--cc=sqazi@google.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox