From: Ingo Molnar <mingo@elte.hu>
To: Andi Kleen <ak@suse.de>
Cc: linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Venki Pallipadi <venkatesh.pallipadi@intel.com>,
suresh.b.siddha@intel.com, Arjan van de Ven <arjan@infradead.org>,
Dave Jones <davej@redhat.com>
Subject: Re: CPA patchset
Date: Fri, 11 Jan 2008 08:19:36 +0100 [thread overview]
Message-ID: <20080111071936.GA16175@elte.hu> (raw)
In-Reply-To: <20080110111248.GR25945@bingen.suse.de>
* Andi Kleen <ak@suse.de> wrote:
> > > > but that's not too smart: why dont they use WB plus cflush
> > > > instead?
> > >
> > > Because they need to access it WC for performance.
> >
> > I think you have it fundamentally backwards: the best for
> > performance is WB + cflush. What would WC offer for performance that
> > cflush cannot do?
>
> Cached requires the cache line to be read first before you can write
> it.
nonsense, and you should know it. It is perfectly possible to construct
fully written cachelines, without reading the cacheline first. MOVDQ is
SSE1 so on basically in every CPU today - and it is 16 byte aligned and
can generate full cacheline writes, _without_ filling in the cacheline
first. Bulk ops (string ops, etc.) will do full cacheline writes too,
without filling in the cacheline. Especially with high performance 3D
ops we do _NOT_ need any funky reads from anywhere because 3D software
can stream a lot of writes out: we construct a full frame or a portion
of a frame, or upload vertices or shader scripts, textures, etc.
( also, _even_ when there is a cache fill pending on for a partially
written cacheline, that might go on in parallel and it is not
necessarily holding up the CPU unless it has an actual data dependency
on that. )
but that's totally besides the point anyway. WC or WB accesses, if a 3D
app or a driver does high-freq change_page_attr() calls, it will _lose_
the performance game:
> > also, it's irrelevant to change_page_attr() call frequency. Just map
> > in everything from the card and use it. In graphics, if you remap
> > anything on the fly and it's not a slowpath you've lost the
> > performance game even before you began it.
>
> The typical case would be lots of user space DRI clients supplying
> their own buffers on the fly. There's not really a fixed pool in this
> case, but it all varies dynamically. In some scenarios that could
> happen quite often.
in what scenarios? Please give me in-tree examples of such high-freq
change_page_attr() cases, where the driver authors would like to call it
with high frequency but are unable to do it and see performance problems
due to the WBINVD.
Ingo
next prev parent reply other threads:[~2008-01-11 7:20 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-03 15:24 [PATCH CPA] [1/28] Shrink __PAGE_KERNEL/__PAGE_KERNEL_EXEC on non PAE kernels Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [2/28] CPA: Do a simple self test at boot Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [3/28] Add pte accessors for the global bit Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [4/28] Add pte_clrhuge on i386 Andi Kleen
2008-01-05 6:48 ` Jeremy Fitzhardinge
2008-01-03 15:24 ` [PATCH CPA] [5/28] Add pte_pgprot " Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [6/28] CPA: Undo white space changes Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [7/28] Extract page table dumping code from i386 fault handler into dump_pagetable() Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [8/28] CPA: Return the page table level in lookup_address() Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [9/28] CPA: Add simple self test at boot Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [10/28] CPA: Change kernel_map_pages to not use c_p_a() Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [11/28] CPA: Change 32bit back to init_mm semaphore locking Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [12/28] CPA: CLFLUSH support in change_page_attr() Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [13/28] CPA: Use macros to modify the PG_arch_1 page flags in change_page_attr Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [14/28] CPA: Use page granuality TLB flushing " Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [15/28] CPA: Don't flush the caches when the CPU supports self-snoop Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [16/28] CPA: Use wbinvd() macro instead of inline assembly in 64bit c_p_a() Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [17/28] CPA: Reorder TLB / cache flushes to follow Intel recommendation Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [18/28] CPA: Make change_page_attr() more robust against use of PAT bits Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [19/28] CPA: Limit cache flushing to pages that really change caching Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [20/28] CPA: Fix inaccurate comments in 64bit change_page_attr() Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [21/28] CPA: Dump pagetable when inconsistency is detected Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [22/28] CPA: Only queue actually unused page table pages for freeing Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [23/28] CPA: Remove unnecessary masking of address Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [24/28] CPA: Only unmap kernel init pages in text mapping when CONFIG_DEBUG_RODATA is set Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [25/28] CPA: Always do full TLB flush when splitting large pages Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [26/28] CPA: Fix reference counting when changing already changed pages Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [27/28] CPA: Change comments of external interfaces to kerneldoc format Andi Kleen
2008-01-03 15:24 ` [PATCH CPA] [28/28] Make kernel_text test match boot mapping initialization Andi Kleen
2008-01-10 9:31 ` CPA patchset Ingo Molnar
2008-01-10 9:44 ` Dave Airlie
2008-01-10 9:55 ` Andi Kleen
2008-01-10 10:20 ` Dave Airlie
2008-01-10 10:50 ` Andi Kleen
2008-01-10 10:02 ` Ingo Molnar
2008-01-10 9:53 ` Andi Kleen
2008-01-10 10:04 ` Ingo Molnar
2008-01-10 10:07 ` Andi Kleen
2008-01-10 10:57 ` Ingo Molnar
2008-01-10 11:12 ` Andi Kleen
2008-01-11 7:19 ` Ingo Molnar [this message]
2008-01-11 7:33 ` Ingo Molnar
2008-01-11 11:28 ` Andi Kleen
2008-01-11 11:26 ` Andi Kleen
2008-01-11 17:02 ` dean gaudet
2008-01-11 17:18 ` Arjan van de Ven
2008-01-11 17:56 ` dean gaudet
2008-01-10 10:43 ` Ingo Molnar
2008-01-10 11:07 ` Andi Kleen
2008-01-10 12:22 ` Ingo Molnar
2008-01-10 12:39 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080111071936.GA16175@elte.hu \
--to=mingo@elte.hu \
--cc=ak@suse.de \
--cc=arjan@infradead.org \
--cc=davej@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
--cc=venkatesh.pallipadi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox