linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Magenheimer <dan.magenheimer@oracle.com>
To: Nitin Gupta <ngupta@vflare.org>,
	Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Minchan Kim <minchan@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Tejun Heo <tj@kernel.org>,
	David Howells <dhowells@redhat.com>,
	x86@kernel.org, Nick Piggin <npiggin@gmail.com>,
	Konrad Rzeszutek Wilk <konrad@darnok.org>
Subject: RE: [PATCH v2 3/3] x86: Support local_flush_tlb_kernel_range
Date: Fri, 15 Jun 2012 10:29:46 -0700 (PDT)	[thread overview]
Message-ID: <10ea9d19-bd24-400c-8131-49f0b4e9e5ae@default> (raw)
In-Reply-To: <4FDB66B7.2010803@vflare.org>

> From: Nitin Gupta [mailto:ngupta@vflare.org]
> Subject: Re: [PATCH v2 3/3] x86: Support local_flush_tlb_kernel_range
> 
> On 06/15/2012 09:35 AM, Dan Magenheimer wrote:
> >> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> >> Sent: Friday, June 15, 2012 9:13 AM
> >> To: Peter Zijlstra
> >> Cc: Minchan Kim; Greg Kroah-Hartman; Nitin Gupta; Dan Magenheimer; linux-kernel@vger.kernel.org;
> >> linux-mm@kvack.org; Thomas Gleixner; Ingo Molnar; Tejun Heo; David Howells; x86@kernel.org; Nick
> >> Piggin
> >> Subject: Re: [PATCH v2 3/3] x86: Support local_flush_tlb_kernel_range
> >>
> >> On 05/17/2012 09:51 AM, Peter Zijlstra wrote:
> >>
> >>> On Thu, 2012-05-17 at 17:11 +0900, Minchan Kim wrote:
> >>>>> +++ b/arch/x86/include/asm/tlbflush.h
> >>>>> @@ -172,4 +172,16 @@ static inline void flush_tlb_kernel_range(unsigned long start,
> >>>>>       flush_tlb_all();
> >>>>>  }
> >>>>>
> >>>>> +static inline void local_flush_tlb_kernel_range(unsigned long start,
> >>>>> +             unsigned long end)
> >>>>> +{
> >>>>> +     if (cpu_has_invlpg) {
> >>>>> +             while (start < end) {
> >>>>> +                     __flush_tlb_single(start);
> >>>>> +                     start += PAGE_SIZE;
> >>>>> +             }
> >>>>> +     } else
> >>>>> +             local_flush_tlb();
> >>>>> +}
> >>>
> >>> It would be much better if you wait for Alex Shi's patch to mature.
> >>> doing the invlpg thing for ranges is not an unconditional win.
> >>
> >> From what I can tell Alex's patches have stalled.  The last post was v6
> >> on 5/17 and there wasn't a single reply to them afaict.
> >>
> >> According to Alex's investigation of this "tipping point", it seems that
> >> a good generic value is 8.  In other words, on most x86 hardware, it is
> >> cheaper to flush up to 8 tlb entries one by one rather than doing a
> >> complete flush.
> >>
> >> So we can do something like:
> >>
> >>      if (cpu_has_invlpg && (end - start)/PAGE_SIZE <= 8) {
> >>              while (start < end) {
> >>
> >> Would this be acceptable?
> >
> > Hey Seth, Nitin --
> >
> > After more work digging around zsmalloc and zbud, I really think
> > this TLB flushing, as well as the "page pair mapping" code can be
> > completely eliminated IFF zsmalloc is limited to items PAGE_SIZE or
> > less.  Since this is already true of zram (and in-tree zcache), and
> > zsmalloc currently has no other users, I think you should seriously
> > consider limiting zsmalloc in that way, or possibly splitting out
> > one version of zsmalloc which handles items PAGE_SIZE or less,
> > and a second version that can handle larger items but has (AFAIK)
> > no users.
> >
> > If you consider it an option to have (a version of) zsmalloc
> > limited to items PAGE_SIZE or less, let me know and we can
> > get into the details.
> 
> zsmalloc is already limited to objects of size PAGE_SIZE or less. This
> two-page splitting is for efficiently storing objects in range
> (PAGE_SIZE/2, PAGE_SIZE) which is very common in both zram and zcache.
> 
> SLUB achieves this efficiency by allocating higher order pages but that
> is not an option for zsmalloc.

That's what I thought, but a separate thread about ensuring zsmalloc
was as generic as possible led me to believe that zsmalloc was moving
in the direction of larger sizes.

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> 
> To add to what Nitin just sent, without the page mapping, zsmalloc and
> the late xvmalloc have the same issue.  Say you have a whole class of
> objects that are 3/4 of a page.  Without the mapping, you can't cross
> non-contiguous page boundaries and you'll have 25% fragmentation in the
> memory pool.  This is the whole point of zsmalloc.

Yes, understood.  This suggestion doesn't change any of that.
It only assumes that no more than one page boundary is crossed.

So, briefly, IIRC the "pair mapping" is what creates the necessity
to do special TLB stuff.  That pair mapping is necessary
to create the illusion to the compression/decompression code
(and one other memcpy) that no pageframe boundary is crossed.
Correct?

The compression code already compresses to a per-cpu page-pair
already and then that "zpage" is copied into the space allocated
for it by zsmalloc.  For that final copy, if the copy code knows
the target may cross a page boundary, has both target pages
kmap'ed, and is smart about doing the copy, the "pair mapping"
can be avoided for compression.

The decompression path calls lzo1x directly and it would be
a huge pain to make lzo1x smart about page boundaries.  BUT
since we know that the decompressed result will always fit
into a page (actually exactly a page), you COULD do an extra
copy to the end of the target page (using the same smart-
about-page-boundaries copying code from above) and then do
in-place decompression, knowing that the decompression will
not cross a page boundary.  So, with the extra copy, the "pair
mapping" can be avoided for decompression as well.

What about the horrible cost of that extra copy?  Well, much
of the cost of a large copy is due to cache effects.  Since
you are copying into a page that will immediately be overwritten
by the decompress, I'll bet that cost is much smaller.  And
compared to the cost of setting up and tearing down TLB
entries (especially on machines with no local_tlb_kernel_range),
I suspect that special copy may be a LOT cheaper.  And
with no special TLB code required, zsmalloc should be a lot
more portable.

Thoughts?
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-06-15 17:31 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-16  2:05 [PATCH v2 1/3] zsmalloc: support zsmalloc to ARM, MIPS, SUPERH Minchan Kim
2012-05-16  2:05 ` [PATCH v2 2/3] remove dependency with x86 Minchan Kim
2012-05-16 17:11   ` Seth Jennings
2012-05-17  8:06     ` Minchan Kim
2012-05-16  2:05 ` [PATCH v2 3/3] x86: Support local_flush_tlb_kernel_range Minchan Kim
2012-05-17  8:11   ` Minchan Kim
2012-05-17 14:46     ` Greg Kroah-Hartman
2012-05-18  8:35       ` Minchan Kim
2012-05-17 14:51     ` Peter Zijlstra
2012-05-17 15:08       ` Peter Zijlstra
2012-05-19  0:13         ` Alex Shi
2012-05-18  8:36       ` Minchan Kim
2012-06-15 15:13       ` Seth Jennings
2012-06-15 16:35         ` Dan Magenheimer
2012-06-15 16:45           ` Nitin Gupta
2012-06-15 17:29             ` Dan Magenheimer [this message]
2012-06-15 19:07               ` Seth Jennings
2012-06-15 19:39                 ` Dan Magenheimer
2012-06-15 19:53                   ` Nitin Gupta
2012-06-15 20:13                     ` Dan Magenheimer
2012-06-15 21:23                       ` Nitin Gupta
2012-06-15 23:26                         ` Seth Jennings
2012-06-15 16:48           ` Seth Jennings
2012-05-16  7:28 ` [PATCH v2 1/3] zsmalloc: support zsmalloc to ARM, MIPS, SUPERH Guan Xuetao
2012-05-17  0:07   ` Minchan Kim
2012-05-17  0:56     ` Guan Xuetao
2012-05-17  8:04       ` Minchan Kim
2012-05-18  1:45         ` Guan Xuetao
2012-05-18  8:38           ` Minchan Kim
2012-05-17  8:32 ` Paul Mundt
2012-05-17  9:06   ` Minchan Kim
2012-05-17  9:19     ` Paul Mundt
2012-05-17  9:08   ` Minchan Kim
2012-05-23 20:51 ` Seth Jennings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10ea9d19-bd24-400c-8131-49f0b4e9e5ae@default \
    --to=dan.magenheimer@oracle.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=dhowells@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=konrad@darnok.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=mingo@redhat.com \
    --cc=ngupta@vflare.org \
    --cc=npiggin@gmail.com \
    --cc=sjenning@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).