From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753082AbcIIPxq (ORCPT <rfc822;w@1wt.eu>);
        Fri, 9 Sep 2016 11:53:46 -0400
Received: from mga09.intel.com ([134.134.136.24]:50531 "EHLO mga09.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751618AbcIIPxn (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 9 Sep 2016 11:53:43 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.30,305,1470726000"; 
   d="scan'208";a="876884961"
Message-ID: <1473436422.3916.3.camel@linux.intel.com>
Subject: Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping
 out
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Minchan Kim <minchan@kernel.org>, "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, tim.c.chen@intel.com,
        dave.hansen@intel.com, andi.kleen@intel.com, aaron.lu@intel.com,
        linux-mm@kvack.org, linux-kernel@vger.kernel.org,
        Hugh Dickins <hughd@google.com>, Shaohua Li <shli@kernel.org>,
        Rik van Riel <riel@redhat.com>, Andrea Arcangeli <aarcange@redhat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
        Vladimir Davydov <vdavydov@virtuozzo.com>,
        Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>
Date: Fri, 09 Sep 2016 08:53:42 -0700
In-Reply-To: <20160909054336.GA2114@bbox>
References: <1473266769-2155-1-git-send-email-ying.huang@intel.com>
         <20160909054336.GA2114@bbox>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.18.5.2 (3.18.5.2-1.fc23) 
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, 2016-09-09 at 14:43 +0900, Minchan Kim wrote:
> Hi Huang,
> 
> On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote:
> > 
> > From: Huang Ying <ying.huang@intel.com>
> > 
> > This patchset is to optimize the performance of Transparent Huge Page
> > (THP) swap.
> > 
> > Hi, Andrew, could you help me to check whether the overall design is
> > reasonable?
> > 
> > Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the
> > swap part of the patchset?  Especially [01/10], [04/10], [05/10],
> > [06/10], [07/10], [10/10].
> > 
> > Hi, Andrea and Kirill, could you help me to review the THP part of the
> > patchset?  Especially [02/10], [03/10], [09/10] and [10/10].
> > 
> > Hi, Johannes, Michal and Vladimir, I am not very confident about the
> > memory cgroup part, especially [02/10] and [03/10].  Could you help me
> > to review it?
> > 
> > And for all, Any comment is welcome!
> > 
> > 
> > Recently, the performance of the storage devices improved so fast that
> > we cannot saturate the disk bandwidth when do page swap out even on a
> > high-end server machine.  Because the performance of the storage
> > device improved faster than that of CPU.  And it seems that the trend
> > will not change in the near future.  On the other hand, the THP
> > becomes more and more popular because of increased memory size.  So it
> > becomes necessary to optimize THP swap performance.
> > 
> > The advantages of the THP swap support include:
> > 
> > - Batch the swap operations for the THP to reduce lock
> >   acquiring/releasing, including allocating/freeing the swap space,
> >   adding/deleting to/from the swap cache, and writing/reading the swap
> >   space, etc.  This will help improve the performance of the THP swap.
> > 
> > - The THP swap space read/write will be 2M sequential IO.  It is
> >   particularly helpful for the swap read, which usually are 4k random
> >   IO.  This will improve the performance of the THP swap too.
> > 
> > - It will help the memory fragmentation, especially when the THP is
> >   heavily used by the applications.  The 2M continuous pages will be
> >   free up after THP swapping out.
> I just read patchset right now and still doubt why the all changes
> should be coupled with THP tightly. Many parts(e.g., you introduced
> or modifying existing functions for making them THP specific) could
> just take page_list and the number of pages then would handle them
> without THP awareness.
> 
> For example, if the nr_pages is larger than SWAPFILE_CLUSTER, we
> can try to allocate new cluster. With that, we could allocate new
> clusters to meet nr_pages requested or bail out if we fail to allocate
> and fallback to 0-order page swapout. With that, swap layer could
> support multiple order-0 pages by batch.
> 
> IMO, I really want to land Tim Chen's batching swapout work first.
> With Tim Chen's work, I expect we can make better refactoring
> for batching swap before adding more confuse to the swap layer.
> (I expect it would share several pieces of code for or would be base
> for batching allocation of swapcache, swapslot)

Minchan,

Ying and I do plan to send out a new patch series on batching swapout
and swapin plus a few other optimization on the swapping of 
regular sized pages.

Hopefully we'll be able to do that soon after we fixed up a few
things and retest.

Tim