From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S941464AbcIZDZo convert rfc822-to-8bit (ORCPT <rfc822;w@1wt.eu>);
        Sun, 25 Sep 2016 23:25:44 -0400
Received: from mga07.intel.com ([134.134.136.100]:50968 "EHLO mga07.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S941260AbcIZDZm (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 25 Sep 2016 23:25:42 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.30,397,1470726000"; 
   d="scan'208";a="883723175"
From: "Huang\, Ying" <ying.huang@intel.com>
To: Shaohua Li <shli@kernel.org>
Cc: "Huang\, Ying" <ying.huang@intel.com>, Rik van Riel <riel@redhat.com>,
        Andrew Morton <akpm@linux-foundation.org>, <tim.c.chen@intel.com>,
        <dave.hansen@intel.com>, <andi.kleen@intel.com>, <aaron.lu@intel.com>,
        <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
        Hugh Dickins <hughd@google.com>, Minchan Kim <minchan@kernel.org>,
        "Andrea Arcangeli" <aarcange@redhat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
        Vladimir Davydov <vdavydov@virtuozzo.com>,
        Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out
References: <1473266769-2155-1-git-send-email-ying.huang@intel.com>
        <20160922225608.GA3898@kernel.org>
        <1474591086.17726.1.camel@redhat.com>
        <87d1jvuz08.fsf@yhuang-dev.intel.com>
        <20160925191849.GA83300@kernel.org>
Date: Mon, 26 Sep 2016 11:25:27 +0800
In-Reply-To: <20160925191849.GA83300@kernel.org> (Shaohua Li's message of
        "Sun, 25 Sep 2016 12:18:49 -0700")
Message-ID: <877f9zs5p4.fsf@yhuang-dev.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Shaohua Li <shli@kernel.org> writes:

> On Fri, Sep 23, 2016 at 10:32:39AM +0800, Huang, Ying wrote:
>> Rik van Riel <riel@redhat.com> writes:
>> 
>> > On Thu, 2016-09-22 at 15:56 -0700, Shaohua Li wrote:
>> >> On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote:
>> >> >.
>> >> > - It will help the memory fragmentation, especially when the THP is
>> >> > . heavily used by the applications.. The 2M continuous pages will
>> >> > be
>> >> > . free up after THP swapping out.
>> >> 
>> >> So this is impossible without THP swapin. While 2M swapout makes a
>> >> lot of
>> >> sense, I doubt 2M swapin is really useful. What kind of application
>> >> is
>> >> 'optimized' to do sequential memory access?
>> >
>> > I suspect a lot of this will depend on the ratio of storage
>> > speed to CPU & RAM speed.
>> >
>> > When swapping to a spinning disk, it makes sense to avoid
>> > extra memory use on swapin, and work in 4kB blocks.
>> 
>> For spinning disk, the THP swap optimization will be turned off in
>> current implementation.  Because huge swap cluster allocation based on
>> swap cluster management, which is available only for non-rotating block
>> devices (blk_queue_nonrot()).
>
> For 2m swapin, as long as one byte is changed in the 2m, next time we must do
> 2m swapout. There is huge waste of memory and IO bandwidth and increases
> unnecessary memory pressure. 2M IO will very easily saturate a very fast SSD
> and makes IO the bottleneck. Not sure about NVRAM though.

One solution is to make 2M swapin configurable, maybe via a sysfs file
in /sys/kernel/mm/transparent_hugepage/, so that we can turn on it only
for really fast storage devices, such as NVRAM, etc.

Best Regards,
Huang, Ying