From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757238AbdJKCeK (ORCPT ); Tue, 10 Oct 2017 22:34:10 -0400 Received: from mga05.intel.com ([192.55.52.43]:38780 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756754AbdJKCeH (ORCPT ); Tue, 10 Oct 2017 22:34:07 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.43,359,1503385200"; d="scan'208";a="145081283" Date: Wed, 11 Oct 2017 10:34:02 +0800 From: Aaron Lu To: Andrew Morton Cc: Dave Hansen , linux-mm , lkml , Andi Kleen , Huang Ying , Tim Chen , Kemi Wang , Anshuman Khandual Subject: Re: [PATCH v2] mm/page_alloc.c: inline __rmqueue() Message-ID: <20171011023402.GC27907@intel.com> References: <20171009054434.GA1798@intel.com> <3a46edcf-88f8-e4f4-8b15-3c02620308e4@intel.com> <20171010025151.GD1798@intel.com> <20171010025601.GE1798@intel.com> <8d6a98d3-764e-fd41-59dc-88a9d21822c7@intel.com> <20171010054342.GF1798@intel.com> <20171010144545.c87a28b0f3c4e475305254ab@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171010144545.c87a28b0f3c4e475305254ab@linux-foundation.org> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 10, 2017 at 02:45:45PM -0700, Andrew Morton wrote: > On Tue, 10 Oct 2017 13:43:43 +0800 Aaron Lu wrote: > > > On Mon, Oct 09, 2017 at 10:19:52PM -0700, Dave Hansen wrote: > > > On 10/09/2017 07:56 PM, Aaron Lu wrote: > > > > This patch adds inline to __rmqueue() and vmlinux' size doesn't have any > > > > change after this patch according to size(1). > > > > > > > > without this patch: > > > > text data bss dec hex filename > > > > 9968576 5793372 17715200 33477148 1fed21c vmlinux > > > > > > > > with this patch: > > > > text data bss dec hex filename > > > > 9968576 5793372 17715200 33477148 1fed21c vmlinux > > > > > > This is unexpected. Could you double-check this, please? > > > > mm/page_alloc.o has size changes: > > > > Without this patch: > > $ size mm/page_alloc.o > > text data bss dec hex filename > > 36695 9792 8396 54883 d663 mm/page_alloc.o > > > > With this patch: > > $ size mm/page_alloc.o > > text data bss dec hex filename > > 37511 9792 8396 55699 d993 mm/page_alloc.o > > > > But vmlinux doesn't. > > > > It's not clear to me what happened, do you want to me dig this out? > > There's weird stuff going on. > > With x86_64 gcc-4.8.4 > > Patch not applied: > > akpm3:/usr/local/google/home/akpm/k/25> nm mm/page_alloc.o|grep __rmqueue > 0000000000002a00 t __rmqueue > > Patch applied: > > akpm3:/usr/local/google/home/akpm/k/25> nm mm/page_alloc.o|grep __rmqueue > 000000000000039f t __rmqueue_fallback > 0000000000001220 t __rmqueue_smallest > > So inlining __rmqueue has caused the compiler to decide to uninline > __rmqueue_fallback and __rmqueue_smallest, which largely undoes the > effect of your patch. > > `inline' is basically advisory (or ignored) in modern gcc's. So gcc > has felt free to ignore it in __rmqueue_fallback and __rmqueue_smallest > because gcc thinks it knows best. That's why we created > __always_inline, to grab gcc by the scruff of its neck. This is a good point and I agree with Andi to use always_inline for those functions that we really want to inline. > > So... I think this patch could do with quite a bit more care, tuning > and testing with various gcc versions. I did some more testing. With x86_64 gcc-4.6.3 available from kernel.org crosstool: Patch not applied: [aaron@aaronlu linux]$ nm mm/page_alloc.o |grep __rmqueue 00000000000023f0 t __rmqueue 00000000000027c0 t __rmqueue_pcplist.isra.95 Patch applied: [aaron@aaronlu linux]$ nm mm/page_alloc.o |grep __rmqueue 0000000000002950 t __rmqueue_pcplist.isra.95 Works expected. With self built x86_64 gcc-4.8.4: Patch not applied: [aaron@aaronlu linux]$ nm mm/page_alloc.o |grep __rmqueue 0000000000001f20 t __rmqueue Patch applied: [aaron@aaronlu linux]$ nm mm/page_alloc.o |grep __rmqueue Works expected.(conflicts with your result though). I also tested gcc-4.9.4, gcc-5.3.1, gcc-6.4.0 and gcc-7.2.1, all have the same output as the above gcc-4.8.4. Then I realized CONFIG_OPTIMIZE_INLINING which I always disabled as suggested by the help message(If unsure, say N). Turnining that config on indeed caused gcc-4.8.4 to emit __rmqueue_fallback here. I think I'll just mark those functions always_inline.