public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Fix some common inline bloat
@ 2014-05-16 21:43 Andi Kleen
  2014-05-16 21:43 ` [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg Andi Kleen
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Andi Kleen @ 2014-05-16 21:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm

It's very easy to bloat the kernel code significantly by adding
code to commonly called inlines. Often these inlines start small,
but later when new code is added they don't get moved out-of-line.

I wrote a new tool to account for inline bloat. Addressing selected
occurrences in the top-20 of my kernel config saved about
145k.

   text	   data	    bss	    dec	    hex	filename
14220873	2008072	1507328	17736273	10ea251	vmlinux-before-anything
14074978	2008168	1507328	17590474	10c68ca	vmlinux-inline

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg
  2014-05-16 21:43 Fix some common inline bloat Andi Kleen
@ 2014-05-16 21:43 ` Andi Kleen
  2014-05-19  9:14   ` David Laight
  2014-05-19 22:00   ` Rustad, Mark D
  2014-05-16 21:43 ` [PATCH 2/8] radeonfb: Out of line errata workarounds Andi Kleen
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Andi Kleen @ 2014-05-16 21:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, Andi Kleen, netdev, Jeff Kirsher

From: Andi Kleen <ak@linux.intel.com>

ixgbe_read_reg and ixgbe_write_reg are frequently called and are very big
because they have complex error handling code.

Moving them out of line saves ~27k text in the ixgbe driver.

   text	   data	    bss	    dec	    hex	filename
14220873	2008072	1507328	17736273	10ea251	vmlinux-before-ixgbe
14193673	2003976	1507328	17704977	10e2811	vmlinux-ixgbe

Cc: netdev@vger.kernel.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.h | 22 ++--------------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
index f12c40f..05f094d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
@@ -162,28 +162,10 @@ static inline void writeq(u64 val, void __iomem *addr)
 }
 #endif
 
-static inline void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value)
-{
-	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
+void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value);
+u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg);
 
-	if (ixgbe_removed(reg_addr))
-		return;
-	writeq(value, reg_addr + reg);
-}
 #define IXGBE_WRITE_REG64(a, reg, value) ixgbe_write_reg64((a), (reg), (value))
-
-static inline u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg)
-{
-	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
-	u32 value;
-
-	if (ixgbe_removed(reg_addr))
-		return IXGBE_FAILED_READ_REG;
-	value = readl(reg_addr + reg);
-	if (unlikely(value == IXGBE_FAILED_READ_REG))
-		ixgbe_check_remove(hw, reg);
-	return value;
-}
 #define IXGBE_READ_REG(a, reg) ixgbe_read_reg((a), (reg))
 
 #define IXGBE_WRITE_REG_ARRAY(a, reg, offset, value) \
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d62e7a2..5f81f62 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -371,6 +371,28 @@ void ixgbe_write_pci_cfg_word(struct ixgbe_hw *hw, u32 reg, u16 value)
 	pci_write_config_word(adapter->pdev, reg, value);
 }
 
+void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value)
+{
+	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
+
+	if (ixgbe_removed(reg_addr))
+		return;
+	writeq(value, reg_addr + reg);
+}
+
+u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg)
+{
+	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
+	u32 value;
+
+	if (ixgbe_removed(reg_addr))
+		return IXGBE_FAILED_READ_REG;
+	value = readl(reg_addr + reg);
+	if (unlikely(value == IXGBE_FAILED_READ_REG))
+		ixgbe_check_remove(hw, reg);
+	return value;
+}
+
 static void ixgbe_service_event_complete(struct ixgbe_adapter *adapter)
 {
 	BUG_ON(!test_bit(__IXGBE_SERVICE_SCHED, &adapter->state));
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/8] radeonfb: Out of line errata workarounds
  2014-05-16 21:43 Fix some common inline bloat Andi Kleen
  2014-05-16 21:43 ` [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg Andi Kleen
@ 2014-05-16 21:43 ` Andi Kleen
  2014-05-16 21:43 ` [PATCH 3/8] list: Out of line INIT_LIST_HEAD and list_del Andi Kleen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2014-05-16 21:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, Andi Kleen, Benjamin Herrenschmidt, linux-fbdev

From: Andi Kleen <ak@linux.intel.com>

Out of lining _radeon_msleep and radeon_pll_errata_* saves about 40k text.

14193673	2003976	1507328	17704977	10e2811	vmlinux-before-radeon
14152713	2003976	1507328	17664017	10d8811	vmlinux-radeon

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-fbdev@vger.kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/video/fbdev/aty/radeon_base.c | 57 ++++++++++++++++++++++++++++++++++
 drivers/video/fbdev/aty/radeonfb.h    | 58 ++---------------------------------
 2 files changed, 60 insertions(+), 55 deletions(-)

diff --git a/drivers/video/fbdev/aty/radeon_base.c b/drivers/video/fbdev/aty/radeon_base.c
index 26d80a4..abd89a9 100644
--- a/drivers/video/fbdev/aty/radeon_base.c
+++ b/drivers/video/fbdev/aty/radeon_base.c
@@ -282,6 +282,63 @@ static int backlight = 1;
 static int backlight = 0;
 #endif
 
+/* Note about this function: we have some rare cases where we must not schedule,
+ * this typically happen with our special "wake up early" hook which allows us to
+ * wake up the graphic chip (and thus get the console back) before everything else
+ * on some machines that support that mechanism. At this point, interrupts are off
+ * and scheduling is not permitted
+ */
+void _radeon_msleep(struct radeonfb_info *rinfo, unsigned long ms)
+{
+	if (rinfo->no_schedule || oops_in_progress)
+		mdelay(ms);
+	else
+		msleep(ms);
+}
+
+/*
+ * Note about PLL register accesses:
+ *
+ * I have removed the spinlock on them on purpose. The driver now
+ * expects that it will only manipulate the PLL registers in normal
+ * task environment, where radeon_msleep() will be called, protected
+ * by a semaphore (currently the console semaphore) so that no conflict
+ * will happen on the PLL register index.
+ *
+ * With the latest changes to the VT layer, this is guaranteed for all
+ * calls except the actual drawing/blits which aren't supposed to use
+ * the PLL registers anyway
+ *
+ * This is very important for the workarounds to work properly. The only
+ * possible exception to this rule is the call to unblank(), which may
+ * be done at irq time if an oops is in progress.
+ */
+void radeon_pll_errata_after_index(struct radeonfb_info *rinfo)
+{
+	if (!(rinfo->errata & CHIP_ERRATA_PLL_DUMMYREADS))
+		return;
+
+	(void)INREG(CLOCK_CNTL_DATA);
+	(void)INREG(CRTC_GEN_CNTL);
+}
+
+void radeon_pll_errata_after_data(struct radeonfb_info *rinfo)
+{
+	if (rinfo->errata & CHIP_ERRATA_PLL_DELAY) {
+		/* we can't deal with posted writes here ... */
+		_radeon_msleep(rinfo, 5);
+	}
+	if (rinfo->errata & CHIP_ERRATA_R300_CG) {
+		u32 save, tmp;
+		save = INREG(CLOCK_CNTL_INDEX);
+		tmp = save & ~(0x3f | PLL_WR_EN);
+		OUTREG(CLOCK_CNTL_INDEX, tmp);
+		tmp = INREG(CLOCK_CNTL_DATA);
+		OUTREG(CLOCK_CNTL_INDEX, save);
+	}
+}
+
+
 /*
  * prototypes
  */
diff --git a/drivers/video/fbdev/aty/radeonfb.h b/drivers/video/fbdev/aty/radeonfb.h
index cb84604..bb73446 100644
--- a/drivers/video/fbdev/aty/radeonfb.h
+++ b/drivers/video/fbdev/aty/radeonfb.h
@@ -370,20 +370,7 @@ struct radeonfb_info {
  * IO macros
  */
 
-/* Note about this function: we have some rare cases where we must not schedule,
- * this typically happen with our special "wake up early" hook which allows us to
- * wake up the graphic chip (and thus get the console back) before everything else
- * on some machines that support that mechanism. At this point, interrupts are off
- * and scheduling is not permitted
- */
-static inline void _radeon_msleep(struct radeonfb_info *rinfo, unsigned long ms)
-{
-	if (rinfo->no_schedule || oops_in_progress)
-		mdelay(ms);
-	else
-		msleep(ms);
-}
-
+void _radeon_msleep(struct radeonfb_info *rinfo, unsigned long ms);
 
 #define INREG8(addr)		readb((rinfo->mmio_base)+addr)
 #define OUTREG8(addr,val)	writeb(val, (rinfo->mmio_base)+addr)
@@ -408,47 +395,8 @@ static inline void _OUTREGP(struct radeonfb_info *rinfo, u32 addr,
 
 #define OUTREGP(addr,val,mask)	_OUTREGP(rinfo, addr, val,mask)
 
-/*
- * Note about PLL register accesses:
- *
- * I have removed the spinlock on them on purpose. The driver now
- * expects that it will only manipulate the PLL registers in normal
- * task environment, where radeon_msleep() will be called, protected
- * by a semaphore (currently the console semaphore) so that no conflict
- * will happen on the PLL register index.
- *
- * With the latest changes to the VT layer, this is guaranteed for all
- * calls except the actual drawing/blits which aren't supposed to use
- * the PLL registers anyway
- *
- * This is very important for the workarounds to work properly. The only
- * possible exception to this rule is the call to unblank(), which may
- * be done at irq time if an oops is in progress.
- */
-static inline void radeon_pll_errata_after_index(struct radeonfb_info *rinfo)
-{
-	if (!(rinfo->errata & CHIP_ERRATA_PLL_DUMMYREADS))
-		return;
-
-	(void)INREG(CLOCK_CNTL_DATA);
-	(void)INREG(CRTC_GEN_CNTL);
-}
-
-static inline void radeon_pll_errata_after_data(struct radeonfb_info *rinfo)
-{
-	if (rinfo->errata & CHIP_ERRATA_PLL_DELAY) {
-		/* we can't deal with posted writes here ... */
-		_radeon_msleep(rinfo, 5);
-	}
-	if (rinfo->errata & CHIP_ERRATA_R300_CG) {
-		u32 save, tmp;
-		save = INREG(CLOCK_CNTL_INDEX);
-		tmp = save & ~(0x3f | PLL_WR_EN);
-		OUTREG(CLOCK_CNTL_INDEX, tmp);
-		tmp = INREG(CLOCK_CNTL_DATA);
-		OUTREG(CLOCK_CNTL_INDEX, save);
-	}
-}
+void radeon_pll_errata_after_index(struct radeonfb_info *rinfo);
+void radeon_pll_errata_after_data(struct radeonfb_info *rinfo);
 
 static inline u32 __INPLL(struct radeonfb_info *rinfo, u32 addr)
 {
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/8] list: Out of line INIT_LIST_HEAD and list_del
  2014-05-16 21:43 Fix some common inline bloat Andi Kleen
  2014-05-16 21:43 ` [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg Andi Kleen
  2014-05-16 21:43 ` [PATCH 2/8] radeonfb: Out of line errata workarounds Andi Kleen
@ 2014-05-16 21:43 ` Andi Kleen
  2014-05-17  0:03   ` Dave Jones
  2014-05-17  0:03   ` Eric Dumazet
  2014-05-16 21:43 ` [PATCH 4/8] e1000e: Out of line __ew32_prepare/__ew32 Andi Kleen
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Andi Kleen @ 2014-05-16 21:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Out of lining these two inlines saves ~21k on my vmlinux

14152713	2003976	1507328	17664017	10d8811	vmlinux-before-list
14131431	2008136	1507328	17646895	10d452f	vmlinux-list

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/list.h | 15 ++++-----------
 lib/Makefile         |  2 +-
 lib/list.c           | 22 ++++++++++++++++++++++
 3 files changed, 27 insertions(+), 12 deletions(-)
 create mode 100644 lib/list.c

diff --git a/include/linux/list.h b/include/linux/list.h
index ef95941..8297885 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -21,11 +21,8 @@
 #define LIST_HEAD(name) \
 	struct list_head name = LIST_HEAD_INIT(name)
 
-static inline void INIT_LIST_HEAD(struct list_head *list)
-{
-	list->next = list;
-	list->prev = list;
-}
+/* Out of line to save space */
+void INIT_LIST_HEAD(struct list_head *list);
 
 /*
  * Insert a new entry between two known consecutive entries.
@@ -101,12 +98,8 @@ static inline void __list_del_entry(struct list_head *entry)
 	__list_del(entry->prev, entry->next);
 }
 
-static inline void list_del(struct list_head *entry)
-{
-	__list_del(entry->prev, entry->next);
-	entry->next = LIST_POISON1;
-	entry->prev = LIST_POISON2;
-}
+/* Out of line to save space */
+void list_del(struct list_head *entry);
 #else
 extern void __list_del_entry(struct list_head *entry);
 extern void list_del(struct list_head *entry);
diff --git a/lib/Makefile b/lib/Makefile
index 0cd7b68..8b744f7 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o
+	 earlycpio.o list.o
 
 obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
 lib-$(CONFIG_MMU) += ioremap.o
diff --git a/lib/list.c b/lib/list.c
new file mode 100644
index 0000000..298768f
--- /dev/null
+++ b/lib/list.c
@@ -0,0 +1,22 @@
+#include <linux/list.h>
+#include <linux/module.h>
+
+/*
+ * Out of line versions of common list.h functions that bloat the
+ * kernel too much.
+ */
+
+void INIT_LIST_HEAD(struct list_head *list)
+{
+	list->next = list;
+	list->prev = list;
+}
+EXPORT_SYMBOL(INIT_LIST_HEAD);
+
+void list_del(struct list_head *entry)
+{
+	__list_del(entry->prev, entry->next);
+	entry->next = LIST_POISON1;
+	entry->prev = LIST_POISON2;
+}
+EXPORT_SYMBOL(list_del);
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/8] e1000e: Out of line __ew32_prepare/__ew32
  2014-05-16 21:43 Fix some common inline bloat Andi Kleen
                   ` (2 preceding siblings ...)
  2014-05-16 21:43 ` [PATCH 3/8] list: Out of line INIT_LIST_HEAD and list_del Andi Kleen
@ 2014-05-16 21:43 ` Andi Kleen
  2014-05-17  3:23   ` Stephen Hemminger
  2014-05-16 21:43 ` [PATCH 5/8] x86: Out of line get_dma_ops Andi Kleen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2014-05-16 21:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, Andi Kleen, jeffrey.t.kirsher, netdev

From: Andi Kleen <ak@linux.intel.com>

Out of lining these two common inlines saves about 30k text size,
due to their errata workarounds.

14131431	2008136	1507328	17646895	10d452f	vmlinux-before-e1000e
14101415	2004040	1507328	17612783	10cbfef	vmlinux-e1000e

Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/net/ethernet/intel/e1000e/e1000.h  | 31 ++----------------------------
 drivers/net/ethernet/intel/e1000e/netdev.c | 30 +++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h
index 1471c54..cbe25bb 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -573,35 +573,8 @@ static inline u32 __er32(struct e1000_hw *hw, unsigned long reg)
 
 #define er32(reg)	__er32(hw, E1000_##reg)
 
-/**
- * __ew32_prepare - prepare to write to MAC CSR register on certain parts
- * @hw: pointer to the HW structure
- *
- * When updating the MAC CSR registers, the Manageability Engine (ME) could
- * be accessing the registers at the same time.  Normally, this is handled in
- * h/w by an arbiter but on some parts there is a bug that acknowledges Host
- * accesses later than it should which could result in the register to have
- * an incorrect value.  Workaround this by checking the FWSM register which
- * has bit 24 set while ME is accessing MAC CSR registers, wait if it is set
- * and try again a number of times.
- **/
-static inline s32 __ew32_prepare(struct e1000_hw *hw)
-{
-	s32 i = E1000_ICH_FWSM_PCIM2PCI_COUNT;
-
-	while ((er32(FWSM) & E1000_ICH_FWSM_PCIM2PCI) && --i)
-		udelay(50);
-
-	return i;
-}
-
-static inline void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val)
-{
-	if (hw->adapter->flags2 & FLAG2_PCIM2PCI_ARBITER_WA)
-		__ew32_prepare(hw);
-
-	writel(val, hw->hw_addr + reg);
-}
+s32 __ew32_prepare(struct e1000_hw *hw);
+void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val);
 
 #define ew32(reg, val)	__ew32(hw, E1000_##reg, (val))
 
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 3e69386..9b6cd9a 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -124,6 +124,36 @@ static const struct e1000_reg_info e1000_reg_info_tbl[] = {
 };
 
 /**
+ * __ew32_prepare - prepare to write to MAC CSR register on certain parts
+ * @hw: pointer to the HW structure
+ *
+ * When updating the MAC CSR registers, the Manageability Engine (ME) could
+ * be accessing the registers at the same time.  Normally, this is handled in
+ * h/w by an arbiter but on some parts there is a bug that acknowledges Host
+ * accesses later than it should which could result in the register to have
+ * an incorrect value.  Workaround this by checking the FWSM register which
+ * has bit 24 set while ME is accessing MAC CSR registers, wait if it is set
+ * and try again a number of times.
+ **/
+s32 __ew32_prepare(struct e1000_hw *hw)
+{
+	s32 i = E1000_ICH_FWSM_PCIM2PCI_COUNT;
+
+	while ((er32(FWSM) & E1000_ICH_FWSM_PCIM2PCI) && --i)
+		udelay(50);
+
+	return i;
+}
+
+void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val)
+{
+	if (hw->adapter->flags2 & FLAG2_PCIM2PCI_ARBITER_WA)
+		__ew32_prepare(hw);
+
+	writel(val, hw->hw_addr + reg);
+}
+
+/**
  * e1000_regdump - register printout routine
  * @hw: pointer to the HW structure
  * @reginfo: pointer to the register info table
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/8] x86: Out of line get_dma_ops
  2014-05-16 21:43 Fix some common inline bloat Andi Kleen
                   ` (3 preceding siblings ...)
  2014-05-16 21:43 ` [PATCH 4/8] e1000e: Out of line __ew32_prepare/__ew32 Andi Kleen
@ 2014-05-16 21:43 ` Andi Kleen
  2014-05-16 21:43 ` [PATCH 6/8] ftrace: Out of line ftrace_trigger_soft_disabled Andi Kleen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2014-05-16 21:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Out of lining the complex version of get_dma_ops saves about 6.8k on
my kernel.

14101415	2004040	1507328	17612783	10cbfef	vmlinux-before-dma
14094629	2004040	1507328	17605997	10ca56d	vmlinux-dma

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/dma-mapping.h |  7 +++----
 arch/x86/lib/Makefile              |  2 ++
 arch/x86/lib/dma.c                 | 11 +++++++++++
 3 files changed, 16 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/lib/dma.c

diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 808dae6..314e4bd 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -29,15 +29,14 @@ extern int panic_on_overflow;
 
 extern struct dma_map_ops *dma_ops;
 
+struct dma_map_ops *__get_dma_ops(struct device *dev);
+
 static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 {
 #ifndef CONFIG_X86_DEV_DMA_OPS
 	return dma_ops;
 #else
-	if (unlikely(!dev) || !dev->archdata.dma_ops)
-		return dma_ops;
-	else
-		return dev->archdata.dma_ops;
+	return __get_dma_ops(dev);
 #endif
 }
 
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index eabcb6e..44dae40 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -44,3 +44,5 @@ else
         lib-y += copy_user_64.o copy_user_nocache_64.o
 	lib-y += cmpxchg16b_emu.o
 endif
+
+lib-y += dma.o
diff --git a/arch/x86/lib/dma.c b/arch/x86/lib/dma.c
new file mode 100644
index 0000000..c97b5ae
--- /dev/null
+++ b/arch/x86/lib/dma.c
@@ -0,0 +1,11 @@
+#include <linux/module.h>
+#include <linux/dma-mapping.h>
+
+struct dma_map_ops *__get_dma_ops(struct device *dev)
+{
+	if (unlikely(!dev) || !dev->archdata.dma_ops)
+		return dma_ops;
+	else
+		return dev->archdata.dma_ops;
+}
+EXPORT_SYMBOL(__get_dma_ops);
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/8] ftrace: Out of line ftrace_trigger_soft_disabled
  2014-05-16 21:43 Fix some common inline bloat Andi Kleen
                   ` (4 preceding siblings ...)
  2014-05-16 21:43 ` [PATCH 5/8] x86: Out of line get_dma_ops Andi Kleen
@ 2014-05-16 21:43 ` Andi Kleen
  2014-05-16 21:43 ` [PATCH 7/8] radeon: Out of line radeon_get_ib_value Andi Kleen
  2014-05-16 21:43 ` [PATCH 8/8] Kbuild: add inline-account tool to find inline bloat Andi Kleen
  7 siblings, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2014-05-16 21:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Out of lining this function saves about 14k text

   text	   data	    bss	    dec	    hex	filename
14094629	2004040	1507328	17605997	10ca56d	vmlinux-before-ftrace
14079650	2008136	1507328	17595114	10c7aea	vmlinux-ftrace

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/ftrace_event.h        | 23 +----------------------
 kernel/trace/trace_events_trigger.c | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index d16da3e..70be665 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -416,28 +416,7 @@ extern enum event_trigger_type event_triggers_call(struct ftrace_event_file *fil
 extern void event_triggers_post_call(struct ftrace_event_file *file,
 				     enum event_trigger_type tt);
 
-/**
- * ftrace_trigger_soft_disabled - do triggers and test if soft disabled
- * @file: The file pointer of the event to test
- *
- * If any triggers without filters are attached to this event, they
- * will be called here. If the event is soft disabled and has no
- * triggers that require testing the fields, it will return true,
- * otherwise false.
- */
-static inline bool
-ftrace_trigger_soft_disabled(struct ftrace_event_file *file)
-{
-	unsigned long eflags = file->flags;
-
-	if (!(eflags & FTRACE_EVENT_FL_TRIGGER_COND)) {
-		if (eflags & FTRACE_EVENT_FL_TRIGGER_MODE)
-			event_triggers_call(file, NULL);
-		if (eflags & FTRACE_EVENT_FL_SOFT_DISABLED)
-			return true;
-	}
-	return false;
-}
+extern bool ftrace_trigger_soft_disabled(struct ftrace_event_file *file);
 
 /*
  * Helper function for event_trigger_unlock_commit{_regs}().
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 4747b47..136c181 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -28,6 +28,31 @@
 static LIST_HEAD(trigger_commands);
 static DEFINE_MUTEX(trigger_cmd_mutex);
 
+
+/**
+ * ftrace_trigger_soft_disabled - do triggers and test if soft disabled
+ * @file: The file pointer of the event to test
+ *
+ * If any triggers without filters are attached to this event, they
+ * will be called here. If the event is soft disabled and has no
+ * triggers that require testing the fields, it will return true,
+ * otherwise false.
+ */
+bool
+ftrace_trigger_soft_disabled(struct ftrace_event_file *file)
+{
+	unsigned long eflags = file->flags;
+
+	if (!(eflags & FTRACE_EVENT_FL_TRIGGER_COND)) {
+		if (eflags & FTRACE_EVENT_FL_TRIGGER_MODE)
+			event_triggers_call(file, NULL);
+		if (eflags & FTRACE_EVENT_FL_SOFT_DISABLED)
+			return true;
+	}
+	return false;
+}
+EXPORT_SYMBOL(ftrace_trigger_soft_disabled);
+
 static void
 trigger_data_free(struct event_trigger_data *data)
 {
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/8] radeon: Out of line radeon_get_ib_value
  2014-05-16 21:43 Fix some common inline bloat Andi Kleen
                   ` (5 preceding siblings ...)
  2014-05-16 21:43 ` [PATCH 6/8] ftrace: Out of line ftrace_trigger_soft_disabled Andi Kleen
@ 2014-05-16 21:43 ` Andi Kleen
  2014-05-20 16:16   ` Marek Olšák
  2014-05-16 21:43 ` [PATCH 8/8] Kbuild: add inline-account tool to find inline bloat Andi Kleen
  7 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2014-05-16 21:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, Andi Kleen, alexander.deucher, dri-devel

From: Andi Kleen <ak@linux.intel.com>

Saves about 5k of text

   text	   data	    bss	    dec	    hex	filename
14080360	2008168	1507328	17595856	10c7dd0	vmlinux-before-radeon
14074978	2008168	1507328	17590474	10c68ca	vmlinux-radeon

Cc: alexander.deucher@amd.com
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/gpu/drm/radeon/radeon.h        | 10 +---------
 drivers/gpu/drm/radeon/radeon_device.c |  9 +++++++++
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 6852861..8cae409 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -1032,15 +1032,7 @@ struct radeon_cs_parser {
 	struct ww_acquire_ctx	ticket;
 };
 
-static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
-{
-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
-
-	if (ibc->kdata)
-		return ibc->kdata[idx];
-	return p->ib.ptr[idx];
-}
-
+u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx);
 
 struct radeon_cs_packet {
 	unsigned	idx;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 0e770bb..1cbd171 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -112,6 +112,15 @@ bool radeon_is_px(struct drm_device *dev)
 	return false;
 }
 
+u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
+{
+	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
+
+	if (ibc->kdata)
+		return ibc->kdata[idx];
+	return p->ib.ptr[idx];
+}
+
 /**
  * radeon_program_register_sequence - program an array of registers.
  *
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/8] Kbuild: add inline-account tool to find inline bloat
  2014-05-16 21:43 Fix some common inline bloat Andi Kleen
                   ` (6 preceding siblings ...)
  2014-05-16 21:43 ` [PATCH 7/8] radeon: Out of line radeon_get_ib_value Andi Kleen
@ 2014-05-16 21:43 ` Andi Kleen
  2014-05-17  8:31   ` Sam Ravnborg
  7 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2014-05-16 21:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, Andi Kleen, linux-kbuild, mmarek

From: Andi Kleen <ak@linux.intel.com>

Add a tool to hunt for inline bloat. It uses objdump -S to account
inlines.

Example output:

Total code bytes seen 10463206

Code bytes by functions:
Function                                           Total          Avg   Num
kmalloc                                            37132 (0.00%)  11    3310
ixgbe_read_reg                                     35440 (0.00%)  24    1444
spin_lock                                          28975 (0.00%)  11    2575
constant_test_bit                                  26387 (0.00%)  5     4642
arch_spin_unlock                                   24986 (0.00%)  7     3364
spin_unlock_irqrestore                             24928 (0.00%)  11    2258
readl                                              24584 (0.00%)  4     5344
writel                                             23199 (0.00%)  6     3643
perf_fetch_caller_regs                             22436 (0.00%)  27    821
get_current                                        22076 (0.00%)  9     2288
_radeon_msleep                                     19680 (0.00%)  55    353
INIT_LIST_HEAD                                     19410 (0.00%)  11    1747
list_del                                           19270 (0.00%)  16    1176
__ew32_prepare                                     19080 (0.00%)  25    740
__list_add                                         17830 (0.00%)  12    1406

Cc: linux-kbuild@vger.kernel.org
Cc: mmarek@suse.cz
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 scripts/inline-account.py | 164 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 164 insertions(+)
 create mode 100755 scripts/inline-account.py

diff --git a/scripts/inline-account.py b/scripts/inline-account.py
new file mode 100755
index 0000000..2dfbf7c
--- /dev/null
+++ b/scripts/inline-account.py
@@ -0,0 +1,164 @@
+#!/usr/bin/python
+# account code bytes per source code / functions from objdump -Sl output
+# useful to find inline bloat
+# Author: Andi Kleen
+import os, sys, re, argparse, multiprocessing
+from collections import Counter
+
+p = argparse.ArgumentParser(
+        description="""
+Account code bytes per source code / functions from objdump.
+Useful to find inline bloat.
+
+The line numbers are the beginning of a block, so the actual code can be later.
+Line numbers can be a also little off due to objdump bugs
+also some misaccounting can happen due to inexact gcc debug information.
+The number output for functions may account a single large function multiple
+times.  program/object files need to be built with -g.
+
+This is somewhat slow due to objdump -S being slow. It helps to have
+plenty of cores.""")
+p.add_argument('--min-bytes', type=int, help='minimum bytes to report', default=100)
+p.add_argument('--threads', '-t', type=int, default=multiprocessing.cpu_count(),
+               help='Number of objdump processes to run')
+p.add_argument('file', help='object file/program as input')
+args = p.parse_args()
+
+def get_syms(fn):
+    f = os.popen("nm  --print-size " + fn)
+    syms = []
+    pc = None
+    for l in f:
+        n = l.split()
+        if len(n) > 2 and n[2].upper() == "T":
+            pc = int(n[0], 16)
+            syms.append(pc)
+            ln = int(n[1], 16)
+    f.close()
+    if not pc:
+        sys.exit(fn + " has no symbols")
+    syms.append(pc + ln)
+    return syms
+
+class Account:
+    pass
+
+def add_account(a, b):
+    a.funcbytes += b.funcbytes
+    a.linebytes += b.linebytes
+    a.funccount += b.funccount
+    a.nolinebytes += a.nolinebytes
+    a.nofuncbytes += a.nofuncbytes
+    a.total += b.total
+    return a
+
+# dont add sys.exit here, causes deadlocks
+def account_range(r):
+    a = Account()
+    a.funcbytes = Counter()
+    a.linebytes = Counter()
+    a.funccount = Counter()
+    a.nolinebytes = 0
+    a.nofuncbytes = 0
+    a.total = 0
+
+    line = None
+    func = None
+    codefunc = None
+
+    cmd = ("objdump -Sl %s --start-address=%#x --stop-address=%#x" %
+                (args.file, r[0], r[1]))
+    f = os.popen(cmd)
+    for l in f:
+        #      250:       e8 00 00 00 00          callq  255 <proc_skip_spaces+0x5>
+        m = re.match(r'\s*([0-9a-fA-F]+):\s+(.*)', l)
+        if m:
+            #print "iscode", func, l,
+            bytes = len(re.findall(r'[0-9a-f][0-9a-f] ', m.group(2)))
+            if not func:
+                a.nofuncbytes += bytes
+                continue
+            if not line:
+                a.nolinebytes += bytes
+                continue
+            a.total += bytes
+            a.funcbytes[func] += bytes
+            a.linebytes[(file, line)] += bytes
+            codefunc = func
+            continue
+
+        # sysctl_init():
+        m = re.match(r'([a-zA-Z_][a-zA-Z0-9_]*)\(\):$', l)
+        if m:
+            if codefunc and m.group(1) != codefunc:
+                a.funccount[codefunc] += 1
+                codefunc = None
+            func = m.group(1)
+            continue
+
+        # /sysctl.c:1666
+        m = re.match(r'^([^:]+):(\d+)$', l)
+        if m:
+            file, line = m.group(1), int(m.group(2))
+            continue
+    f.close()
+
+    if codefunc:
+        a.funccount[codefunc] += 1
+    return a
+
+# objdump -S is slow, so we parallelize
+
+# split symbol table into chunks for parallelization
+# we split on functions boundaries to avoid mis-accounting
+# assumes functions have roughly similar length
+syms = sorted(get_syms(args.file))
+chunk = min((len(syms) - 1) / args.threads, len(syms) - 1)
+boundaries = [syms[x] for x in range(0, len(syms) - 1, chunk)] + [syms[-1]]
+ranges = [(boundaries[x], boundaries[x+1]) for x in range(0, len(boundaries) - 1)]
+assert ranges[0][0] == syms[0]
+assert ranges[-1][1] == syms[-1]
+
+# map-reduce
+if args.threads == 1:
+    al = map(account_range, ranges)
+else:
+    al = multiprocessing.Pool(args.threads).map(account_range, ranges)
+a = reduce(add_account, al)
+
+print "Total code bytes seen", a.total
+#print "Bytes with no function %d (%.2f%%)" % (a.nofuncbytes, 100.0*(float(a.nofuncbytes)/a.total))
+#print "Bytes with no lines %d (%.2f%%)" % (a.nolinebytes, 100.0*(float(a.nolinebytes)/a.total))
+
+def sort_map(m):
+    return sorted(m.keys(), key=lambda x: m[x], reverse=True)
+
+print "\nCode bytes by functions:"
+print "%-50s %-5s  %-5s   %-5s %-5s" % ("Function", "Total", "", "Avg", "Num")
+for j in sort_map(a.funcbytes):
+    if a.funcbytes[j] < args.min_bytes:
+        break
+    print "%-50s %-5d (%.2f%%)  %-5d %-5d" % (
+            j,
+            a.funcbytes[j],
+            a.funcbytes[j] / float(a.total),
+            a.funcbytes[j] / a.funccount[j],
+            a.funccount[j])
+
+for j in a.linebytes.keys():
+    if a.linebytes[j] < args.min_bytes:
+        del a.linebytes[j]
+
+# os.path.commonprefix fails with >50k entries
+# just use the first 10 
+prefix = os.path.commonprefix(map(lambda x: x[0], a.linebytes.keys()[:10]))
+
+print "\nCode bytes by nearby source line blocks:"
+print "prefix", prefix
+
+print "%-50s %-5s" % ("Line", "Total")
+for j in sort_map(a.linebytes):
+    print "%-50s %-5d (%.2f%%)" % (
+            "%s:%d" % (j[0].replace(prefix, ""), j[1]),
+            a.linebytes[j],
+            a.linebytes[j] / float(a.total))
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/8] list: Out of line INIT_LIST_HEAD and list_del
  2014-05-16 21:43 ` [PATCH 3/8] list: Out of line INIT_LIST_HEAD and list_del Andi Kleen
@ 2014-05-17  0:03   ` Dave Jones
  2014-05-17  2:37     ` Andi Kleen
  2014-05-17  0:03   ` Eric Dumazet
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-05-17  0:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, akpm, Andi Kleen

On Fri, May 16, 2014 at 02:43:10PM -0700, Andi Kleen wrote:

 > diff --git a/lib/Makefile b/lib/Makefile
 > index 0cd7b68..8b744f7 100644
 > --- a/lib/Makefile
 > +++ b/lib/Makefile
 > @@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 >  	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
 >  	 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
 >  	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 > -	 earlycpio.o
 > +	 earlycpio.o list.o

Hey Andi,

Did you test this with CONFIG_DEBUG_LIST ?
Unless I'm missing something, this looks like we'll have duplicate
symbols for list_del if that is set.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/8] list: Out of line INIT_LIST_HEAD and list_del
  2014-05-16 21:43 ` [PATCH 3/8] list: Out of line INIT_LIST_HEAD and list_del Andi Kleen
  2014-05-17  0:03   ` Dave Jones
@ 2014-05-17  0:03   ` Eric Dumazet
  1 sibling, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2014-05-17  0:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, akpm, Andi Kleen

On Fri, 2014-05-16 at 14:43 -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Out of lining these two inlines saves ~21k on my vmlinux
> 
> 14152713	2003976	1507328	17664017	10d8811	vmlinux-before-list
> 14131431	2008136	1507328	17646895	10d452f	vmlinux-list
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---

> +
> +void list_del(struct list_head *entry)
> +{
> +	__list_del(entry->prev, entry->next);
> +	entry->next = LIST_POISON1;
> +	entry->prev = LIST_POISON2;
> +}
> +EXPORT_SYMBOL(list_del);

Have you tried :

CONFIG_DEBUG_LIST=y

Function will be doubly defined/exported then....

BTW, my vmlinux is more like 

$ size vmlinux
   text	   data	    bss	    dec	    hex	filename
8233991	1743032	1904640	11881663	 b54cbf	vmlinux

Seems I beat you without even trying ;)



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/8] list: Out of line INIT_LIST_HEAD and list_del
  2014-05-17  0:03   ` Dave Jones
@ 2014-05-17  2:37     ` Andi Kleen
  0 siblings, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2014-05-17  2:37 UTC (permalink / raw)
  To: Dave Jones, Andi Kleen, linux-kernel, akpm, Andi Kleen

> Did you test this with CONFIG_DEBUG_LIST ?
> Unless I'm missing something, this looks like we'll have duplicate
> symbols for list_del if that is set.

Good point. Will fix.

BTW I only ran it in my limited config. I'm sure running it on 
different configs will show up other low hanging fruit ...

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/8] e1000e: Out of line __ew32_prepare/__ew32
  2014-05-16 21:43 ` [PATCH 4/8] e1000e: Out of line __ew32_prepare/__ew32 Andi Kleen
@ 2014-05-17  3:23   ` Stephen Hemminger
  0 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2014-05-17  3:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, akpm, Andi Kleen, jeffrey.t.kirsher, netdev

On Fri, 16 May 2014 14:43:11 -0700
Andi Kleen <andi@firstfloor.org> wrote:

> From: Andi Kleen <ak@linux.intel.com>
> 
> Out of lining these two common inlines saves about 30k text size,
> due to their errata workarounds.
> 
> 14131431	2008136	1507328	17646895	10d452f	vmlinux-before-e1000e
> 14101415	2004040	1507328	17612783	10cbfef	vmlinux-e1000e
> 
> Cc: jeffrey.t.kirsher@intel.com
> Cc: netdev@vger.kernel.org
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Since you are making a formerly private function global, the name
should be unique. I.e prefix it with e1000_



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] Kbuild: add inline-account tool to find inline bloat
  2014-05-16 21:43 ` [PATCH 8/8] Kbuild: add inline-account tool to find inline bloat Andi Kleen
@ 2014-05-17  8:31   ` Sam Ravnborg
  2014-05-17  9:36     ` Sam Ravnborg
  0 siblings, 1 reply; 23+ messages in thread
From: Sam Ravnborg @ 2014-05-17  8:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, akpm, Andi Kleen, linux-kbuild, mmarek

Hi Andi.

On Fri, May 16, 2014 at 02:43:15PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add a tool to hunt for inline bloat. It uses objdump -S to account
> inlines.
I tried this on my sparc32 build - but it failed with:
objdump: can't disassemble for architecture UNKNOWN!

It looks simple to add CROSS_COMPILE support but I did not do so.
My python skills are non-existing.

	Sam

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] Kbuild: add inline-account tool to find inline bloat
  2014-05-17  8:31   ` Sam Ravnborg
@ 2014-05-17  9:36     ` Sam Ravnborg
  2014-05-17 16:51       ` Andi Kleen
  0 siblings, 1 reply; 23+ messages in thread
From: Sam Ravnborg @ 2014-05-17  9:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, akpm, Andi Kleen, linux-kbuild, mmarek

On Sat, May 17, 2014 at 10:31:44AM +0200, Sam Ravnborg wrote:
> Hi Andi.
> 
> On Fri, May 16, 2014 at 02:43:15PM -0700, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > Add a tool to hunt for inline bloat. It uses objdump -S to account
> > inlines.
> I tried this on my sparc32 build - but it failed with:
> objdump: can't disassemble for architecture UNKNOWN!
> 
> It looks simple to add CROSS_COMPILE support but I did not do so.
> My python skills are non-existing.

Patched the calls to nm and objdump - but it gave no output
when I ran the script.

nm --print-size shows following output:
00002910 00000024 r CSWTCH.946
00002bd4 00000024 r CSWTCH.951
         U PDE_DATA
         U ROOT_DEV
000000fc 00000014 T SyS_accept
00002c98 000001a8 T SyS_accept4
00000fc4 0000008c T SyS_bind
00000eb4 00000094 T SyS_connect
00000d98 00000094 T SyS_getpeername
00000e2c 00000088 T SyS_getsockname
00000c6c 00000090 T SyS_getsockopt
00000f48 0000007c T SyS_listen
00000128 00000018 T SyS_recv
0000142c 000000f0 T SyS_recvfrom
00001920 000000d0 T SyS_recvmmsg
00001238 00000010 T SyS_recvmsg
00000110 00000018 T SyS_send
00001d38 00000010 T SyS_sendmmsg
00001db0 00000010 T SyS_sendmsg
000015f4 000000dc T SyS_sendto
00000cfc 0000009c T SyS_setsockopt
00000c0c 00000060 T SyS_shutdown
00003020 000000b4 T SyS_socket
000007ec 000001f8 T SyS_socketcall
00002e40 000001e0 T SyS_socketpair
000b776c 00000098 t T.1063
000b762c 000000d0 t T.1064


objdump -Sl shows following output:
000000d4 <sock_mmap>:
sock_mmap():
      d4:       9d e3 bf a0     save  %sp, -96, %sp
      d8:       c2 06 20 78     ld  [ %i0 + 0x78 ], %g1
      dc:       94 10 00 19     mov  %i1, %o2
      e0:       92 10 00 01     mov  %g1, %o1
      e4:       c2 00 60 18     ld  [ %g1 + 0x18 ], %g1
      e8:       c2 00 60 40     ld  [ %g1 + 0x40 ], %g1
      ec:       9f c0 40 00     call  %g1
      f0:       90 10 00 18     mov  %i0, %o0
      f4:       81 c7 e0 08     ret
      f8:       91 e8 00 08     restore  %g0, %o0, %o0

000000fc <SyS_accept>:
sys_accept():
      fc:       96 10 20 00     clr  %o3
     100:       82 13 c0 00     mov  %o7, %g1
     104:       40 00 00 00     call  104 <SyS_accept+0x8>
     108:       9e 10 40 00     mov  %g1, %o7
     10c:       01 00 00 00     nop

00000110 <SyS_send>:
SyS_send():
     110:       98 10 20 00     clr  %o4        ! 0 <sock_from_file-0x4c>
     114:       9a 10 20 00     clr  %o5
     118:       82 13 c0 00     mov  %o7, %g1
     11c:       40 00 00 00     call  11c <SyS_send+0xc>
     120:       9e 10 40 00     mov  %g1, %o7
     124:       01 00 00 00     nop

00000128 <SyS_recv>:
sys_recv():
     128:       98 10 20 00     clr  %o4        ! 0 <sock_from_file-0x4c>
     12c:       9a 10 20 00     clr  %o5
     130:       82 13 c0 00     mov  %o7, %g1
     134:       40 00 00 00     call  134 <SyS_recv+0xc>
     138:       9e 10 40 00     mov  %g1, %o7
     13c:       01 00 00 00     nop

	Sam

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] Kbuild: add inline-account tool to find inline bloat
  2014-05-17  9:36     ` Sam Ravnborg
@ 2014-05-17 16:51       ` Andi Kleen
  0 siblings, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2014-05-17 16:51 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: Andi Kleen, linux-kernel, akpm, Andi Kleen, linux-kbuild, mmarek

> Patched the calls to nm and objdump - but it gave no output
> when I ran the script.

You have to compile with debug info on.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg
  2014-05-16 21:43 ` [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg Andi Kleen
@ 2014-05-19  9:14   ` David Laight
  2014-05-19 22:00   ` Rustad, Mark D
  1 sibling, 0 replies; 23+ messages in thread
From: David Laight @ 2014-05-19  9:14 UTC (permalink / raw)
  To: 'Andi Kleen', linux-kernel@vger.kernel.org
  Cc: akpm@linux-foundation.org, Andi Kleen, netdev@vger.kernel.org,
	Jeff Kirsher

From: Andi Kleen
> ixgbe_read_reg and ixgbe_write_reg are frequently called and are very big
> because they have complex error handling code.

Have you measured the performance impact?
I suspect that it might me measurable.

Clearly the calls during initialisation don't need to be inline,
but there will be some in the normal tx and rx paths.

	David

> Moving them out of line saves ~27k text in the ixgbe driver.
> 
>    text	   data	    bss	    dec	    hex	filename
> 14220873	2008072	1507328	17736273	10ea251	vmlinux-before-ixgbe
> 14193673	2003976	1507328	17704977	10e2811	vmlinux-ixgbe
> 
> Cc: netdev@vger.kernel.org
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_common.h | 22 ++--------------------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 22 ++++++++++++++++++++++
>  2 files changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
> index f12c40f..05f094d 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
> @@ -162,28 +162,10 @@ static inline void writeq(u64 val, void __iomem *addr)
>  }
>  #endif
> 
> -static inline void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value)
> -{
> -	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
> +void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value);
> +u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg);
> 
> -	if (ixgbe_removed(reg_addr))
> -		return;
> -	writeq(value, reg_addr + reg);
> -}
>  #define IXGBE_WRITE_REG64(a, reg, value) ixgbe_write_reg64((a), (reg), (value))
> -
> -static inline u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg)
> -{
> -	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
> -	u32 value;
> -
> -	if (ixgbe_removed(reg_addr))
> -		return IXGBE_FAILED_READ_REG;
> -	value = readl(reg_addr + reg);
> -	if (unlikely(value == IXGBE_FAILED_READ_REG))
> -		ixgbe_check_remove(hw, reg);
> -	return value;
> -}
>  #define IXGBE_READ_REG(a, reg) ixgbe_read_reg((a), (reg))
> 
>  #define IXGBE_WRITE_REG_ARRAY(a, reg, offset, value) \
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index d62e7a2..5f81f62 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -371,6 +371,28 @@ void ixgbe_write_pci_cfg_word(struct ixgbe_hw *hw, u32 reg, u16 value)
>  	pci_write_config_word(adapter->pdev, reg, value);
>  }
> 
> +void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value)
> +{
> +	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
> +
> +	if (ixgbe_removed(reg_addr))
> +		return;
> +	writeq(value, reg_addr + reg);
> +}
> +
> +u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg)
> +{
> +	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
> +	u32 value;
> +
> +	if (ixgbe_removed(reg_addr))
> +		return IXGBE_FAILED_READ_REG;
> +	value = readl(reg_addr + reg);
> +	if (unlikely(value == IXGBE_FAILED_READ_REG))
> +		ixgbe_check_remove(hw, reg);
> +	return value;
> +}
> +
>  static void ixgbe_service_event_complete(struct ixgbe_adapter *adapter)
>  {
>  	BUG_ON(!test_bit(__IXGBE_SERVICE_SCHED, &adapter->state));
> --
> 1.9.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg
  2014-05-16 21:43 ` [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg Andi Kleen
  2014-05-19  9:14   ` David Laight
@ 2014-05-19 22:00   ` Rustad, Mark D
  2014-05-19 23:25     ` Andi Kleen
  1 sibling, 1 reply; 23+ messages in thread
From: Rustad, Mark D @ 2014-05-19 22:00 UTC (permalink / raw)
  To: Andi Kleen, Kirsher, Jeffrey T
  Cc: <linux-kernel@vger.kernel.org>, akpm@linux-foundation.org,
	Andi Kleen, Netdev

On May 16, 2014, at 2:43 PM, Andi Kleen <andi@firstfloor.org> wrote:

> From: Andi Kleen <ak@linux.intel.com>
> 
> ixgbe_read_reg and ixgbe_write_reg are frequently called and are very big
> because they have complex error handling code.

Actually, this patch doesn't do anything to ixgbe_write_reg, which would almost certainly be very bad for performance, but instead changes ixgbe_write_reg64. The latter is not in a performance-sensitive path, but is only called from one site, so there is little reason to take it out-of-line.

I already have a patch in queue to make ixgbe_read_reg out-of-line, because it does have a very costly memory footprint inline, as you have found.

> Moving them out of line saves ~27k text in the ixgbe driver.
> 
>   text	   data	    bss	    dec	    hex	filename
> 14220873	2008072	1507328	17736273	10ea251	vmlinux-before-ixgbe
> 14193673	2003976	1507328	17704977	10e2811	vmlinux-ixgbe
> 
> Cc: netdev@vger.kernel.org
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_common.h | 22 ++--------------------
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 22 ++++++++++++++++++++++
> 2 files changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
> index f12c40f..05f094d 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
> @@ -162,28 +162,10 @@ static inline void writeq(u64 val, void __iomem *addr)
> }
> #endif
> 
> -static inline void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value)
> -{
> -	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
> +void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value);
> +u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg);
> 
> -	if (ixgbe_removed(reg_addr))
> -		return;
> -	writeq(value, reg_addr + reg);
> -}
> #define IXGBE_WRITE_REG64(a, reg, value) ixgbe_write_reg64((a), (reg), (value))
> -
> -static inline u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg)
> -{
> -	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
> -	u32 value;
> -
> -	if (ixgbe_removed(reg_addr))
> -		return IXGBE_FAILED_READ_REG;
> -	value = readl(reg_addr + reg);
> -	if (unlikely(value == IXGBE_FAILED_READ_REG))
> -		ixgbe_check_remove(hw, reg);
> -	return value;
> -}
> #define IXGBE_READ_REG(a, reg) ixgbe_read_reg((a), (reg))
> 
> #define IXGBE_WRITE_REG_ARRAY(a, reg, offset, value) \
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index d62e7a2..5f81f62 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -371,6 +371,28 @@ void ixgbe_write_pci_cfg_word(struct ixgbe_hw *hw, u32 reg, u16 value)
> 	pci_write_config_word(adapter->pdev, reg, value);
> }
> 
> +void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value)
> +{
> +	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
> +
> +	if (ixgbe_removed(reg_addr))
> +		return;
> +	writeq(value, reg_addr + reg);
> +}
> +
> +u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg)
> +{
> +	u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
> +	u32 value;
> +
> +	if (ixgbe_removed(reg_addr))
> +		return IXGBE_FAILED_READ_REG;
> +	value = readl(reg_addr + reg);
> +	if (unlikely(value == IXGBE_FAILED_READ_REG))
> +		ixgbe_check_remove(hw, reg);
> +	return value;
> +}
> +
> static void ixgbe_service_event_complete(struct ixgbe_adapter *adapter)
> {
> 	BUG_ON(!test_bit(__IXGBE_SERVICE_SCHED, &adapter->state));
> -- 
> 1.9.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Mark Rustad, Networking Division, Intel Corporation


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg
  2014-05-19 22:00   ` Rustad, Mark D
@ 2014-05-19 23:25     ` Andi Kleen
  2014-05-20 17:06       ` Rustad, Mark D
  0 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2014-05-19 23:25 UTC (permalink / raw)
  To: Rustad, Mark D
  Cc: Andi Kleen, Kirsher, Jeffrey T,
	<linux-kernel@vger.kernel.org>, akpm@linux-foundation.org,
	Netdev

On Mon, May 19, 2014 at 10:00:52PM +0000, Rustad, Mark D wrote:
> On May 16, 2014, at 2:43 PM, Andi Kleen <andi@firstfloor.org> wrote:
> 
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > ixgbe_read_reg and ixgbe_write_reg are frequently called and are very big
> > because they have complex error handling code.
> 
> Actually, this patch doesn't do anything to ixgbe_write_reg, which would almost certainly be very bad for performance, but instead changes ixgbe_write_reg64.

I doubt a few cycles around the write make a lot of difference for MMIO. MMIO is dominated
by other things.

> The latter is not in a performance-sensitive path, but is only called from one site, so there is little reason to take it out-of-line.

True I moved the wrong one.

ixgbe_write_reg                                    3305  (0.00%)  8     409  


> I already have a patch in queue to make ixgbe_read_reg out-of-line, because it does have a very costly memory footprint inline, as you have found.

Please move write_reg too.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] radeon: Out of line radeon_get_ib_value
  2014-05-16 21:43 ` [PATCH 7/8] radeon: Out of line radeon_get_ib_value Andi Kleen
@ 2014-05-20 16:16   ` Marek Olšák
  2014-05-20 17:04     ` Andi Kleen
  2014-05-20 18:14     ` Christian König
  0 siblings, 2 replies; 23+ messages in thread
From: Marek Olšák @ 2014-05-20 16:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, Deucher, Alexander, akpm, Andi Kleen, dri-devel

I think the function should stay in the header file. It's used in
performance-critical code, so we want it to be inlined.

Marek

On Fri, May 16, 2014 at 11:43 PM, Andi Kleen <andi@firstfloor.org> wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> Saves about 5k of text
>
>    text    data     bss     dec     hex filename
> 14080360        2008168 1507328 17595856        10c7dd0 vmlinux-before-radeon
> 14074978        2008168 1507328 17590474        10c68ca vmlinux-radeon
>
> Cc: alexander.deucher@amd.com
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  drivers/gpu/drm/radeon/radeon.h        | 10 +---------
>  drivers/gpu/drm/radeon/radeon_device.c |  9 +++++++++
>  2 files changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 6852861..8cae409 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -1032,15 +1032,7 @@ struct radeon_cs_parser {
>         struct ww_acquire_ctx   ticket;
>  };
>
> -static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
> -{
> -       struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
> -
> -       if (ibc->kdata)
> -               return ibc->kdata[idx];
> -       return p->ib.ptr[idx];
> -}
> -
> +u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx);
>
>  struct radeon_cs_packet {
>         unsigned        idx;
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 0e770bb..1cbd171 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -112,6 +112,15 @@ bool radeon_is_px(struct drm_device *dev)
>         return false;
>  }
>
> +u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
> +{
> +       struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
> +
> +       if (ibc->kdata)
> +               return ibc->kdata[idx];
> +       return p->ib.ptr[idx];
> +}
> +
>  /**
>   * radeon_program_register_sequence - program an array of registers.
>   *
> --
> 1.9.0
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] radeon: Out of line radeon_get_ib_value
  2014-05-20 16:16   ` Marek Olšák
@ 2014-05-20 17:04     ` Andi Kleen
  2014-05-20 18:14     ` Christian König
  1 sibling, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2014-05-20 17:04 UTC (permalink / raw)
  To: Marek Olšák
  Cc: Andi Kleen, linux-kernel, Deucher, Alexander, akpm, dri-devel

On Tue, May 20, 2014 at 06:16:48PM +0200, Marek Olšák wrote:
> I think the function should stay in the header file. It's used in
> performance-critical code, so we want it to be inlined.

This doesn't make any sense. If it's talking to the hardware it will
be dominated by the cache misses.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg
  2014-05-19 23:25     ` Andi Kleen
@ 2014-05-20 17:06       ` Rustad, Mark D
  0 siblings, 0 replies; 23+ messages in thread
From: Rustad, Mark D @ 2014-05-20 17:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andi Kleen, Kirsher, Jeffrey T,
	<linux-kernel@vger.kernel.org>, akpm@linux-foundation.org,
	Netdev

On May 19, 2014, at 4:25 PM, Andi Kleen <ak@linux.intel.com> wrote:

> On Mon, May 19, 2014 at 10:00:52PM +0000, Rustad, Mark D wrote:
>> On May 16, 2014, at 2:43 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> 
>>> From: Andi Kleen <ak@linux.intel.com>
>>> 
>>> ixgbe_read_reg and ixgbe_write_reg are frequently called and are very big
>>> because they have complex error handling code.
>> 
>> Actually, this patch doesn't do anything to ixgbe_write_reg, which would almost certainly be very bad for performance, but instead changes ixgbe_write_reg64.
> 
> I doubt a few cycles around the write make a lot of difference for MMIO. MMIO is dominated
> by other things.
> 
>> The latter is not in a performance-sensitive path, but is only called from one site, so there is little reason to take it out-of-line.
> 
> True I moved the wrong one.
> 
> ixgbe_write_reg                                    3305  (0.00%)  8     409  
> 
> 
>> I already have a patch in queue to make ixgbe_read_reg out-of-line, because it does have a very costly memory footprint inline, as you have found.
> 
> Please move write_reg too.

I will take a look at moving most of them out-of-line. There are just a few in very hot paths that should remain inline.

-- 
Mark Rustad, Networking Division, Intel Corporation


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] radeon: Out of line radeon_get_ib_value
  2014-05-20 16:16   ` Marek Olšák
  2014-05-20 17:04     ` Andi Kleen
@ 2014-05-20 18:14     ` Christian König
  1 sibling, 0 replies; 23+ messages in thread
From: Christian König @ 2014-05-20 18:14 UTC (permalink / raw)
  To: Marek Olšák, Andi Kleen
  Cc: Deucher, Alexander, akpm, Andi Kleen, linux-kernel, dri-devel

Yeah, agree. That function is quite critical for command stream parsing 
and patching.

Christian.

Am 20.05.2014 18:16, schrieb Marek Olšák:
> I think the function should stay in the header file. It's used in
> performance-critical code, so we want it to be inlined.
>
> Marek
>
> On Fri, May 16, 2014 at 11:43 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> From: Andi Kleen <ak@linux.intel.com>
>>
>> Saves about 5k of text
>>
>>     text    data     bss     dec     hex filename
>> 14080360        2008168 1507328 17595856        10c7dd0 vmlinux-before-radeon
>> 14074978        2008168 1507328 17590474        10c68ca vmlinux-radeon
>>
>> Cc: alexander.deucher@amd.com
>> Cc: dri-devel@lists.freedesktop.org
>> Signed-off-by: Andi Kleen <ak@linux.intel.com>
>> ---
>>   drivers/gpu/drm/radeon/radeon.h        | 10 +---------
>>   drivers/gpu/drm/radeon/radeon_device.c |  9 +++++++++
>>   2 files changed, 10 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
>> index 6852861..8cae409 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -1032,15 +1032,7 @@ struct radeon_cs_parser {
>>          struct ww_acquire_ctx   ticket;
>>   };
>>
>> -static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
>> -{
>> -       struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
>> -
>> -       if (ibc->kdata)
>> -               return ibc->kdata[idx];
>> -       return p->ib.ptr[idx];
>> -}
>> -
>> +u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx);
>>
>>   struct radeon_cs_packet {
>>          unsigned        idx;
>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>> index 0e770bb..1cbd171 100644
>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>> @@ -112,6 +112,15 @@ bool radeon_is_px(struct drm_device *dev)
>>          return false;
>>   }
>>
>> +u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
>> +{
>> +       struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
>> +
>> +       if (ibc->kdata)
>> +               return ibc->kdata[idx];
>> +       return p->ib.ptr[idx];
>> +}
>> +
>>   /**
>>    * radeon_program_register_sequence - program an array of registers.
>>    *
>> --
>> 1.9.0
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-05-20 18:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-16 21:43 Fix some common inline bloat Andi Kleen
2014-05-16 21:43 ` [PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg Andi Kleen
2014-05-19  9:14   ` David Laight
2014-05-19 22:00   ` Rustad, Mark D
2014-05-19 23:25     ` Andi Kleen
2014-05-20 17:06       ` Rustad, Mark D
2014-05-16 21:43 ` [PATCH 2/8] radeonfb: Out of line errata workarounds Andi Kleen
2014-05-16 21:43 ` [PATCH 3/8] list: Out of line INIT_LIST_HEAD and list_del Andi Kleen
2014-05-17  0:03   ` Dave Jones
2014-05-17  2:37     ` Andi Kleen
2014-05-17  0:03   ` Eric Dumazet
2014-05-16 21:43 ` [PATCH 4/8] e1000e: Out of line __ew32_prepare/__ew32 Andi Kleen
2014-05-17  3:23   ` Stephen Hemminger
2014-05-16 21:43 ` [PATCH 5/8] x86: Out of line get_dma_ops Andi Kleen
2014-05-16 21:43 ` [PATCH 6/8] ftrace: Out of line ftrace_trigger_soft_disabled Andi Kleen
2014-05-16 21:43 ` [PATCH 7/8] radeon: Out of line radeon_get_ib_value Andi Kleen
2014-05-20 16:16   ` Marek Olšák
2014-05-20 17:04     ` Andi Kleen
2014-05-20 18:14     ` Christian König
2014-05-16 21:43 ` [PATCH 8/8] Kbuild: add inline-account tool to find inline bloat Andi Kleen
2014-05-17  8:31   ` Sam Ravnborg
2014-05-17  9:36     ` Sam Ravnborg
2014-05-17 16:51       ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox