From: Helmut Schaa <helmut.schaa@googlemail.com>
To: "Rafał Miłecki" <zajec5@gmail.com>
Cc: Ivo Van Doorn <ivdoorn@gmail.com>,
"John W. Linville" <linville@tuxdriver.com>,
linux-wireless@vger.kernel.org, users@rt2x00.serialmonkey.com
Subject: Re: [PATCH 21/23] rt2x00: Optimize register access in rt2800usb
Date: Mon, 18 Apr 2011 16:48:43 +0200 [thread overview]
Message-ID: <201104181648.43967.helmut.schaa@googlemail.com> (raw)
In-Reply-To: <BANLkTimOsO1RSO-+n97a0WiQR__f_sgcUQ@mail.gmail.com>
Hi,
Am Montag, 18. April 2011 schrieb Ivo Van Doorn:
> > Wouldn't this be better to create two pointers in struct rt2x00_dev.
> > One for writing function and one for reading function? Am I right
> > thinking calling functions by pointers is quite fast? Or is this still
> > noticeably slower than using proper functions directly?
>
> We already have the pointer inside struct rt2x00_dev which references
> the register access functions for rt2800pci/usb. These pointers are used
> by rt2800lib to access the common registers. What this patch does, is
> optimize the case where we exactly know which function we need, because
> we are in the actual driver.
>
> As for the performance, I'll let Helmut comment on that as he created patch 20,
> which introduced this change to rt2800pci. :)
Sure, I was comparing some assembly in the rt2800pci hotpaths (on a 380Mhz
MIPS CPU btw). A register read/write on PCI is just a readl or writel,
nothing more but using the indirect wrappers we get something like this
(This is x86_64 as I didn't want to cross compile right now). For example
the register read + write in rt2800pci_enable_interrupt (which is called
in every tasklet invocation, which can happen for every rx'ed frame and
every tx'ed frame).
movq 8(%rbx), %rax # rt2x00dev_1(D)->ops, rt2x00dev_1(D)->ops
leaq -36(%rbp), %rdx #, tmp82
movq %rbx, %rdi # rt2x00dev,
movq 72(%rax), %rax # D.47612_27->drv, D.47612_27->drv
movl $516, %esi #,
call *(%rax) # rt2800ops_29->register_read
movb %r14b, %cl #,
movq 8(%rbx), %rax # rt2x00dev_1(D)->ops, rt2x00dev_1(D)->ops
movq %rbx, %rdi # rt2x00dev,
movq 72(%rax), %rax # D.47619_31->drv, D.47619_31->drv
movl $516, %esi #,
movl $1, %edx #, reg.119
sall %cl, %edx #, reg.119
andl %r13d, %edx # irq_field$bit_mask, reg.119
notl %r13d # tmp89
andl -36(%rbp), %r13d # reg, tmp89
orl %r13d, %edx # tmp89, reg.119
movl %edx, -36(%rbp) # reg.119, reg
call *16(%rax) # rt2800ops_33->register_write
Also, this will trigger rt2x00pci_register_read
pushq %rbp #
mov %esi, %esi # offset, addr.27
movq %rsp, %rbp #,
addq 1056(%rdi), %rsi # rt2x00dev_1(D)->csr.base, addr.27
movl %eax, (%rdx) # ret,* value
And rt2x00pci_register_write:
pushq %rbp #
mov %esi, %esi # offset, addr.26
movq %rsp, %rbp #,
addq 1056(%rdi), %rsi # rt2x00dev_1(D)->csr.base, addr.26
movl %edx,(%rsi) # value,* addr.26
And here the same when using rt2x00pci_register_read/write directly:
movq 1056(%rbx), %rax # rt2x00dev_1(D)->csr.base, rt2x00dev_1(D)->csr.base
movl 516(%rax),%eax #, reg.119
movl %r13d, %edx # irq_field$bit_mask, tmp80
movb %r14b, %cl #,
notl %edx # tmp80
andl %edx, %eax # tmp80, reg.119
movl $1, %edx #, tmp85
sall %cl, %edx #, tmp85
andl %r13d, %edx # irq_field$bit_mask, tmp85
orl %edx, %eax # tmp85, reg.119
movq 1056(%rbx), %rdx # rt2x00dev_1(D)->csr.base, rt2x00dev_1(D)->csr.base
movl %eax,516(%rdx) # reg.119,
As you can see we save more then just one indirect function call:
17 movs -> 7 movs
2 calls -> 0 calls
1 add -> 0 adds
This happens because the compiler is able to apply a number of optimizations
that are only possible by inlining rt2x00pci_register_read/write. When using
the indirect function call the compiler is not able to inline them.
So, I first thought about using direct calls only in the interrupt handler
and the RX/TX hotpaths but since using rt2800_register_read and
rt2x00pci_register_read in different locations in rt2800pci would be even
more confusing I just replaced every rt2800_register_read with
rt2x00pci_register_read in rt2800pci.
One way to keep the abstraction and still improve the register_read/write
operations would be to introduce a inlined rt2800pci_register_read/write
which directly calls rt2x00pci_register_read/write and provide that via
rt2800_ops to rt2800lib. That way all calls in rt2800pci can directly
inline rt2x00_register_read/write while rt2800lib will still use indirect
calls to do the same.
However, I didn't see any need for this.
Helmut
next prev parent reply other threads:[~2011-04-18 14:50 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-18 13:26 [PATCH 01/23] Enable WLAN LED on Ralink SoC (rt305x) devices Ivo van Doorn
2011-04-18 13:26 ` [PATCH 02/23] rt2x00: Fix stuck queue in tx failure case Ivo van Doorn
2011-04-18 13:27 ` [PATCH 03/23] rt2x00: Split rt2x00dev->flags Ivo van Doorn
2011-04-18 13:27 ` [PATCH 04/23] rt2x00: Make rt2x00_queue_entry_for_each more flexible Ivo van Doorn
2011-04-18 13:28 ` [PATCH 05/23] rt2x00: Use correct TBTT_SYNC config in AP mode Ivo van Doorn
2011-04-18 13:28 ` [PATCH 06/23] rt2x00: Update TX_SW_CFG2 init value Ivo van Doorn
2011-04-18 13:28 ` [PATCH 07/23] rt2x00: Use TXOP_HTTXOP for beacons Ivo van Doorn
2011-04-18 13:29 ` [PATCH 08/23] rt2800usb: read TX_STA_FIFO asynchronously Ivo van Doorn
2011-04-18 13:29 ` [PATCH 09/23] rt2x00: fix queue timeout checks Ivo van Doorn
2011-04-18 13:30 ` [PATCH 10/23] rt2800usb: handle TX status timeouts Ivo van Doorn
2011-04-18 13:30 ` [PATCH 11/23] rt2800usb: add timer to handle TX_STA_FIFO Ivo van Doorn
2011-04-18 13:31 ` [PATCH 12/23] Decrease association time for USB devices Ivo van Doorn
2011-04-18 13:31 ` [PATCH 13/23] rt2x00: Always inline rt2x00pci_enable_interrupt Ivo van Doorn
2011-04-18 13:31 ` [PATCH 14/23] rt2x00: Linksys WUSB600N rev2 is a RT3572 device Ivo van Doorn
2011-04-18 13:32 ` [PATCH 15/23] rt2x00: Allow dynamic addition of PCI/USB IDs Ivo van Doorn
2011-04-18 13:32 ` [PATCH 16/23] rt2x00: Add USB IDs Ivo van Doorn
2011-04-18 13:33 ` [PATCH 17/23] rt2x00: RT33xx device support is no longer experimental Ivo van Doorn
2011-04-18 13:33 ` [PATCH 18/23] rt2x00: Enable support for RT53xx PCI devices by default Ivo van Doorn
2011-04-18 13:33 ` [PATCH 19/23] rt2x00: Merge rt2x00ht.c contents in other files Ivo van Doorn
2011-04-18 13:34 ` [PATCH 20/23] rt2x00: Optimize register access in rt2800pci Ivo van Doorn
2011-04-18 13:34 ` [PATCH 21/23] rt2x00: Optimize register access in rt2800usb Ivo van Doorn
2011-04-18 13:34 ` [PATCH 22/23] rt2x00: Implement get_ringparam callback function Ivo van Doorn
2011-04-18 13:35 ` [PATCH 23/23] rt2x00: Implement get_antenna and set_antenna callback functions Ivo van Doorn
2011-04-18 13:56 ` [PATCH 21/23] rt2x00: Optimize register access in rt2800usb Rafał Miłecki
2011-04-18 14:06 ` Ivo Van Doorn
2011-04-18 14:14 ` Rafał Miłecki
2011-04-18 14:48 ` Helmut Schaa [this message]
2011-04-18 15:02 ` Rafał Miłecki
2011-04-28 2:55 ` [PATCH 04/23] rt2x00: Make rt2x00_queue_entry_for_each more flexible Yasushi SHOJI
2011-04-28 2:55 ` Yasushi SHOJI
2011-04-28 18:55 ` Ivo Van Doorn
2011-04-29 6:06 ` Gertjan van Wingerde
2011-04-30 14:01 ` Ivo van Doorn
2011-05-02 13:33 ` Yasushi SHOJI
2011-05-02 19:24 ` Ivo van Doorn
2011-05-09 8:08 ` Yasushi SHOJI
2011-05-09 8:50 ` Ivo Van Doorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201104181648.43967.helmut.schaa@googlemail.com \
--to=helmut.schaa@googlemail.com \
--cc=ivdoorn@gmail.com \
--cc=linux-wireless@vger.kernel.org \
--cc=linville@tuxdriver.com \
--cc=users@rt2x00.serialmonkey.com \
--cc=zajec5@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.