linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Helmut Schaa <helmut.schaa@googlemail.com>
To: "Rafał Miłecki" <zajec5@gmail.com>
Cc: Ivo Van Doorn <ivdoorn@gmail.com>,
	"John W. Linville" <linville@tuxdriver.com>,
	linux-wireless@vger.kernel.org, users@rt2x00.serialmonkey.com
Subject: Re: [PATCH 21/23] rt2x00: Optimize register access in rt2800usb
Date: Mon, 18 Apr 2011 16:48:43 +0200	[thread overview]
Message-ID: <201104181648.43967.helmut.schaa@googlemail.com> (raw)
In-Reply-To: <BANLkTimOsO1RSO-+n97a0WiQR__f_sgcUQ@mail.gmail.com>

Hi,

Am Montag, 18. April 2011 schrieb Ivo Van Doorn:
> > Wouldn't this be better to create two pointers in struct rt2x00_dev.
> > One for writing function and one for reading function? Am I right
> > thinking calling functions by pointers is quite fast? Or is this still
> > noticeably slower than using proper functions directly?
> 
> We already have the pointer inside struct rt2x00_dev which references
> the register access functions for rt2800pci/usb. These pointers are used
> by rt2800lib to access the common registers. What this patch does, is
> optimize the case where we exactly know which function we need, because
> we are in the actual driver.
> 
> As for the performance, I'll let Helmut comment on that as he created patch 20,
> which introduced this change to rt2800pci. :)

Sure, I was comparing some assembly in the rt2800pci hotpaths (on a 380Mhz
MIPS CPU btw). A register read/write on PCI is just a readl or writel,
nothing more but using the indirect wrappers we get something like this
(This is x86_64 as I didn't want to cross compile right now). For example
the register read + write in rt2800pci_enable_interrupt (which is called
in every tasklet invocation, which can happen for every rx'ed frame and
every tx'ed frame).

movq    8(%rbx), %rax   # rt2x00dev_1(D)->ops, rt2x00dev_1(D)->ops
leaq    -36(%rbp), %rdx #, tmp82
movq    %rbx, %rdi      # rt2x00dev,
movq    72(%rax), %rax  # D.47612_27->drv, D.47612_27->drv
movl    $516, %esi      #,
call    *(%rax) # rt2800ops_29->register_read
movb    %r14b, %cl      #,
movq    8(%rbx), %rax   # rt2x00dev_1(D)->ops, rt2x00dev_1(D)->ops
movq    %rbx, %rdi      # rt2x00dev,
movq    72(%rax), %rax  # D.47619_31->drv, D.47619_31->drv
movl    $516, %esi      #,
movl    $1, %edx        #, reg.119
sall    %cl, %edx       #, reg.119
andl    %r13d, %edx     # irq_field$bit_mask, reg.119
notl    %r13d   # tmp89
andl    -36(%rbp), %r13d        # reg, tmp89
orl     %r13d, %edx     # tmp89, reg.119
movl    %edx, -36(%rbp) # reg.119, reg
call    *16(%rax)       # rt2800ops_33->register_write

Also, this will trigger rt2x00pci_register_read

pushq   %rbp    #
mov     %esi, %esi      # offset, addr.27
movq    %rsp, %rbp      #,
addq    1056(%rdi), %rsi        # rt2x00dev_1(D)->csr.base, addr.27
movl    %eax, (%rdx)    # ret,* value

And rt2x00pci_register_write:

pushq   %rbp    #
mov     %esi, %esi      # offset, addr.26
movq    %rsp, %rbp      #,
addq    1056(%rdi), %rsi        # rt2x00dev_1(D)->csr.base, addr.26
movl 	%edx,(%rsi)        # value,* addr.26

And here the same when using rt2x00pci_register_read/write directly:

movq    1056(%rbx), %rax        # rt2x00dev_1(D)->csr.base, rt2x00dev_1(D)->csr.base
movl 	516(%rax),%eax     #, reg.119
movl    %r13d, %edx     # irq_field$bit_mask, tmp80
movb    %r14b, %cl      #,
notl    %edx    # tmp80
andl    %edx, %eax      # tmp80, reg.119
movl    $1, %edx        #, tmp85
sall    %cl, %edx       #, tmp85
andl    %r13d, %edx     # irq_field$bit_mask, tmp85
orl     %edx, %eax      # tmp85, reg.119
movq    1056(%rbx), %rdx        # rt2x00dev_1(D)->csr.base, rt2x00dev_1(D)->csr.base
movl 	%eax,516(%rdx)     # reg.119,

As you can see we save more then just one indirect function call:

17 movs -> 7 movs
2 calls -> 0 calls
1 add -> 0 adds

This happens because the compiler is able to apply a number of optimizations
that are only possible by inlining rt2x00pci_register_read/write. When using
the indirect function call the compiler is not able to inline them.

So, I first thought about using direct calls only in the interrupt handler
and the RX/TX hotpaths but since using rt2800_register_read and
rt2x00pci_register_read in different locations in rt2800pci would be even
more confusing I just replaced every rt2800_register_read with
rt2x00pci_register_read in rt2800pci.

One way to keep the abstraction and still improve the register_read/write
operations would be to introduce a inlined rt2800pci_register_read/write
which directly calls rt2x00pci_register_read/write and provide that via
rt2800_ops to rt2800lib. That way all calls in rt2800pci can directly
inline rt2x00_register_read/write while rt2800lib will still use indirect
calls to do the same.

However, I didn't see any need for this.

Helmut

  parent reply	other threads:[~2011-04-18 14:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-18 13:26 [PATCH 01/23] Enable WLAN LED on Ralink SoC (rt305x) devices Ivo van Doorn
2011-04-18 13:26 ` [PATCH 02/23] rt2x00: Fix stuck queue in tx failure case Ivo van Doorn
2011-04-18 13:27   ` [PATCH 03/23] rt2x00: Split rt2x00dev->flags Ivo van Doorn
2011-04-18 13:27     ` [PATCH 04/23] rt2x00: Make rt2x00_queue_entry_for_each more flexible Ivo van Doorn
2011-04-18 13:28       ` [PATCH 05/23] rt2x00: Use correct TBTT_SYNC config in AP mode Ivo van Doorn
2011-04-18 13:28         ` [PATCH 06/23] rt2x00: Update TX_SW_CFG2 init value Ivo van Doorn
2011-04-18 13:28           ` [PATCH 07/23] rt2x00: Use TXOP_HTTXOP for beacons Ivo van Doorn
2011-04-18 13:29             ` [PATCH 08/23] rt2800usb: read TX_STA_FIFO asynchronously Ivo van Doorn
2011-04-18 13:29               ` [PATCH 09/23] rt2x00: fix queue timeout checks Ivo van Doorn
2011-04-18 13:30                 ` [PATCH 10/23] rt2800usb: handle TX status timeouts Ivo van Doorn
2011-04-18 13:30                   ` [PATCH 11/23] rt2800usb: add timer to handle TX_STA_FIFO Ivo van Doorn
2011-04-18 13:31                     ` [PATCH 12/23] Decrease association time for USB devices Ivo van Doorn
2011-04-18 13:31                       ` [PATCH 13/23] rt2x00: Always inline rt2x00pci_enable_interrupt Ivo van Doorn
2011-04-18 13:31                         ` [PATCH 14/23] rt2x00: Linksys WUSB600N rev2 is a RT3572 device Ivo van Doorn
2011-04-18 13:32                           ` [PATCH 15/23] rt2x00: Allow dynamic addition of PCI/USB IDs Ivo van Doorn
2011-04-18 13:32                             ` [PATCH 16/23] rt2x00: Add USB IDs Ivo van Doorn
2011-04-18 13:33                               ` [PATCH 17/23] rt2x00: RT33xx device support is no longer experimental Ivo van Doorn
2011-04-18 13:33                                 ` [PATCH 18/23] rt2x00: Enable support for RT53xx PCI devices by default Ivo van Doorn
2011-04-18 13:33                                   ` [PATCH 19/23] rt2x00: Merge rt2x00ht.c contents in other files Ivo van Doorn
2011-04-18 13:34                                     ` [PATCH 20/23] rt2x00: Optimize register access in rt2800pci Ivo van Doorn
2011-04-18 13:34                                       ` [PATCH 21/23] rt2x00: Optimize register access in rt2800usb Ivo van Doorn
2011-04-18 13:34                                         ` [PATCH 22/23] rt2x00: Implement get_ringparam callback function Ivo van Doorn
2011-04-18 13:35                                           ` [PATCH 23/23] rt2x00: Implement get_antenna and set_antenna callback functions Ivo van Doorn
2011-04-18 13:56                                         ` [PATCH 21/23] rt2x00: Optimize register access in rt2800usb Rafał Miłecki
2011-04-18 14:06                                           ` Ivo Van Doorn
2011-04-18 14:14                                             ` Rafał Miłecki
2011-04-18 14:48                                             ` Helmut Schaa [this message]
2011-04-18 15:02                                               ` Rafał Miłecki
2011-04-28  2:55       ` [PATCH 04/23] rt2x00: Make rt2x00_queue_entry_for_each more flexible Yasushi SHOJI
2011-04-28  2:55       ` Yasushi SHOJI
2011-04-28 18:55         ` Ivo Van Doorn
2011-04-29  6:06           ` Gertjan van Wingerde
2011-04-30 14:01             ` Ivo van Doorn
2011-05-02 13:33               ` Yasushi SHOJI
2011-05-02 19:24                 ` Ivo van Doorn
2011-05-09  8:08                   ` Yasushi SHOJI
2011-05-09  8:50                     ` Ivo Van Doorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201104181648.43967.helmut.schaa@googlemail.com \
    --to=helmut.schaa@googlemail.com \
    --cc=ivdoorn@gmail.com \
    --cc=linux-wireless@vger.kernel.org \
    --cc=linville@tuxdriver.com \
    --cc=users@rt2x00.serialmonkey.com \
    --cc=zajec5@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).