* linux/libata.h/ata_busy_wait() inefficiencies?
@ 2008-03-25 20:59 Andreas Mohr
2008-03-26 14:11 ` Mark Lord
0 siblings, 1 reply; 3+ messages in thread
From: Andreas Mohr @ 2008-03-25 20:59 UTC (permalink / raw)
To: linux-ide
Hello all,
originally I just intended to prepare a patch (to -mm),
but it seems better to discuss this first.
Currently (2.6.25-rc6) we have:
static inline u8 ata_busy_wait(struct ata_port *ap, unsigned int bits,
unsigned int max)
{
u8 status;
do {
udelay(10);
status = ata_chk_status(ap);
max--;
} while (status != 0xff && (status & bits) && (max > 0));
return status;
}
This can easily be improved to have merged max handling
by doing a single (--max > 0) instead, without any change in behaviour.
Building this on i386 showed combined image savings of almost 100 bytes,
most likely due to not having to re-fetch max again for the duplicate
max access.
However we're not finished yet:
Since status != 0xff is most certainly a check for the "unplugged hotplug"
case, it's a bit wasteful to check this as the very first abort condition.
By moving this more towards the end one should be able to improve exit latency
of this loop, without altering behaviour here as well (- right??).
Since the "(status != 0xff && (status & bits)" part is being used verbatim
(ICK, COPY&PASTE!! ;) many times (e.g. also in libata-core.c),
IMHO the best way going forward would be to create another fittingly named
inline header helper for this combined check which could then have its
check order swapped for better exit latency.
Those two tweaks alone may already be able to deliver a noticeable speedup
of ata operations given that this is frequently used inner libata code.
Thoughts on this?
Thanks,
Andreas Mohr
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: linux/libata.h/ata_busy_wait() inefficiencies?
2008-03-25 20:59 linux/libata.h/ata_busy_wait() inefficiencies? Andreas Mohr
@ 2008-03-26 14:11 ` Mark Lord
2008-03-26 21:34 ` Alan Cox
0 siblings, 1 reply; 3+ messages in thread
From: Mark Lord @ 2008-03-26 14:11 UTC (permalink / raw)
To: Andreas Mohr; +Cc: linux-ide
Andreas Mohr wrote:
> Hello all,
>
> originally I just intended to prepare a patch (to -mm),
> but it seems better to discuss this first.
>
> Currently (2.6.25-rc6) we have:
>
> static inline u8 ata_busy_wait(struct ata_port *ap, unsigned int bits,
> unsigned int max)
> {
> u8 status;
>
> do {
> udelay(10);
> status = ata_chk_status(ap);
> max--;
> } while (status != 0xff && (status & bits) && (max > 0));
>
> return status;
> }
>
> This can easily be improved to have merged max handling
> by doing a single (--max > 0) instead, without any change in behaviour.
> Building this on i386 showed combined image savings of almost 100 bytes,
> most likely due to not having to re-fetch max again for the duplicate
> max access.
>
> However we're not finished yet:
> Since status != 0xff is most certainly a check for the "unplugged hotplug"
> case, it's a bit wasteful to check this as the very first abort condition.
> By moving this more towards the end one should be able to improve exit latency
> of this loop, without altering behaviour here as well (- right??).
> Since the "(status != 0xff && (status & bits)" part is being used verbatim
> (ICK, COPY&PASTE!! ;) many times (e.g. also in libata-core.c),
> IMHO the best way going forward would be to create another fittingly named
> inline header helper for this combined check which could then have its
> check order swapped for better exit latency.
>
> Those two tweaks alone may already be able to deliver a noticeable speedup
> of ata operations given that this is frequently used inner libata code.
..
While you're at it, the udelay(10) should really be *much* smaller,
or at least broken into a top/bottom pair of udelay(5). I really suspect
that much of the time, the status value is satisified on the first iteration,
requiring no more than a microsecond or so. Yet we always force it to take
at least 10us, or about 15000 instructions worth on a modern CPU.
Cheers
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: linux/libata.h/ata_busy_wait() inefficiencies?
2008-03-26 14:11 ` Mark Lord
@ 2008-03-26 21:34 ` Alan Cox
0 siblings, 0 replies; 3+ messages in thread
From: Alan Cox @ 2008-03-26 21:34 UTC (permalink / raw)
To: Mark Lord; +Cc: Andreas Mohr, linux-ide
> > Those two tweaks alone may already be able to deliver a noticeable speedup
> > of ata operations given that this is frequently used inner libata code.
> ..
Unlikely given the fact that chk_status will be a synchronous I/O request
across the bus and on many controllers across to the device itself. Any
other optimisations get a bit irrelevant
> While you're at it, the udelay(10) should really be *much* smaller,
> or at least broken into a top/bottom pair of udelay(5). I really suspect
> that much of the time, the status value is satisified on the first iteration,
> requiring no more than a microsecond or so. Yet we always force it to take
> at least 10us, or about 15000 instructions worth on a modern CPU.
That depends how long after the event before the wait_for_ is called
must elapse before the status bits are valid. There is also a trade off
of bus usage - especially on older devices where we will lock the bus and
thus CPU for maybe 1-2uS *per* chk_status.
If you want performance get an AHCI controller 8)
Alan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-03-26 21:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-25 20:59 linux/libata.h/ata_busy_wait() inefficiencies? Andreas Mohr
2008-03-26 14:11 ` Mark Lord
2008-03-26 21:34 ` Alan Cox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).