* Net: ucc_geth ethernet driver optimization space
@ 2009-05-27 5:08 Liu Dave-R63238
2009-05-27 6:49 ` Joakim Tjernlund
2009-05-27 10:36 ` Li Yang
0 siblings, 2 replies; 3+ messages in thread
From: Liu Dave-R63238 @ 2009-05-27 5:08 UTC (permalink / raw)
To: netdev; +Cc: linuxppc-dev, linux-kernel
Guys,
The ucc_geth ethernet driver have dozens of strong sync read/write
operation, such as in_be32/16/8, out_be32/16/8.
all of them is sync read/write, it is very expensive for performance.
For the critical patch, we can remove some unnecessary in_be(x),
out_be(x) with normal memory operation, and keep some necessary
memory barrier.
eg: BD access in the interrupt handler and start_xmit.
The BD operation only need the memory barrier between length/buffer
and status.
struct buffer descriptor {
u16 status;
u16 length;
u32 buffer;
} __attribute__ ((packed));
struct buffer descriptor *BD;
BD->length =3D xxxx;
BD->buffer =3D yyyy;
wmb();
BD->status =3D zzzz;
For powerpc, eieio is enough for 60x, mbar 1 is enough for e500.
Of couse, also need the memory clobber to avoid the compiler
reorder between them.
Thanks, Dave
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Net: ucc_geth ethernet driver optimization space
2009-05-27 5:08 Net: ucc_geth ethernet driver optimization space Liu Dave-R63238
@ 2009-05-27 6:49 ` Joakim Tjernlund
2009-05-27 10:36 ` Li Yang
1 sibling, 0 replies; 3+ messages in thread
From: Joakim Tjernlund @ 2009-05-27 6:49 UTC (permalink / raw)
To: Liu Dave-R63238
Cc: linuxppc-dev, netdev, linux-kernel,
linuxppc-dev-bounces+joakim.tjernlund=transmode.se
linuxppc-dev-bounces+joakim.tjernlund=transmode.se@ozlabs.org wrote on 27/05/2009 07:08:07:
>
> Guys,
>
> The ucc_geth ethernet driver have dozens of strong sync read/write
> operation, such as in_be32/16/8, out_be32/16/8.
>
> all of them is sync read/write, it is very expensive for performance.
>
> For the critical patch, we can remove some unnecessary in_be(x),
> out_be(x) with normal memory operation, and keep some necessary
> memory barrier.
>
> eg: BD access in the interrupt handler and start_xmit.
>
> The BD operation only need the memory barrier between length/buffer
> and status.
>
> struct buffer descriptor {
> u16 status;
> u16 length;
> u32 buffer;
> } __attribute__ ((packed));
>
> struct buffer descriptor *BD;
>
> BD->length = xxxx;
> BD->buffer = yyyy;
> wmb();
> BD->status = zzzz;
>
> For powerpc, eieio is enough for 60x, mbar 1 is enough for e500.
> Of couse, also need the memory clobber to avoid the compiler
> reorder between them.
>
> Thanks, Dave
Yes, pretty please :)
You might want to combine status and length into one U32 though:
BD->buffer = yyyy;
wmb();
BD->stat_len = zzzz << 16 | xxxx;
Jocke
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Net: ucc_geth ethernet driver optimization space
2009-05-27 5:08 Net: ucc_geth ethernet driver optimization space Liu Dave-R63238
2009-05-27 6:49 ` Joakim Tjernlund
@ 2009-05-27 10:36 ` Li Yang
1 sibling, 0 replies; 3+ messages in thread
From: Li Yang @ 2009-05-27 10:36 UTC (permalink / raw)
To: Liu Dave-R63238; +Cc: netdev, linux-kernel, linuxppc-dev
On Wed, May 27, 2009 at 1:08 PM, Liu Dave-R63238 <DaveLiu@freescale.com> wr=
ote:
> Guys,
>
> The ucc_geth ethernet driver have dozens of strong sync read/write
> operation, such as in_be32/16/8, out_be32/16/8.
>
> all of them is sync read/write, it is very expensive for performance.
>
Totally agree. That's one of my concerns right from the beginning.
> For the critical patch, we can remove some unnecessary in_be(x),
> out_be(x) with normal memory operation, and keep some necessary
> memory barrier.
>
> eg: BD access in the interrupt handler and start_xmit.
>
> The BD operation only need the memory barrier between length/buffer
> and status.
>
> struct buffer descriptor {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0u16 status;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0u16 length;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0u32 buffer;
> } __attribute__ ((packed));
>
> struct buffer descriptor *BD;
>
> BD->length =3D xxxx;
> BD->buffer =3D yyyy;
> wmb();
> BD->status =3D zzzz;
The BD can reside either in memory or memory mapped region, which
makes the case more complex.
MMIO accesses need to use IO accessors for the sparse checking. We
might make use of the __raw_*() accessors, but I'm not sure if it's
suitable for non-PCI buses on powerpc. And also we need to pay
special attention to the problem described here:
http://lwn.net/Articles/198988/
- Leo
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-05-27 10:36 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-27 5:08 Net: ucc_geth ethernet driver optimization space Liu Dave-R63238
2009-05-27 6:49 ` Joakim Tjernlund
2009-05-27 10:36 ` Li Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).