* [RFC] networking structure holes
@ 2006-10-31 20:34 Arnaldo Carvalho de Melo
2006-10-31 21:04 ` Eric Dumazet
0 siblings, 1 reply; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2006-10-31 20:34 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
Hi,
I've been working on some DWARF2 utilities and one of them,
pahole (Poke-a-Hole) can be used to find holes due to alignment rules
in structs, the full output of:
[acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o
is available at:
http://oops.merseine.nu:81/acme/net.ipv4.tcp.o.pahole
Just to show what we can find with this tool here is the layout
of struct net_device, that barring any cacheline locality optimization
has 4 bytes to harvest, David, do you think reordering those fields to
get 4 byts back is ok?
[acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o net_device
/* /pub/scm/linux/kernel/git/acme/net-2.6/include/linux/skbuff.h:87 */
struct net_device {
char name[16]; /* 0 16 */
struct hlist_node name_hlist; /* 16 8 */
long unsigned int mem_end; /* 24 4 */
long unsigned int mem_start; /* 28 4 */
long unsigned int base_addr; /* 32 4 */
unsigned int irq; /* 36 4 */
unsigned char if_port; /* 40 1 */
unsigned char dma; /* 41 1 */
/* XXX 2 bytes hole, try to pack */
long unsigned int state; /* 44 4 */
struct net_device * next; /* 48 4 */
int (*init)(); /* 52 4 */
long unsigned int features; /* 56 4 */
struct net_device * next_sched; /* 60 4 */
int ifindex; /* 64 4 */
int iflink; /* 68 4 */
struct net_device_stats * (*get_stats)(); /* 72 4 */
const struct iw_handler_def * wireless_handlers; /* 76 4 */
struct iw_public_data * wireless_data; /* 80 4 */
const struct ethtool_ops * ethtool_ops; /* 84 4 */
unsigned int flags; /* 88 4 */
short unsigned int gflags; /* 92 2 */
short unsigned int priv_flags; /* 94 2 */
short unsigned int padded; /* 96 2 */
unsigned char operstate; /* 98 1 */
unsigned char link_mode; /* 99 1 */
unsigned int mtu; /* 100 4 */
short unsigned int type; /* 104 2 */
short unsigned int hard_header_len; /* 106 2 */
struct net_device * master; /* 108 4 */
unsigned char perm_addr[32]; /* 112 32 */
unsigned char addr_len; /* 144 1 */
/* XXX 1 bytes hole, try to pack */
short unsigned int dev_id; /* 146 2 */
struct dev_mc_list * mc_list; /* 148 4 */
int mc_count; /* 152 4 */
int promiscuity; /* 156 4 */
int allmulti; /* 160 4 */
void * atalk_ptr; /* 164 4 */
void * ip_ptr; /* 168 4 */
void * dn_ptr; /* 172 4 */
void * ip6_ptr; /* 176 4 */
void * ec_ptr; /* 180 4 */
void * ax25_ptr; /* 184 4 */
struct list_head poll_list; /* 188 8 */
int (*poll)(); /* 196 4 */
int quota; /* 200 4 */
int weight; /* 204 4 */
long unsigned int last_rx; /* 208 4 */
unsigned char dev_addr[32]; /* 212 32 */
unsigned char broadcast[32]; /* 244 32 */
spinlock_t queue_lock; /* 276 0 */
struct Qdisc * qdisc; /* 276 4 */
struct Qdisc * qdisc_sleeping; /* 280 4 */
struct list_head qdisc_list; /* 284 8 */
long unsigned int tx_queue_len; /* 292 4 */
struct sk_buff * gso_skb; /* 296 4 */
spinlock_t ingress_lock; /* 300 0 */
struct Qdisc * qdisc_ingress; /* 300 4 */
spinlock_t _xmit_lock; /* 304 0 */
int xmit_lock_owner; /* 304 4 */
void * priv; /* 308 4 */
int (*hard_start_xmit)(); /* 312 4 */
long unsigned int trans_start; /* 316 4 */
int watchdog_timeo; /* 320 4 */
struct timer_list watchdog_timer; /* 324 24 */
atomic_t refcnt; /* 348 4 */
struct list_head todo_list; /* 352 8 */
struct hlist_node index_hlist; /* 360 8 */
enum reg_state; /* 368 4 */
void (*uninit)(); /* 372 4 */
void (*destructor)(); /* 376 4 */
int (*open)(); /* 380 4 */
int (*stop)(); /* 384 4 */
int (*hard_header)(); /* 388 4 */
int (*rebuild_header)(); /* 392 4 */
void (*set_multicast_list)(); /* 396 4 */
int (*set_mac_address)(); /* 400 4 */
int (*do_ioctl)(); /* 404 4 */
int (*set_config)(); /* 408 4 */
int (*hard_header_cache)(); /* 412 4 */
void (*header_cache_update)(); /* 416 4 */
int (*change_mtu)(); /* 420 4 */
void (*tx_timeout)(); /* 424 4 */
void (*vlan_rx_register)(); /* 428 4 */
void (*vlan_rx_add_vid)(); /* 432 4 */
void (*vlan_rx_kill_vid)(); /* 436 4 */
int (*hard_header_parse)(); /* 440 4 */
int (*neigh_setup)(); /* 444 4 */
struct net_bridge_port * br_port; /* 448 4 */
struct class_device class_dev; /* 452 144 */
struct attribute_group * sysfs_groups[3]; /* 596 12 */
}; /* size: 608, sum members: 605, holes: 2, sum holes: 3 */
The git repo for pahole & the other dwarves is at:
git://git.kernel.org/pub/scm/linux/kernel/git/acme/pahole.git
For those wanting to just look at the cset comments (best
documentation available so far ;)):
http://www.kernel.org/git/?p=linux/kernel/git/acme/pahole.git;a=summary
Best Regards,
- Arnaldo
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] networking structure holes
2006-10-31 20:34 [RFC] networking structure holes Arnaldo Carvalho de Melo
@ 2006-10-31 21:04 ` Eric Dumazet
2006-11-02 2:31 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2006-10-31 21:04 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: David S. Miller, netdev
Arnaldo Carvalho de Melo a écrit :
> Hi,
>
> I've been working on some DWARF2 utilities and one of them,
> pahole (Poke-a-Hole) can be used to find holes due to alignment rules
> in structs, the full output of:
>
> [acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o
>
> is available at:
>
> http://oops.merseine.nu:81/acme/net.ipv4.tcp.o.pahole
>
> Just to show what we can find with this tool here is the layout
> of struct net_device, that barring any cacheline locality optimization
> has 4 bytes to harvest, David, do you think reordering those fields to
> get 4 byts back is ok?
I just want to bring your attention this net_device structure was re-ordered
(by me :)) so that separate cache lines are used on SMP machines.
If you select CONFIG_SMP , you'll probably notice far more holes. But it was a
feature, not lazyness.
We can probably move some fields, but very carefully :)
>
> [acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o net_device
>
> /* /pub/scm/linux/kernel/git/acme/net-2.6/include/linux/skbuff.h:87 */
> struct net_device {
> char name[16]; /* 0 16 */
> struct hlist_node name_hlist; /* 16 8 */
> long unsigned int mem_end; /* 24 4 */
> long unsigned int mem_start; /* 28 4 */
> long unsigned int base_addr; /* 32 4 */
> unsigned int irq; /* 36 4 */
> unsigned char if_port; /* 40 1 */
> unsigned char dma; /* 41 1 */
>
> /* XXX 2 bytes hole, try to pack */
>
> long unsigned int state; /* 44 4 */
> struct net_device * next; /* 48 4 */
> int (*init)(); /* 52 4 */
> long unsigned int features; /* 56 4 */
> struct net_device * next_sched; /* 60 4 */
> int ifindex; /* 64 4 */
> int iflink; /* 68 4 */
> struct net_device_stats * (*get_stats)(); /* 72 4 */
> const struct iw_handler_def * wireless_handlers; /* 76 4 */
> struct iw_public_data * wireless_data; /* 80 4 */
> const struct ethtool_ops * ethtool_ops; /* 84 4 */
> unsigned int flags; /* 88 4 */
> short unsigned int gflags; /* 92 2 */
> short unsigned int priv_flags; /* 94 2 */
> short unsigned int padded; /* 96 2 */
> unsigned char operstate; /* 98 1 */
> unsigned char link_mode; /* 99 1 */
> unsigned int mtu; /* 100 4 */
> short unsigned int type; /* 104 2 */
> short unsigned int hard_header_len; /* 106 2 */
> struct net_device * master; /* 108 4 */
> unsigned char perm_addr[32]; /* 112 32 */
> unsigned char addr_len; /* 144 1 */
>
> /* XXX 1 bytes hole, try to pack */
>
> short unsigned int dev_id; /* 146 2 */
> struct dev_mc_list * mc_list; /* 148 4 */
> int mc_count; /* 152 4 */
> int promiscuity; /* 156 4 */
> int allmulti; /* 160 4 */
> void * atalk_ptr; /* 164 4 */
> void * ip_ptr; /* 168 4 */
> void * dn_ptr; /* 172 4 */
> void * ip6_ptr; /* 176 4 */
> void * ec_ptr; /* 180 4 */
> void * ax25_ptr; /* 184 4 */
> struct list_head poll_list; /* 188 8 */
> int (*poll)(); /* 196 4 */
> int quota; /* 200 4 */
> int weight; /* 204 4 */
> long unsigned int last_rx; /* 208 4 */
> unsigned char dev_addr[32]; /* 212 32 */
> unsigned char broadcast[32]; /* 244 32 */
> spinlock_t queue_lock; /* 276 0 */
> struct Qdisc * qdisc; /* 276 4 */
> struct Qdisc * qdisc_sleeping; /* 280 4 */
> struct list_head qdisc_list; /* 284 8 */
> long unsigned int tx_queue_len; /* 292 4 */
> struct sk_buff * gso_skb; /* 296 4 */
> spinlock_t ingress_lock; /* 300 0 */
> struct Qdisc * qdisc_ingress; /* 300 4 */
> spinlock_t _xmit_lock; /* 304 0 */
> int xmit_lock_owner; /* 304 4 */
> void * priv; /* 308 4 */
> int (*hard_start_xmit)(); /* 312 4 */
> long unsigned int trans_start; /* 316 4 */
> int watchdog_timeo; /* 320 4 */
> struct timer_list watchdog_timer; /* 324 24 */
> atomic_t refcnt; /* 348 4 */
> struct list_head todo_list; /* 352 8 */
> struct hlist_node index_hlist; /* 360 8 */
> enum reg_state; /* 368 4 */
> void (*uninit)(); /* 372 4 */
> void (*destructor)(); /* 376 4 */
> int (*open)(); /* 380 4 */
> int (*stop)(); /* 384 4 */
> int (*hard_header)(); /* 388 4 */
> int (*rebuild_header)(); /* 392 4 */
> void (*set_multicast_list)(); /* 396 4 */
> int (*set_mac_address)(); /* 400 4 */
> int (*do_ioctl)(); /* 404 4 */
> int (*set_config)(); /* 408 4 */
> int (*hard_header_cache)(); /* 412 4 */
> void (*header_cache_update)(); /* 416 4 */
> int (*change_mtu)(); /* 420 4 */
> void (*tx_timeout)(); /* 424 4 */
> void (*vlan_rx_register)(); /* 428 4 */
> void (*vlan_rx_add_vid)(); /* 432 4 */
> void (*vlan_rx_kill_vid)(); /* 436 4 */
> int (*hard_header_parse)(); /* 440 4 */
> int (*neigh_setup)(); /* 444 4 */
> struct net_bridge_port * br_port; /* 448 4 */
> struct class_device class_dev; /* 452 144 */
> struct attribute_group * sysfs_groups[3]; /* 596 12 */
> }; /* size: 608, sum members: 605, holes: 2, sum holes: 3 */
>
> The git repo for pahole & the other dwarves is at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/pahole.git
>
> For those wanting to just look at the cset comments (best
> documentation available so far ;)):
>
> http://www.kernel.org/git/?p=linux/kernel/git/acme/pahole.git;a=summary
>
> Best Regards,
>
> - Arnaldo
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] networking structure holes
2006-10-31 21:04 ` Eric Dumazet
@ 2006-11-02 2:31 ` Arnaldo Carvalho de Melo
2006-11-02 6:55 ` Eric Dumazet
0 siblings, 1 reply; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2006-11-02 2:31 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S. Miller, netdev
On 10/31/06, Eric Dumazet <dada1@cosmosbay.com> wrote:
> Arnaldo Carvalho de Melo a écrit :
> > Hi,
> >
> > I've been working on some DWARF2 utilities and one of them,
> > pahole (Poke-a-Hole) can be used to find holes due to alignment rules
> > in structs, the full output of:
> >
> > [acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o
> >
> > is available at:
> >
> > http://oops.merseine.nu:81/acme/net.ipv4.tcp.o.pahole
> >
> > Just to show what we can find with this tool here is the layout
> > of struct net_device, that barring any cacheline locality optimization
> > has 4 bytes to harvest, David, do you think reordering those fields to
> > get 4 byts back is ok?
>
> I just want to bring your attention this net_device structure was re-ordered
> (by me :)) so that separate cache lines are used on SMP machines.
>
> If you select CONFIG_SMP , you'll probably notice far more holes. But it was a
> feature, not lazyness.
Thanks for commenting on this case!
> We can probably move some fields, but very carefully :)
Of course, in time I probably will try to combine valgrind's
cachegrind or some new tool using the same principles I used in OSTRA
to find out working sets of struct members to do automatic
"suggestions" on how to reorder structs to avoid holes while keeping
the relevant struct members close together as to exploit cacheline
locality effects, like you do so well manually :-)
- Arnaldo
PS.: While we don't have tools to check out that the holes are not a
problem because we want to exploit cacheline locality effects... what
about some comments on the structs to explain that such holes are not
a problem? :-)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] networking structure holes
2006-11-02 2:31 ` Arnaldo Carvalho de Melo
@ 2006-11-02 6:55 ` Eric Dumazet
0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2006-11-02 6:55 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: David S. Miller, netdev
Arnaldo Carvalho de Melo a écrit :
> On 10/31/06, Eric Dumazet <dada1@cosmosbay.com> wrote:
>> Arnaldo Carvalho de Melo a écrit :
>> > Hi,
>> >
>> > I've been working on some DWARF2 utilities and one of them,
>> > pahole (Poke-a-Hole) can be used to find holes due to alignment rules
>> > in structs, the full output of:
>> >
>> > [acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o
>> >
>> > is available at:
>> >
>> > http://oops.merseine.nu:81/acme/net.ipv4.tcp.o.pahole
>> >
>> > Just to show what we can find with this tool here is the layout
>> > of struct net_device, that barring any cacheline locality optimization
>> > has 4 bytes to harvest, David, do you think reordering those fields to
>> > get 4 byts back is ok?
>>
>> I just want to bring your attention this net_device structure was
>> re-ordered
>> (by me :)) so that separate cache lines are used on SMP machines.
>>
>> If you select CONFIG_SMP , you'll probably notice far more holes. But
>> it was a
>> feature, not lazyness.
>
> Thanks for commenting on this case!
>
>> We can probably move some fields, but very carefully :)
>
> Of course, in time I probably will try to combine valgrind's
> cachegrind or some new tool using the same principles I used in OSTRA
> to find out working sets of struct members to do automatic
> "suggestions" on how to reorder structs to avoid holes while keeping
> the relevant struct members close together as to exploit cacheline
> locality effects, like you do so well manually :-)
>
> - Arnaldo
>
> PS.: While we don't have tools to check out that the holes are not a
> problem because we want to exploit cacheline locality effects... what
> about some comments on the structs to explain that such holes are not
> a problem? :-)
I am all for automatic tools, if they can convince human beings :)
For example, I am using an optimization that is quite simple but which was not
accepted by netdev community :
- Moving the struct flowi directly into "struct dst_entry", right after the
'struct dst_entry *next;' pointer.
AFAIK all objects that include a 'struct dst_entry' also include a 'struct
flowi', so this is just a small violation of layering.
This really helps because lookups now touch only one cache line per chained
item instead of two/three. On loaded routers with 8 items per chain, thats 8
or 16 cache lines CPU dont have to bring in its cache per IP packet.
Eric
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-11-02 6:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-31 20:34 [RFC] networking structure holes Arnaldo Carvalho de Melo
2006-10-31 21:04 ` Eric Dumazet
2006-11-02 2:31 ` Arnaldo Carvalho de Melo
2006-11-02 6:55 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).