netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] networking structure holes
@ 2006-10-31 20:34 Arnaldo Carvalho de Melo
  2006-10-31 21:04 ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2006-10-31 20:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

Hi,

	I've been working on some DWARF2 utilities and one of them,
pahole (Poke-a-Hole) can be used to find holes due to alignment rules
in structs, the full output of:

[acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o

	is available at:

http://oops.merseine.nu:81/acme/net.ipv4.tcp.o.pahole

	Just to show what we can find with this tool here is the layout
of struct net_device, that barring any cacheline locality optimization
has 4 bytes to harvest, David, do you think reordering those fields to
get 4 byts back is ok?

[acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o net_device

/* /pub/scm/linux/kernel/git/acme/net-2.6/include/linux/skbuff.h:87 */
struct net_device {
        char                       name[16];             /*     0    16 */
        struct hlist_node          name_hlist;           /*    16     8 */
        long unsigned int          mem_end;              /*    24     4 */
        long unsigned int          mem_start;            /*    28     4 */
        long unsigned int          base_addr;            /*    32     4 */
        unsigned int               irq;                  /*    36     4 */
        unsigned char              if_port;              /*    40     1 */
        unsigned char              dma;                  /*    41     1 */

        /* XXX 2 bytes hole, try to pack */

        long unsigned int          state;                /*    44     4 */
        struct net_device *        next;                 /*    48     4 */
        int                        (*init)();            /*    52     4 */
        long unsigned int          features;             /*    56     4 */
        struct net_device *        next_sched;           /*    60     4 */
        int                        ifindex;              /*    64     4 */
        int                        iflink;               /*    68     4 */
        struct net_device_stats *  (*get_stats)();       /*    72     4 */
        const struct iw_handler_def  * wireless_handlers;    /*    76     4 */
        struct iw_public_data *    wireless_data;        /*    80     4 */
        const struct ethtool_ops  * ethtool_ops;          /*    84     4 */
        unsigned int               flags;                /*    88     4 */
        short unsigned int         gflags;               /*    92     2 */
        short unsigned int         priv_flags;           /*    94     2 */
        short unsigned int         padded;               /*    96     2 */
        unsigned char              operstate;            /*    98     1 */
        unsigned char              link_mode;            /*    99     1 */
        unsigned int               mtu;                  /*   100     4 */
        short unsigned int         type;                 /*   104     2 */
        short unsigned int         hard_header_len;      /*   106     2 */
        struct net_device *        master;               /*   108     4 */
        unsigned char              perm_addr[32];        /*   112    32 */
        unsigned char              addr_len;             /*   144     1 */

        /* XXX 1 bytes hole, try to pack */

        short unsigned int         dev_id;               /*   146     2 */
        struct dev_mc_list *       mc_list;              /*   148     4 */
        int                        mc_count;             /*   152     4 */
        int                        promiscuity;          /*   156     4 */
        int                        allmulti;             /*   160     4 */
        void *                     atalk_ptr;            /*   164     4 */
        void *                     ip_ptr;               /*   168     4 */
        void *                     dn_ptr;               /*   172     4 */
        void *                     ip6_ptr;              /*   176     4 */
        void *                     ec_ptr;               /*   180     4 */
        void *                     ax25_ptr;             /*   184     4 */
        struct list_head           poll_list;            /*   188     8 */
        int                        (*poll)();            /*   196     4 */
        int                        quota;                /*   200     4 */
        int                        weight;               /*   204     4 */
        long unsigned int          last_rx;              /*   208     4 */
        unsigned char              dev_addr[32];         /*   212    32 */
        unsigned char              broadcast[32];        /*   244    32 */
        spinlock_t                 queue_lock;           /*   276     0 */
        struct Qdisc *             qdisc;                /*   276     4 */
        struct Qdisc *             qdisc_sleeping;       /*   280     4 */
        struct list_head           qdisc_list;           /*   284     8 */
        long unsigned int          tx_queue_len;         /*   292     4 */
        struct sk_buff *           gso_skb;              /*   296     4 */
        spinlock_t                 ingress_lock;         /*   300     0 */
        struct Qdisc *             qdisc_ingress;        /*   300     4 */
        spinlock_t                 _xmit_lock;           /*   304     0 */
        int                        xmit_lock_owner;      /*   304     4 */
        void *                     priv;                 /*   308     4 */
        int                        (*hard_start_xmit)(); /*   312     4 */
        long unsigned int          trans_start;          /*   316     4 */
        int                        watchdog_timeo;       /*   320     4 */
        struct timer_list          watchdog_timer;       /*   324    24 */
        atomic_t                   refcnt;               /*   348     4 */
        struct list_head           todo_list;            /*   352     8 */
        struct hlist_node          index_hlist;          /*   360     8 */
        enum                       reg_state;            /*   368     4 */
        void                       (*uninit)();          /*   372     4 */
        void                       (*destructor)();      /*   376     4 */
        int                        (*open)();            /*   380     4 */
        int                        (*stop)();            /*   384     4 */
        int                        (*hard_header)();     /*   388     4 */
        int                        (*rebuild_header)();  /*   392     4 */
        void                       (*set_multicast_list)(); /*   396     4 */
        int                        (*set_mac_address)(); /*   400     4 */
        int                        (*do_ioctl)();        /*   404     4 */
        int                        (*set_config)();      /*   408     4 */
        int                        (*hard_header_cache)(); /*   412     4 */
        void                       (*header_cache_update)(); /*   416     4 */
        int                        (*change_mtu)();      /*   420     4 */
        void                       (*tx_timeout)();      /*   424     4 */
        void                       (*vlan_rx_register)(); /*   428     4 */
        void                       (*vlan_rx_add_vid)(); /*   432     4 */
        void                       (*vlan_rx_kill_vid)(); /*   436     4 */
        int                        (*hard_header_parse)(); /*   440     4 */
        int                        (*neigh_setup)();     /*   444     4 */
        struct net_bridge_port *   br_port;              /*   448     4 */
        struct class_device        class_dev;            /*   452   144 */
        struct attribute_group *   sysfs_groups[3];      /*   596    12 */
}; /* size: 608, sum members: 605, holes: 2, sum holes: 3 */

	The git repo for pahole & the other dwarves is at:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/pahole.git

	For those wanting to just look at the cset comments (best
documentation available so far ;)):

http://www.kernel.org/git/?p=linux/kernel/git/acme/pahole.git;a=summary

Best Regards,

- Arnaldo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] networking structure holes
  2006-10-31 20:34 [RFC] networking structure holes Arnaldo Carvalho de Melo
@ 2006-10-31 21:04 ` Eric Dumazet
  2006-11-02  2:31   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2006-10-31 21:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: David S. Miller, netdev

Arnaldo Carvalho de Melo a écrit :
> Hi,
> 
> 	I've been working on some DWARF2 utilities and one of them,
> pahole (Poke-a-Hole) can be used to find holes due to alignment rules
> in structs, the full output of:
> 
> [acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o
> 
> 	is available at:
> 
> http://oops.merseine.nu:81/acme/net.ipv4.tcp.o.pahole
> 
> 	Just to show what we can find with this tool here is the layout
> of struct net_device, that barring any cacheline locality optimization
> has 4 bytes to harvest, David, do you think reordering those fields to
> get 4 byts back is ok?

I just want to bring your attention this net_device structure was re-ordered 
(by me :)) so that separate cache lines are used on SMP machines.

If you select CONFIG_SMP , you'll probably notice far more holes. But it was a 
feature, not lazyness.

We can probably move some fields, but very carefully :)

> 
> [acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o net_device
> 
> /* /pub/scm/linux/kernel/git/acme/net-2.6/include/linux/skbuff.h:87 */
> struct net_device {
>         char                       name[16];             /*     0    16 */
>         struct hlist_node          name_hlist;           /*    16     8 */
>         long unsigned int          mem_end;              /*    24     4 */
>         long unsigned int          mem_start;            /*    28     4 */
>         long unsigned int          base_addr;            /*    32     4 */
>         unsigned int               irq;                  /*    36     4 */
>         unsigned char              if_port;              /*    40     1 */
>         unsigned char              dma;                  /*    41     1 */
> 
>         /* XXX 2 bytes hole, try to pack */
> 
>         long unsigned int          state;                /*    44     4 */
>         struct net_device *        next;                 /*    48     4 */
>         int                        (*init)();            /*    52     4 */
>         long unsigned int          features;             /*    56     4 */
>         struct net_device *        next_sched;           /*    60     4 */
>         int                        ifindex;              /*    64     4 */
>         int                        iflink;               /*    68     4 */
>         struct net_device_stats *  (*get_stats)();       /*    72     4 */
>         const struct iw_handler_def  * wireless_handlers;    /*    76     4 */
>         struct iw_public_data *    wireless_data;        /*    80     4 */
>         const struct ethtool_ops  * ethtool_ops;          /*    84     4 */
>         unsigned int               flags;                /*    88     4 */
>         short unsigned int         gflags;               /*    92     2 */
>         short unsigned int         priv_flags;           /*    94     2 */
>         short unsigned int         padded;               /*    96     2 */
>         unsigned char              operstate;            /*    98     1 */
>         unsigned char              link_mode;            /*    99     1 */
>         unsigned int               mtu;                  /*   100     4 */
>         short unsigned int         type;                 /*   104     2 */
>         short unsigned int         hard_header_len;      /*   106     2 */
>         struct net_device *        master;               /*   108     4 */
>         unsigned char              perm_addr[32];        /*   112    32 */
>         unsigned char              addr_len;             /*   144     1 */
> 
>         /* XXX 1 bytes hole, try to pack */
> 
>         short unsigned int         dev_id;               /*   146     2 */
>         struct dev_mc_list *       mc_list;              /*   148     4 */
>         int                        mc_count;             /*   152     4 */
>         int                        promiscuity;          /*   156     4 */
>         int                        allmulti;             /*   160     4 */
>         void *                     atalk_ptr;            /*   164     4 */
>         void *                     ip_ptr;               /*   168     4 */
>         void *                     dn_ptr;               /*   172     4 */
>         void *                     ip6_ptr;              /*   176     4 */
>         void *                     ec_ptr;               /*   180     4 */
>         void *                     ax25_ptr;             /*   184     4 */
>         struct list_head           poll_list;            /*   188     8 */
>         int                        (*poll)();            /*   196     4 */
>         int                        quota;                /*   200     4 */
>         int                        weight;               /*   204     4 */
>         long unsigned int          last_rx;              /*   208     4 */
>         unsigned char              dev_addr[32];         /*   212    32 */
>         unsigned char              broadcast[32];        /*   244    32 */
>         spinlock_t                 queue_lock;           /*   276     0 */
>         struct Qdisc *             qdisc;                /*   276     4 */
>         struct Qdisc *             qdisc_sleeping;       /*   280     4 */
>         struct list_head           qdisc_list;           /*   284     8 */
>         long unsigned int          tx_queue_len;         /*   292     4 */
>         struct sk_buff *           gso_skb;              /*   296     4 */
>         spinlock_t                 ingress_lock;         /*   300     0 */
>         struct Qdisc *             qdisc_ingress;        /*   300     4 */
>         spinlock_t                 _xmit_lock;           /*   304     0 */
>         int                        xmit_lock_owner;      /*   304     4 */
>         void *                     priv;                 /*   308     4 */
>         int                        (*hard_start_xmit)(); /*   312     4 */
>         long unsigned int          trans_start;          /*   316     4 */
>         int                        watchdog_timeo;       /*   320     4 */
>         struct timer_list          watchdog_timer;       /*   324    24 */
>         atomic_t                   refcnt;               /*   348     4 */
>         struct list_head           todo_list;            /*   352     8 */
>         struct hlist_node          index_hlist;          /*   360     8 */
>         enum                       reg_state;            /*   368     4 */
>         void                       (*uninit)();          /*   372     4 */
>         void                       (*destructor)();      /*   376     4 */
>         int                        (*open)();            /*   380     4 */
>         int                        (*stop)();            /*   384     4 */
>         int                        (*hard_header)();     /*   388     4 */
>         int                        (*rebuild_header)();  /*   392     4 */
>         void                       (*set_multicast_list)(); /*   396     4 */
>         int                        (*set_mac_address)(); /*   400     4 */
>         int                        (*do_ioctl)();        /*   404     4 */
>         int                        (*set_config)();      /*   408     4 */
>         int                        (*hard_header_cache)(); /*   412     4 */
>         void                       (*header_cache_update)(); /*   416     4 */
>         int                        (*change_mtu)();      /*   420     4 */
>         void                       (*tx_timeout)();      /*   424     4 */
>         void                       (*vlan_rx_register)(); /*   428     4 */
>         void                       (*vlan_rx_add_vid)(); /*   432     4 */
>         void                       (*vlan_rx_kill_vid)(); /*   436     4 */
>         int                        (*hard_header_parse)(); /*   440     4 */
>         int                        (*neigh_setup)();     /*   444     4 */
>         struct net_bridge_port *   br_port;              /*   448     4 */
>         struct class_device        class_dev;            /*   452   144 */
>         struct attribute_group *   sysfs_groups[3];      /*   596    12 */
> }; /* size: 608, sum members: 605, holes: 2, sum holes: 3 */
> 
> 	The git repo for pahole & the other dwarves is at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/pahole.git
> 
> 	For those wanting to just look at the cset comments (best
> documentation available so far ;)):
> 
> http://www.kernel.org/git/?p=linux/kernel/git/acme/pahole.git;a=summary
> 
> Best Regards,
> 
> - Arnaldo
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] networking structure holes
  2006-10-31 21:04 ` Eric Dumazet
@ 2006-11-02  2:31   ` Arnaldo Carvalho de Melo
  2006-11-02  6:55     ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2006-11-02  2:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev

On 10/31/06, Eric Dumazet <dada1@cosmosbay.com> wrote:
> Arnaldo Carvalho de Melo a écrit :
> > Hi,
> >
> >       I've been working on some DWARF2 utilities and one of them,
> > pahole (Poke-a-Hole) can be used to find holes due to alignment rules
> > in structs, the full output of:
> >
> > [acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o
> >
> >       is available at:
> >
> > http://oops.merseine.nu:81/acme/net.ipv4.tcp.o.pahole
> >
> >       Just to show what we can find with this tool here is the layout
> > of struct net_device, that barring any cacheline locality optimization
> > has 4 bytes to harvest, David, do you think reordering those fields to
> > get 4 byts back is ok?
>
> I just want to bring your attention this net_device structure was re-ordered
> (by me :)) so that separate cache lines are used on SMP machines.
>
> If you select CONFIG_SMP , you'll probably notice far more holes. But it was a
> feature, not lazyness.

Thanks for commenting on this case!

> We can probably move some fields, but very carefully :)

Of course, in time I probably will try to combine valgrind's
cachegrind or some new tool using the same principles I used in OSTRA
to find out working sets of struct members to do automatic
"suggestions" on how to reorder structs to avoid holes while keeping
the relevant struct members close together as to exploit cacheline
locality effects, like you do so well manually :-)

- Arnaldo

PS.: While we don't have tools to check out that the holes are not a
problem because we want to exploit cacheline locality effects... what
about some comments on the structs to explain that such holes are not
a problem? :-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] networking structure holes
  2006-11-02  2:31   ` Arnaldo Carvalho de Melo
@ 2006-11-02  6:55     ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2006-11-02  6:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: David S. Miller, netdev

Arnaldo Carvalho de Melo a écrit :
> On 10/31/06, Eric Dumazet <dada1@cosmosbay.com> wrote:
>> Arnaldo Carvalho de Melo a écrit :
>> > Hi,
>> >
>> >       I've been working on some DWARF2 utilities and one of them,
>> > pahole (Poke-a-Hole) can be used to find holes due to alignment rules
>> > in structs, the full output of:
>> >
>> > [acme@newtoy net-2.6]$ pahole net/ipv4/tcp.o
>> >
>> >       is available at:
>> >
>> > http://oops.merseine.nu:81/acme/net.ipv4.tcp.o.pahole
>> >
>> >       Just to show what we can find with this tool here is the layout
>> > of struct net_device, that barring any cacheline locality optimization
>> > has 4 bytes to harvest, David, do you think reordering those fields to
>> > get 4 byts back is ok?
>>
>> I just want to bring your attention this net_device structure was 
>> re-ordered
>> (by me :)) so that separate cache lines are used on SMP machines.
>>
>> If you select CONFIG_SMP , you'll probably notice far more holes. But 
>> it was a
>> feature, not lazyness.
> 
> Thanks for commenting on this case!
> 
>> We can probably move some fields, but very carefully :)
> 
> Of course, in time I probably will try to combine valgrind's
> cachegrind or some new tool using the same principles I used in OSTRA
> to find out working sets of struct members to do automatic
> "suggestions" on how to reorder structs to avoid holes while keeping
> the relevant struct members close together as to exploit cacheline
> locality effects, like you do so well manually :-)
> 
> - Arnaldo
> 
> PS.: While we don't have tools to check out that the holes are not a
> problem because we want to exploit cacheline locality effects... what
> about some comments on the structs to explain that such holes are not
> a problem? :-)

I am all for automatic tools, if they can convince human beings :)

For example, I am using an optimization that is quite simple but which was not 
accepted by netdev community :

- Moving the struct flowi directly into "struct dst_entry", right after the 
'struct dst_entry *next;' pointer.


AFAIK all objects that include a 'struct dst_entry' also include a 'struct 
flowi', so this is just a small violation of layering.

This really helps because lookups now touch only one cache line per chained 
item instead of two/three. On loaded routers with 8 items per chain, thats 8 
or 16 cache lines CPU dont have to bring in its cache per IP packet.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-11-02  6:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-31 20:34 [RFC] networking structure holes Arnaldo Carvalho de Melo
2006-10-31 21:04 ` Eric Dumazet
2006-11-02  2:31   ` Arnaldo Carvalho de Melo
2006-11-02  6:55     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).