* skb_shared_info()
@ 2006-08-08 23:39 David Miller
2006-08-09 0:36 ` skb_shared_info() Herbert Xu
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: David Miller @ 2006-08-08 23:39 UTC (permalink / raw)
To: netdev; +Cc: herbert, johnpol
I'm beginning to think that where we store the
skb_shared_info() is a weakness of the SKB design.
It makes it more difficult to have local memory
management schemes and to just wrap SKB's around
arbitrary pieces of data.
The e1000 issue is just one example of this, another
would be any attempt to consolidate the TCP retransmit
queue data management.
How to deal with data refcounting and destruction is
the primary matter to deal with if we want to move
skb_shared_info() somewhere else. But that can probably
be handled with a data destructor callback of some sort
which we're always talking about adding anyways. :)
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: skb_shared_info() 2006-08-08 23:39 skb_shared_info() David Miller @ 2006-08-09 0:36 ` Herbert Xu 2006-08-09 0:58 ` skb_shared_info() David Miller 2006-08-10 19:49 ` skb_shared_info() Evgeniy Polyakov ` (2 subsequent siblings) 3 siblings, 1 reply; 19+ messages in thread From: Herbert Xu @ 2006-08-09 0:36 UTC (permalink / raw) To: David Miller; +Cc: netdev, johnpol On Tue, Aug 08, 2006 at 04:39:15PM -0700, David Miller wrote: > > I'm beginning to think that where we store the > skb_shared_info() is a weakness of the SKB design. I'm not sure whether the problem is where we store skb_shared_info, or the fact that we can't put kmalloc'ed memory into skb_shinfo->frags. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-09 0:36 ` skb_shared_info() Herbert Xu @ 2006-08-09 0:58 ` David Miller 2006-08-09 5:35 ` skb_shared_info() Evgeniy Polyakov 0 siblings, 1 reply; 19+ messages in thread From: David Miller @ 2006-08-09 0:58 UTC (permalink / raw) To: herbert; +Cc: netdev, johnpol From: Herbert Xu <herbert@gondor.apana.org.au> Date: Wed, 9 Aug 2006 10:36:16 +1000 > I'm not sure whether the problem is where we store skb_shared_info, > or the fact that we can't put kmalloc'ed memory into > skb_shinfo->frags. That's a good point. I guess we could do something like: struct skb_frag_struct { union { struct page *page; void *data; } u; __u16 page_offset; __u16 size; }; and something clever like a special page_offset encoding means "use data, not page". ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-09 0:58 ` skb_shared_info() David Miller @ 2006-08-09 5:35 ` Evgeniy Polyakov 2006-08-09 5:50 ` skb_shared_info() David Miller 0 siblings, 1 reply; 19+ messages in thread From: Evgeniy Polyakov @ 2006-08-09 5:35 UTC (permalink / raw) To: David Miller; +Cc: herbert, netdev On Tue, Aug 08, 2006 at 05:58:39PM -0700, David Miller (davem@davemloft.net) wrote: > From: Herbert Xu <herbert@gondor.apana.org.au> > Date: Wed, 9 Aug 2006 10:36:16 +1000 > > > I'm not sure whether the problem is where we store skb_shared_info, > > or the fact that we can't put kmalloc'ed memory into > > skb_shinfo->frags. > > That's a good point. It is second problem actually. If we want to change memory usage in skbs, we have problems with frag_list and place where shared info lives, since it's addon only for 1500 MTU does not change alignment. > I guess we could do something like: > > struct skb_frag_struct { > union { > struct page *page; > void *data; > } u; > __u16 page_offset; > __u16 size; > }; > > and something clever like a special page_offset encoding > means "use data, not page". We can separate kmalloced data from page by using internal page structures (lru pointers and PG_slab bit). -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-09 5:35 ` skb_shared_info() Evgeniy Polyakov @ 2006-08-09 5:50 ` David Miller 0 siblings, 0 replies; 19+ messages in thread From: David Miller @ 2006-08-09 5:50 UTC (permalink / raw) To: johnpol; +Cc: herbert, netdev From: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Date: Wed, 9 Aug 2006 09:35:24 +0400 > We can separate kmalloced data from page by using internal page structures > (lru pointers and PG_slab bit). Yes, it is one idea. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-08 23:39 skb_shared_info() David Miller 2006-08-09 0:36 ` skb_shared_info() Herbert Xu @ 2006-08-10 19:49 ` Evgeniy Polyakov 2006-08-11 14:00 ` skb_shared_info() Alexey Kuznetsov 2006-08-14 5:04 ` skb_shared_info() Andi Kleen 3 siblings, 0 replies; 19+ messages in thread From: Evgeniy Polyakov @ 2006-08-10 19:49 UTC (permalink / raw) To: David Miller; +Cc: netdev, herbert On Tue, Aug 08, 2006 at 04:39:15PM -0700, David Miller (davem@davemloft.net) wrote: > > I'm beginning to think that where we store the > skb_shared_info() is a weakness of the SKB design. Food for thoughts - unix sockets can use PAGE_SIZEd chunks of memory (and they do it almost always), which are aligned to 2*PAGE_SIZE due to alignment issues with skb_shared_info, so unix sockets waste 3.5 kb of memory on each skb. I think it is time to resurrect idea of placing shared_info inside skb and allow to allocate it from own cache for special cases, what do you think? -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-08 23:39 skb_shared_info() David Miller 2006-08-09 0:36 ` skb_shared_info() Herbert Xu 2006-08-10 19:49 ` skb_shared_info() Evgeniy Polyakov @ 2006-08-11 14:00 ` Alexey Kuznetsov 2006-08-12 0:27 ` skb_shared_info() David Miller 2006-08-14 5:04 ` skb_shared_info() Andi Kleen 3 siblings, 1 reply; 19+ messages in thread From: Alexey Kuznetsov @ 2006-08-11 14:00 UTC (permalink / raw) To: David Miller; +Cc: netdev, herbert, johnpol Hello! >> management schemes and to just wrap SKB's around >> arbitrary pieces of data. + > and something clever like a special page_offset encoding > means "use data, not page". But for what purpose do you plan to use it? > The e1000 issue is just one example of this, another What is this issue? What's about aggregated tcp queue, I can guess you did not find place where to add protocol headers, but cannot figure out how adding non-pagecache references could help. You would rather want more then one skb_shared_info(): at least two, one is immutable, another is for headers. I think Evgeniy's idea about inlining skb_shared_info to skb head is promising and simple enough. All the point of shared skb_shared_info was to make cloning fast. But it makes lots of sense to inline some short vector inot skb head (and, probably, even a MAX_HEADER space _instead_ of space for fclone). With aggregated tcp send queue, when transmitting a segment, you could allocate new skb head with space for header and either take existing skb_shared_info from queue, attach it to head and set offset/length. Or, alternatively, set one or two of page pointers in array, inlined in head. (F.e. in the case of AF_UNIX socket, mentioned by Evgeniy, we would keep data in pages and attach it directly to skb head). Cloning becomes more expensive, but who needs it cheap, if tcp does not? Returning to "arbitrary pieces of data". Page cache references in skb_shared_info are unique thing, get_page()/page_cache_release() are enough to clone data. But it is not enough even for such simple thing as splice(). It wants we remembered some strange "pipe_buffer", where each page is wrapped together with some ops, flags and even pipe_inode_info :-), and called some destructor, when it is released. First thought is that it is insane: it does not respect page cache logic, requires we implemented additional level of refcounting, abuses amount of information, which have to be stored in skb beyond all the limits of sanity. But the second thought is that something like this is required in any case. At least we must report to someone when a page is not in use and can be recycled. I think Evgeniy knows more about this, AIO has the same issue. But this is simpler, because release callback can be done not per fragment or even per-skb, but actually per-transaction. One idea is to announce (some) skb_shared_info completely immutable, force each layer who needs to add a header or to fragment to refer to original skb_shared_info as whole, using for modifications another skb_shared_info() or area inlined in skb head. And if someone is not able to, he must reallocate all the pages. In this case destructor/notification can be done not for fragment, but for whole aggregated skb_shared_info. Seems, it will work both with aggregated tcp queue and with udp. Alexey ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-11 14:00 ` skb_shared_info() Alexey Kuznetsov @ 2006-08-12 0:27 ` David Miller 2006-08-12 0:32 ` skb_shared_info() Herbert Xu 2006-08-13 13:04 ` skb_shared_info() Alexey Kuznetsov 0 siblings, 2 replies; 19+ messages in thread From: David Miller @ 2006-08-12 0:27 UTC (permalink / raw) To: kuznet; +Cc: netdev, herbert, johnpol From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Date: Fri, 11 Aug 2006 18:00:19 +0400 > > The e1000 issue is just one example of this, another > > What is this issue? E1000 wants 16K buffers for jumbo MTU settings. The reason is that the chip can only handle power-of-2 buffer sizes, and next hop from 9K is 16K. It is not possible to tell the chip to only accept 9K packets, you must give it the whole next power of 2 buffer size for the MTU you wish to use. With skb_shared_info() overhead this becomes a 32K allocation in the simplest implementation. Whichever hardware person was trying to save some trace lines on the chip should have consulted software folks before implementing things this way :-) > What's about aggregated tcp queue, I can guess you did not find > place where to add protocol headers, but cannot figure out how > adding non-pagecache references could help. This is not the idea. I'm trying to see if we can salvage non-SG paths in the design. The idea is that "struct retransmit_queue" entries could hold either paged or non-paged data, based upon the capabilities of the transmit device. If we store raw kmalloc buffers, we cannot attach them to an arbitrary skb because of skb_shared_info(). This is true even if we purposefully allocate the necessary head room for these kmalloc based buffers. It's requirement to live in the skb->data area really does preclude any kind of clever buffering schemes. > I think Evgeniy's idea about inlining skb_shared_info to skb head > is promising and simple enough. I think you are talking about "struct sk_buff" area when you say "skb head". It is confusing talk because when I hear this phrase my brain says "skb->head" which exactly where we want to move skb_shared_info() away from! :-) > But it makes lots of sense to inline some short vector inot skb head > (and, probably, even a MAX_HEADER space _instead_ of space for > fclone). If inlined, one implementation of retransmit queue becomes apparent. "struct retransmit_queue" is just list of data blobs, each represented by usual vector of pages and offset and length. Then transmission from write queue is merely propagating pages to inline skb vector. Some silly sample datastructure: struct retransmit_block { struct list_head rblk_node; void *rblk_data; unsigned short rblk_frags; skb_frag_t frags[MAX_SKB_FRAGS]; }; struct retransmit_queue { struct list_head rqueue_head; struct retransmit_block *rqueue_send_head; int rqueue_send_head_frag; unsigned int rqueue_send_head_off; }; tcp_sendmsg() and tcp_sendpage() just accumulate into the tail retransmit_block until all of MAX_SKB_FRAGS are consumed. tcp_write_xmit() and friends build skbs like this: struct struct sk_buff *tcp_xmit_build(struct retransmit_block *rblk, int frag, unsigned int off, unsigned int len) { struct sk_buff *skb = alloc_skb(MAX_HEADER, GFP_KERNEL); int ent; if (unlikely(!skb)) return NULL; ent = 0; while (len) { unsigned int this_off = rblk->frags[frag].page_offset + off; unsigned int this_len = rblk->flags[frag].size - off; if (this_len > len) this_len = len; skb->inline_info.frags[ent].page = rblk->frags[frag].page; skb->inline_info.frags[ent].page_offset = this_off; skb->inline_info.frags[ent].size = this_len; ent++; frag++; off = 0; len -= this_len; } return skb; } (sorry, another outer loop is also needed to traverse to subsequent retransmit_blocks in the list when all of rblk_frags of current retransmit_block are consumed by inner loop) Depending upon how we do completion callbacks, as you discuss below, either we'll need a get_page() refcount grab in that inner loop or we won't. > With aggregated tcp send queue, when transmitting a segment, you could > allocate new skb head with space for header and either take existing > skb_shared_info from queue, attach it to head and set offset/length. > Or, alternatively, set one or two of page pointers in array, inlined in head. > (F.e. in the case of AF_UNIX socket, mentioned by Evgeniy, we would keep data > in pages and attach it directly to skb head). The latter scheme is closer to what I was thinking about. Why not inline this entire fraglist thing alongside sk_buff? > Cloning becomes more expensive, but who needs it cheap, if tcp does not? Exactly :) > One idea is to announce (some) skb_shared_info completely immutable, > force each layer who needs to add a header or to fragment to refer > to original skb_shared_info as whole, using for modifications > another skb_shared_info() or area inlined in skb head. > And if someone is not able to, he must reallocate all the pages. > In this case destructor/notification can be done not for fragment, > but for whole aggregated skb_shared_info. Seems, it will work both > with aggregated tcp queue and with udp. It sounds interesting... so using my sample datastructures above for aggregated tcp queue, such notifications would be made on (for example) "retransmit_block" objects? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-12 0:27 ` skb_shared_info() David Miller @ 2006-08-12 0:32 ` Herbert Xu 2006-08-13 13:04 ` skb_shared_info() Alexey Kuznetsov 1 sibling, 0 replies; 19+ messages in thread From: Herbert Xu @ 2006-08-12 0:32 UTC (permalink / raw) To: David Miller; +Cc: kuznet, netdev, johnpol On Fri, Aug 11, 2006 at 05:27:30PM -0700, David Miller wrote: > > E1000 wants 16K buffers for jumbo MTU settings. > > The reason is that the chip can only handle power-of-2 buffer > sizes, and next hop from 9K is 16K. > > It is not possible to tell the chip to only accept 9K packets, you > must give it the whole next power of 2 buffer size for the MTU you > wish to use. > > With skb_shared_info() overhead this becomes a 32K allocation > in the simplest implementation. I think is no longer an issue because we've all come to the conclusion that E1000 supports SG and therefore we can and should use 4K pages, no? Whatever we do here, allocating 16K via kmalloc is very unlikely to succeed in the near future so we have to via the SG route anyway. In which case that we must have a head area which can accomodate the skb_shared_info as we do now. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-12 0:27 ` skb_shared_info() David Miller 2006-08-12 0:32 ` skb_shared_info() Herbert Xu @ 2006-08-13 13:04 ` Alexey Kuznetsov 2006-08-14 8:01 ` skb_shared_info() Evgeniy Polyakov 1 sibling, 1 reply; 19+ messages in thread From: Alexey Kuznetsov @ 2006-08-13 13:04 UTC (permalink / raw) To: David Miller; +Cc: netdev, herbert, johnpol Hello! > E1000 wants 16K buffers for jumbo MTU settings. > > The reason is that the chip can only handle power-of-2 buffer > sizes, and next hop from 9K is 16K. Let it use pages. Someone should start. :-) High order allocations are disaster in any case. > If we store raw kmalloc buffers, we cannot attach them to an arbitrary > skb because of skb_shared_info(). This is true even if we > purposefully allocate the necessary head room for these kmalloc based > buffers. I still do not see. For non-SG paths, you have to keep header and data together, you just do not need anything clever. And it does not look like you can. If you have a fixed space for header, you already have a space for refcnt to make skb_shared_info(). For SG paths, TCP_PAGE() seems to be enough and even better than kmalloc()ed buffers. > I think you are talking about "struct sk_buff" Yes. :-) > If inlined, one implementation of retransmit queue becomes apparent. > "struct retransmit_queue" is just list of data blobs, each represented > by usual vector of pages and offset and length. > > Then transmission from write queue is merely propagating pages to > inline skb vector. Yes. I have no idea how good it is, but at least it is something different of what we have now. :-) Seems, it is too simple to be correct. See below. > The latter scheme is closer to what I was thinking about. Why > not inline this entire fraglist thing alongside sk_buff? Sorry, that crap about cost of clone still clouds my mind. :-) Actually, the borderline was not between "short" and "long", but between "plain page vector" and "not-so-plain". See below. > It sounds interesting... so using my sample datastructures above > for aggregated tcp queue, such notifications would be made on > (for example) "retransmit_block" objects? Not quite. TCP is too easy case. If you remember, I did some ugly but 100% functional zero-copy send() without any "architectural" changes: async_seq at socket side and SIOCOUTQ/SIOCOUTWAITQ + POLLWRCOMP extension to poll() at api side were enough to notify when to recycle pages. This sucks because of two reasons: 1. Small one. Someone sort of packet sniffer could hold those pages after they are acked. If they are recycled, it will see wrong content. Seems, we can live with this. TCP works, that's all that is really important. 2. Non-TCP. No queue. Notifications has to be done from skb destructor. It means skb have to carry all the info somewhere. With UDP it could be original not-fragmented skb_shared_info(). Provided we prohibit to change it and to clone page references, the point of recycle is when this skb_shared_info() is destroyed. This is the place, where skb_shared_info() does not want to be inlined. For TCP the idea was to do the same. If transmitted skb does not have separate pages in it, referring to an aggregated skb_shared_info() instead, it would work. This is the second place. It was the first approach. Probably, the best, but utterly inflexible. F.e. it is impossible to merge two tcp segments from different sources. I am afraid, per-fragment destructors are still required. And this is so not cool. :-) Carrying in each skb space for 18 destructors and necessary data looks too scary. I see two ways to do this, both consider the case, when destructors are required as exceptional. 1. Multiple skb_shared_info() per skb. Some of them are nothing but wrapper for a page with refcounter and information necessary for destruction. So, in each skb you have vector: struct { union { struct page *page; struct skb_shared_info *shinfo; } datum; unsigned int offset, size; /* This selects an area inside datum */ } data [MAX_SKB_FRAGS_INLINED]; The pointers could be distingushed by selector hidden in low bits. Sorry. :) struct skb_shared_info is essentially preserved, only some ops and space for information to clone/release are added and frag_list is removed. Seems, it is flexible enough and even reduces overhead sometimes (for SG you have only struct sk_buff with MAX_HEADER inlined and direct references to pages) But code promises to be ugly: sometimes page references are direct, sometimes they are indirect. 2. Another idea is to preserve current structure, but for those "pathological" cases, when special notifications are required, to allocate optional additional vector which could hold "transaction tokens", crap sort of pipe_inode_info for splice(), arbitrarily obscure and complicated, the advantage is that it keeps "normal" skbs small and clean. Alexey ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-13 13:04 ` skb_shared_info() Alexey Kuznetsov @ 2006-08-14 8:01 ` Evgeniy Polyakov 2006-08-14 13:19 ` skb_shared_info() Alexey Kuznetsov 0 siblings, 1 reply; 19+ messages in thread From: Evgeniy Polyakov @ 2006-08-14 8:01 UTC (permalink / raw) To: Alexey Kuznetsov; +Cc: David Miller, netdev, herbert On Sun, Aug 13, 2006 at 05:04:31PM +0400, Alexey Kuznetsov (kuznet@ms2.inr.ac.ru) wrote: > Hello! > > > E1000 wants 16K buffers for jumbo MTU settings. > > > > The reason is that the chip can only handle power-of-2 buffer > > sizes, and next hop from 9K is 16K. > > Let it use pages. Someone should start. :-) > > High order allocations are disaster in any case. > > > > If we store raw kmalloc buffers, we cannot attach them to an arbitrary > > skb because of skb_shared_info(). This is true even if we > > purposefully allocate the necessary head room for these kmalloc based > > buffers. > > I still do not see. For non-SG paths, you have to keep header and data > together, you just do not need anything clever. And it does not look > like you can. If you have a fixed space for header, you already have > a space for refcnt to make skb_shared_info(). > > For SG paths, TCP_PAGE() seems to be enough and even better than > kmalloc()ed buffers. What if we will store array of pages and in shared info just like we have right now. So there will only one destructor. e1000 will setup head/data/tail pointers to point to the area in the first sg page. Those drivers which do not have sg support can use head/data pointers like before, but destructor should be changed in that case. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-14 8:01 ` skb_shared_info() Evgeniy Polyakov @ 2006-08-14 13:19 ` Alexey Kuznetsov 2006-08-14 13:38 ` skb_shared_info() Evgeniy Polyakov 0 siblings, 1 reply; 19+ messages in thread From: Alexey Kuznetsov @ 2006-08-14 13:19 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: David Miller, netdev, herbert Hello! > e1000 will setup head/data/tail pointers to point to the area in the > first sg page. Maybe. But I still hope this is not necessary, the driver should be able to do at least primitive header splitting, in that case the header could be inlined to skb. Alternatively, header can be copied with skb_pull(). Not good, but it is at least a regula procedure, which f.e. allows to keep the pages in highmem. > What if we will store array of pages and in shared info > just like we have right now. So there will only one destructor. Seems, this is not enough. Actually, I was asking for your experience with aio. The simplest situation, which confuses me, translated to aio language: two senders send two pages. Those pages are to be recycled, when we are done. We have to notify both. And we have to merge both those pages in one segment. The second half of that mail suggested three different solutions, all of them creepy. :-) Alexey ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-14 13:19 ` skb_shared_info() Alexey Kuznetsov @ 2006-08-14 13:38 ` Evgeniy Polyakov 2006-08-15 11:30 ` skb_shared_info() Alexey Kuznetsov 0 siblings, 1 reply; 19+ messages in thread From: Evgeniy Polyakov @ 2006-08-14 13:38 UTC (permalink / raw) To: Alexey Kuznetsov; +Cc: David Miller, netdev, herbert On Mon, Aug 14, 2006 at 05:19:19PM +0400, Alexey Kuznetsov (kuznet@ms2.inr.ac.ru) wrote: > > What if we will store array of pages and in shared info > > just like we have right now. So there will only one destructor. > > Seems, this is not enough. Actually, I was asking for your experience > with aio. > > The simplest situation, which confuses me, translated to aio language: > two senders send two pages. Those pages are to be recycled, when we > are done. We have to notify both. And we have to merge both those pages > in one segment. Since they will be sent from different users it will be separate chunks, each of which has own destructor, so yes one destructor per page in described case, but in general it will be one per block of pages. Blocks can then be combined into queue with additional blocks of kmalloced data... Mda, sounds not very good. > The second half of that mail suggested three different solutions, > all of them creepy. :-) I still like existing way - it is much simpler (I hope :) to convince e1000 developers to fix driver's memory usage with help of putting skb_shared_info() or only pointer into skb. With pointer it would be simpler to do refcounting, but it requires additional allocation which is not cheap. Inlining skb_shared_info can lead to troubles with cloning/sharing, although it is also possible to do. Idea of having header inside struct sk_buff is good until we strike something, which will create a header which size exceeds preallocated area (does MAX_TCP_HEADER enough for all, what about ipv6 options?)... So as first step I would try to create extended alloc_skb() with a pointer to shared_info (crap) (in case of appropriate sizes, i.e. when aligned one exceeds power-of-two and so on), and let's e1000 and unix socket use that. If there will be additional requirements for other global changes like you suggested, we can break things more fundamentally :) > Alexey -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-14 13:38 ` skb_shared_info() Evgeniy Polyakov @ 2006-08-15 11:30 ` Alexey Kuznetsov 0 siblings, 0 replies; 19+ messages in thread From: Alexey Kuznetsov @ 2006-08-15 11:30 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: David Miller, netdev, herbert Hello! > I still like existing way - it is much simpler (I hope :) to convince > e1000 developers to fix driver's memory usage e1000 is not a problem at all. It just has to use pages. If it is going to use high order allocations, it will suck, be it order 3 or 2. > area (does MAX_TCP_HEADER enough for all, what about ipv6 options?)... This does not matter. All the cases with pathologically extended headers go slow path with non-trivial pskb_pull(). This is what all those options are invented for. > unix > socket use that. Totally does not worth it, unless it is a part of something bigger. > global changes like you suggested, we can break things more > fundamentally :) _You_ have to. :-) Sender zerocopy stalled only because of this. Alexey ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-08 23:39 skb_shared_info() David Miller ` (2 preceding siblings ...) 2006-08-11 14:00 ` skb_shared_info() Alexey Kuznetsov @ 2006-08-14 5:04 ` Andi Kleen 2006-08-14 7:29 ` skb_shared_info() Herbert Xu 3 siblings, 1 reply; 19+ messages in thread From: Andi Kleen @ 2006-08-14 5:04 UTC (permalink / raw) To: David Miller; +Cc: netdev, herbert, johnpol > The e1000 issue is just one example of this, another > would be any attempt to consolidate the TCP retransmit > queue data management. Another reason to move it in the sk_buff would be better cache coloring? Currently on large/small MTU packets it will be always on the same colors. -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-14 5:04 ` skb_shared_info() Andi Kleen @ 2006-08-14 7:29 ` Herbert Xu 2006-08-14 7:45 ` skb_shared_info() Andi Kleen 0 siblings, 1 reply; 19+ messages in thread From: Herbert Xu @ 2006-08-14 7:29 UTC (permalink / raw) To: Andi Kleen; +Cc: David Miller, netdev, johnpol On Mon, Aug 14, 2006 at 07:04:01AM +0200, Andi Kleen wrote: > > Another reason to move it in the sk_buff would be better cache > coloring? Currently on large/small MTU packets it will be always on > the same colors. If we went with a fully paged skbs this should be a non-issue, right? In a fully paged representation the head would be small which is the perfect place to place the skb_shared_info. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-14 7:29 ` skb_shared_info() Herbert Xu @ 2006-08-14 7:45 ` Andi Kleen 2006-08-14 7:50 ` skb_shared_info() Herbert Xu 0 siblings, 1 reply; 19+ messages in thread From: Andi Kleen @ 2006-08-14 7:45 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, netdev, johnpol On Monday 14 August 2006 09:29, Herbert Xu wrote: > On Mon, Aug 14, 2006 at 07:04:01AM +0200, Andi Kleen wrote: > > > > Another reason to move it in the sk_buff would be better cache > > coloring? Currently on large/small MTU packets it will be always on > > the same colors. > > If we went with a fully paged skbs this should be a non-issue, right? Even for 1.5k MTU? (which is still the most common case after all) > In a fully paged representation the head would be small which is the > perfect place to place the skb_shared_info. If the head is small and allocated with kmalloc then slab should color it yes. It won't for big allocations. -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-14 7:45 ` skb_shared_info() Andi Kleen @ 2006-08-14 7:50 ` Herbert Xu 2006-08-14 8:18 ` skb_shared_info() Andi Kleen 0 siblings, 1 reply; 19+ messages in thread From: Herbert Xu @ 2006-08-14 7:50 UTC (permalink / raw) To: Andi Kleen; +Cc: David Miller, netdev, johnpol On Mon, Aug 14, 2006 at 09:45:53AM +0200, Andi Kleen wrote: > > Even for 1.5k MTU? (which is still the most common case after all) Ideally they would stay in kmalloc memory. Could you explain the cache colouring problem for 1500-byte packets? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: skb_shared_info() 2006-08-14 7:50 ` skb_shared_info() Herbert Xu @ 2006-08-14 8:18 ` Andi Kleen 0 siblings, 0 replies; 19+ messages in thread From: Andi Kleen @ 2006-08-14 8:18 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, netdev, johnpol On Monday 14 August 2006 09:50, Herbert Xu wrote: > On Mon, Aug 14, 2006 at 09:45:53AM +0200, Andi Kleen wrote: > > > > Even for 1.5k MTU? (which is still the most common case after all) > > Ideally they would stay in kmalloc memory. Could you explain the cache > colouring problem for 1500-byte packets? kmalloc rounds the 1.5k up to 2k and with that exactly two objects fit into a 4k page. With that slab doesn't have any space left to do cache coloring. -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-08-15 11:31 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-08 23:39 skb_shared_info() David Miller 2006-08-09 0:36 ` skb_shared_info() Herbert Xu 2006-08-09 0:58 ` skb_shared_info() David Miller 2006-08-09 5:35 ` skb_shared_info() Evgeniy Polyakov 2006-08-09 5:50 ` skb_shared_info() David Miller 2006-08-10 19:49 ` skb_shared_info() Evgeniy Polyakov 2006-08-11 14:00 ` skb_shared_info() Alexey Kuznetsov 2006-08-12 0:27 ` skb_shared_info() David Miller 2006-08-12 0:32 ` skb_shared_info() Herbert Xu 2006-08-13 13:04 ` skb_shared_info() Alexey Kuznetsov 2006-08-14 8:01 ` skb_shared_info() Evgeniy Polyakov 2006-08-14 13:19 ` skb_shared_info() Alexey Kuznetsov 2006-08-14 13:38 ` skb_shared_info() Evgeniy Polyakov 2006-08-15 11:30 ` skb_shared_info() Alexey Kuznetsov 2006-08-14 5:04 ` skb_shared_info() Andi Kleen 2006-08-14 7:29 ` skb_shared_info() Herbert Xu 2006-08-14 7:45 ` skb_shared_info() Andi Kleen 2006-08-14 7:50 ` skb_shared_info() Herbert Xu 2006-08-14 8:18 ` skb_shared_info() Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).