Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: expose vlan structure to user space
From: David Acker @ 2006-04-14 21:10 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20060414.135556.91849056.davem@davemloft.net>

David S. Miller wrote:
> From: David Acker <dacker@roinet.com>
> Date: Fri, 14 Apr 2006 09:15:54 -0400
> 
> 
>>David S. Miller wrote:
>> > I think this really belongs in a portable header file
>> > in glibc somewhere.
>>
>>David, while I can try to work with the glibc folks, I don't understand 
>>how this patch would be different from what if_ether.h exposes to user 
>>space today.  Why is one appropriate and the other not?  I am not trying 
>>to be a pain; I am just trying to understand the rationale so that I can 
>>explain it to others.
> 
> 
> Doing stupid things in the past does not equate to being a reason
> to keep doing so in the future.
> 
> 

Understood.  I will see what I can do in glibc land instead.
-Ack


^ permalink raw reply

* Re: [PATCH 05/10] e1000: Update truesize with the length of the packet for packet split
From: Jeff Garzik @ 2006-04-14 21:16 UTC (permalink / raw)
  To: Kok, Auke
  Cc: netdev, Miller, David, Ronciak, John, Brandeburg, Jesse,
	Kirsher, Jeff, Kok, Auke
In-Reply-To: <20060414183658.3461.4975.stgit@gitlost.site>

Kok, Auke wrote:
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index 2cc9955..7ba1b16 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -3770,6 +3770,7 @@ e1000_clean_rx_irq_ps(struct e1000_adapt
>  			ps_page->ps_page[j] = NULL;
>  			skb->len += length;
>  			skb->data_len += length;
> +			skb->truesize += length;

NAK, apparently this is adding a very serious bug.

	Jeff




^ permalink raw reply

*  Ad:¡ºÎÒÊÇÍæ¼Ò¡»£¬ÓÎÏ·Íæ¼ÒµÄÍøÉÏ¼ÒÔ°£¡
From: O31NZsfPyz @ 2006-04-14 21:33 UTC (permalink / raw)
  To: wXFG2bz

¡ºÎÒÊÇÍæ¼Ò¡»ÍøÕ¾ http://iamwanjia.oicp.net/ ÊÇÎªÓÎÏ·Íæ¼ÒÌá¹©Ò»¸ö»¥ÏàÈÏÊ¶£¬½»Á÷£¬¹µÍ¨µÄÒ»¸öÉçÇøÆ½Ì¨¡£

ÔÚÕâ¸öÆ½Ì¨ÖÐ£¬Ä¿Ç°ÓÐ¡°Íæ¼Ò¶Ô¶ÔÅö¡±£¬¡°ÒµÄÚ×îÐÂÏûÏ¢¡±£¬¡°×°±¸µã¿¨½»Ò×½»»»Çø¡±£¬
¡°ÓÎÏ·ÈËÉú¡±£¬ ¡°ÓéÀÖÐÝÏÐÇø¡± µÈ¶ÔÍæ¼Ò¿ª·ÅµÄÖ÷Ìâ¡£

Èç¹ûÄã¶ÔÓÎÏ·ÓÐÐËÈ¤£¬Çë¼ÓÈëÎÒÃÇ£¬ÎÒÃÇÒ»Í¬½¨ÉèÎÒÃÇ×Ô¼ºµÄÍøÉÏ¼ÒÔ°¡£

±¾Õ¾³¤ÆÚÕÐÆ¸°æÖ÷¡£»¶ÓÄã¼ÓÃË¡£



^ permalink raw reply

* [PATCH] Fix crash on big-endian systems during scan
From: Pavel Roskin @ 2006-04-14 21:59 UTC (permalink / raw)
  To: netdev; +Cc: hostap

From: Pavel Roskin <proski@gnu.org>

The original code was doing arithmetics on a little-endian value.
Reported by Stelios Koroneos <stelios@stelioscellar.com>

Signed-off-by: Pavel Roskin <proski@gnu.org>
---

 drivers/net/wireless/hostap/hostap_ioctl.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/hostap/hostap_ioctl.c b/drivers/net/wireless/hostap/hostap_ioctl.c
index 8b37e82..8399de5 100644
--- a/drivers/net/wireless/hostap/hostap_ioctl.c
+++ b/drivers/net/wireless/hostap/hostap_ioctl.c
@@ -1860,7 +1860,7 @@ static char * __prism2_translate_scan(lo
 	memset(&iwe, 0, sizeof(iwe));
 	iwe.cmd = SIOCGIWFREQ;
 	if (scan) {
-		chan = scan->chid;
+		chan = le16_to_cpu(scan->chid);
 	} else if (bss) {
 		chan = bss->chan;
 	} else {
@@ -1868,7 +1868,7 @@ static char * __prism2_translate_scan(lo
 	}
 
 	if (chan > 0) {
-		iwe.u.freq.m = freq_list[le16_to_cpu(chan - 1)] * 100000;
+		iwe.u.freq.m = freq_list[chan - 1] * 100000;
 		iwe.u.freq.e = 1;
 		current_ev = iwe_stream_add_event(current_ev, end_buf, &iwe,
 						  IW_EV_FREQ_LEN);

^ permalink raw reply related

* Re: [XFRM] Fix aevent timer
From: David S. Miller @ 2006-04-14 22:03 UTC (permalink / raw)
  To: hadi; +Cc: netdev, herbert, hidden
In-Reply-To: <1144932959.5262.5.camel@jzny2>

From: jamal <hadi@cyberus.ca>
Date: Thu, 13 Apr 2006 08:55:58 -0400

> Finally got an opportunity to test this. I have incorporated all the
> feedback and it passes all the scenarios i know.
> 
> Dave, please apply to Linus tree since this is a bug fix.

Applied, thanks.

^ permalink raw reply

* Re: [XFRM Doc]: aevent description
From: David S. Miller @ 2006-04-14 22:05 UTC (permalink / raw)
  To: hadi; +Cc: netdev, hidden, herbert
In-Reply-To: <1144933208.5262.9.camel@jzny2>

From: jamal <hadi@cyberus.ca>
Date: Thu, 13 Apr 2006 09:00:08 -0400

> There is dependency on the previous patch i sent since the issue that
> patch fixes is assumed in this text description. It would be a good
> idea to apply at the same time as the other.

Applied, after fixing 28 lines containing trailing whitespace :-)

^ permalink raw reply

* Re: [PATCH] atm: clip causes unregister hang
From: David S. Miller @ 2006-04-14 22:07 UTC (permalink / raw)
  To: shemminger; +Cc: herbert, chas, linux-atm-general, netdev, stable
In-Reply-To: <20060412145254.4dd21be6@localhost.localdomain>

From: Stephen Hemminger <shemminger@osdl.org>
Date: Wed, 12 Apr 2006 14:52:54 -0700

> If Classical IP over ATM module is loaded, its neighbor table gets
> populated when permanent neighbor entries are created; but these entries
> are not flushed when the device is removed. Since the entry never gets
> flushed the unregister of the network device never completes.
> 
> This version of the patch also adds locking around the reference to
> the atm arp daemon to avoid races with events and daemon state changes.
> (Note: barrier() was never really safe)
> 
> Bug-reference: http://bugzilla.kernel.org/show_bug.cgi?id=6295
> Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

Applied, thanks Stephen.

^ permalink raw reply

* Re: [PATCH 05/10] e1000: Update truesize with the length of the packet for packet split
From: Jesse Brandeburg @ 2006-04-14 22:13 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Kok, Auke, netdev, Miller, David, Ronciak, John,
	Brandeburg, Jesse, Kirsher, Jeff, Kok, Auke
In-Reply-To: <4440112D.9040509@pobox.com>

On Fri, 14 Apr 2006, Jeff Garzik wrote:

> Kok, Auke wrote:
> > diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> > index 2cc9955..7ba1b16 100644
> > --- a/drivers/net/e1000/e1000_main.c
> > +++ b/drivers/net/e1000/e1000_main.c
> > @@ -3770,6 +3770,7 @@ e1000_clean_rx_irq_ps(struct e1000_adapt
> >  			ps_page->ps_page[j] = NULL;
> >  			skb->len += length;
> >  			skb->data_len += length;
> > +			skb->truesize += length;
> 
> NAK, apparently this is adding a very serious bug.

I'm surprised to hear you say that, here's why.  Elaboration would be 
helpful.

our packet split receive code uses two buffers for any packet the hardware 
was able to split.  The skb when we're receiving say a 1000 byte IPv4 
packet looks like this:

skb -
    |- len = total packet length (1000)
    |- data_len = length in skb_shinfo(skb)->frags[] (1000 - headerlen)
    |- truesize = size of originally allocated skb (128 + NET_IP_ALIGN)

in this case, the stack is using truesize to figure out how much to charge 
the receive socket for our receive packet.

in the old driver we indicate a packet with truesize = 128, which has 1000 
bytes of data in it.  normally (for everyone using only an skb) skb->len, 
and skb->truesize are updated by skb_put();  In this case we don't have a 
kernel support function for adding length to skb->len *and* to 
skb->truesize when chaining on something to ->frags[]

see the code in do_tcp_sendpages for an example of what the stack does in 
setting up a packet for transmit using frags[]

it updates skb->len and skb->truesize, right after calling 
skb_fill_page_desc, just like we do.

interesingly, that is right where sk_forward_alloc is modified (but thats 
another thread)

Jesse

^ permalink raw reply

* Re: [PATCH 05/10] e1000: Update truesize with the length of the packet for packet split
From: David S. Miller @ 2006-04-14 22:32 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: jgarzik, auke-jan.h.kok, netdev, john.ronciak, Jeffrey.t.kirsher,
	auke
In-Reply-To: <Pine.WNT.4.63.0604141448520.2116@jbrandeb-desk.amr.corp.intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date: Fri, 14 Apr 2006 15:13:31 -0700 (Pacific Daylight Time)

> in the old driver we indicate a packet with truesize = 128, which has 1000 
> bytes of data in it.  normally (for everyone using only an skb) skb->len, 
> and skb->truesize are updated by skb_put();  In this case we don't have a 
> kernel support function for adding length to skb->len *and* to 
> skb->truesize when chaining on something to ->frags[]

Where does skb_put() update truesize?

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Jesse Brandeburg @ 2006-04-14 22:32 UTC (permalink / raw)
  To: David S. Miller
  Cc: jesse.brandeburg, bb, herbert, kernel, nipsy, jrlundgren, cat,
	djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
	jesse.brandeburg, E1000-devel, ak, jgarzik
In-Reply-To: <20060414.140225.113553109.davem@davemloft.net>



On Fri, 14 Apr 2006, David S. Miller wrote:

> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Date: Fri, 14 Apr 2006 13:28:10 -0700
> 
> > We also have some new data from the last couple of days.  First, I think 
> > that this problem is likely not just E1000's fault.  We have multiple 
> > reports both in bugzilla.kernel.org and from a distro that show this 
> > problem has occurred on (at least) tg3 driven adapters as well as e1000.
> 
> That's interesting since tg3 does not enable TSO by default for any
> chip until very recent versions of the tg3 driver.  And even with
> those recent tg3 drivers which will enable TSO by default, it is only
> done for a very specific selection of chip revisions.
> 
> Where are these reports that precisely implicate tg3?

well there was one of them here, but the tg3 bit may actually be due to 
the 2.6.14 problems.

http://bugzilla.kernel.org/show_bug.cgi?id=6279

the other one is buried at a distro's bugzilla, so I can't post it.

> Note that there were legitimate TSO retransmit bugs in the 2.6.14
> timeframe that triggered those messages and did get fixed.  And yes
> some of those reports were with tg3.

I've removed the call to pskb_expand_head in e1000 (yes i know its not 
quite right, but it shouldn't be fatal either) and the bug still occurs.

so its something else.  I'm not giving up on this, I just want to post the 
state of the investigation.

Jesse

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: David S. Miller @ 2006-04-14 22:42 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: bb, herbert, kernel, nipsy, jrlundgren, cat, djani22,
	yoseph.basri, mykleb, olel, michal, chris, netdev,
	jesse.brandeburg, E1000-devel, ak, jgarzik
In-Reply-To: <Pine.WNT.4.63.0604141527420.2116@jbrandeb-desk.amr.corp.intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date: Fri, 14 Apr 2006 15:32:55 -0700 (Pacific Daylight Time)

> well there was one of them here, but the tg3 bit may actually be due to 
> the 2.6.14 problems.
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=6279

There are 2 e1000 gigabit devices in that person's system, and
not one tg3 device.

^ permalink raw reply

* Re: [PATCH 05/10] e1000: Update truesize with the length of the packet for packet split
From: Jesse Brandeburg @ 2006-04-14 22:43 UTC (permalink / raw)
  To: David S. Miller
  Cc: jesse.brandeburg, jgarzik, auke-jan.h.kok, netdev, john.ronciak,
	Jeffrey.t.kirsher, auke
In-Reply-To: <20060414.153219.62111494.davem@davemloft.net>

On Fri, 14 Apr 2006, David S. Miller wrote:

> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Date: Fri, 14 Apr 2006 15:13:31 -0700 (Pacific Daylight Time)
> 
> > in the old driver we indicate a packet with truesize = 128, which has 1000 
> > bytes of data in it.  normally (for everyone using only an skb) skb->len, 
> > and skb->truesize are updated by skb_put();  In this case we don't have a 
> > kernel support function for adding length to skb->len *and* to 
> > skb->truesize when chaining on something to ->frags[]
> 
> Where does skb_put() update truesize?

my mistake, it does not.  but truesize in the "normal" case is already set 
to the full length of the allocated skb (which would hold the whole 
frame).  In the packet split case the received skb looks dramatically 
different, and since we're one of the only drivers using this mechanism.

I could easily be incorrect here, but my interpretation based on the 
stack's use of ->truesize in do_tcp_sendpages is what I based our code 
change on.

Please help me understand how you think it should work when we have a 
device that wants to receive a packet using header in the skb->data, and 
application data in the ->frags[]? 

Jesse

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Jesse Brandeburg @ 2006-04-14 22:46 UTC (permalink / raw)
  To: David S. Miller
  Cc: jesse.brandeburg, bb, herbert, kernel, nipsy, jrlundgren, cat,
	djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
	jesse.brandeburg, E1000-devel, ak, jgarzik
In-Reply-To: <20060414.154240.30125561.davem@davemloft.net>

On Fri, 14 Apr 2006, David S. Miller wrote:

> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Date: Fri, 14 Apr 2006 15:32:55 -0700 (Pacific Daylight Time)
> 
> > well there was one of them here, but the tg3 bit may actually be due to 
> > the 2.6.14 problems.
> > 
> > http://bugzilla.kernel.org/show_bug.cgi?id=6279
> 
> There are 2 e1000 gigabit devices in that person's system, and
> not one tg3 device.

sure, thats fine, but we just reproduced it in two seperate systems 
without the e1000 driver loaded, using the instructions as mentioned in a 
previous email.  We used a 5704 with TSO enabled.

I'm willing to try any debug patches you might come up with.

Jesse

^ permalink raw reply

* Re: [PATCH 05/10] e1000: Update truesize with the length of the packet for packet split
From: David S. Miller @ 2006-04-14 22:51 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: jgarzik, auke-jan.h.kok, netdev, john.ronciak, Jeffrey.t.kirsher,
	auke
In-Reply-To: <Pine.WNT.4.63.0604141533250.2116@jbrandeb-desk.amr.corp.intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date: Fri, 14 Apr 2006 15:43:02 -0700 (Pacific Daylight Time)

> Please help me understand how you think it should work when we have a 
> device that wants to receive a packet using header in the skb->data, and 
> application data in the ->frags[]? 

If you're removing the pages from ->frags[] and you already accounted
for them in skb->truesize, then yes I guess it could be correct to
subtract it back out.

Looking at S2IO it's doing something very wrong with it's modification
of skb->truesize.  It's not changing the amount of memory allocated
to an SKB in any way yet it's bumping skb->truesize for some reason.
That will definitely lead to performance problems.

That's why generally when we see an skb->truesize modification, it's
just assumed to be a bug, it's generally not necessary but may be
so in your e1000 situation here.

Another problem to look out for is that once a socket has a reference
to an SKB you must never ever modify skb->truesize because doing so
will break socket memory accounting.  There is exactly one spot in
the TCP stack where we are able to do this by adjusting the socket
memory accounting variables carefully in a locked context at the same
time, but otherwise it's not safe to modify skb->truesize when there
is an attached socket.

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: David S. Miller @ 2006-04-14 22:52 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: bb, herbert, kernel, nipsy, jrlundgren, cat, djani22,
	yoseph.basri, mykleb, olel, michal, chris, netdev,
	jesse.brandeburg, E1000-devel, ak, jgarzik
In-Reply-To: <Pine.WNT.4.63.0604141543170.2116@jbrandeb-desk.amr.corp.intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date: Fri, 14 Apr 2006 15:46:31 -0700 (Pacific Daylight Time)

> sure, thats fine, but we just reproduced it in two seperate systems 
> without the e1000 driver loaded, using the instructions as mentioned in a 
> previous email.  We used a 5704 with TSO enabled.

That I didn't notice, thanks for the datapoint.

^ permalink raw reply

* Re: [PATCH] atm: clip timer race
From: David S. Miller @ 2006-04-14 22:56 UTC (permalink / raw)
  To: shemminger; +Cc: herbert, chas, linux-atm-general, netdev
In-Reply-To: <20060412154214.1dd4117d@localhost.localdomain>

From: Stephen Hemminger <shemminger@osdl.org>
Date: Wed, 12 Apr 2006 15:42:14 -0700

> By inspection, the clip idle timer code is racy on SMP.
> Here is a safe version of timer management.
> Untested, I don't have ATM hardware.
> 
> Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

Applied, thanks Stephen.

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Jesse Brandeburg @ 2006-04-14 22:55 UTC (permalink / raw)
  To: David S. Miller
  Cc: jesse.brandeburg, bb, herbert, kernel, nipsy, jrlundgren, cat,
	djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
	jesse.brandeburg, E1000-devel, ak, jgarzik
In-Reply-To: <20060414.155202.82987818.davem@davemloft.net>

On Fri, 14 Apr 2006, David S. Miller wrote:

> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Date: Fri, 14 Apr 2006 15:46:31 -0700 (Pacific Daylight Time)
> 
> > sure, thats fine, but we just reproduced it in two seperate systems 
> > without the e1000 driver loaded, using the instructions as mentioned in a 
> > previous email.  We used a 5704 with TSO enabled.
> 
> That I didn't notice, thanks for the datapoint.

you're welcome, its "hot off the presses" (we just reproduced it 5 minutes 
ago) 

I'm trying to isolate more of a reproduction case, I'll be sure to post if 
I can find anything with more detail.

^ permalink raw reply

* Re: [PATCH 1/4] clip: run through Lindent
From: David S. Miller @ 2006-04-14 22:58 UTC (permalink / raw)
  To: shemminger; +Cc: chas, linux-atm-general, netdev, stable
In-Reply-To: <20060413152224.2d46df99@localhost.localdomain>

From: Stephen Hemminger <shemminger@osdl.org>
Date: Thu, 13 Apr 2006 15:22:24 -0700

> Run CLIP driver through Lindent script to fix formatting.
> 
> Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

These no longer apply after putting in the two CLIP
bug fixes.

Please respin, thanks.

^ permalink raw reply

* Re: [PATCH 1/4] clip: run through Lindent
From: David S. Miller @ 2006-04-14 22:59 UTC (permalink / raw)
  To: shemminger; +Cc: chas, linux-atm-general, netdev, stable
In-Reply-To: <20060413152224.2d46df99@localhost.localdomain>


Ignore my previous email, I tried to apply it to the wrong
tree :-)

^ permalink raw reply

* Re: [PATCH] IrDA: irda-usb, unregister netdev when patch upload fails
From: David S. Miller @ 2006-04-14 23:02 UTC (permalink / raw)
  To: samuel.ortiz; +Cc: irda-users, jt, netdev
In-Reply-To: <20060414101607.GA32724@irie>

From: Samuel Ortiz <samuel.ortiz@nokia.com>
Date: Fri, 14 Apr 2006 13:16:07 +0300

> Hi David,
> 
> In the STIR421x case, when the firmware upload fails, we need to unregister_netdev. Otherwise we hit a BUG on free_netdev(), if sysfs is enabled.
>  
> Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>

Applied, but please become familiar with the carriage return key on
your keyboard and format your changelog descirptions to ~80 columns.

Thanks.

^ permalink raw reply

* Re: [PATCH] IrDA: smsc-ircc2, smcinit support for ALi ISA bridges
From: David S. Miller @ 2006-04-14 23:03 UTC (permalink / raw)
  To: samuel.ortiz; +Cc: netdev, jt, irda-users
In-Reply-To: <20060414101617.GB32724@irie>

From: Samuel Ortiz <samuel.ortiz@nokia.com>
Date: Fri, 14 Apr 2006 13:16:18 +0300

> Hi David,
> 
> This patch enables support for ALi ISA bridges when we run the smcinit code.
> It is needed to properly configure some Toshiba laptops.
> 
> Patch from Linus Walleij <triad@df.lth.se>
> Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>

Also applied, thanks.

^ permalink raw reply

* Re: [PATCH 05/10] e1000: Update truesize with the length of the packet for packet split
From: Jesse Brandeburg @ 2006-04-14 23:04 UTC (permalink / raw)
  To: David S. Miller
  Cc: jesse.brandeburg, jgarzik, auke-jan.h.kok, netdev, john.ronciak,
	Jeffrey.t.kirsher, auke
In-Reply-To: <20060414.155129.121789313.davem@davemloft.net>

On Fri, 14 Apr 2006, David S. Miller wrote:

> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Date: Fri, 14 Apr 2006 15:43:02 -0700 (Pacific Daylight Time)
> 
> > Please help me understand how you think it should work when we have a 
> > device that wants to receive a packet using header in the skb->data, and 
> > application data in the ->frags[]? 
> 
> If you're removing the pages from ->frags[] and you already accounted
> for them in skb->truesize, then yes I guess it could be correct to
> subtract it back out.

they are not accounted for in the skb yet.  the way e1000 works is it 
pre-allocates all its 128 byte receive buffers using skb_alloc, and then 
chains in any pages that are used for receive packets.

> Looking at S2IO it's doing something very wrong with it's modification
> of skb->truesize.  It's not changing the amount of memory allocated
> to an SKB in any way yet it's bumping skb->truesize for some reason.
> That will definitely lead to performance problems.

I can only speak to what e1000 is trying to do. we're technically 
increasing the size of the skb after we've allocated it, so

does nr_frags and frags[] need truesize to be correct?

> That's why generally when we see an skb->truesize modification, it's
> just assumed to be a bug, it's generally not necessary but may be
> so in your e1000 situation here.

cool, I would really prefer that we extend the kernel api's to "do the 
right thing" for receives using frags[]

> Another problem to look out for is that once a socket has a reference
> to an SKB you must never ever modify skb->truesize because doing so
> will break socket memory accounting.  There is exactly one spot in
> the TCP stack where we are able to do this by adjusting the socket
> memory accounting variables carefully in a locked context at the same
> time, but otherwise it's not safe to modify skb->truesize when there
> is an attached socket.

Absolutely!  we're indicating the packet to the stack using netif_rx* (and 
we allocated it), so no socket has the reference until after we change 
truesize.


^ permalink raw reply

* Re: [PATCH 05/10] e1000: Update truesize with the length of the packet for packet split
From: David S. Miller @ 2006-04-14 23:45 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: jgarzik, auke-jan.h.kok, netdev, john.ronciak, Jeffrey.t.kirsher,
	auke
In-Reply-To: <Pine.WNT.4.63.0604141555200.2116@jbrandeb-desk.amr.corp.intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date: Fri, 14 Apr 2006 16:04:21 -0700 (Pacific Daylight Time)

> On Fri, 14 Apr 2006, David S. Miller wrote:
> 
> > From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > Date: Fri, 14 Apr 2006 15:43:02 -0700 (Pacific Daylight Time)
> > 
> > > Please help me understand how you think it should work when we have a 
> > > device that wants to receive a packet using header in the skb->data, and 
> > > application data in the ->frags[]? 
> > 
> > If you're removing the pages from ->frags[] and you already accounted
> > for them in skb->truesize, then yes I guess it could be correct to
> > subtract it back out.
> 
> they are not accounted for in the skb yet.  the way e1000 works is it 
> pre-allocates all its 128 byte receive buffers using skb_alloc, and then 
> chains in any pages that are used for receive packets.

Ok so you allocate the base skb->data memory using alloc_skb().  Then
you add on pages and increment skb->truesize to account for the pages.
And you do all of this for receive packets before they are passed up
to the stack.

Right?

If so, then that should be OK.

Jeff, I seem to have misunderstood what e1000 was doing, the patches
changing skb->truesize should be fine.

^ permalink raw reply

* Re: [PATCH 05/10] e1000: Update truesize with the length of the packet for packet split
From: Jesse Brandeburg @ 2006-04-14 23:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: jesse.brandeburg, jgarzik, auke-jan.h.kok, netdev, john.ronciak,
	Jeffrey.t.kirsher, auke
In-Reply-To: <20060414.164512.17280154.davem@davemloft.net>



On Fri, 14 Apr 2006, David S. Miller wrote:
> > they are not accounted for in the skb yet.  the way e1000 works is it 
> > pre-allocates all its 128 byte receive buffers using skb_alloc, and then 
> > chains in any pages that are used for receive packets.
> 
> Ok so you allocate the base skb->data memory using alloc_skb().  Then
> you add on pages and increment skb->truesize to account for the pages.
> And you do all of this for receive packets before they are passed up
> to the stack.
> 
> Right?

yes, exactly.

> If so, then that should be OK.
> 
> Jeff, I seem to have misunderstood what e1000 was doing, the patches
> changing skb->truesize should be fine.

Thanks

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: David S. Miller @ 2006-04-14 23:53 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: bb, herbert, kernel, nipsy, jrlundgren, cat, djani22,
	yoseph.basri, mykleb, olel, michal, chris, netdev,
	jesse.brandeburg, E1000-devel, ak, jgarzik
In-Reply-To: <Pine.WNT.4.63.0604141553210.2116@jbrandeb-desk.amr.corp.intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date: Fri, 14 Apr 2006 15:55:10 -0700 (Pacific Daylight Time)

> I'm trying to isolate more of a reproduction case, I'll be sure to
> post if I can find anything with more detail.

I think I see the bug.

If tbench with large numbers of clients is part of what helps
reproduce it, the key might be hitting the memory limits in tcp_mem[]
and friends, or something to do with concurrent access to
sk->sk_forward_alloc.

I bet there is some race in there.

A lot of the action is in net/core/stream.c  We modify
sk->sk_forward_alloc non-atomically but that should be ok
since we ought to be holding all of the correct locks when
we hit these accesses.  But it is the first thing to audit.

Let's look at sk_stream_rfree() as that is invoked from SKB
freeing callbacks and is the most likely suspect for these
kinds of problems.

It is hooked up to the skb->destructor by sk_stream_set_owner_r() and
then invoked via __kfree_skb().

Nothing here takes any locks, and as stated above we modify
sk->sk_forward_alloc non-atomically, and this is therefore the bug.

Shit.

I'll think of how to fix this in the least invasive manner.  I also
want to search the changelog history to see if this race was always
present or if it was "introduced".

Making sk->sk_forward_alloc an atomic_t would be incredibly expensive
so I'll try to find a way to avoid that.  We may be able to just do
a bh_lock_sock()/bh_unlock_sock() around the body of sk_stream_rfree()
to fix this.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox