Netdev List
 help / color / mirror / Atom feed
* About the Davicom PHY in drivers/net/phy in Linux kernel
From: macpaul @ 2010-10-28  6:33 UTC (permalink / raw)
  To: netdev; +Cc: afleming, jeff, f.rodo, joseph_chang

Hi all,

According to the source code of Davicom PHY in Linux,
We should do "ISOLATE" command to PHY before setting "auto negotiation" or "MII/RMII".

However, I've found that with Faraday's MAC/GMAC controller (ftmac100/ftgmac100), setting ISOLATE for multiple PHY configuration will lead MDC become stop because the flaw inside the MAC controller. Faraday's MAC/GMAC will leverage the TX_CLK as the MDC source. When FTMAC100 send "ISOLATE" to Davicom's PHY, the TX_CLK send out from PHY will be stopped, then MDC will also become stop.

However, this mail wasn't meant to discuss about the design flaw inside the IC.

We've done two test to the following codes.

1st: if we just skip the " BMCR_ISOLATE" setting command, we will get PHY sometimes become unstable, for example, could not do DHCP request successfully.

2nd: if we replace "BMCR_ISOLATE" to "BMCR_RESET", we could get rid of the problem occurred by Faraday GMAC. And the PHY works well still.

I've found that in some other PHY implementation, for example, in "marvell.c", there are seems no ISOLATE commands. There are only RESET commands.
If we could replace BMCR_ISOLATE to BMCR_RESET in current kernel source, will there be any unpredictable behavior happened?

Please give us suggestion according to your experiences.
Thanks a lot.

+static int dm9161_config_aneg(struct phy_device *phydev)
+{
+	int err;
+
+	/* Isolate the PHY */
+	err = phy_write(phydev, MII_BMCR, BMCR_ISOLATE);
+
+	if (err < 0)
+		return err;
+
+	/* Configure the new settings */
+	err = genphy_config_aneg(phydev);
+
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static int dm9161_config_init(struct phy_device *phydev)
+{
+	int err;
+
+	/* Isolate the PHY */
+	err = phy_write(phydev, MII_BMCR, BMCR_ISOLATE);
+
+	if (err < 0)
+		return err;
+
+	/* Do not bypass the scrambler/descrambler */
+	err = phy_write(phydev, MII_DM9161_SCR, MII_DM9161_SCR_INIT);
+
+	if (err < 0)
+		return err;
+
+	/* Clear 10BTCSR to default */
+	err = phy_write(phydev, MII_DM9161_10BTCSR, MII_DM9161_10BTCSR_INIT);
+
+	if (err < 0)
+		return err;
+
+	/* Reconnect the PHY, and enable Autonegotiation */
+	err = phy_write(phydev, MII_BMCR, BMCR_ANENABLE);
+
+	if (err < 0)
+		return err;
+
+	return 0;

Best regards,
Macpaul Lin

^ permalink raw reply

* Re: [PATCH] cxgb3: fix crash due to manipulating queues before registration
From: Divy Le Ray @ 2010-10-28  6:35 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: eric.dumazet, sonnyrao, Dimitrios Michailidis, netdev,
	linux-kernel
In-Reply-To: <1288242390-28574-1-git-send-email-nacc@us.ibm.com>

On 10/27/2010 10:06 PM, Nishanth Aravamudan wrote:
> Hi Eric,
>
> Something like the following?:
>
> Thanks,
> Nish
>
>
> Along the same lines as "cxgb4: fix crash due to manipulating queues
> before registration" (8f6d9f40476895571df039b6f1f5230ec7faebad), before
> commit "net: allocate tx queues in register_netdevice"
> netif_tx_stop_all_queues and related functions could be used between
> device allocation and registration but now only after registration.
> cxgb4 has such a call before registration and crashes now.  Move it
> after register_netdev.
>
> Signed-off-by: Nishanth Aravamudan<nacc@us.ibm.com>

Acked-by: Divy Le Ray <divy@chelsio.com>

> Cc: eric.dumazet@gmail.com
> Cc: sonnyrao@us.ibm.com
> Cc: Divy Le Ray<divy@chelsio.com>
> Cc: Dimitris Michailidis<dm@chelsio.com>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>   drivers/net/cxgb3/cxgb3_main.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
> index 4e3c123..96c70a5 100644
> --- a/drivers/net/cxgb3/cxgb3_main.c
> +++ b/drivers/net/cxgb3/cxgb3_main.c
> @@ -3301,7 +3301,6 @@ static int __devinit init_one(struct pci_dev *pdev,
>   		pi->rx_offload = T3_RX_CSUM | T3_LRO;
>   		pi->port_id = i;
>   		netif_carrier_off(netdev);
> -		netif_tx_stop_all_queues(netdev);
>   		netdev->irq = pdev->irq;
>   		netdev->mem_start = mmio_start;
>   		netdev->mem_end = mmio_start + mmio_len - 1;
> @@ -3342,6 +3341,7 @@ static int __devinit init_one(struct pci_dev *pdev,
>   				adapter->name = adapter->port[i]->name;
>
>   			__set_bit(i,&adapter->registered_device_map);
> +			 netif_tx_stop_all_queues(adapter->port[i]);
>   		}
>   	}
>   	if (!adapter->registered_device_map) {



^ permalink raw reply

* Re: 2.6.35->2.6.36 regression, vanilla kernel panic, ppp or hrtimers crashing
From: Jarek Poplawski @ 2010-10-28  7:05 UTC (permalink / raw)
  To: Denys Fedoryshchenko
  Cc: Thomas Gleixner, Paul Mackerras, linux-kernel, netdev
In-Reply-To: <201010251222.37191.nuclearcat@nuclearcat.com>

On 2010-10-25 11:22, Denys Fedoryshchenko wrote:
> Hi
> 
> Here is what i got from netconsole
>  [  259.238755] BUG: unable to handle kernel 
>  paging request
>  at f8ba001c
>  [  259.238953] IP:
>  [<c0199ebe>] do_select+0x2cc/0x502
...
> It is not easy to do full git bisect(it is semi-embedded distro), but i can 
> try reversing particular commits, if someone can give idea which one, and can 
> try debug patches.

Hi,
Nothing concrete, but you might try reverting this one:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.36.y.git;a=commitdiff;h=15fd0cd9a2ad24a78fbee369dec8ca660979d57e

Jarek P.

^ permalink raw reply

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Krishna Kumar2 @ 2010-10-28  7:18 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, arnd, avi, davem, eric.dumazet, kvm, Michael S. Tsirkin,
	netdev, rusty
In-Reply-To: <OFC29C4491.59069AD1-ON652577CA.00170F0D-652577CA.001C76C8@LocalDomain>

> Krishna Kumar2/India/IBM wrote on 10/28/2010 10:44:14 AM:
>
> > > > > Results for UDP BW tests (unidirectional, sum across
> > > > > 3 iterations, each iteration of 45 seconds, default
> > > > > netperf, vhosts bound to cpus 0-3; no other tuning):
> > > >
> > > > Is binding vhost threads to CPUs really required?
> > > > What happens if we let the scheduler do its job?
> > >
> > > Nothing drastic, I remember BW% and SD% both improved a
> > > bit as a result of binding.
> >
> > If there's a significant improvement this would mean that
> > we need to rethink the vhost-net interaction with the scheduler.
>
> I will get a test run with and without binding and post the
> results later today.

Correction: The result with binding is is much better for
SD/CPU compared to without-binding:

_____________________________________________________
     numtxqs=8,vhosts=5, Bind vs No-bind
#     BW%     CPU%     RCPU%     SD%       RSD%
_____________________________________________________
1     11.25     10.77    1.89     0        -6.06
2     18.66     7.20     7.20    -14.28    -7.40
4     4.24     -1.27     1.56    -2.70     -.98
8     14.91    -3.79     5.46    -12.19    -3.76
16    12.32    -8.67     4.63    -35.97    -26.66
24    11.68    -7.83     5.10    -40.73    -32.37
32    13.09    -10.51    6.57    -51.52    -42.28
40    11.04    -4.12     11.23   -50.69    -42.81
48    8.61     -10.30    6.04    -62.38    -55.54
64    7.55     -6.05     6.41    -61.20    -56.04
80    8.74     -11.45    6.29    -72.65    -67.17
96    9.84     -6.01     9.87    -69.89    -64.78
128   5.57     -6.23     8.99    -75.03    -70.97
_____________________________________________________
BW: 10.4%,  CPU/RCPU: -7.4%,7.7%,  SD: -70.5%,-65.7%

Notes:
    1.  All my test results earlier was binding vhost
        to cpus 0-3 for both org and new kernel.
    2.  I am not using MST's use_mq patch, only mainline
        kernel. However, I reported earlier that I got
        better results with that patch. The result for
        MQ vs MQ+use_mm patch (from my earlier mail):

BW: 0   CPU/RCPU: -4.2,-6.1  SD/RSD: -13.1,-15.6

Thanks,

- KK


^ permalink raw reply

* Re: [Bugme-new] [Bug 20772] New: ip= nfsaddrs= stopped working
From: Simon Horman @ 2010-10-27 22:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: netdev, bugzilla-daemon, bugme-daemon, mpetersen
In-Reply-To: <20101019120938.93622e82.akpm@linux-foundation.org>

On Tue, Oct 19, 2010 at 12:09:38PM -0700, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Tue, 19 Oct 2010 18:20:15 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=20772
> > 
> >            Summary: ip= nfsaddrs= stopped working
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 2.6.33, 2.6.32, 2.6.26
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: IPV4
> >         AssignedTo: shemminger@linux-foundation.org
> >         ReportedBy: mpetersen@peak6.com
> >         Regression: Yes
> > 
> > 
> > I can no longer set a static IP address using either ip= or nfsaddr=,
> > ie ip=192.168.0.42:192.168.0.69:255.255.255.0:testhost:eth0:none has no
> > effect, and DHCP is still attempted.
> > 
> > Even if I undefine IP_PNP_DHCP IP_PNP_BOOTP and IP_PNP_RARP in the
> > config (but leave IP_PNP,) the kernel seems to ignore the static IP
> > settings and try to get a an IP address with DHCP somehow.

Hi,

I believe that the problem is that there is a minor syntax error in the
configuration parameter. Specifically there should be two colons between
192.168.0.69 and 255.255.255.0 or between 192.168.0.42 and 192.168.0.69.

As it stands the settings are:

	my address	192.168.0.42
	server address	192.168.0.69
	gateway		255.255.255.0
	netmask		testhost
	hostname	eth0
	interface	none

I believe it is the interface=none that is killing any IP configuration
by the kernel as no such interface exists.

^ permalink raw reply

* RE: About the Davicom PHY in drivers/net/phy in Linux kernel
From: Joseph Chang @ 2010-10-28  7:59 UTC (permalink / raw)
  To: macpaul, netdev; +Cc: afleming, jeff, f.rodo
In-Reply-To: <50FD90C65C53FB45BADEEBCD84FF07F2029CB176@ATCPCS06.andestech.com>

Dear Mac Paul,

For you are using DAVICOM PHY in this case.

Our suggestion, please check below red comment text:

 * == > Would you tell us your:
	CPU = ?
	Linux Kernel version= ?
      I will like to download the same source code from LXR.
      And can check more detail for you. 

+static int dm9161_config_aneg(struct phy_device *phydev) {
+	int err;
+
+	/* Isolate the PHY */
     err = 0;
+	//err = phy_write(phydev, MII_BMCR, BMCR_ISOLATE);
+
+	if (err < 0)
+		return err;
+
+	/* Configure the new settings */
+	err = genphy_config_aneg(phydev);
    err = phy_write(phydev, MII_BMCR, BMCR_ANENABLE | 0x200 );
+
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static int dm9161_config_init(struct phy_device *phydev) {
+	int err;
+
+	/* Isolate the PHY */
     err = 0;
+	//err = phy_write(phydev, MII_BMCR, BMCR_ISOLATE);
+
+	if (err < 0)
+		return err;
+
+	/* Do not bypass the scrambler/descrambler */
+	err = phy_write(phydev, MII_DM9161_SCR, MII_DM9161_SCR_INIT);
+
+	if (err < 0)
+		return err;
+
+	/* Clear 10BTCSR to default */
+	err = phy_write(phydev, MII_DM9161_10BTCSR, MII_DM9161_10BTCSR_INIT);
+
+	if (err < 0)
+		return err;
+
+	/* Reconnect the PHY, and enable Autonegotiation */
+	err = phy_write(phydev, MII_BMCR, BMCR_ANENABLE | 0x200 );
+
+	if (err < 0)
+		return err;
+
+	return 0;

Best Regards,
Joseph CHANG 
System Application Engineering Division 
Davicom Semiconductor, Inc. 
No. 6 Li-Hsin Rd. VI, Science-Based Park, 
Hsin-Chu, Taiwan. 
Tel: 886-3-5798797 Ex 8534
Fax: 886-3-5646929 
Web: http://www.davicom.com.tw 


-----Original Message-----
From: macpaul@andestech.com [mailto:macpaul@andestech.com] 
Sent: Thursday, October 28, 2010 2:34 PM
To: netdev@vger.kernel.org
Cc: afleming@freescale.com; jeff@garzik.org; f.rodo@til-technologies.fr; joseph_chang@mail.davicom.com.tw
Subject: About the Davicom PHY in drivers/net/phy in Linux kernel

Hi all,

According to the source code of Davicom PHY in Linux,
We should do "ISOLATE" command to PHY before setting "auto negotiation" or "MII/RMII".

However, I've found that with Faraday's MAC/GMAC controller (ftmac100/ftgmac100), setting ISOLATE for multiple PHY configuration will lead MDC become stop because the flaw inside the MAC controller. Faraday's MAC/GMAC will leverage the TX_CLK as the MDC source. When FTMAC100 send "ISOLATE" to Davicom's PHY, the TX_CLK send out from PHY will be stopped, then MDC will also become stop.

However, this mail wasn't meant to discuss about the design flaw inside the IC.

We've done two test to the following codes.

1st: if we just skip the " BMCR_ISOLATE" setting command, we will get PHY sometimes become unstable, for example, could not do DHCP request successfully.

2nd: if we replace "BMCR_ISOLATE" to "BMCR_RESET", we could get rid of the problem occurred by Faraday GMAC. And the PHY works well still.

I've found that in some other PHY implementation, for example, in "marvell.c", there are seems no ISOLATE commands. There are only RESET commands.
If we could replace BMCR_ISOLATE to BMCR_RESET in current kernel source, will there be any unpredictable behavior happened?

Please give us suggestion according to your experiences.
Thanks a lot.

+static int dm9161_config_aneg(struct phy_device *phydev)
+{
+	int err;
+
+	/* Isolate the PHY */
+	err = phy_write(phydev, MII_BMCR, BMCR_ISOLATE);
+
+	if (err < 0)
+		return err;
+
+	/* Configure the new settings */
+	err = genphy_config_aneg(phydev);
+
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static int dm9161_config_init(struct phy_device *phydev)
+{
+	int err;
+
+	/* Isolate the PHY */
+	err = phy_write(phydev, MII_BMCR, BMCR_ISOLATE);
+
+	if (err < 0)
+		return err;
+
+	/* Do not bypass the scrambler/descrambler */
+	err = phy_write(phydev, MII_DM9161_SCR, MII_DM9161_SCR_INIT);
+
+	if (err < 0)
+		return err;
+
+	/* Clear 10BTCSR to default */
+	err = phy_write(phydev, MII_DM9161_10BTCSR, MII_DM9161_10BTCSR_INIT);
+
+	if (err < 0)
+		return err;
+
+	/* Reconnect the PHY, and enable Autonegotiation */
+	err = phy_write(phydev, MII_BMCR, BMCR_ANENABLE);
+
+	if (err < 0)
+		return err;
+
+	return 0;

Best regards,
Macpaul Lin

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



^ permalink raw reply

* Re: IPV6 raw socket denies bind(2)
From: Rémi Denis-Courmont @ 2010-10-28  8:25 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netdev
In-Reply-To: <alpine.LNX.2.01.1010280000260.7820@obet.zrqbmnf.qr>


On Thu, 28 Oct 2010 00:01:57 +0200 (CEST), Jan Engelhardt
<jengelh@medozas.de> wrote:
> #include <sys/socket.h>
> #include <stdio.h>
> #include <string.h>
> #include <netinet/udp.h>
> #include <netinet/in.h>
> #include <netinet/ip6.h>
> #include <arpa/inet.h>
> #include <stdlib.h>
> 
> int main(void)
> {
> 	struct sockaddr_in6 src = {};
> 	int sk;
> 
> 	sk = socket(AF_INET6, SOCK_RAW, IPPROTO_UDP);
> 	memset(&src, 0, sizeof(src));
> 	inet_pton(AF_INET6, "::1", &src);

That should be:
inet_pton(AF_INET6, "::1", &src.sin6_addr);
No wonder it does not work.


-- 
Rémi Denis-Courmont
http://www.remlab.net
http://fi.linkedin.com/in/remidenis


^ permalink raw reply

* Re: [RFC PATCH 0/1] vhost: Reduce TX used buffer signal for performance
From: Stefan Hajnoczi @ 2010-10-28  8:57 UTC (permalink / raw)
  To: Shirley Ma; +Cc: mst@redhat.com, David Miller, netdev, kvm, linux-kernel
In-Reply-To: <1288213557.17571.28.camel@localhost.localdomain>

On Wed, Oct 27, 2010 at 10:05 PM, Shirley Ma <mashirle@us.ibm.com> wrote:
> This patch changes vhost TX used buffer signal to guest from one by
> one to up to 3/4 of vring size. This change improves vhost TX message
> size from 256 to 8K performance for both bandwidth and CPU utilization
> without inducing any regression.

Any concerns about introducing latency or does the guest not care when
TX completions come in?

> Signed-off-by: Shirley Ma <xma@us.ibm.com>
> ---
>
>  drivers/vhost/net.c   |   19 ++++++++++++++++++-
>  drivers/vhost/vhost.c |   31 +++++++++++++++++++++++++++++++
>  drivers/vhost/vhost.h |    3 +++
>  3 files changed, 52 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 4b4da5b..bd1ba71 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -198,7 +198,24 @@ static void handle_tx(struct vhost_net *net)
>                if (err != len)
>                        pr_debug("Truncated TX packet: "
>                                 " len %d != %zd\n", err, len);
> -               vhost_add_used_and_signal(&net->dev, vq, head, 0);
> +               /*
> +                * if no pending buffer size allocate, signal used buffer
> +                * one by one, otherwise, signal used buffer when reaching
> +                * 3/4 ring size to reduce CPU utilization.
> +                */
> +               if (unlikely(vq->pend))
> +                       vhost_add_used_and_signal(&net->dev, vq, head, 0);
> +               else {
> +                       vq->pend[vq->num_pend].id = head;

I don't understand the logic here: if !vq->pend then we assign to
vq->pend[vq->num_pend].

> +                       vq->pend[vq->num_pend].len = 0;
> +                       ++vq->num_pend;
> +                       if (vq->num_pend == (vq->num - (vq->num >> 2))) {
> +                               vhost_add_used_and_signal_n(&net->dev, vq,
> +                                                           vq->pend,
> +                                                           vq->num_pend);
> +                               vq->num_pend = 0;
> +                       }
> +               }
>                total_len += len;
>                if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
>                        vhost_poll_queue(&vq->poll);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 94701ff..47696d2 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -170,6 +170,16 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>        vq->call_ctx = NULL;
>        vq->call = NULL;
>        vq->log_ctx = NULL;
> +       /* signal pending used buffers */
> +       if (vq->pend) {
> +               if (vq->num_pend != 0) {
> +                       vhost_add_used_and_signal_n(dev, vq, vq->pend,
> +                                                   vq->num_pend);
> +                       vq->num_pend = 0;
> +               }
> +               kfree(vq->pend);
> +       }
> +       vq->pend = NULL;
>  }
>
>  static int vhost_worker(void *data)
> @@ -273,7 +283,13 @@ long vhost_dev_init(struct vhost_dev *dev,
>                dev->vqs[i].heads = NULL;
>                dev->vqs[i].dev = dev;
>                mutex_init(&dev->vqs[i].mutex);
> +               dev->vqs[i].num_pend = 0;
> +               dev->vqs[i].pend = NULL;
>                vhost_vq_reset(dev, dev->vqs + i);
> +               /* signal 3/4 of ring size used buffers */
> +               dev->vqs[i].pend = kmalloc((dev->vqs[i].num -
> +                                          (dev->vqs[i].num >> 2)) *
> +                                          sizeof *vq->peed, GFP_KERNEL);

Has this patch been compile tested?  vq->peed?

Stefan

^ permalink raw reply

* Re: [RFC PATCH 0/1] vhost: Reduce TX used buffer signal for performance
From: Stefan Hajnoczi @ 2010-10-28  8:59 UTC (permalink / raw)
  To: Shirley Ma; +Cc: mst@redhat.com, David Miller, netdev, kvm, linux-kernel
In-Reply-To: <AANLkTinqG_G2pevoYJyiHiBda1ZZS-53Uqz9WEspbMOy@mail.gmail.com>

On Thu, Oct 28, 2010 at 9:57 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
Just read the patch 1/1 discussion and it looks like you're already on
it.  Sorry for the noise.

Stefan

^ permalink raw reply

* [PATCH] 8390: Don't oops on starting dev queue
From: Pavel Emelyanov @ 2010-10-28  9:01 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: David Miller, Linux Netdev List

The __NS8390_init tries to start the device queue before the
device is registered. This results in an oops (snipped):

[    2.865493] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[    2.866106] IP: [<ffffffffa000602a>] netif_start_queue+0xb/0x12 [8390]
[    2.881267] Call Trace:
[    2.881437]  [<ffffffffa000624d>] __NS8390_init+0x102/0x15a [8390]
[    2.881999]  [<ffffffffa00062ae>] NS8390_init+0x9/0xb [8390]
[    2.882237]  [<ffffffffa000d820>] ne2k_pci_init_one+0x297/0x354 [ne2k_pci]
[    2.882955]  [<ffffffff811c7a0e>] local_pci_probe+0x12/0x16
[    2.883308]  [<ffffffff811c85ad>] pci_device_probe+0xc3/0xef
[    2.884049]  [<ffffffff8129218d>] driver_probe_device+0xbe/0x14b
[    2.884937]  [<ffffffff81292260>] __driver_attach+0x46/0x62
[    2.885170]  [<ffffffff81291788>] bus_for_each_dev+0x49/0x78
[    2.885781]  [<ffffffff81291fbb>] driver_attach+0x1c/0x1e
[    2.886089]  [<ffffffff812912ab>] bus_add_driver+0xba/0x227
[    2.886330]  [<ffffffff8129259a>] driver_register+0x9e/0x115
[    2.886933]  [<ffffffff811c8815>] __pci_register_driver+0x50/0xac
[    2.887785]  [<ffffffffa001102c>] ne2k_pci_init+0x2c/0x2e [ne2k_pci]
[    2.888093]  [<ffffffff81000212>] do_one_initcall+0x7c/0x130
[    2.888693]  [<ffffffff8106d74f>] sys_init_module+0x99/0x1da
[    2.888946]  [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b

This happens because the netif_start_queue sets respective bit on the dev->_tx
array which is not yet allocated.

As far as I understand the code removing the netif_start_queue from __NS8390_init
is OK, since queue will be started later on device open. Plz, correct me if I'm wrong.

Found in the Dave's current tree, so he's in Cc.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

diff --git a/drivers/net/lib8390.c b/drivers/net/lib8390.c
index 316bb70..e7030ce 100644
--- a/drivers/net/lib8390.c
+++ b/drivers/net/lib8390.c
@@ -1077,7 +1077,6 @@ static void __NS8390_init(struct net_device *dev, int startp)
 	ei_outb_p(ei_local->rx_start_page, e8390_base + EN1_CURPAG);
 	ei_outb_p(E8390_NODMA+E8390_PAGE0+E8390_STOP, e8390_base+E8390_CMD);
 
-	netif_start_queue(dev);
 	ei_local->tx1 = ei_local->tx2 = 0;
 	ei_local->txing = 0;
 


^ permalink raw reply related

* Re: [PATCH] conntrack: allow nf_ct_alloc_hashtable() to get highmem pages
From: Patrick McHardy @ 2010-10-28 10:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Netfilter Development Mailinglist, stable
In-Reply-To: <1288170555.2709.63.camel@edumazet-laptop>

On 27.10.2010 11:09, Eric Dumazet wrote:
> commit ea781f197d6a8 (use SLAB_DESTROY_BY_RCU and get rid of call_rcu())
> did a mistake in __vmalloc() call in nf_ct_alloc_hashtable().
> 
> I forgot to add __GFP_HIGHMEM, so pages were taken from LOWMEM only.

Applied, thanks. I'll also pass this to -stable.

^ permalink raw reply

* Does ip does support labels for IPv6 addresses?
From: Heiko Gerstung @ 2010-10-28 10:41 UTC (permalink / raw)
  To: netdev

Hi !

Not sure if this is where I should ask, please feel free to send me to
someone else.

I just tried to use "ip" (from the iproute2 project) to assign multiple
IPv6 addresses to a physical network interface. In order to be able to
distinguish between them, I wanted to use the "label" feature as
described on the ip man page, but it seems that this is not supported
for IPv6 addresses.

Does anybody know if this is a (documentation-)bug, a feature or a
misconfiguration of ~/brain ?

Example:

# ip addr add dev eth0 2001::23/64 label eth0:0

at least adds the address correctly. But when I want to check with
# ip addr show
the label is not listed anywhere and using
# ip addr show label "eth0:0" does not work either (does not come up
with any output at all).

The description of the "label" option on the man page does not mention
that this is not supported on IPv6.

Best Regards,
 Heiko


-- 

Heiko Gerstung

*MEINBERG Funkuhren* GmbH & Co. KG
Lange Wand 9
D-31812 Bad Pyrmont, Germany
Phone: +49 (0)5281 9309-25
Fax: +49 (0)5281 9309-30
Amtsgericht Hannover 17HRA 100322
Geschäftsführer/Managing Directors: Günter Meinberg, Werner Meinberg,
Andre Hartmann, Heiko Gerstung
Email: heiko.gerstung@meinberg.de <mailto:heiko.gerstung@meinberg.de>
Web: www.meinberg.de <http://www.meinberg.de>

------------------------------------------------------------------------
*MEINBERG - Accurate Time Worldwide*


^ permalink raw reply

* [net-2.6 PATCH] ixgb: call pci_disable_device in ixgb_remove
From: Jeff Kirsher @ 2010-10-28 10:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Ben Hutchings, Emil Tantilov,
	Jeff Kirsher

From: Emil Tantilov <emil.s.tantilov@intel.com>

ixgb fails to work after reload on recent kernels:

rmmod ixgb (dev->current_state = PCI_UNKNOWN)
modprobe ixgb (pci_enable_device will bail leaving current_state to PCI_UNKNOWN)
ifup eth0
do_IRQ: 2.82 No irq handler for vector (irq -1)

The issue was exposed by commit fcd097f31a6ee207cc0c3da9cccd2a86d4334785
PCI: MSI: Remove unsafe and unnecessary hardware access

which avoids HW writes for power states != PCI_D0

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgb/ixgb_main.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
index 666207a..caa8192 100644
--- a/drivers/net/ixgb/ixgb_main.c
+++ b/drivers/net/ixgb/ixgb_main.c
@@ -533,6 +533,7 @@ ixgb_remove(struct pci_dev *pdev)
 	pci_release_regions(pdev);
 
 	free_netdev(netdev);
+	pci_disable_device(pdev);
 }
 
 /**


^ permalink raw reply related

* [net-2.6 PATCH] igbvf: fix panic on load
From: Jeff Kirsher @ 2010-10-28 10:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Emil Tantilov, Greg Rose, Jeff Kirsher

From: Emil Tantilov <emil.s.tantilov@intel.com>

Introduced by commit:e6484930d7c73d324bccda7d43d131088da697b9
net: allocate tx queues in register_netdevice

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Acked-by: Greg Rose <greg.v.rose@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igbvf/netdev.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
index ebfaa68..28af019 100644
--- a/drivers/net/igbvf/netdev.c
+++ b/drivers/net/igbvf/netdev.c
@@ -2783,15 +2783,15 @@ static int __devinit igbvf_probe(struct pci_dev *pdev,
 	/* reset the hardware with the new settings */
 	igbvf_reset(adapter);
 
-	/* tell the stack to leave us alone until igbvf_open() is called */
-	netif_carrier_off(netdev);
-	netif_stop_queue(netdev);
-
 	strcpy(netdev->name, "eth%d");
 	err = register_netdev(netdev);
 	if (err)
 		goto err_hw_init;
 
+	/* tell the stack to leave us alone until igbvf_open() is called */
+	netif_carrier_off(netdev);
+	netif_stop_queue(netdev);
+
 	igbvf_print_device_info(adapter);
 
 	igbvf_initialize_last_counter_stats(adapter);


^ permalink raw reply related

* [net-2.6 PATCH] e1000e: reset PHY after errors detected
From: Jeff Kirsher @ 2010-10-28 10:59 UTC (permalink / raw)
  To: davem
  Cc: netdev, gospo, bphilips, Jesse Brandeburg, Carolyn Wyborny,
	Jeff Kirsher

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

Some errors can be induced in the PHY via environmental testing
(specifically extreme temperature changes and electro static
discharge testing), and in the case of the PHY hanging due to
this input, this detects the problem and resets to continue.
This issue only applies to 82574 silicon.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/82571.c  |   38 ++++++++++++++++++++++++++++++++++++++
 drivers/net/e1000e/e1000.h  |    3 +++
 drivers/net/e1000e/netdev.c |   22 ++++++++++++++++++++++
 3 files changed, 63 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000e/82571.c b/drivers/net/e1000e/82571.c
index ca663f1..7236f1a 100644
--- a/drivers/net/e1000e/82571.c
+++ b/drivers/net/e1000e/82571.c
@@ -52,6 +52,10 @@
 			      (ID_LED_DEF1_DEF2))
 
 #define E1000_GCR_L1_ACT_WITHOUT_L0S_RX 0x08000000
+#define E1000_BASE1000T_STATUS          10
+#define E1000_IDLE_ERROR_COUNT_MASK     0xFF
+#define E1000_RECEIVE_ERROR_COUNTER     21
+#define E1000_RECEIVE_ERROR_MAX         0xFFFF
 
 #define E1000_NVM_INIT_CTRL2_MNGM 0x6000 /* Manageability Operation Mode mask */
 
@@ -1243,6 +1247,39 @@ static s32 e1000_led_on_82574(struct e1000_hw *hw)
 }
 
 /**
+ *  e1000_check_phy_82574 - check 82574 phy hung state
+ *  @hw: pointer to the HW structure
+ *
+ *  Returns whether phy is hung or not
+ **/
+bool e1000_check_phy_82574(struct e1000_hw *hw)
+{
+	u16 status_1kbt = 0;
+	u16 receive_errors = 0;
+	bool phy_hung = false;
+	s32 ret_val = 0;
+
+	/*
+	 * Read PHY Receive Error counter first, if its is max - all F's then
+	 * read the Base1000T status register If both are max then PHY is hung.
+	 */
+	ret_val = e1e_rphy(hw, E1000_RECEIVE_ERROR_COUNTER, &receive_errors);
+
+	if (ret_val)
+		goto out;
+	if (receive_errors == E1000_RECEIVE_ERROR_MAX)  {
+		ret_val = e1e_rphy(hw, E1000_BASE1000T_STATUS, &status_1kbt);
+		if (ret_val)
+			goto out;
+		if ((status_1kbt & E1000_IDLE_ERROR_COUNT_MASK) ==
+		    E1000_IDLE_ERROR_COUNT_MASK)
+			phy_hung = true;
+	}
+out:
+	return phy_hung;
+}
+
+/**
  *  e1000_setup_link_82571 - Setup flow control and link settings
  *  @hw: pointer to the HW structure
  *
@@ -1859,6 +1896,7 @@ struct e1000_info e1000_82574_info = {
 				  | FLAG_HAS_SMART_POWER_DOWN
 				  | FLAG_HAS_AMT
 				  | FLAG_HAS_CTRLEXT_ON_LOAD,
+	.flags2			  = FLAG2_CHECK_PHY_HANG,
 	.pba			= 36,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,
diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index cee882d..fdc67fe 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -397,6 +397,7 @@ struct e1000_adapter {
 	struct work_struct print_hang_task;
 
 	bool idle_check;
+	int phy_hang_count;
 };
 
 struct e1000_info {
@@ -454,6 +455,7 @@ struct e1000_info {
 #define FLAG2_HAS_EEE                     (1 << 5)
 #define FLAG2_DMA_BURST                   (1 << 6)
 #define FLAG2_DISABLE_AIM                 (1 << 8)
+#define FLAG2_CHECK_PHY_HANG              (1 << 9)
 
 #define E1000_RX_DESC_PS(R, i)	    \
 	(&(((union e1000_rx_desc_packet_split *)((R).desc))[i]))
@@ -631,6 +633,7 @@ extern s32 e1000_get_phy_info_ife(struct e1000_hw *hw);
 extern s32 e1000_check_polarity_ife(struct e1000_hw *hw);
 extern s32 e1000_phy_force_speed_duplex_ife(struct e1000_hw *hw);
 extern s32 e1000_check_polarity_igp(struct e1000_hw *hw);
+extern bool e1000_check_phy_82574(struct e1000_hw *hw);
 
 static inline s32 e1000_phy_hw_reset(struct e1000_hw *hw)
 {
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index ec8cf3f..35a9dd4 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4098,6 +4098,25 @@ static void e1000e_enable_receives(struct e1000_adapter *adapter)
 	}
 }
 
+static void e1000e_check_82574_phy_workaround(struct e1000_adapter *adapter)
+{
+	struct e1000_hw *hw = &adapter->hw;
+
+	/*
+	 * With 82574 controllers, PHY needs to be checked periodically
+	 * for hung state and reset, if two calls return true
+	 */
+	if (e1000_check_phy_82574(hw))
+		adapter->phy_hang_count++;
+	else
+		adapter->phy_hang_count = 0;
+
+	if (adapter->phy_hang_count > 1) {
+		adapter->phy_hang_count = 0;
+		schedule_work(&adapter->reset_task);
+	}
+}
+
 /**
  * e1000_watchdog - Timer Call-back
  * @data: pointer to adapter cast into an unsigned long
@@ -4333,6 +4352,9 @@ link_up:
 	if (e1000e_get_laa_state_82571(hw))
 		e1000e_rar_set(hw, adapter->hw.mac.addr, 0);
 
+	if (adapter->flags2 & FLAG2_CHECK_PHY_HANG)
+		e1000e_check_82574_phy_workaround(adapter);
+
 	/* Reset the timer */
 	if (!test_bit(__E1000_DOWN, &adapter->state))
 		mod_timer(&adapter->watchdog_timer,


^ permalink raw reply related

* [net-2.6 PATCH] e1000e: Add check for reset flags before displaying reset message
From: Jeff Kirsher @ 2010-10-28 10:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Carolyn Wyborny, Bruce Allan,
	Jeff Kirsher

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

Some parts need to execute resets during normal operation.  This flag
check ensures that those parts reset without needlessly alarming the
user.  Other unexpected resets by other parts will dump debug info
and message the reset action to the user, as originally intended.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/netdev.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 35a9dd4..c4ca162 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4882,8 +4882,11 @@ static void e1000_reset_task(struct work_struct *work)
 	struct e1000_adapter *adapter;
 	adapter = container_of(work, struct e1000_adapter, reset_task);
 
-	e1000e_dump(adapter);
-	e_err("Reset adapter\n");
+	if (!((adapter->flags & FLAG_RX_NEEDS_RESTART) &&
+	      (adapter->flags & FLAG_RX_RESTART_NOW))) {
+		e1000e_dump(adapter);
+		e_err("Reset adapter\n");
+	}
 	e1000e_reinit_locked(adapter);
 }
 


^ permalink raw reply related

* [net-2.6 PATCH] ixgbe: DCB, fix TX hang occurring in stress condition with PFC
From: Jeff Kirsher @ 2010-10-28 10:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, John Fastabend, Jeff Kirsher

From: John Fastabend <john.r.fastabend@intel.com>

The DCB credits refill quantum _must_ be greater than half the max
packet size. This is needed to guarantee that TX DMA operations
are not attempted during a pause state. Additionally, the min IFG
must be set correctly for DCB mode. If a DMA operation is
requested unexpectedly during the pause state the HW data
store may be corrupted leading to a DMA hang.  The DMA hang
requires a reset to correct. This fixes the HW configuration
to avoid this condition.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_dcb.c       |   39 +++++++++++++++++++++++++++++++----
 drivers/net/ixgbe/ixgbe_dcb.h       |    5 ++--
 drivers/net/ixgbe/ixgbe_dcb_82599.c |    5 ++++
 drivers/net/ixgbe/ixgbe_dcb_82599.h |    3 +++
 drivers/net/ixgbe/ixgbe_main.c      |   12 +++++++++--
 5 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_dcb.c b/drivers/net/ixgbe/ixgbe_dcb.c
index 8bb9ddb..0d44c64 100644
--- a/drivers/net/ixgbe/ixgbe_dcb.c
+++ b/drivers/net/ixgbe/ixgbe_dcb.c
@@ -43,9 +43,12 @@
  * ixgbe_dcb_check_config().
  */
 s32 ixgbe_dcb_calculate_tc_credits(struct ixgbe_dcb_config *dcb_config,
-                                   u8 direction)
+				   int max_frame, u8 direction)
 {
 	struct tc_bw_alloc *p;
+	int min_credit;
+	int min_multiplier;
+	int min_percent = 100;
 	s32 ret_val = 0;
 	/* Initialization values default for Tx settings */
 	u32 credit_refill       = 0;
@@ -59,6 +62,31 @@ s32 ixgbe_dcb_calculate_tc_credits(struct ixgbe_dcb_config *dcb_config,
 		goto out;
 	}
 
+	min_credit = ((max_frame / 2) + DCB_CREDIT_QUANTUM - 1) /
+			DCB_CREDIT_QUANTUM;
+
+	/* Find smallest link percentage */
+	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
+		p = &dcb_config->tc_config[i].path[direction];
+		bw_percent = dcb_config->bw_percentage[direction][p->bwg_id];
+		link_percentage = p->bwg_percent;
+
+		link_percentage = (link_percentage * bw_percent) / 100;
+
+		if (link_percentage && link_percentage < min_percent)
+			min_percent = link_percentage;
+	}
+
+	/*
+	 * The ratio between traffic classes will control the bandwidth
+	 * percentages seen on the wire. To calculate this ratio we use
+	 * a multiplier. It is required that the refill credits must be
+	 * larger than the max frame size so here we find the smallest
+	 * multiplier that will allow all bandwidth percentages to be
+	 * greater than the max frame size.
+	 */
+	min_multiplier = (min_credit / min_percent) + 1;
+
 	/* Find out the link percentage for each TC first */
 	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
 		p = &dcb_config->tc_config[i].path[direction];
@@ -73,8 +101,9 @@ s32 ixgbe_dcb_calculate_tc_credits(struct ixgbe_dcb_config *dcb_config,
 		/* Save link_percentage for reference */
 		p->link_percent = (u8)link_percentage;
 
-		/* Calculate credit refill and save it */
-		credit_refill = link_percentage * MINIMUM_CREDIT_REFILL;
+		/* Calculate credit refill ratio using multiplier */
+		credit_refill = min(link_percentage * min_multiplier,
+				    MAX_CREDIT_REFILL);
 		p->data_credits_refill = (u16)credit_refill;
 
 		/* Calculate maximum credit for the TC */
@@ -85,8 +114,8 @@ s32 ixgbe_dcb_calculate_tc_credits(struct ixgbe_dcb_config *dcb_config,
 		 * of a TC is too small, the maximum credit may not be
 		 * enough to send out a jumbo frame in data plane arbitration.
 		 */
-		if (credit_max && (credit_max < MINIMUM_CREDIT_FOR_JUMBO))
-			credit_max = MINIMUM_CREDIT_FOR_JUMBO;
+		if (credit_max && (credit_max < min_credit))
+			credit_max = min_credit;
 
 		if (direction == DCB_TX_CONFIG) {
 			/*
diff --git a/drivers/net/ixgbe/ixgbe_dcb.h b/drivers/net/ixgbe/ixgbe_dcb.h
index eb1059f..0208a87 100644
--- a/drivers/net/ixgbe/ixgbe_dcb.h
+++ b/drivers/net/ixgbe/ixgbe_dcb.h
@@ -150,15 +150,14 @@ struct ixgbe_dcb_config {
 /* DCB driver APIs */
 
 /* DCB credits calculation */
-s32 ixgbe_dcb_calculate_tc_credits(struct ixgbe_dcb_config *, u8);
+s32 ixgbe_dcb_calculate_tc_credits(struct ixgbe_dcb_config *, int, u8);
 
 /* DCB hw initialization */
 s32 ixgbe_dcb_hw_config(struct ixgbe_hw *, struct ixgbe_dcb_config *);
 
 /* DCB definitions for credit calculation */
+#define DCB_CREDIT_QUANTUM	64   /* DCB Quantum */
 #define MAX_CREDIT_REFILL       511  /* 0x1FF * 64B = 32704B */
-#define MINIMUM_CREDIT_REFILL   5    /* 5*64B = 320B */
-#define MINIMUM_CREDIT_FOR_JUMBO 145  /* 145= UpperBound((9*1024+54)/64B) for 9KB jumbo frame */
 #define DCB_MAX_TSO_SIZE        (32*1024) /* MAX TSO packet size supported in DCB mode */
 #define MINIMUM_CREDIT_FOR_TSO  (DCB_MAX_TSO_SIZE/64 + 1) /* 513 for 32KB TSO packet */
 #define MAX_CREDIT              4095 /* Maximum credit supported: 256KB * 1204 / 64B */
diff --git a/drivers/net/ixgbe/ixgbe_dcb_82599.c b/drivers/net/ixgbe/ixgbe_dcb_82599.c
index 67c219f..05f2247 100644
--- a/drivers/net/ixgbe/ixgbe_dcb_82599.c
+++ b/drivers/net/ixgbe/ixgbe_dcb_82599.c
@@ -397,6 +397,11 @@ static s32 ixgbe_dcb_config_82599(struct ixgbe_hw *hw)
 	reg &= ~IXGBE_RTTDCS_ARBDIS;
 	IXGBE_WRITE_REG(hw, IXGBE_RTTDCS, reg);
 
+	/* Enable Security TX Buffer IFG for DCB */
+	reg = IXGBE_READ_REG(hw, IXGBE_SECTXMINIFG);
+	reg |= IXGBE_SECTX_DCB;
+	IXGBE_WRITE_REG(hw, IXGBE_SECTXMINIFG, reg);
+
 	return 0;
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_dcb_82599.h b/drivers/net/ixgbe/ixgbe_dcb_82599.h
index 18d7fbf..3841649 100644
--- a/drivers/net/ixgbe/ixgbe_dcb_82599.h
+++ b/drivers/net/ixgbe/ixgbe_dcb_82599.h
@@ -95,6 +95,9 @@
 
 #define IXGBE_TXPBTHRESH_DCB    0xA        /* THRESH value for DCB mode */
 
+/* SECTXMINIFG DCB */
+#define IXGBE_SECTX_DCB		0x00001F00 /* DCB TX Buffer IFG */
+
 
 /* DCB hardware-specific driver APIs */
 
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index f856312..2bd3eb4 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -3347,6 +3347,7 @@ static void ixgbe_napi_disable_all(struct ixgbe_adapter *adapter)
 static void ixgbe_configure_dcb(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	int max_frame = adapter->netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
 	u32 txdctl;
 	int i, j;
 
@@ -3359,8 +3360,15 @@ static void ixgbe_configure_dcb(struct ixgbe_adapter *adapter)
 	if (hw->mac.type == ixgbe_mac_82598EB)
 		netif_set_gso_max_size(adapter->netdev, 32768);
 
-	ixgbe_dcb_calculate_tc_credits(&adapter->dcb_cfg, DCB_TX_CONFIG);
-	ixgbe_dcb_calculate_tc_credits(&adapter->dcb_cfg, DCB_RX_CONFIG);
+#ifdef CONFIG_FCOE
+	if (adapter->netdev->features & NETIF_F_FCOE_MTU)
+		max_frame = max(max_frame, IXGBE_FCOE_JUMBO_FRAME_SIZE);
+#endif
+
+	ixgbe_dcb_calculate_tc_credits(&adapter->dcb_cfg, max_frame,
+					DCB_TX_CONFIG);
+	ixgbe_dcb_calculate_tc_credits(&adapter->dcb_cfg, max_frame,
+					DCB_RX_CONFIG);
 
 	/* reconfigure the hardware */
 	ixgbe_dcb_hw_config(&adapter->hw, &adapter->dcb_cfg);


^ permalink raw reply related

* [PATCH] fib: Fix fib zone and its hash leak on namespace stop
From: Pavel Emelyanov @ 2010-10-28 12:00 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List

When we stop a namespace we flush the table and free one, but the
added fn_zone-s (and their hashes if grown) are leaked. Need to free.
Tries releases all its stuff in the flushing code.

Shame on us - this bug exists since the very first make-fib-per-net
patches in 2.6.27 :(

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index ba3666d..07bdb5e 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -158,6 +158,8 @@ extern int fib_table_flush(struct fib_table *table);
 extern void fib_table_select_default(struct fib_table *table,
 				     const struct flowi *flp,
 				     struct fib_result *res);
+extern void fib_free_table(struct fib_table *tb);
+
 
 
 #ifndef CONFIG_IP_MULTIPLE_TABLES
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 36e27c2..eb6f69a 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1052,7 +1052,7 @@ static void ip_fib_net_exit(struct net *net)
 		hlist_for_each_entry_safe(tb, node, tmp, head, tb_hlist) {
 			hlist_del(node);
 			fib_table_flush(tb);
-			kfree(tb);
+			fib_free_table(tb);
 		}
 	}
 	kfree(net->ipv4.fib_table_hash);
diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index b232375..b3acb04 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -716,6 +716,24 @@ int fib_table_flush(struct fib_table *tb)
 	return found;
 }
 
+void fib_free_table(struct fib_table *tb)
+{
+	struct fn_hash *table = (struct fn_hash *) tb->tb_data;
+	struct fn_zone *fz, *next;
+
+	next = table->fn_zone_list;
+	while (next != NULL) {
+		fz = next;
+		next = fz->fz_next;
+
+		if (fz->fz_hash != fz->fz_embedded_hash)
+			fz_hash_free(fz->fz_hash, fz->fz_divisor);
+
+		kfree(fz);
+	}
+
+	kfree(tb);
+}
 
 static inline int
 fn_hash_dump_bucket(struct sk_buff *skb, struct netlink_callback *cb,
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index b144508..200eb53 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1797,6 +1797,11 @@ int fib_table_flush(struct fib_table *tb)
 	return found;
 }
 
+void fib_free_table(struct fib_table *tb)
+{
+	kfree(tb);
+}
+
 void fib_table_select_default(struct fib_table *tb,
 			      const struct flowi *flp,
 			      struct fib_result *res)

^ permalink raw reply related

* [PATCH] ixgbe: delay rx_ring freeing
From: Eric Dumazet @ 2010-10-28 12:02 UTC (permalink / raw)
  To: Tantilov, Emil S
  Cc: David Miller, Waskiewicz Jr, Peter P, Kirsher, Jeffrey T,
	Brattain, Ross B, netdev
In-Reply-To: <1288239858.2658.72.camel@edumazet-laptop>

Le jeudi 28 octobre 2010 à 06:24 +0200, Eric Dumazet a écrit :
> Le mercredi 27 octobre 2010 à 16:35 -0600, Tantilov, Emil S a écrit :

> > Eric,
> > 
> > We are seeing intermittent hangs on ia32 arch which seem to point to this patch:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 00000040
> >  IP: [<f7f6b537>] ixgbe_get_stats64+0x47/0x120 [ixgbe]
> >  *pdpt = 000000002dc83001 *pde = 000000032d7e5067
> >  Oops: 0000 [#2] SMP
> >  last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
> >  Modules linked in: act_skbedit cls_u32 sch_multiq ixgbe mdio netconsole configfs autofs4 8021q garp stp llc sunrpc ipv6 e
> > xt3 jbd dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath power_meter hwmon sg ses enclosure dcdbas pcspkr serio_raw iTCO_wdt iTCO_vendor_support io
> > atdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mod [last unloaded: mperf
> > ]
> > 
> >  Pid: 1939, comm: irqbalance Tainted: G      D W   2.6.36-rc7-upstream-net-next-2.6-ixgbe-queue-i386-g55e1a84 #1 09CGW2/Po
> > werEdge T610
> >  EIP: 0060:[<f7f6b537>] EFLAGS: 00010206 CPU: 0
> >  EIP is at ixgbe_get_stats64+0x47/0x120 [ixgbe]
> >  EAX: 00000000 EBX: ecc45e4c ECX: ebea0400 EDX: 00000000
> >  ESI: ebea0000 EDI: 00000018 EBP: f7f846a0 ESP: ecc45d88
> >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> >  Process irqbalance (pid: 1939, ti=ecc44000 task=efc63a50 task.ti=ecc44000)
> >  Stack:
> >  ecc45e4c 00000000 ebea0400 ebea0000 ecc45e4c ebea0000 ecc45f04 f7f846a0
> >  <0> c0750593 ebea0000 edfec340 ebea0000 000002b5 c075063a edfec340 c0993eec
> >  <0> ebe78000 0000021c 00000000 00000009 00000000 00000000 00000000 00000000
> >  Call Trace:
> >  [<c0750593>] ? dev_get_stats+0x33/0xc0
> >  [<c075063a>] ? dev_seq_printf_stats+0x1a/0x180
> >  [<c07507aa>] ? dev_seq_show+0xa/0x20
> >  [<c052398f>] ? seq_read+0x22f/0x3d0
> >  [<c0523760>] ? seq_read+0x0/0x3d0
> >  [<c054fdde>] ? proc_reg_read+0x5e/0x90
> >  [<c054fd80>] ? proc_reg_read+0x0/0x90
> >  [<c050a1dd>] ? vfs_read+0x9d/0x160
> >  [<c049d4ef>] ? audit_syscall_entry+0x20f/0x230
> >  [<c050a971>] ? sys_read+0x41/0x70
> >  [<c0409cdf>] ? sysenter_do_call+0x12/0x28
> >  Code: 60 4f 7e c8 8b 44 24 08 8b b8 20 06 00 00 85 ff 7e 63 c7 44 24 04 00 00 00 00 66 90 8b 54 24 04 8b 4c 24 08 8b 84 9
> > 1 00 05 00 00 <8b> 50 40 eb 06 8d 74 26 00 89 ca f6 c2 01 0f 85 ae 00 00 00 8b
> >  EIP: [<f7f6b537>] ixgbe_get_stats64+0x47/0x120 [ixgbe] SS:ESP 0068:ecc45d88
> >  CR2: 0000000000000040
> >  ---[ end trace 51ea89f4e57f54f1 ]---
> > 
> > Emil

I believe I now understand the problem, please try following patch.

Thanks



[PATCH] ixgbe: delay rx_ring freeing

"cat /proc/net/dev" uses RCU protection only.

Its quite possible we call a driver get_stats() method while device is
dismantling and freeing its data structures.

So get_stats() methods must be very careful not accessing driver private
data without appropriate locking.

In ixgbe case, we access rx_ring pointers. These pointers are freed in
ixgbe_clear_interrupt_scheme() and set to NULL, this can trigger NULL
dereference in ixgbe_get_stats64()

A possible fix is to use RCU locking in ixgbe_get_stats64() and defer
rx_ring freeing after a grace period in ixgbe_clear_interrupt_scheme()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Tantilov, Emil S <emil.s.tantilov@intel.com>
---
 drivers/net/ixgbe/ixgbe.h      |    1 
 drivers/net/ixgbe/ixgbe_main.c |   34 +++++++++++++++++++++----------
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index ed8703c..018e143 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -192,6 +192,7 @@ struct ixgbe_ring {
 
 	unsigned int size;		/* length in bytes */
 	dma_addr_t dma;			/* phys. address of descriptor ring */
+	struct rcu_head rcu;
 } ____cacheline_internodealigned_in_smp;
 
 enum ixgbe_ring_f_enum {
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index f856312..1b16292 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -4742,6 +4742,11 @@ err_set_interrupt:
 	return err;
 }
 
+static void ring_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct ixgbe_ring, rcu));
+}
+
 /**
  * ixgbe_clear_interrupt_scheme - Clear the current interrupt scheme settings
  * @adapter: board private structure to clear interrupt scheme on
@@ -4758,7 +4763,12 @@ void ixgbe_clear_interrupt_scheme(struct ixgbe_adapter *adapter)
 		adapter->tx_ring[i] = NULL;
 	}
 	for (i = 0; i < adapter->num_rx_queues; i++) {
-		kfree(adapter->rx_ring[i]);
+		struct ixgbe_ring *ring = adapter->rx_ring[i];
+		
+		/* ixgbe_get_stats64() might access this ring, we must wait
+		 * a grace period before freeing it.
+		 */
+		call_rcu(&ring->rcu, ring_free_rcu);
 		adapter->rx_ring[i] = NULL;
 	}
 
@@ -6551,20 +6561,23 @@ static struct rtnl_link_stats64 *ixgbe_get_stats64(struct net_device *netdev,
 
 	/* accurate rx/tx bytes/packets stats */
 	dev_txq_stats_fold(netdev, stats);
+	rcu_read_lock();
 	for (i = 0; i < adapter->num_rx_queues; i++) {
-		struct ixgbe_ring *ring = adapter->rx_ring[i];
+		struct ixgbe_ring *ring = ACCESS_ONCE(adapter->rx_ring[i]);
 		u64 bytes, packets;
 		unsigned int start;
 
-		do {
-			start = u64_stats_fetch_begin_bh(&ring->syncp);
-			packets = ring->stats.packets;
-			bytes   = ring->stats.bytes;
-		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
-		stats->rx_packets += packets;
-		stats->rx_bytes   += bytes;
+		if (ring) {
+			do {
+				start = u64_stats_fetch_begin_bh(&ring->syncp);
+				packets = ring->stats.packets;
+				bytes   = ring->stats.bytes;
+			} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
+			stats->rx_packets += packets;
+			stats->rx_bytes   += bytes;
+		}
 	}
-
+	rcu_read_unlock();
 	/* following stats updated by ixgbe_watchdog_task() */
 	stats->multicast	= netdev->stats.multicast;
 	stats->rx_errors	= netdev->stats.rx_errors;
@@ -7270,6 +7283,7 @@ static void __exit ixgbe_exit_module(void)
 	dca_unregister_notify(&dca_notifier);
 #endif
 	pci_unregister_driver(&ixgbe_driver);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
 #ifdef CONFIG_IXGBE_DCA



^ permalink raw reply related

* [PATCH] Fix missing dependency of PCH_GBE driver
From: Steven Rostedt @ 2010-10-28 12:16 UTC (permalink / raw)
  To: LKML, netdev
  Cc: David S. Miller, Dan Carpenter, Masayuki Ohtake, Andrew Morton

While doing some randconfigs I stumbled across this build error:

ERROR: "mii_ethtool_gset" [drivers/net/pch_gbe/pch_gbe.ko] undefined!
ERROR: "generic_mii_ioctl" [drivers/net/pch_gbe/pch_gbe.ko] undefined!
ERROR: "mii_nway_restart" [drivers/net/pch_gbe/pch_gbe.ko] undefined!
ERROR: "mii_check_gmii_support" [drivers/net/pch_gbe/pch_gbe.ko] undefined!
ERROR: "mii_link_ok" [drivers/net/pch_gbe/pch_gbe.ko] undefined!
ERROR: "mii_ethtool_sset" [drivers/net/pch_gbe/pch_gbe.ko] undefined!

The config option PCH_GBE is missing the dependency of MII.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 9334539..cf9769c 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2540,7 +2540,7 @@ source "drivers/net/stmmac/Kconfig"
 
 config PCH_GBE
 	tristate "PCH Gigabit Ethernet"
-	depends on PCI
+	depends on PCI && MII
 	---help---
 	  This is a gigabit ethernet driver for Topcliff PCH.
 	  Topcliff PCH is the platform controller hub that is used in Intel's



^ permalink raw reply related

* Re: [PATCH] ixgbe: delay rx_ring freeing
From: Jeff Kirsher @ 2010-10-28 12:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tantilov, Emil S, David Miller, Waskiewicz Jr, Peter P,
	Brattain, Ross B, netdev
In-Reply-To: <1288267354.2649.369.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 3956 bytes --]

On Thu, 2010-10-28 at 05:02 -0700, Eric Dumazet wrote:
> Le jeudi 28 octobre 2010 à 06:24 +0200, Eric Dumazet a écrit :
> > Le mercredi 27 octobre 2010 à 16:35 -0600, Tantilov, Emil S a écrit :
> 
> > > Eric,
> > > 
> > > We are seeing intermittent hangs on ia32 arch which seem to point to this patch:
> > > 
> > > BUG: unable to handle kernel NULL pointer dereference at 00000040
> > >  IP: [<f7f6b537>] ixgbe_get_stats64+0x47/0x120 [ixgbe]
> > >  *pdpt = 000000002dc83001 *pde = 000000032d7e5067
> > >  Oops: 0000 [#2] SMP
> > >  last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
> > >  Modules linked in: act_skbedit cls_u32 sch_multiq ixgbe mdio netconsole configfs autofs4 8021q garp stp llc sunrpc ipv6 e
> > > xt3 jbd dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath power_meter hwmon sg ses enclosure dcdbas pcspkr serio_raw iTCO_wdt iTCO_vendor_support io
> > > atdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mod [last unloaded: mperf
> > > ]
> > > 
> > >  Pid: 1939, comm: irqbalance Tainted: G      D W   2.6.36-rc7-upstream-net-next-2.6-ixgbe-queue-i386-g55e1a84 #1 09CGW2/Po
> > > werEdge T610
> > >  EIP: 0060:[<f7f6b537>] EFLAGS: 00010206 CPU: 0
> > >  EIP is at ixgbe_get_stats64+0x47/0x120 [ixgbe]
> > >  EAX: 00000000 EBX: ecc45e4c ECX: ebea0400 EDX: 00000000
> > >  ESI: ebea0000 EDI: 00000018 EBP: f7f846a0 ESP: ecc45d88
> > >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > >  Process irqbalance (pid: 1939, ti=ecc44000 task=efc63a50 task.ti=ecc44000)
> > >  Stack:
> > >  ecc45e4c 00000000 ebea0400 ebea0000 ecc45e4c ebea0000 ecc45f04 f7f846a0
> > >  <0> c0750593 ebea0000 edfec340 ebea0000 000002b5 c075063a edfec340 c0993eec
> > >  <0> ebe78000 0000021c 00000000 00000009 00000000 00000000 00000000 00000000
> > >  Call Trace:
> > >  [<c0750593>] ? dev_get_stats+0x33/0xc0
> > >  [<c075063a>] ? dev_seq_printf_stats+0x1a/0x180
> > >  [<c07507aa>] ? dev_seq_show+0xa/0x20
> > >  [<c052398f>] ? seq_read+0x22f/0x3d0
> > >  [<c0523760>] ? seq_read+0x0/0x3d0
> > >  [<c054fdde>] ? proc_reg_read+0x5e/0x90
> > >  [<c054fd80>] ? proc_reg_read+0x0/0x90
> > >  [<c050a1dd>] ? vfs_read+0x9d/0x160
> > >  [<c049d4ef>] ? audit_syscall_entry+0x20f/0x230
> > >  [<c050a971>] ? sys_read+0x41/0x70
> > >  [<c0409cdf>] ? sysenter_do_call+0x12/0x28
> > >  Code: 60 4f 7e c8 8b 44 24 08 8b b8 20 06 00 00 85 ff 7e 63 c7 44 24 04 00 00 00 00 66 90 8b 54 24 04 8b 4c 24 08 8b 84 9
> > > 1 00 05 00 00 <8b> 50 40 eb 06 8d 74 26 00 89 ca f6 c2 01 0f 85 ae 00 00 00 8b
> > >  EIP: [<f7f6b537>] ixgbe_get_stats64+0x47/0x120 [ixgbe] SS:ESP 0068:ecc45d88
> > >  CR2: 0000000000000040
> > >  ---[ end trace 51ea89f4e57f54f1 ]---
> > > 
> > > Emil
> 
> I believe I now understand the problem, please try following patch.
> 
> Thanks
> 
> 
> 
> [PATCH] ixgbe: delay rx_ring freeing
> 
> "cat /proc/net/dev" uses RCU protection only.
> 
> Its quite possible we call a driver get_stats() method while device is
> dismantling and freeing its data structures.
> 
> So get_stats() methods must be very careful not accessing driver private
> data without appropriate locking.
> 
> In ixgbe case, we access rx_ring pointers. These pointers are freed in
> ixgbe_clear_interrupt_scheme() and set to NULL, this can trigger NULL
> dereference in ixgbe_get_stats64()
> 
> A possible fix is to use RCU locking in ixgbe_get_stats64() and defer
> rx_ring freeing after a grace period in ixgbe_clear_interrupt_scheme()
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Reported-by: Tantilov, Emil S <emil.s.tantilov@intel.com>
> ---
>  drivers/net/ixgbe/ixgbe.h      |    1 
>  drivers/net/ixgbe/ixgbe_main.c |   34 +++++++++++++++++++++----------
>  2 files changed, 25 insertions(+), 10 deletions(-)
> 

Thanks Eric, I have added this patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH] 8390: Don't oops on starting dev queue
From: Pavel Emelyanov @ 2010-10-28 12:22 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: David Miller, Linux Netdev List, Tom Herbert
In-Reply-To: <4CC93BCE.4020303@parallels.com>

On 10/28/2010 01:01 PM, Pavel Emelyanov wrote:
> The __NS8390_init tries to start the device queue before the
> device is registered. This results in an oops (snipped):

I've found the commit, that broke the driver.
It was Tom's e6484930 (net: allocate tx queues in register_netdevice)

^ permalink raw reply

* Re: [PATCH 1/4] tipc: Fix bugs in tipc_msg_calc_data_size()
From: Neil Horman @ 2010-10-28 14:07 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: davem, netdev, allan.stephens, drosenberg, jon.maloy, torvalds,
	security
In-Reply-To: <1288207773-25448-2-git-send-email-paul.gortmaker@windriver.com>

On Wed, Oct 27, 2010 at 03:29:30PM -0400, Paul Gortmaker wrote:
> From: Allan Stephens <Allan.Stephens@windriver.com>
> 
> Enhances TIPC's computation of the amount of data to be sent so that
> it works properly when large values are involved. Calculations are now
> done using "size_t" instead of "int", and a check has been added to
> handle cases where the total amount of data exceeds the range of "size_t".
> 
> Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
> ---
>  net/tipc/msg.c |   17 ++++++++++++-----
>  net/tipc/msg.h |    2 +-
>  2 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/net/tipc/msg.c b/net/tipc/msg.c
> index ecb532f..38360a9 100644
> --- a/net/tipc/msg.c
> +++ b/net/tipc/msg.c
> @@ -72,15 +72,22 @@ void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type,
>  
>  /**
>   * tipc_msg_calc_data_size - determine total data size for message
> + *
> + * Note: If total exceeds range of size_t returns largest possible value
>   */
>  
> -int tipc_msg_calc_data_size(struct iovec const *msg_sect, u32 num_sect)
> +size_t tipc_msg_calc_data_size(struct iovec const *msg_sect, size_t num_sect)
>  {
> -	int dsz = 0;
> -	int i;
> +	size_t dsz = 0;
> +	size_t len;
> +	size_t i;
>  
> -	for (i = 0; i < num_sect; i++)
> -		dsz += msg_sect[i].iov_len;
> +	for (i = 0; i < num_sect; i++) {
> +		len = msg_sect[i].iov_len;
> +		if (len > (size_t)LONG_MAX - dsz)
> +			return (size_t)LONG_MAX;
> +		dsz += len;
> +	}
>  	return dsz;
>  }
>  
> diff --git a/net/tipc/msg.h b/net/tipc/msg.h
> index 031aad1..a132800 100644
> --- a/net/tipc/msg.h
> +++ b/net/tipc/msg.h
> @@ -711,7 +711,7 @@ static inline void msg_set_dataoctet(struct tipc_msg *m, u32 pos)
>  u32 tipc_msg_tot_importance(struct tipc_msg *m);
>  void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type,
>  			    u32 hsize, u32 destnode);
> -int tipc_msg_calc_data_size(struct iovec const *msg_sect, u32 num_sect);
> +size_t tipc_msg_calc_data_size(struct iovec const *msg_sect, size_t num_sect);
>  int tipc_msg_build(struct tipc_msg *hdr,
>  			    struct iovec const *msg_sect, u32 num_sect,
>  			    int max_size, int usrmem, struct sk_buff** buf);
> -- 
> 1.7.1.GIT
> 
You should probably roll this patch together with your second one.  The caller
tipc_msg_build will otherwise throw a warning when you build, as it assigns the
return value with this patch (a size_t) to a signed integer.  It probably won't
matter since you limit dsz to TIPC_MAX_USER_MSG_SIZE which is small in
comparison to the size of an int, but nevertheless, the two patches are related,
so you can merge them.

Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH 2/4] tipc: Fix bugs in tipc_msg_build()
From: Neil Horman @ 2010-10-28 14:19 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: davem, netdev, allan.stephens, drosenberg, jon.maloy, torvalds,
	security
In-Reply-To: <1288207773-25448-3-git-send-email-paul.gortmaker@windriver.com>

On Wed, Oct 27, 2010 at 03:29:31PM -0400, Paul Gortmaker wrote:
> From: Allan Stephens <Allan.Stephens@windriver.com>
> 
> Enhances TIPC's creation of message buffers so that it works properly
> when large amounts of data are involved. Calculations are now done
> using "size_t" where needed, and comparisons no longer mix signed and
> unsigned size values.
> 
> Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
> ---
>  net/tipc/msg.c |   25 +++++++++++++++++--------
>  net/tipc/msg.h |    4 ++--
>  2 files changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/net/tipc/msg.c b/net/tipc/msg.c
> index 38360a9..b67e831 100644
> --- a/net/tipc/msg.c
> +++ b/net/tipc/msg.c
> @@ -96,27 +96,36 @@ size_t tipc_msg_calc_data_size(struct iovec const *msg_sect, size_t num_sect)
>   *
>   * Note: Caller must not hold any locks in case copy_from_user() is interrupted!
>   *
> - * Returns message data size or errno
> + * If successful, creates message buffer and returns message data size
> + * (which must be <= TIPC_MAX_USER_MSG_SIZE).
> + * If fails only because message data size exceeds the specified maximum
> + * returns message data size, but doesn't created a message buffer.
> + * If fails for any other reason returns errno and doesn't create a buffer.
>   */
>  
>  int tipc_msg_build(struct tipc_msg *hdr,
> -			    struct iovec const *msg_sect, u32 num_sect,
> -			    int max_size, int usrmem, struct sk_buff** buf)
> +		   struct iovec const *msg_sect, size_t num_sect,
> +		   u32 max_size, int usrmem, struct sk_buff **buf)
>  {
> -	int dsz, sz, hsz, pos, res, cnt;
> +	size_t dsz;
> +	u32 hsz;
> +	u32 sz;
> +	size_t pos;
> +	size_t cnt;
> +	int res;
>  
>  	dsz = tipc_msg_calc_data_size(msg_sect, num_sect);
> -	if (unlikely(dsz > TIPC_MAX_USER_MSG_SIZE)) {
> +	if (unlikely(dsz > (size_t)TIPC_MAX_USER_MSG_SIZE)) {
>  		*buf = NULL;
>  		return -EINVAL;
>  	}
>  
>  	pos = hsz = msg_hdr_sz(hdr);
> -	sz = hsz + dsz;
> +	sz = hsz + (u32)dsz;
>  	msg_set_size(hdr, sz);
>  	if (unlikely(sz > max_size)) {
>  		*buf = NULL;
> -		return dsz;
> +		return (int)dsz;
Don't you want to return -ETOOBIG here, or something simmilar?  All the call
chains that I see for tipc_msg_build check for negative return codes and check
those against specific values.  Why return some random too-big-value?

>  	}
>  
>  	*buf = tipc_buf_acquire(sz);
> @@ -135,7 +144,7 @@ int tipc_msg_build(struct tipc_msg *hdr,
>  		pos += msg_sect[cnt].iov_len;
>  	}
>  	if (likely(res))
> -		return dsz;
> +		return (int)dsz;
>  
Ditto the above

>  	buf_discard(*buf);
>  	*buf = NULL;
> diff --git a/net/tipc/msg.h b/net/tipc/msg.h
> index a132800..41fb532 100644
> --- a/net/tipc/msg.h
> +++ b/net/tipc/msg.h
> @@ -713,8 +713,8 @@ void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type,
>  			    u32 hsize, u32 destnode);
>  size_t tipc_msg_calc_data_size(struct iovec const *msg_sect, size_t num_sect);
>  int tipc_msg_build(struct tipc_msg *hdr,
> -			    struct iovec const *msg_sect, u32 num_sect,
> -			    int max_size, int usrmem, struct sk_buff** buf);
> +		   struct iovec const *msg_sect, size_t num_sect,
> +		   u32 max_size, int usrmem, struct sk_buff **buf);
>  
>  static inline void msg_set_media_addr(struct tipc_msg *m, struct tipc_media_addr *a)
>  {

I count 3 other callers of tipc_msg_calc_data_size (tipc_send,
tipc_forward2name, tipc_forward2port).  Where are the updates to those functions
to make the corresponding size_t adjustments
Neil

> -- 
> 1.7.1.GIT
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH] via-velocity: Codestyle fixes
From: Bob Beers @ 2010-10-28 14:34 UTC (permalink / raw)
  To: alex; +Cc: netdev, linux-kernel
In-Reply-To: <1288242011-32257-1-git-send-email-alex@wigen.net>

On Thu, Oct 28, 2010 at 1:00 AM,  <alex@wigen.net> wrote:
> From: Alexander Wigen <alex@wigen.net>

>  static void mac_wol_reset(struct mac_regs __iomem *regs)
> @@ -342,7 +342,7 @@ VELOCITY_PARAM(ValPktLen, "Receiving or Drop invalid 802.3 frame");
>    1: Wake up if link status is on/off.
>    2: Wake up if recevied an arp packet.
>    4: Wake up if recevied any unicast packet.
> -   Those value can be sumed up to support more than one option.
> +   Those value can be summed up to support more than one option.
>  */
>  VELOCITY_PARAM(wol_opts, "Wake On Lan options");
>

my $.02 ...

s/recevied/receive/ x2

s/These values can be summed/Those value can be summed up/

-Bob Beers

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox