Netdev List
 help / color / mirror / Atom feed
* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Chris Wright @ 2010-05-13 21:08 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Chris Wright, Scott Feldman, davem, netdev, arnd
In-Reply-To: <4BEC65BC.5040208@trash.net>

* Patrick McHardy (kaber@trash.net) wrote:
> Chris Wright wrote:
> > * Patrick McHardy (kaber@trash.net) wrote:
> >>> +	} else  {
> >>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
> >> What does -1 mean?
> > 
> > It means no VFs.  Could be made a macro/enum constant
> 
> Why call rtnl_vg_port_fill_nest at all in that case? It even
> calls the ndo_get_vf_port() callback.

For the case where port profile is set on net dev that does not
have VFs (e.g. the enic case in 2/2).

thanks,
-chris

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Patrick McHardy @ 2010-05-13 21:11 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev, arnd
In-Reply-To: <20100513210828.GD30483@x200.localdomain>

Chris Wright wrote:
> * Patrick McHardy (kaber@trash.net) wrote:
>> Chris Wright wrote:
>>> * Patrick McHardy (kaber@trash.net) wrote:
>>>>> +	} else  {
>>>>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
>>>> What does -1 mean?
>>> It means no VFs.  Could be made a macro/enum constant
>> Why call rtnl_vg_port_fill_nest at all in that case? It even
>> calls the ndo_get_vf_port() callback.
> 
> For the case where port profile is set on net dev that does not
> have VFs (e.g. the enic case in 2/2).

Thanks for the explanation. I guess a enum constant would be nice
to have. But the bigger problem is the asymetrical message
parsing/construction.

BTW:

> +enum {
> +	VF_PORT_REQUEST_PREASSOCIATE = 0,
> +	VF_PORT_REQUEST_PREASSOCIATE_RR,
> +	VF_PORT_REQUEST_ASSOCIATE,
> +	VF_PORT_REQUEST_DISASSOCIATE,
> +};

Do multiple of these commands have to be issued in order to
reach "associated" state? That also wouldn't fit into the
rtnetlink design, which contains state, not commands.

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Chris Wright @ 2010-05-13 21:18 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Chris Wright, Scott Feldman, davem, netdev, arnd
In-Reply-To: <4BEC6B19.1040808@trash.net>

* Patrick McHardy (kaber@trash.net) wrote:
> Chris Wright wrote:
> > * Patrick McHardy (kaber@trash.net) wrote:
> >> Chris Wright wrote:
> >>> * Patrick McHardy (kaber@trash.net) wrote:
> >>>>> +	} else  {
> >>>>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
> >>>> What does -1 mean?
> >>> It means no VFs.  Could be made a macro/enum constant
> >> Why call rtnl_vg_port_fill_nest at all in that case? It even
> >> calls the ndo_get_vf_port() callback.
> > 
> > For the case where port profile is set on net dev that does not
> > have VFs (e.g. the enic case in 2/2).
> 
> Thanks for the explanation. I guess a enum constant would be nice
> to have. But the bigger problem is the asymetrical message
> parsing/construction.

Yeah, what would you like to do there?  I think we have to keep the
existing, just break out symmtetic set/get?

> BTW:
> 
> > +enum {
> > +	VF_PORT_REQUEST_PREASSOCIATE = 0,
> > +	VF_PORT_REQUEST_PREASSOCIATE_RR,
> > +	VF_PORT_REQUEST_ASSOCIATE,
> > +	VF_PORT_REQUEST_DISASSOCIATE,
> > +};
> 
> Do multiple of these commands have to be issued in order to
> reach "associated" state? That also wouldn't fit into the
> rtnetlink design, which contains state, not commands.

It's optional.  At the very least, you need 1 associate/disassociate for
each logical link up/down.

For VM migration or (perhaps failover modes) you can optionally issue a
preassociate.  Preassociate has 2 flavors.  One which is purely advisory,
another which will reserve resources on the switch.  These all reprsent
state transitions in the switch, but only associate should allow final
logical link up and traffic to flow.

thanks,
-chris


^ permalink raw reply

* RE: [Pv-drivers] [PATCH 14/20] drivers/net/vmxnet3: Use kzalloc
From: Bhavesh Davda @ 2010-05-13 21:20 UTC (permalink / raw)
  To: Julia Lawall, Shreyas Bhatewara, VMware, Inc.,
	netdev@vger.kernel.org, "linux-kernel@vger
In-Reply-To: <Pine.LNX.4.64.1005132204040.6282@ask.diku.dk>

Looks good. Thanks for doing this!

Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>

- Bhavesh
 
Bhavesh P. Davda

> -----Original Message-----
> From: pv-drivers-bounces@vmware.com [mailto:pv-drivers-
> bounces@vmware.com] On Behalf Of Julia Lawall
> Sent: Thursday, May 13, 2010 1:06 PM
> To: Shreyas Bhatewara; VMware, Inc.; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; kernel-janitors@vger.kernel.org
> Subject: [Pv-drivers] [PATCH 14/20] drivers/net/vmxnet3: Use kzalloc
> 
> From: Julia Lawall <julia@diku.dk>
> 
> Use kzalloc rather than the combination of kmalloc and memset.
> 
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
> 
> // <smpl>
> @@
> expression x,size,flags;
> statement S;
> @@
> 
> -x = kmalloc(size,flags);
> +x = kzalloc(size,flags);
>  if (x == NULL) S
> -memset(x, 0, size);
> // </smpl>
> 
> Signed-off-by: Julia Lawall <julia@diku.dk>
> 
> ---
>  drivers/net/vmxnet3/vmxnet3_drv.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff -u -p a/drivers/net/vmxnet3/vmxnet3_drv.c
> b/drivers/net/vmxnet3/vmxnet3_drv.c
> --- a/drivers/net/vmxnet3/vmxnet3_drv.c
> +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> @@ -1369,13 +1369,12 @@ vmxnet3_rq_create(struct vmxnet3_rx_queu
> 
>  	sz = sizeof(struct vmxnet3_rx_buf_info) * (rq->rx_ring[0].size +
>  						   rq->rx_ring[1].size);
> -	bi = kmalloc(sz, GFP_KERNEL);
> +	bi = kzalloc(sz, GFP_KERNEL);
>  	if (!bi) {
>  		printk(KERN_ERR "%s: failed to allocate rx bufinfo\n",
>  		       adapter->netdev->name);
>  		goto err;
>  	}
> -	memset(bi, 0, sz);
>  	rq->buf_info[0] = bi;
>  	rq->buf_info[1] = bi + rq->rx_ring[0].size;
> 
> _______________________________________________
> Pv-drivers mailing list
> Pv-drivers@vmware.com
> http://mailman2.vmware.com/mailman/listinfo/pv-drivers

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Patrick McHardy @ 2010-05-13 21:23 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev, arnd
In-Reply-To: <20100513211847.GE30483@x200.localdomain>

Chris Wright wrote:
> * Patrick McHardy (kaber@trash.net) wrote:
>> Chris Wright wrote:
>>> * Patrick McHardy (kaber@trash.net) wrote:
>>>> Chris Wright wrote:
>>>>> * Patrick McHardy (kaber@trash.net) wrote:
>>>>>>> +	} else  {
>>>>>>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
>>>>>> What does -1 mean?
>>>>> It means no VFs.  Could be made a macro/enum constant
>>>> Why call rtnl_vg_port_fill_nest at all in that case? It even
>>>> calls the ndo_get_vf_port() callback.
>>> For the case where port profile is set on net dev that does not
>>> have VFs (e.g. the enic case in 2/2).
>> Thanks for the explanation. I guess a enum constant would be nice
>> to have. But the bigger problem is the asymetrical message
>> parsing/construction.
> 
> Yeah, what would you like to do there?  I think we have to keep the
> existing, just break out symmtetic set/get?

Sure, that would be fine. I'll have a closer look at the exact
message layout tommorrow, its getting late here.

>> BTW:
>>
>>> +enum {
>>> +	VF_PORT_REQUEST_PREASSOCIATE = 0,
>>> +	VF_PORT_REQUEST_PREASSOCIATE_RR,
>>> +	VF_PORT_REQUEST_ASSOCIATE,
>>> +	VF_PORT_REQUEST_DISASSOCIATE,
>>> +};
>> Do multiple of these commands have to be issued in order to
>> reach "associated" state? That also wouldn't fit into the
>> rtnetlink design, which contains state, not commands.
> 
> It's optional.  At the very least, you need 1 associate/disassociate for
> each logical link up/down.
> 
> For VM migration or (perhaps failover modes) you can optionally issue a
> preassociate.  Preassociate has 2 flavors.  One which is purely advisory,
> another which will reserve resources on the switch.  These all reprsent
> state transitions in the switch, but only associate should allow final
> logical link up and traffic to flow.

I see, thanks. That seems fine.

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Scott Feldman @ 2010-05-13 21:30 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: davem, netdev, chrisw, arnd
In-Reply-To: <4BEC63DB.2090306@trash.net>

On 5/13/10 1:40 PM, "Patrick McHardy" <kaber@trash.net> wrote:

> Scott Feldman wrote:
>> +struct ifla_vf_port_vsi {
>> + __u8 vsi_mgr_id;
>> + __u8 vsi_type_id[3];
>> + __u8 vsi_type_version;
>> + __u8 pad[3];
>> +};
> 
> Where is this actually used? The only use I could find is in the
> size calculation.

This is provisioned for VDP use.  The enic implementation (patch 2/2)
doesn't use these members.
 
> Please keep the style used in that file consistent and align arguments
> to the beginning of the opening '('.

I'll fix.
 
>> +{
>> + struct nlattr *data;
>> + int err;
>> +
>> + data = nla_nest_start(skb, IFLA_VF_PORT);
>> + if (!data)
>> +  return -EMSGSIZE;
>> +
>> + if (vf >= 0)
>> +  nla_put_u32(skb, IFLA_VF_PORT_VF, vf);
>> +
>> + err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
>> + if (err == -EMSGSIZE) {
>> +  nla_nest_cancel(skb, data);
>> +  return -EMSGSIZE;
>> + } else if (err) {
>> +  nla_nest_cancel(skb, data);
>> +  return 0;
> 
> Why is the error not returned in this case?

I was think the netdev could fail the call if the operation wasn't
supported.  I better choice would be to not set the netdev->op in the first
place.  Let me fix this.

>> +  if (err)
>> +   goto nla_put_failure;
>> + }
>> +
>> + return 0;
>> +
>> +nla_put_failure:
>> + return -EMSGSIZE;
>> +}
>> +
>>  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>>    int type, u32 pid, u32 seq, u32 change,
>>    unsigned int flags)
>> @@ -747,17 +825,23 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct
>> net_device *dev,
>> goto nla_put_failure;
>> copy_rtnl_link_stats64(nla_data(attr), stats);
>>  
>> + if (dev->dev.parent)
>> +  NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
> 
> Should this attribute really be included even if the number is zero?

The previous code would write zero also.  I moved it out of the
get_vf_config check so it could be used for get_vf_port as well.
 
> This is oddly indented, please align .len to .type as in the
> existing attributes.

I'll fix, but bumping into 80 char issues...
 
>> + [IFLA_VF_PORT_VSI_TYPE]  = { .type = NLA_BINARY,
>> +    .len = sizeof(struct ifla_vf_port_vsi)},
>> + [IFLA_VF_PORT_INSTANCE_UUID] = { .type = NLA_BINARY,
>> +    .len = VF_PORT_UUID_MAX },
>> + [IFLA_VF_PORT_HOST_UUID] = { .type = NLA_STRING,
>> +    .len = VF_PORT_UUID_MAX },
>> + [IFLA_VF_PORT_REQUEST]  = { .type = NLA_U8, },
>> + [IFLA_VF_PORT_RESPONSE]  = { .type = NLA_U16, },
>> +};
>> +
>>  struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
>>  {
>> struct net *net;
>> @@ -1028,6 +1127,27 @@ static int do_setlink(struct net_device *dev, struct
>> ifinfomsg *ifm,
>> }
>> err = 0;
>>  
>> + if (tb[IFLA_VF_PORT]) {
>> +  struct nlattr *vf_port[IFLA_VF_PORT_MAX+1];
>> +  int vf = -1;
>> +
>> +  err = nla_parse_nested(vf_port, IFLA_VF_PORT_MAX,
>> +   tb[IFLA_VF_PORT], ifla_vf_port_policy);
>> +  if (err < 0)
>> +   goto errout;
>> +
>> +  if (vf_port[IFLA_VF_PORT_VF])
>> +   vf = nla_get_u32(vf_port[IFLA_VF_PORT_VF]);
>> +  err = -EOPNOTSUPP;
>> +  if (ops->ndo_set_vf_port)
>> +   err = ops->ndo_set_vf_port(dev, vf, vf_port);
> 
> This appears to be addressing a single VF to issue commands.
> I already explained this during the last set of VF patches,
> messages are supposed to by symetrical, since you're dumping
> state for all existing VFs, you also need to accept configuration
> for multiple VFs. Basically, the kernel must be able to receive
> a message it created during a dump and fully recreate the state.

This was modeled same as existing IFLA_VF_ cmd where single VF is addressed
on set, but all VFs for PF are dumped on get.


^ permalink raw reply

* Re: [PATCH 0/9] mm: generic adaptive large memory allocation APIs
From: Andreas Dilger @ 2010-05-13 21:56 UTC (permalink / raw)
  To: James Bottomley
  Cc: Changli Gao, Andrew Morton, Hoang-Nam Nguyen, Christoph Raisch,
	Roland Dreier, Sean Hefty, Hal Rosenstock, Divy Le Ray,
	Theodore Ts'o, Alexander Viro, Paul Menage, Li Zefan,
	linux-rdma, linux-kernel@vger.kernel.org Mailinglist, netdev,
	linux-scsi, linux-ext4@vger.kernel.org development, linux-fsdevel,
	linux-mm, containers
In-Reply-To: <1273763055.4353.136.camel@mulgrave.site>

On 2010-05-13, at 09:04, James Bottomley wrote:
> This isn't necessarily true ... most drivers and filesystems have to
> know what type they're getting.  Often they have to do extra tricks to
> process vmalloc areas.  Conversely, large kmalloc areas are a very
> precious commodity: if a driver or filesystem can handle vmalloc for
> large allocations, it should: it's easier for us to expand the vmalloc
> area than to try to make page reclaim keep large contiguous areas ... I
> notice your proposed API does the exact opposite of this ... tries
> kmalloc first and then does vmalloc.
> 
> Given this policy problem, isn't it easier simply to hand craft the
> vmalloc fall back to kmalloc (or vice versa) in the driver than add this
> whole massive raft of APIs for it?

I know we wouldn't mind using large vmalloc allocations for e.g. per-group arrays in ext4 (allocated once per mount), but I'd always understood that using vmalloc for general purpose uses can have a significant impact because the vmalloc() engine has (had?) serious performance problems.  That means it is better performance-wise to have a wrapper function like this to switch between kmalloc() and vmalloc() based on the allocation size, but it makes the code ugly.  Having the wrapper in the kernel would at least identify the different places that are using this kind of workaround.

If the performance of vmalloc() has been improved in the last few years, then I'd be happy to just use vmalloc() all the time.  That said, vmalloc still isn't suitable for sub-page allocations, so if you have a variable-sized allocation that may be very small or very large the small allocations will waste a whole page and a wrapper is still needed, or vmalloc should be changed to call kmalloc/kfree for the sub-page allocations.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next] ixgbevf: Enable GRO by default
From: Jeff Kirsher @ 2010-05-13 22:16 UTC (permalink / raw)
  To: Shirley Ma; +Cc: e1000-devel, netdev, davem, kvm
In-Reply-To: <1273780274.30943.6.camel@localhost.localdomain>

On Thu, May 13, 2010 at 12:51, Shirley Ma <mashirle@us.ibm.com> wrote:
> Enable GRO by default for performance.
>
> Signed-off-by: Shirley Ma <xma@us.ibm.com>
> ---
>
>  drivers/net/ixgbevf/ixgbevf_main.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/ixgbevf/ixgbevf_main.c b/drivers/net/ixgbevf/ixgbevf_main.c
> index 40f47b8..1bbb05e 100644
> --- a/drivers/net/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ixgbevf/ixgbevf_main.c
> @@ -3415,6 +3415,7 @@ static int __devinit ixgbevf_probe(struct pci_dev *pdev,
>        netdev->features |= NETIF_F_IPV6_CSUM;
>        netdev->features |= NETIF_F_TSO;
>        netdev->features |= NETIF_F_TSO6;
> +       netdev->features |= NETIF_F_GRO;
>        netdev->vlan_features |= NETIF_F_TSO;
>        netdev->vlan_features |= NETIF_F_TSO6;
>        netdev->vlan_features |= NETIF_F_IP_CSUM;
>
>

Thanks, I have added the patch to my queue.

-- 
Cheers,
Jeff

------------------------------------------------------------------------------

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: mmotm 2010-05-11 - dies in pm_qos_update_request()
From: Rafael J. Wysocki @ 2010-05-13 22:24 UTC (permalink / raw)
  To: Valdis.Kletnieks, Mark Gross
  Cc: e1000-devel, netdev, Andrew Morton, David S. Miller, linux-kernel
In-Reply-To: <4793.1273761278@localhost>

On Thursday 13 May 2010, Valdis.Kletnieks@vt.edu wrote:
> On Wed, 12 May 2010 23:07:20 +0200, "Rafael J. Wysocki" said:
> > On Wednesday 12 May 2010, Valdis.Kletnieks@vt.edu wrote:
> > > On Tue, 11 May 2010 18:21:22 PDT, akpm@linux-foundation.org said:
> > > > The mm-of-the-moment snapshot 2010-05-11-18-20 has been uploaded to
> > > > 
> > > >    http://userweb.kernel.org/~akpm/mmotm/
> > > 
> > > Dell Latitude E6500, x86_64 kernel.
> > > 
> > > Died a horrid death at boot in the e1000e driver.  Seems to be
> > > something in linux-next.patch. Didn't get a netconsole trace for obvious
> > > reasons...
> > > 
> > > Copied-by-hand traceback:
> > > pm_qos_update_request()+0x22
> > > e1000_configure+0x478
> > > e1000_open_device+0xee
> > > ? _raw_notifier_call_chain+0xf
> > > __dev_open+0xec
> > > dev_open+0x1b
> > > netpoll_setup+0x28b
> > > init_netconsole+0xbc
> > > 
> > > I suspect this commit:
> > > 
> > > commit 23606cf5d1192c2b17912cb2ef6e62f9b11de133
> > > Author: Rafael J. Wysocki <rjw@sisk.pl>
> > > Date:   Sun Mar 14 14:35:17 2010 +0000
> > > 
> > >     e1000e / PCI / PM: Add basic runtime PM support (rev. 4)
> > 
> > No, I don't think so.  I'm running -rc6 with this patch applied on a box with
> > e1000e and it works just fine.
> > 
> > Please try to revert this one instead:
> > 
> > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=patch;h=ed77134bfccf5e75b6cbadab268e559dbe6a4ebb
> 
> Confirming - reverting that patch and doing the build fixup results in a
> kernel that doesn't blow up in the e1000e driver...

Then I guess there's an initializations problem somewhere.

Mark, any chance to look into that any time soon?  If we don't resolve this
before the merge window opens, I'm afraid I'll have to revert that commit
from my tree.

Rafael

------------------------------------------------------------------------------

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: mmotm 2010-05-11 - dies in pm_qos_update_request()
From: Rafael J. Wysocki @ 2010-05-13 22:32 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Mark Gross, e1000-devel, netdev, linux-kernel, Andrew Morton,
	David S. Miller
In-Reply-To: <4793.1273761278@localhost>

On Thursday 13 May 2010, Valdis.Kletnieks@vt.edu wrote:
> On Wed, 12 May 2010 23:07:20 +0200, "Rafael J. Wysocki" said:
> > On Wednesday 12 May 2010, Valdis.Kletnieks@vt.edu wrote:
> > > On Tue, 11 May 2010 18:21:22 PDT, akpm@linux-foundation.org said:
> > > > The mm-of-the-moment snapshot 2010-05-11-18-20 has been uploaded to
> > > > 
> > > >    http://userweb.kernel.org/~akpm/mmotm/
> > > 
> > > Dell Latitude E6500, x86_64 kernel.
> > > 
> > > Died a horrid death at boot in the e1000e driver.  Seems to be
> > > something in linux-next.patch. Didn't get a netconsole trace for obvious
> > > reasons...
> > > 
> > > Copied-by-hand traceback:
> > > pm_qos_update_request()+0x22
> > > e1000_configure+0x478
> > > e1000_open_device+0xee
> > > ? _raw_notifier_call_chain+0xf
> > > __dev_open+0xec
> > > dev_open+0x1b
> > > netpoll_setup+0x28b
> > > init_netconsole+0xbc
> > > 
> > > I suspect this commit:
> > > 
> > > commit 23606cf5d1192c2b17912cb2ef6e62f9b11de133
> > > Author: Rafael J. Wysocki <rjw@sisk.pl>
> > > Date:   Sun Mar 14 14:35:17 2010 +0000
> > > 
> > >     e1000e / PCI / PM: Add basic runtime PM support (rev. 4)
> > 
> > No, I don't think so.  I'm running -rc6 with this patch applied on a box with
> > e1000e and it works just fine.
> > 
> > Please try to revert this one instead:
> > 
> > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=patch;h=ed77134bfccf5e75b6cbadab268e559dbe6a4ebb
> 
> Confirming - reverting that patch and doing the build fixup results in a
> kernel that doesn't blow up in the e1000e driver...

Can you try the tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git linux-next

without the other linux-next changes, please?

I'm not seeing the problem you're reporting on my test box with e1000e.

Rafael

------------------------------------------------------------------------------

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* [PATCH 6/6] sky2: version 1.28
From: Stephen Hemminger @ 2010-05-13 16:12 UTC (permalink / raw)
  To: David Miller, Mike McCormack; +Cc: netdev
In-Reply-To: <20100513161247.833356588@vyatta.com>

[-- Attachment #1: sky2-1.28.patch --]
[-- Type: text/plain, Size: 401 bytes --]

Version 1.28

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/sky2.c	2010-05-13 09:11:26.947584759 -0700
+++ b/drivers/net/sky2.c	2010-05-13 09:11:35.827269492 -0700
@@ -53,7 +53,7 @@
 #include "sky2.h"
 
 #define DRV_NAME		"sky2"
-#define DRV_VERSION		"1.27"
+#define DRV_VERSION		"1.28"
 
 /*
  * The Yukon II chipset takes 64 bit command blocks (called list elements)



^ permalink raw reply

* [PATCH 4/6] sky2: Refactor down/up code out of sky2_restart()
From: Stephen Hemminger @ 2010-05-13 16:12 UTC (permalink / raw)
  To: David Miller, Mike McCormack; +Cc: netdev
In-Reply-To: <20100513161247.833356588@vyatta.com>

[-- Attachment #1: sky2-mike4.patch --]
[-- Type: text/plain, Size: 1894 bytes --]

From: Mike McCormack <mikem@ring3k.org>

Code to bring down all sky2 interfaces and bring it up
again can be reused in sky2_suspend and sky2_resume.

Factor the code to bring the interfaces down into
sky2_all_down and the up code into sky2_all_up.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>

---
Not a regression.

 drivers/net/sky2.c |   26 +++++++++++++++++++-------
 1 files changed, 19 insertions(+), 7 deletions(-)

--- a/drivers/net/sky2.c	2010-05-13 08:57:32.736962641 -0700
+++ b/drivers/net/sky2.c	2010-05-13 08:57:33.337275609 -0700
@@ -3312,15 +3312,11 @@ static int sky2_reattach(struct net_devi
 	return err;
 }
 
-static void sky2_restart(struct work_struct *work)
+static void sky2_all_down(struct sky2_hw *hw)
 {
-	struct sky2_hw *hw = container_of(work, struct sky2_hw, restart_work);
-	u32 imask;
 	int i;
 
-	rtnl_lock();
-
-	imask = sky2_read32(hw, B0_IMSK);
+	sky2_read32(hw, B0_IMSK);
 	sky2_write32(hw, B0_IMSK, 0);
 	synchronize_irq(hw->pdev->irq);
 	napi_disable(&hw->napi);
@@ -3336,8 +3332,12 @@ static void sky2_restart(struct work_str
 		netif_tx_disable(dev);
 		sky2_hw_down(sky2);
 	}
+}
 
-	sky2_reset(hw);
+static void sky2_all_up(struct sky2_hw *hw)
+{
+	u32 imask = Y2_IS_BASE;
+	int i;
 
 	for (i = 0; i < hw->ports; i++) {
 		struct net_device *dev = hw->dev[i];
@@ -3348,6 +3348,7 @@ static void sky2_restart(struct work_str
 
 		sky2_hw_up(sky2);
 		sky2_set_multicast(dev);
+		imask |= portirq_msk[i];
 		netif_wake_queue(dev);
 	}
 
@@ -3356,6 +3357,17 @@ static void sky2_restart(struct work_str
 
 	sky2_read32(hw, B0_Y2_SP_LISR);
 	napi_enable(&hw->napi);
+}
+
+static void sky2_restart(struct work_struct *work)
+{
+	struct sky2_hw *hw = container_of(work, struct sky2_hw, restart_work);
+
+	rtnl_lock();
+
+	sky2_all_down(hw);
+	sky2_reset(hw);
+	sky2_all_up(hw);
 
 	rtnl_unlock();
 }



^ permalink raw reply

* [PATCH 2/6] sky2: Avoid race in sky2_change_mtu
From: Stephen Hemminger @ 2010-05-13 16:12 UTC (permalink / raw)
  To: David Miller, Mike McCormack; +Cc: netdev
In-Reply-To: <20100513161247.833356588@vyatta.com>

[-- Attachment #1: sky2-mike1.patch --]
[-- Type: text/plain, Size: 888 bytes --]

From: Mike McCormack <mikem@ring3k.org>

netif_stop_queue does not ensure all in-progress transmits are complete,
 so use netif_tx_disable() instead.

Secondly, make sure NAPI polls are disabled before stopping the tx queue,
 otherwise sky2_status_intr might trigger a TX queue wakeup between when
 we stop the queue and NAPI is disabled.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>

---
This is not a regression, so only apply to -next


--- a/drivers/net/sky2.c	2010-05-13 08:57:20.186332415 -0700
+++ b/drivers/net/sky2.c	2010-05-13 08:57:22.526983099 -0700
@@ -2275,8 +2275,8 @@ static int sky2_change_mtu(struct net_de
 	sky2_write32(hw, B0_IMSK, 0);
 
 	dev->trans_start = jiffies;	/* prevent tx timeout */
-	netif_stop_queue(dev);
 	napi_disable(&hw->napi);
+	netif_tx_disable(dev);
 
 	synchronize_irq(hw->pdev->irq);
 



^ permalink raw reply

* [PATCH 3/6] sky2: Shut off interrupts before NAPI
From: Stephen Hemminger @ 2010-05-13 16:12 UTC (permalink / raw)
  To: David Miller, Mike McCormack; +Cc: netdev
In-Reply-To: <20100513161247.833356588@vyatta.com>

[-- Attachment #1: sky2-mike3.patch --]
[-- Type: text/plain, Size: 749 bytes --]

From: Mike McCormack <mikem@ring3k.org>

Interrupts should be masked, then synchronized, and
finally NAPI should be disabled.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>

---
Not a regression, only apply to -next


--- a/drivers/net/sky2.c	2010-05-13 08:57:31.127627401 -0700
+++ b/drivers/net/sky2.c	2010-05-13 08:57:32.736962641 -0700
@@ -3320,10 +3320,10 @@ static void sky2_restart(struct work_str
 
 	rtnl_lock();
 
-	napi_disable(&hw->napi);
-	synchronize_irq(hw->pdev->irq);
 	imask = sky2_read32(hw, B0_IMSK);
 	sky2_write32(hw, B0_IMSK, 0);
+	synchronize_irq(hw->pdev->irq);
+	napi_disable(&hw->napi);
 
 	for (i = 0; i < hw->ports; i++) {
 		struct net_device *dev = hw->dev[i];



^ permalink raw reply

* [PATCH 5/6] sky2: Avoid allocating memory in sky2_resume
From: Stephen Hemminger @ 2010-05-13 16:12 UTC (permalink / raw)
  To: David Miller, Mike McCormack; +Cc: netdev
In-Reply-To: <20100513161247.833356588@vyatta.com>

[-- Attachment #1: sky2-mike5.patch --]
[-- Type: text/plain, Size: 1988 bytes --]

From: Mike McCormack <mikem@ring3k.org>

Allocating memory can fail, and since we have the memory we need
in sky2_resume when sky2_suspend is called, just stop the hardware
without freeing the memory it's using.

This avoids the possibility of failing because we can't allocate
memory in sky2_resume(), and allows sharing code with sky2_restart().

Signed-off-by: Mike McCormack <mikem@ring3k.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>

---
Not a regression

 drivers/net/sky2.c |   20 +++++---------------
 1 files changed, 5 insertions(+), 15 deletions(-)

--- a/drivers/net/sky2.c	2010-05-13 08:57:33.337275609 -0700
+++ b/drivers/net/sky2.c	2010-05-13 08:57:33.907302370 -0700
@@ -4926,12 +4926,12 @@ static int sky2_suspend(struct pci_dev *
 	cancel_work_sync(&hw->restart_work);
 
 	rtnl_lock();
+
+	sky2_all_down(hw);
 	for (i = 0; i < hw->ports; i++) {
 		struct net_device *dev = hw->dev[i];
 		struct sky2_port *sky2 = netdev_priv(dev);
 
-		sky2_detach(dev);
-
 		if (sky2->wol)
 			sky2_wol_init(sky2);
 
@@ -4940,8 +4940,6 @@ static int sky2_suspend(struct pci_dev *
 
 	device_set_wakeup_enable(&pdev->dev, wol != 0);
 
-	sky2_write32(hw, B0_IMSK, 0);
-	napi_disable(&hw->napi);
 	sky2_power_aux(hw);
 	rtnl_unlock();
 
@@ -4956,12 +4954,11 @@ static int sky2_suspend(struct pci_dev *
 static int sky2_resume(struct pci_dev *pdev)
 {
 	struct sky2_hw *hw = pci_get_drvdata(pdev);
-	int i, err;
+	int err;
 
 	if (!hw)
 		return 0;
 
-	rtnl_lock();
 	err = pci_set_power_state(pdev, PCI_D0);
 	if (err)
 		goto out;
@@ -4979,20 +4976,13 @@ static int sky2_resume(struct pci_dev *p
 		goto out;
 	}
 
+	rtnl_lock();
 	sky2_reset(hw);
-	sky2_write32(hw, B0_IMSK, Y2_IS_BASE);
-	napi_enable(&hw->napi);
-
-	for (i = 0; i < hw->ports; i++) {
-		err = sky2_reattach(hw->dev[i]);
-		if (err)
-			goto out;
-	}
+	sky2_all_up(hw);
 	rtnl_unlock();
 
 	return 0;
 out:
-	rtnl_unlock();
 
 	dev_err(&pdev->dev, "resume failed (%d)\n", err);
 	pci_disable_device(pdev);



^ permalink raw reply

* [PATCH 0/6] sky2: update
From: Stephen Hemminger @ 2010-05-13 16:12 UTC (permalink / raw)
  To: David Miller, Mike McCormack; +Cc: netdev

Bunch of patches from Mike, with some additional comments.



^ permalink raw reply

* [PATCH 1/6] sky2: Restore multicast after restart
From: Stephen Hemminger @ 2010-05-13 16:12 UTC (permalink / raw)
  To: David Miller, Mike McCormack; +Cc: netdev
In-Reply-To: <20100513161247.833356588@vyatta.com>

[-- Attachment #1: sky2-mike2.patch --]
[-- Type: text/plain, Size: 744 bytes --]

From: Mike McCormack <mikem@ring3k.org>

Multicast settings will be lost on reset, so restore them.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
---
This regression was introduced in 2.6.34 by
  commit 8a0c9228f110218f443d9ef8f9ab629251959733
  sky2: Avoid down and up during sky2_reset

So please apply to -net as well.


 drivers/net/sky2.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

--- a/drivers/net/sky2.c	2010-05-13 09:02:37.756960274 -0700
+++ b/drivers/net/sky2.c	2010-05-13 09:02:53.528209351 -0700
@@ -3347,6 +3347,7 @@ static void sky2_restart(struct work_str
 			continue;
 
 		sky2_hw_up(sky2);
+		sky2_set_multicast(dev);
 		netif_wake_queue(dev);
 	}
 



^ permalink raw reply

* RE: [PATCH 2.6.34-rc6] net: Improve ks8851 snl transmit performance
From: Arce, Abraham @ 2010-05-13 23:39 UTC (permalink / raw)
  To: Ha, Tristram, Ben Dooks
  Cc: David Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jan, Sebastien
In-Reply-To: <14385191E87B904DBD836449AA30269D021A66@MORGANITE.micrel.com>

Tristram,

> The latest nuttcp default size for UDP is 1500 bytes, rather than 8192 bytes.
> In my case, the transmit performance improves from 10 Mbps to 11.  Have you
> tried TCP?
> 

Not yet... one point to highlight before:

  - SPI controller clock rate is 24 MHz, unable to set ~40 MHz, 

Now testing in 2.6.34 rc7 using now TCP, nuttcp version 6.1.2:

Before the patch

  # /testsuites/ethernet/bin/nuttcp -i -Ri50m 10.87.231.229
      1.1460 MB /   1.00 sec =    9.6134 Mbps
      1.1858 MB /   1.00 sec =    9.9473 Mbps
      1.2258 MB /   1.00 sec =   10.2828 Mbps
      1.1996 MB /   1.00 sec =   10.0628 Mbps
      1.2203 MB /   1.00 sec =   10.2365 Mbps
      1.2258 MB /   1.00 sec =   10.2828 Mbps
      1.2134 MB /   1.00 sec =   10.1786 Mbps
      1.2235 MB /   1.00 sec =   10.2636 Mbps
      1.2134 MB /   1.00 sec =   10.1785 Mbps
      1.2120 MB /   1.00 sec =   10.1670 Mbps

     12.6250 MB /  10.46 sec =   10.1240 Mbps 2 %TX 0 %RX 0 retrans 7.91 msRTT

  # /testsuites/ethernet/bin/nuttcp 10.87.231.229
     12.9319 MB /  10.58 sec =   10.2553 Mbps 1 %TX 0 %RX 0 retrans 7.90 msRTT

After the patch

  # /testsuites/ethernet/bin/nuttcp -i -Ri50m 10.87.231.229
      1.1671 MB /   1.00 sec =    9.7902 Mbps
      1.2169 MB /   1.00 sec =   10.2077 Mbps
      1.2175 MB /   1.00 sec =   10.2134 Mbps
      1.2396 MB /   1.00 sec =   10.3986 Mbps
      1.2396 MB /   1.00 sec =   10.3987 Mbps
      1.2387 MB /   1.00 sec =   10.3910 Mbps
      1.2410 MB /   1.00 sec =   10.4102 Mbps
      1.2203 MB /   1.00 sec =   10.2365 Mbps
      1.2382 MB /   1.00 sec =   10.3871 Mbps
      1.2369 MB /   1.00 sec =   10.3755 Mbps

     12.8125 MB /  10.45 sec =   10.2820 Mbps 2 %TX 0 %RX 0 retrans 7.90 msRTT

  # /testsuites/ethernet/bin/nuttcp 10.87.231.229
     13.0808 MB /  10.64 sec =   10.3123 Mbps 1 %TX 0 %RX 0 retrans 7.90 msRTT

Best Regards
Abraham

^ permalink raw reply

* Re: [PATCH 2/2] ioat2,3: convert to producer/consumer locking
From: Dan Williams @ 2010-05-13 23:42 UTC (permalink / raw)
  To: David Howells
  Cc: linux-kernel, linux-raid, netdev, Paul E. McKenney,
	Maciej Sosnowski
In-Reply-To: <31229.1273653365@redhat.com>

On Wed, May 12, 2010 at 1:36 AM, David Howells <dhowells@redhat.com> wrote:
>
> Out of interest, does it make the code smaller if you mark
> ioat2_get_ring_ent() and ioat2_ring_mask() with __attribute_const__?
>
> I'm not sure whether it'll affect how long gcc is willing to cache these, but
> once computed, I would guess they won't change within the calling function.

Unfortunately, it does not make a difference, but I'll keep this in
mind if ioat2_get_ring_ent() ever gets more complicated (which it
might in the future).

> Also, is the device you're driving watching the ring and its indices?  If so,
> does it modify the indices?  If that is the case, you might need to use
> read_barrier_depends() rather than smp_read_barrier_depends().

The device does not observe the indices directly.  Instead we
increment a free running 'count' register by the distance between
ioat->pending and ioat->head.

>
>> +             prefetch(ioat2_get_ring_ent(ioat, idx + i + 1));
>> +             desc = ioat2_get_ring_ent(ioat, idx + i);
>>               dump_desc_dbg(ioat, desc);
>>               tx = &desc->txd;
>>               if (tx->cookie) {
>
> Is this right, I wonder?  You're prefetching [i+1] before reading [i]?  Doesn't
> this mean that you might have to wait for [i+1] to be retrieved from RAM before
> [i] can be read?  Should you instead read tx->cookie before issuing the
> prefetch?  Admittedly, this is only likely to affect the reading of the head of
> the queue - subsequent reads in the same loop will, of course, have been
> prefetched.

Yes, it should be the other way around.

Thanks!

--
Dan

^ permalink raw reply

* [PATCH] netfilter: Remove skb_is_nonlinear check from nf_conntrack_sip
From: Jason Gunthorpe @ 2010-05-14  0:38 UTC (permalink / raw)
  To: netfilter-devel, netdev

At least the XEN net front driver always produces non linear skbs,
so the SIP module does nothing at all when used with that NIC.

Copy the hacky technique for accessing SKB data from the ftp conntrack,
better than nothing..

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 net/netfilter/nf_conntrack_sip.c |   21 ++++++++++++++-------
 1 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 4b57216..37dd7a4 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -30,6 +30,10 @@ MODULE_DESCRIPTION("SIP connection tracking helper");
 MODULE_ALIAS("ip_conntrack_sip");
 MODULE_ALIAS_NFCT_HELPER("sip");
 
+/* This is slow, but it's simple. --RR */
+static char *sip_buffer;
+static DEFINE_SPINLOCK(nf_sip_lock);
+
 #define MAX_PORTS	8
 static unsigned short ports[MAX_PORTS];
 static unsigned int ports_c;
@@ -1275,17 +1279,14 @@ static int sip_help(struct sk_buff *skb,
 
 	nf_ct_refresh(ct, skb, sip_timeout * HZ);
 
-	if (!skb_is_nonlinear(skb))
-		dptr = skb->data + dataoff;
-	else {
-		pr_debug("Copy of skbuff not supported yet.\n");
-		return NF_ACCEPT;
-	}
-
 	datalen = skb->len - dataoff;
 	if (datalen < strlen("SIP/2.0 200"))
 		return NF_ACCEPT;
 
+	spin_lock_bh(&nf_sip_lock);
+	dptr = skb_header_pointer(skb, dataoff, datalen, sip_buffer);
+	BUG_ON(dptr == NULL);
+
 	if (strnicmp(dptr, "SIP/2.0 ", strlen("SIP/2.0 ")) != 0)
 		ret = process_sip_request(skb, &dptr, &datalen);
 	else
@@ -1297,6 +1298,7 @@ static int sip_help(struct sk_buff *skb,
 			ret = NF_DROP;
 	}
 
+	spin_unlock_bh(&nf_sip_lock);
 	return ret;
 }
 
@@ -1329,6 +1331,7 @@ static void nf_conntrack_sip_fini(void)
 			nf_conntrack_helper_unregister(&sip[i][j]);
 		}
 	}
+	kfree(sip_buffer);
 }
 
 static int __init nf_conntrack_sip_init(void)
@@ -1336,6 +1339,10 @@ static int __init nf_conntrack_sip_init(void)
 	int i, j, ret;
 	char *tmpname;
 
+	sip_buffer = kmalloc(65536, GFP_KERNEL);
+	if (!sip_buffer)
+		return -ENOMEM;
+
 	if (ports_c == 0)
 		ports[ports_c++] = SIP_PORT;
 
-- 
1.6.0.4


^ permalink raw reply related

* [PATCH] sctp: fix append error cause to ERROR chunk correctly
From: Wei Yongjun @ 2010-05-14  0:37 UTC (permalink / raw)
  To: David Miller
  Cc: Vlad Yasevich, Neil Horman, linux-sctp, Eugene Teo,
	netdev@vger.kernel.org
In-Reply-To: <4BEC00B2.6000705@hp.com>

commit 5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809
  sctp: Fix skb_over_panic resulting from multiple invalid \
    parameter errors (CVE-2010-1173) (v4)

cause 'error cause' never be add the the ERROR chunk due to
some typo when check valid length in sctp_init_cause_fixed().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 net/sctp/sm_make_chunk.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 30c1767..70d6c10 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -141,7 +141,7 @@ int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
 	len = sizeof(sctp_errhdr_t) + paylen;
 	err.length  = htons(len);
 
-	if (skb_tailroom(chunk->skb) >  len)
+	if (skb_tailroom(chunk->skb) < len)
 		return -ENOSPC;
 	chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
 						     sizeof(sctp_errhdr_t),
@@ -1421,7 +1421,7 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
 void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
 			     int len, const void *data)
 {
-	if (skb_tailroom(chunk->skb) > len)
+	if (skb_tailroom(chunk->skb) >= len)
 		return sctp_addto_chunk(chunk, len, data);
 	else
 		return NULL;
-- 
1.6.5.2


^ permalink raw reply related

* RE: does the broadcom bnx2x support RSS/multi queue
From: Jon Zhou @ 2010-05-14  1:03 UTC (permalink / raw)
  To: eilong@broadcom.com; +Cc: Eric Dumazet, netdev@vger.kernel.org
In-Reply-To: <1273750340.1423.2.camel@lb-tlvb-eilong.il.broadcom.com>

hi eilon:

I got these log when applied debug=0x20,why " MSI is not attainable"?
thanks


May 13 18:53:26 ibm-bc-53 kernel: Broadcom NetXtreme II 5771x 10Gigabit Ethernet
 Driver bnx2x 1.52.12 ($DateTime: 2009/12/17 12:14:50 $)
May 13 18:53:26 ibm-bc-53 kernel: bnx2x 0000:15:00.0: PCI INT A -> GSI 24 (level
, low) -> IRQ 24
May 13 18:53:26 ibm-bc-53 kernel: bnx2x 0000:15:00.0: setting latency timer to 6
4
May 13 18:53:26 ibm-bc-53 kernel: bnx2x: part number 394D4342-31373735-31314131-
473033
May 13 18:53:26 ibm-bc-53 kernel: eth2: Broadcom NetXtreme II BCM57711 XGb (A0)
PCI-E x8 5GHz (Gen2) found at mem a0000000, IRQ 24, node addr ffff8801db9b0210
May 13 18:53:26 ibm-bc-53 kernel: bnx2x 0000:15:00.1: PCI INT B -> GSI 34 (level
, low) -> IRQ 34
May 13 18:53:26 ibm-bc-53 kernel: bnx2x 0000:15:00.1: setting latency timer to 6
4
May 13 18:53:26 ibm-bc-53 kernel: eth2 renamed to eth3 by udevd [30749]
May 13 18:53:26 ibm-bc-53 kernel: bnx2x: part number 394D4342-31373735-31314131-
473033
May 13 18:53:26 ibm-bc-53 kernel: udev: renamed network interface eth2 to eth3
May 13 18:53:26 ibm-bc-53 ifup:     eth3      device: Broadcom Corporation NetXt
reme II BCM57711 10Gigabit PCIe
May 13 18:53:26 ibm-bc-53 SuSEfirewall2: SuSEfirewall2 not active
May 13 18:53:26 ibm-bc-53 kernel: eth2: Broadcom NetXtreme II BCM57711 XGb (A0)
PCI-E x8 5GHz (Gen2) found at mem a0800000, IRQ 34, node addr ffff8801db9a8210
May 13 18:53:26 ibm-bc-53 kernel: bnx2x 0000:1a:00.0: PCI INT A -> GSI 26 (level
, low) -> IRQ 26
May 13 18:53:26 ibm-bc-53 kernel: bnx2x 0000:1a:00.0: setting latency timer to 6
4
May 13 18:53:26 ibm-bc-53 kernel: eth2 renamed to eth5 by udevd [30822]
May 13 18:53:26 ibm-bc-53 kernel: bnx2x: part number 394D4342-31373735-31314131-
473033
May 13 18:53:26 ibm-bc-53 kernel: udev: renamed network interface eth2 to eth5
May 13 18:53:27 ibm-bc-53 ifup:     eth5      device: Broadcom Corporation NetXt
reme II BCM57711 10Gigabit PCIe
May 13 18:53:27 ibm-bc-53 ifup:     eth5      Startmode is 'off'
May 13 18:53:27 ibm-bc-53 kernel: eth2: Broadcom NetXtreme II BCM57711 XGb (A0)
PCI-E x8 5GHz (Gen2) found at mem 9a000000, IRQ 26, node addr ffff880203c18210
May 13 18:53:27 ibm-bc-53 kernel: bnx2x 0000:1a:00.1: PCI INT B -> GSI 25 (level
, low) -> IRQ 25
May 13 18:53:27 ibm-bc-53 kernel: bnx2x 0000:1a:00.1: setting latency timer to 6
4
May 13 18:53:27 ibm-bc-53 kernel: bnx2x: part number 394D4342-31373735-31314131-
473033
May 13 18:53:27 ibm-bc-53 ifup:     eth2      device: Broadcom Corporation NetXt
reme II BCM57711 10Gigabit PCIe
May 13 18:53:27 ibm-bc-53 SuSEfirewall2: SuSEfirewall2 not active


May 13 18:53:27 ibm-bc-53 kernel: eth4: Broadcom NetXtreme II BCM57711 XGb (A0)
PCI-E x8 5GHz (Gen2) found at mem 9a800000, IRQ 25, node addr ffff8801f7940210
May 13 18:53:27 ibm-bc-53 ifup:     eth4      device: Broadcom Corporation NetXt
reme II BCM57711 10Gigabit PCIe
May 13 18:53:27 ibm-bc-53 SuSEfirewall2: SuSEfirewall2 not active
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_set_num_queues:8134(eth5)]set number of queues to 4
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_enable_msix:7625(eth5)]msix_table[0].entry = 0 (slowpath)
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_enable_msix:7630(eth5)]msix_table[1].entry = 1 (CNIC)
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_enable_msix:7637(eth5)]msix_table[2].entry = 2 (fastpath #0)
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_enable_msix:7637(eth5)]msix_table[3].entry = 3 (fastpath #1)
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_enable_msix:7637(eth5)]msix_table[4].entry = 4 (fastpath #2)
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_enable_msix:7637(eth5)]msix_table[5].entry = 5 (fastpath #3)
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_enable_msix:7667(eth5)]MSI-X is not attainable  rc -28
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_enable_msi:7726(eth5)]MSI is not attainable
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_nic_init:6148(eth5)]queue[0]:  bnx2x_init_sb(ffff8801db9a8780,ffff8801d70fc000)  cl_id 0  sb 1  cos 0
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_init_rx_rings:5386(eth5)]mtu 1500  rx_buf_size 1650
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_set_storm_rx_mode:5753(eth5)]rx mode 0  mask 0x1
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_init_internal_func:6061(eth5)]All MIN values are zeroes  fairness will be disabled
May 13 18:54:05 ibm-bc-53 kernel: [bnx2x_init_ind_table:5688(eth5)]Initializing indirection table  multi_mode 1
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_wait_ramrod:7941(eth5)]waiting for state to become 3000 on IDX [0]
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_sp_event:1191(eth5)]got setup ramrod
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_mac_addr_e1h_gen:7927(eth5)]setting MAC (001a:6476:0367)  E1HOV 0  CLID mask 1
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_wait_ramrod:7941(eth5)]waiting for state to become 0 on IDX [0]
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_sp_event:1215(eth5)]got set mac ramrod
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_rx_mode:12833(eth5)]dev->flags = 1002
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_storm_rx_mode:5753(eth5)]rx mode 1  mask 0x1
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_rx_mode:12833(eth5)]dev->flags = 1003
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_storm_rx_mode:5753(eth5)]rx mode 1  mask 0x1
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_rx_mode:12833(eth5)]dev->flags = 1003
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_rx_mode:12927(eth5)]Adding mcast MAC: ffff8803f58a4c08
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_storm_rx_mode:5753(eth5)]rx mode 1  mask 0x1
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_rx_mode:12833(eth5)]dev->flags = 1003
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_rx_mode:12927(eth5)]Adding mcast MAC: ffff8803f58a4c08
May 13 18:54:06 ibm-bc-53 kernel: [bnx2x_set_storm_rx_mode:5753(eth5)]rx mode 1  mask 0x1
May 13 18:54:06 ibm-bc-53 kernel: bnx2x: eth5 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON




-----Original Message-----
From: Eilon Greenstein [mailto:eilong@broadcom.com] 
Sent: Thursday, May 13, 2010 7:32 PM
To: Jon Zhou
Cc: Eric Dumazet; netdev@vger.kernel.org
Subject: RE: does the broadcom bnx2x support RSS/multi queue

On Thu, 2010-05-13 at 02:44 -0700, Jon Zhou wrote:
> insmod ./PF_RING/drivers/broadcom/netxtreme2-5.2.50/bnx2x-1.52.12/src/bnx2x.ko multi_mode=1 num_queues=4 int_mode=3 debug=1
There is no need to set multi_mode or int_mode - you are using the
default values. However, I need more information on why you are using
INTA and not MSI-X, so please set the debug to 0x20

> but seems MSI not enabled:
> 34:         26          0          0          0          0          0          0          0          0          0          0          0   49470611   63427140          0          0   IO-APIC-fasteoi   eth5
Indeed, you are using INT# - this is why you do not have multi-queue.

Eilon



^ permalink raw reply

* Re: Question about vlans, bonding, etc.
From: George B. @ 2010-05-14  1:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1272948506.2407.174.camel@edumazet-laptop>

On Mon, May 3, 2010 at 9:48 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 03 mai 2010 à 17:06 -0700, George B. a écrit :
>> Watching the "Receive issues with bonding and vlans" thread brought a
>> question to mind.  In what order should things be done for best
>> performance?
>>
>> For example, say I have a pair of ethernet interfaces.  Do I slave the
>> ethernet interfaces to the bond device and then make the vlans on the
>> bond devices?
>> Or do I make the vlans on the ethernet devices and then bond the vlan
>> interfaces?
>>
>> In the first case I would have:
>>
>>
>>
>> bond0.3--|     |------eth0
>>              bond0
>> bond0.5--|     |------eth1
>>
>> The second case would be:
>>
>>       |------------------eth0.5-----|
>>       |          |-------eth0.3---eth0
>> bond0  bond1
>>       |          |-------eth1.3---eth1
>>       |------------------eth1.5-----|
>>
>> I am using he first method currently as it seemed more intuitive to me
>> at the time to bond the ethernets and then put the vlans on the bonds
>> but it seems life might be easier for the vlan driver if it is bound
>> directly to the hardware.  I am using Intel NICs (igb driver) with 4
>> queues per NIC.
>>
>> Would there be a performance difference expected between the two
>> configurations?  Can the vlan driver "see through" the bond interface
>> to the
>> hardware and take advantage of multiple queues if the hardware
>> supports it in the first configuration?
>
> Unfortunatly, first combination is not multiqueue aware yet.
>
> You'll need to patch bonding driver like this if your nics have 4
> queues :
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 85e813c..98cc3c0 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4915,8 +4915,8 @@ int bond_create(struct net *net, const char *name)
>
>        rtnl_lock();
>
> -       bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
> -                               bond_setup);
> +       bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
> +                               bond_setup, 4);
>        if (!bond_dev) {
>                pr_err("%s: eek! can't alloc netdev!\n", name);
>                rtnl_unlock();
>
>
>

I just got around to fooling with this some.  It would seem to me that
I should be able to get better performance if I could create the vlans
on the ethernet interfaces and then bond them together.  For example,
it seems intuitive that I should be able to create vlan eth0.5 and
eth1.5 and then enslave them.  Problem is that when I try to create
vlan5 on the second interface, vconfig balks that it already exists.
Yes, I know it exists, but I want vlan5 on two interfaces and I want
to use ifenslave to bond them together into a bond interface.  So if I
have 10 vlans, I would have 10 vlans on each ethernet interface and 10
bond interfaces.  The way it seems I am forced to do it now is bond
the two NICs together and add all the vlans to the single bond
interface.  It seems that the bond interface would then become a
bottleneck for all the vlans.

Is there some physical reason why it is not possible to create the
same vlan on multiple interfaces as long as the naming convention
keeps them named separately so they can be distinguished from each
other?

^ permalink raw reply

* Re: Question about vlans, bonding, etc.
From: Stephen Hemminger @ 2010-05-14  1:12 UTC (permalink / raw)
  To: George B.; +Cc: Eric Dumazet, netdev
In-Reply-To: <AANLkTikVJqN6m5nsJsFSNHS_HbOFyt0hGr_8MHu6tWDR@mail.gmail.com>

On Thu, 13 May 2010 18:10:33 -0700
"George B." <georgeb@gmail.com> wrote:

> vlan5 on the second interface, vconfig balks that it already exists.

vconfig is stupid. use 'ip link'


-- 

^ permalink raw reply

* PF_PACKET + bind() to proto + outbound packets
From: Paul LeoNerd Evans @ 2010-05-14  1:14 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 868 bytes --]

I'm writing a small traffic watching program, to capture IPv4 packets.

If I have a PF_PACKET socket bound to no particular protocol it sees
both inbound and outbound packets; I can then apply a BPF filter for
just one protocol (i.e. IPv4).

But if instead I bind the socket to the IPv4 protocol specifically, it
no longer sees any outbound packets created by the machine, only inbound
ones.

Is there perhaps some ioctl or sockopt I could enable, to see these
outbound packets too?

Further, would there actually be much difference in practice, in terms
of performance, abilities, etc... even if this were an option turned on?
What's the preferred method of snooping on all of the machine's, for
example, IPv4 traffic?

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox