Netdev List
 help / color / mirror / Atom feed
* [PATCH -next 2/3] bnx2: Add prefetches to rx path.
From: Michael Chan @ 2010-05-05  5:21 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1273036906-29162-1-git-send-email-mchan@broadcom.com>

Add prefetches of the skb and the next rx descriptor to speed up rx path.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 drivers/net/bnx2.c |   12 +++++++++---
 drivers/net/bnx2.h |    1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 6ad3184..cdee29b 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2719,6 +2719,7 @@ bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
 	}
 
 	rx_buf->skb = skb;
+	rx_buf->desc = (struct l2_fhdr *) skb->data;
 	dma_unmap_addr_set(rx_buf, mapping, mapping);
 
 	rxbd->rx_bd_haddr_hi = (u64) mapping >> 32;
@@ -2941,6 +2942,7 @@ bnx2_reuse_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
 	rxr->rx_prod_bseq += bp->rx_buf_use_size;
 
 	prod_rx_buf->skb = skb;
+	prod_rx_buf->desc = (struct l2_fhdr *) skb->data;
 
 	if (cons == prod)
 		return;
@@ -3086,7 +3088,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 	while (sw_cons != hw_cons) {
 		unsigned int len, hdr_len;
 		u32 status;
-		struct sw_bd *rx_buf;
+		struct sw_bd *rx_buf, *next_rx_buf;
 		struct sk_buff *skb;
 		dma_addr_t dma_addr;
 		u16 vtag = 0;
@@ -3097,7 +3099,11 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 
 		rx_buf = &rxr->rx_buf_ring[sw_ring_cons];
 		skb = rx_buf->skb;
+		prefetch(skb);
 
+		next_rx_buf =
+			&rxr->rx_buf_ring[RX_RING_IDX(NEXT_RX_BD(sw_cons))];
+		prefetch(next_rx_buf->desc);
 		rx_buf->skb = NULL;
 
 		dma_addr = dma_unmap_addr(rx_buf, mapping);
@@ -3106,7 +3112,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 			BNX2_RX_OFFSET + BNX2_RX_COPY_THRESH,
 			PCI_DMA_FROMDEVICE);
 
-		rx_hdr = (struct l2_fhdr *) skb->data;
+		rx_hdr = rx_buf->desc;
 		len = rx_hdr->l2_fhdr_pkt_len;
 		status = rx_hdr->l2_fhdr_status;
 
@@ -5764,7 +5770,7 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
 	rx_buf = &rxr->rx_buf_ring[rx_start_idx];
 	rx_skb = rx_buf->skb;
 
-	rx_hdr = (struct l2_fhdr *) rx_skb->data;
+	rx_hdr = rx_buf->desc;
 	skb_reserve(rx_skb, BNX2_RX_OFFSET);
 
 	pci_dma_sync_single_for_cpu(bp->pdev,
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index ab34a5d..dd35bd0 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -6551,6 +6551,7 @@ struct l2_fhdr {
 
 struct sw_bd {
 	struct sk_buff		*skb;
+	struct l2_fhdr		*desc;
 	DEFINE_DMA_UNMAP_ADDR(mapping);
 };
 
-- 
1.6.4.GIT



^ permalink raw reply related

* [PATCH -next 1/3] bnx2: Add GRO support.
From: Michael Chan @ 2010-05-05  5:21 UTC (permalink / raw)
  To: davem; +Cc: netdev

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 drivers/net/bnx2.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index ab26bbc..6ad3184 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -3207,10 +3207,10 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 
 #ifdef BCM_VLAN
 		if (hw_vlan)
-			vlan_hwaccel_receive_skb(skb, bp->vlgrp, vtag);
+			vlan_gro_receive(&bnapi->napi, bp->vlgrp, vtag, skb);
 		else
 #endif
-			netif_receive_skb(skb);
+			napi_gro_receive(&bnapi->napi, skb);
 
 		rx_pkt++;
 
-- 
1.6.4.GIT



^ permalink raw reply related

* Re: [Patch 2/3] sysctl: add proc_do_large_bitmap
From: Cong Wang @ 2010-05-05  3:14 UTC (permalink / raw)
  To: Changli Gao
  Cc: linux-kernel, Octavian Purdila, Eric Dumazet, penguin-kernel,
	netdev, Neil Horman, ebiederm, adobriyan, David Miller
In-Reply-To: <v2y412e6f7f1004301541yb1ede589t8c446966743ca138@mail.gmail.com>

Changli Gao wrote:
>                      add the following lines to let "echo 1-10 >>
> /proc/..." work as normal.

Hmm, I haven't tested this, what did you see if we append
lines into it?

Also, do we need appending lines to this /proc file when design it?
Octavian? Eric?

Thanks.

^ permalink raw reply

* Re: [Patch 1/3] sysctl: refactor integer handling proc code
From: Cong Wang @ 2010-05-05  3:02 UTC (permalink / raw)
  To: Changli Gao
  Cc: linux-kernel, Octavian Purdila, Eric Dumazet, penguin-kernel,
	netdev, Neil Horman, ebiederm, David Miller, adobriyan
In-Reply-To: <u2s412e6f7f1004301549tb0e88a80n4c621e42c0b31015@mail.gmail.com>

Changli Gao wrote:
> On Fri, Apr 30, 2010 at 4:25 PM, Amerigo Wang <amwang@redhat.com> wrote:
>> +       if (*p == '-' && *size > 1) {
>> +               *neg = 1;
> 
> As neg is bool*, you should use true and false instead of 1 and 0.
> 

Yeah, I only corrected those lines that I touched, I should
correct them all.

Will fix.

Thanks.


^ permalink raw reply

* MDaemon Notification -- Attachment Removed
From: Postmaster @ 2010-05-05  2:41 UTC (permalink / raw)
  To: netdev

-------------------------------------------------------------------
MDaemon has detected restricted attachments within an email message
-------------------------------------------------------------------

>From      : netdev@vger.kernel.org
To        : ngochphcm@klv.com.vn
Subject   : Mail Delivery (failure ngochphcm@klv.com.vn)
Message-ID: 

---------------------
Attachment(s) removed
---------------------
message.scr



^ permalink raw reply

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Stephen Hemminger @ 2010-05-05  2:44 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org, pv-drivers@vmware.com,
	Shreyas Bhatewara
In-Reply-To: <20100505001857.GQ8323@vmware.com>

On Tue, 4 May 2010 17:18:57 -0700
Pankaj Thakkar <pthakkar@vmware.com> wrote:

> The purpose of this email is to introduce the architecture and the design principles. The overall project involves more than just changes to vmxnet3 driver and hence we though an overview email would be better. Once people agree to the design in general we intend to provide the code changes to the vmxnet3 driver.

As Dave said, we care more about what the implementation looks like than the high level
goals of the design. I think we all agree that better management of virtualized devices
is necessary, the problem is that their are so many of them (vmware, xen, HV, Xen), 
and vendors seem to to lean on their own specific implementation of a offloading, 
which makes a general solution more difficult. Please, Please solve this cleanly.

The little things like API's and locking semantics and handling of dynamic versus
static control can make a good design in principle fall apart when someone does a bad
job of implementing them.

Lastly, projects that have had multiple people involved for long periods of time
in the dark often end up building a legacy mentality "but we convinced vendor XXX to include it
in their Enterprise version 666" and require lots of "retraining" before the code
becomes acceptable.

-- 

^ permalink raw reply

* Re: [PATCH] compat-wireless: updates for orinoco
From: Luis R. Rodriguez @ 2010-05-05  1:47 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Luis Rodriguez, Hauke Mehrtens, David Miller,
	linux-wireless@vger.kernel.org, mcgrof@kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20100505001830.GO2624@tux>

On Tue, May 4, 2010 at 5:18 PM, Luis R. Rodriguez
<lrodriguez@atheros.com> wrote:
> On Tue, May 04, 2010 at 05:04:09PM -0700, Stephen Hemminger wrote:
>> On Tue, 4 May 2010 16:26:53 -0700
>> "Luis R. Rodriguez" <lrodriguez@atheros.com> wrote:
>>
>> > First of all, thanks a lot! Some comments below.
>> >
>> > On Tue, May 4, 2010 at 3:40 PM, Hauke Mehrtens <hauke@hauke-m.de> wrote:
>> > > * Make all the patches apply again.
>> > > * rename read_pda to avoid conflicts with definitions in kernel <= 2.6.29
>> >
>> > I'm going to apply these two changes, if you get time can you send a
>> > patch to rename read_pda upstream as well, that way we don't have to
>> > carry this?
>> >
>> > > * add orinoco usb
>> >
>> > Thanks for this but I've grown tired of updating these netdev ops and
>> > I think we can do better. I'll add a netdev_attach_ops() which would
>> > simply do all the backport stuff for us, this way for backporting
>> > purposes all we have to do is replace the old lines with a
>> > netdev_attach_ops() call. In fact if we *really* wanted to we could
>> > add a dummy netdev_attach_ops() upstream and just backport that on
>> > older kernels, this would mean 0 line changes to backport a newer
>> > driver.
>> >
>> > Something like this maybe on the generic compat module, it builds for
>> > me, will commit soon.
>> >
>> > /*
>> >  * Expand this as drivers require more ops, for now this
>> >  * only sets the ones we need.
>> >  */
>> > void netdev_attach_ops(struct net_device *dev,
>> >                       const struct net_device_ops *ops)
>> > {
>> > #define SET_NETDEVOP(_op) (_op ? _op : NULL)
>> >        dev->open = SET_NETDEVOP(ops->ndo_open);
>> >        dev->stop = SET_NETDEVOP(ops->ndo_stop);
>> >        dev->hard_start_xmit = SET_NETDEVOP(ops->ndo_start_xmit);
>> >        dev->set_multicast_list = SET_NETDEVOP(ops->ndo_set_multicast_list);
>> >        dev->change_mtu = SET_NETDEVOP(ops->ndo_change_mtu);
>> >        dev->set_mac_address = SET_NETDEVOP(ops->ndo_set_mac_address);
>> >        dev->tx_timeout = SET_NETDEVOP(ops->ndo_tx_timeout);
>> >        dev->get_stats = SET_NETDEVOP(ops->ndo_get_stats);
>> > #undef SET_NETDEVOP
>> > }
>> > EXPORT_SYMBOL(netdev_attach_ops);
>> >
>> > For newer kernels then this would just be:
>> >
>> > static inline void netdev_attach_ops(struct net_device *dev,
>> >                       const struct net_device_ops *ops)
>> > {
>> >        dev->netdev_ops = ops;
>> > }
>> >
>> > Stephen, would the above be acceptable upstream on netdevice.h ? It
>> > would eliminate all needs from having to #ifdef network drivers when
>> > backporting. If so I can send a respective patch and spatch all the
>> > setters I think. An example of the nasty ifdef crap we have to do for
>> > the current backport of netdevop'able drivers is below.
>> >
>>
>> No. supporting backporting is not part of the upstream kernel
>> mission. Honestly, we try for forward compatibility but intentionally
>> ignore carrying extra backport baggage.
>
> Sure, understood, just had to try :), if only I could find a *good*
> non-backport reason to have the netdev_attach_ops()...

FWIW, it helped a lot, porting an Ethernet driver for example consists
of a 1 line change to the driver, this goes down to 2.6.21 even. With
a netdev_attach_ops() upstream this would require 0 lines of code
changes. But --- I understand, I'll try to find a real value for it on
existing kernels.

 patches/01-netdev.patch |  625 ++++++-----------------------------------------
 1 files changed, 75 insertions(+), 550 deletions(-)

  Luis

^ permalink raw reply

* Re: [RFC] net: change bridge/macvlan hook to be be generic
From: Scott Feldman @ 2010-05-05  0:58 UTC (permalink / raw)
  To: Stephen Hemminger, David Miller, Patrick McHardy; +Cc: netdev
In-Reply-To: <20100504153758.0ed3a87d@nehalam>

On 5/4/10 3:37 PM, "Stephen Hemminger" <shemminger@vyatta.com> wrote:

> The existing macvlan and bridge have special hooks in the packet input
> path. This patch changes it to a generic hook chain, like the packet type
> processing. I have been wanting to look into flow based switching, etc...

Can this be further simplified by saying that a netdev can only be hooked by
one mux (macvlan, bridge, etc) at any given time, so there is never more
than one element in the hook chain.  If so, then we just need a single hook,
not a chain.  

It seems odd to me that a dev would have both macvlan_port != NULL and
br_port != NULL.  Can dev be in a macvlan and a bridge at the same time?

-scott


^ permalink raw reply

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Chris Wright @ 2010-05-05  0:58 UTC (permalink / raw)
  To: Pankaj Thakkar; +Cc: kvm, pv-drivers, netdev, linux-kernel, virtualization
In-Reply-To: <20100504230225.GP8323@vmware.com>

* Pankaj Thakkar (pthakkar@vmware.com) wrote:
> We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> Linux users can exploit the benefits provided by passthrough devices in a
> seamless manner while retaining the benefits of virtualization. The document
> below tries to answer most of the questions which we anticipated. Please let us
> know your comments and queries.

How does the throughput, latency, and host CPU utilization for normal
data path compare with say NetQueue?

And does this obsolete your UPT implementation?

> Network Plugin Architecture
> ---------------------------
> 
> VMware has been working on various device passthrough technologies for the past
> few years. Passthrough technology is interesting as it can result in better
> performance/cpu utilization for certain demanding applications. In our vSphere
> product we support direct assignment of PCI devices like networking adapters to
> a guest virtual machine. This allows the guest to drive the device using the
> device drivers installed inside the guest. This is similar to the way KVM
> allows for passthrough of PCI devices to the guests. The hypervisor is bypassed
> for all I/O and control operations and hence it can not provide any value add
> features such as live migration, suspend/resume, etc.
> 
> 
> Network Plugin Architecture (NPA) is an approach which VMware has developed in
> joint partnership with Intel which allows us to retain the best of passthrough
> technology and virtualization. NPA allows for passthrough of the fast data
> (I/O) path and lets the hypervisor deal with the slow control path using
> traditional emulation/paravirtualization techniques. Through this splitting of
> data and control path the hypervisor can still provide the above mentioned
> value add features and exploit the performance benefits of passthrough.

How many cards actually support this NPA interface?  What does it look
like, i.e. where is the NPA specification?  (AFAIK, we never got the UPT
one).

> NPA requires SR-IOV hardware which allows for sharing of one single NIC adapter
> by multiple guests. SR-IOV hardware has many logically separate functions
> called virtual functions (VF) which can be independently assigned to the guest
> OS. They also have one or more physical functions (PF) (managed by a PF driver)
> which are used by the hypervisor to control certain aspects of the VFs and the
> rest of the hardware.

How do you handle hardware which has a more symmetric view of the
SR-IOV world (SR-IOV is only PCI sepcification, not a network driver
specification)?  Or hardware which has multiple functions per physical
port (multiqueue, hw filtering, embedded switch, etc.)?

> NPA splits the guest driver into two components called
> the Shell and the Plugin. The shell is responsible for interacting with the
> guest networking stack and funneling the control operations to the hypervisor.
> The plugin is responsible for driving the data path of the virtual function
> exposed to the guest and is specific to the NIC hardware. NPA also requires an
> embedded switch in the NIC to allow for switching traffic among the virtual
> functions. The PF is also used as an uplink to provide connectivity to other
> VMs which are in emulation mode. The figure below shows the major components in
> a block diagram.
> 
>         +------------------------------+
>         |         Guest VM             |
>         |                              |
>         |      +----------------+      |
>         |      | vmxnet3 driver |      |
>         |      |     Shell      |      |
>         |      | +============+ |      |
>         |      | |   Plugin   | |      |
>         +------+-+------------+-+------+
>                 |           .
>                +---------+  .
>                | vmxnet3 |  .
>                |___+-----+  .
>                      |      .
>                      |      .
>                 +----------------------------+
>                 |                            |
>                 |       virtual switch       |
>                 +----------------------------+
>                   |         .               \
>                   |         .                \
>            +=============+  .                 \
>            | PF control  |  .                  \
>            |             |  .                   \
>            |  L2 driver  |  .                    \
>            +-------------+  .                     \
>                   |         .                      \
>                   |         .                       \
>                 +------------------------+     +------------+
>                 | PF   VF1 VF2 ...   VFn |     |            |
>                 |                        |     |  regular   |
>                 |       SR-IOV NIC       |     |    nic     |
>                 |    +--------------+    |     |   +--------+
>                 |    |   embedded   |    |     +---+
>                 |    |    switch    |    |
>                 |    +--------------+    |
>                 |        +---------------+
>                 +--------+
> 
> NPA offers several benefits:
> 1. Performance: Critical performance sensitive paths are not trapped and the
> guest can directly drive the hardware without incurring virtualization
> overheads.

Can you demonstrate with data?

> 2. Hypervisor control: All control operations from the guest such as programming
> MAC address go through the hypervisor layer and hence can be subjected to
> hypervisor policies. The PF driver can be further used to put policy decisions
> like which VLAN the guest should be on.

This can happen without NPA as well.  VF simply needs to request
the change via the PF (in fact, hw does that right now).  Also, we
already have a host side management interface via PF (see, for example,
RTM_SETLINK IFLA_VF_MAC interface).

What is control plane interface?  Just something like a fixed register set?

> 3. Guest Management: No hardware specific drivers need to be installed in the
> guest virtual machine and hence no overheads are incurred for guest management.
> All software for the driver (including the PF driver and the plugin) is
> installed in the hypervisor.

So we have a plugin per hardware VF implementation?  And the hypervisor
injects this code into the guest?

> 4. IHV independence: The architecture provides guidelines for splitting the
> functionality between the VFs and PF but does not dictate how the hardware
> should be implemented. It gives the IHV the freedom to do asynchronous updates
> either to the software or the hardware to work around any defects.

Yes, this is important, esp. instead of the requirement for hw to
implement a specific interface (I suspect you know all about this issue
already).

> The fundamental tenet in NPA is to let the hypervisor control the passthrough
> functionality with minimal guest intervention. This gives a lot of flexibility
> to the hypervisor which can then treat passthrough as an offload feature (just
> like TSO, LRO, etc) which is offered to the guest virtual machine when there
> are no conflicting features present. For example, if the hypervisor wants to
> migrate the virtual machine from one host to another, the hypervisor can switch
> the virtual machine out of passthrough mode into paravirtualized/emulated mode
> and it can use existing technique to migrate the virtual machine. Once the
> virtual machine is migrated to the destination host the hypervisor can switch
> the virtual machine back to passthrough mode if a supporting SR-IOV nic is
> present. This may involve reloading of a different plugin corresponding to the
> new SR-IOV hardware.
> 
> Internally we have explored various other options before settling on the NPA
> approach. For example there are approaches which create a bonding driver on top
> of a complete passthrough of a NIC device and an emulated/paravirtualized
> device. Though this approach allows for live migration to work it adds a lot of
> complexity and dependency. First the hypervisor has to rely on a guest with
> hot-add support. Second the hypervisor has to depend on the guest networking
> stack to cooperate to perform migration. Third the guest has to carry the
> driver images for all possible hardware to which the guest may migrate to.
> Fourth the hypervisor does not get full control for all the policy decisions.
> Another approach we have considered is to have a uniform interface for the data
> path between the emulated/paravirtualized device and the hardware device which
> allows the hypervisor to seamlessly switch from the emulated interface to the
> hardware interface. Though this approach is very attractive and can work
> without any guest involvement it is not acceptable to the IHVs as it does not
> give them the freedom to fix bugs/erratas and differentiate from each other. We
> believe NPA approach provides the right level of control and flexibility to the
> hypervisors while letting the guest exploit the benefits of passthrough.

> The plugin image is provided by the IHVs along with the PF driver and is
> packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> either into a Linux VM or a Windows VM. The plugin is written against the Shell

And it will need to be GPL AFAICT from what you've said thus far.  It
does sound worrisome, although I suppose hw firmware isn't particularly
different.

> API interface which the shell is responsible for implementing. The API
> interface allows the plugin to do TX and RX only by programming the hardware
> rings (along with things like buffer allocation and basic initialization). The
> virtual machine comes up in paravirtualized/emulated mode when it is booted.
> The hypervisor allocates the VF and other resources and notifies the shell of
> the availability of the VF. The hypervisor injects the plugin into memory
> location specified by the shell. The shell initializes the plugin by calling
> into a known entry point and the plugin initializes the data path. The control
> path is already initialized by the PF driver when the VF is allocated. At this
> point the shell switches to using the loaded plugin to do all further TX and RX
> operations. The guest networking stack does not participate in these operations
> and continues to function normally. All the control operations continue being
> trapped by the hypervisor and are directed to the PF driver as needed. For
> example, if the MAC address changes the hypervisor updates its internal state
> and changes the state of the embedded switch as well through the PF control
> API.

How does the shell switch back to emulated mode for live migration?

> We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
> splitting the driver into two parts: Shell and Plugin. The new split driver is
> backwards compatible and continues to work on old/existing vmxnet3 device
> emulations. The shell implements the API interface and contains code to do the
> bookkeeping for TX/RX buffers along with interrupt management. The shell code
> also handles the loading of the plugin and verifying the license of the loaded
> plugin. The plugin contains the code specific to vmxnet3 ring and descriptor
> management. The plugin uses the same Shell API interface which would be used by
> other IHVs. This vmxnet3 plugin is compiled statically along with the shell as
> this is needed to provide connectivity when there is no underlying SR-IOV
> device present. The IHV plugins are required to be distributed under GPL
> license and we are currently looking at ways to verify this both within the
> hypervisor and within the shell.

Please make this shell API interface and the PF/VF requirments available.

thanks,
-chris

^ permalink raw reply

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Pankaj Thakkar @ 2010-05-05  0:38 UTC (permalink / raw)
  To: David Miller
  Cc: shemminger@vyatta.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, virtualization@lists.linux-foundation.org,
	pv-drivers@vmware.com, Shreyas Bhatewara
In-Reply-To: <20100504.173236.104064409.davem@davemloft.net>

Sure. We have been working on NPA for a while and have the code internally up
and running. Let me sync up internally on how and when we can provide the
vmxnet3 driver code so that people can look at it.


On Tue, May 04, 2010 at 05:32:36PM -0700, David Miller wrote:
> Date: Tue, 4 May 2010 17:32:36 -0700
> From: David Miller <davem@davemloft.net>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "shemminger@vyatta.com" <shemminger@vyatta.com>,
> 	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> From: Pankaj Thakkar <pthakkar@vmware.com>
> Date: Tue, 4 May 2010 17:18:57 -0700
> 
> > The purpose of this email is to introduce the architecture and the
> > design principles. The overall project involves more than just
> > changes to vmxnet3 driver and hence we though an overview email
> > would be better. Once people agree to the design in general we
> > intend to provide the code changes to the vmxnet3 driver.
> 
> Stephen's point is that code talks and bullshit walks.
> 
> Talk about high level designs rarely gets any traction, and often goes
> nowhere.  Give us an example implementation so there is something
> concrete for us to sink our teeth into.

^ permalink raw reply

* [PATCH 5/6] hotplug: netns aware uevent_helper
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev, David Miller,
	Eric W. Biederman
In-Reply-To: <m1fx3hk6gw.fsf@fess.ebiederm.org>

From: Eric W. Biederman <ebiederm@xmission.com>

It only makes sense for uevent_helper to get events
in the intial namespaces.  It's invocation is not
per namespace and it is not clear how we could make
it's invocation namespace aware.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 lib/kobject_uevent.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 9057ec1..1b3dbab 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -18,6 +18,7 @@
 #include <linux/string.h>
 #include <linux/kobject.h>
 #include <linux/module.h>
+#include <linux/user_namespace.h>
 
 #include <linux/socket.h>
 #include <linux/skbuff.h>
@@ -98,6 +99,21 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
 	return 0;
 }
 
+static int kobj_usermode_filter(struct kobject *kobj)
+{
+	const struct kobj_ns_type_operations *ops;
+
+	ops = kobj_ns_ops(kobj);
+	if (ops) {
+		const void *init_ns, *ns;
+		ns = kobj->ktype->namespace(kobj);
+		init_ns = ops->initial_ns();
+		return ns != init_ns;
+	}
+
+	return 0;
+}
+
 /**
  * kobject_uevent_env - send an uevent with environmental data
  *
@@ -273,7 +289,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 #endif
 
 	/* call uevent_helper, usually only enabled during early boot */
-	if (uevent_helper[0]) {
+	if (uevent_helper[0] && !kobj_usermode_filter(kobj)) {
 		char *argv [3];
 
 		argv [0] = uevent_helper;
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related

* [PATCH 6/6] net: Expose all network devices in a namespaces in sysfs
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev, David Miller,
	Eric W. Biederman
In-Reply-To: <m1fx3hk6gw.fsf@fess.ebiederm.org>

From: Eric W. Biederman <ebiederm@xmission.com>

This reverts commit aaf8cdc34ddba08122f02217d9d684e2f9f5d575.

Drivers like the ipw2100 call device_create_group when they
are initialized and device_remove_group when they are shutdown.
Moving them between namespaces deletes their sysfs groups early.

In particular the following call chain results.
netdev_unregister_kobject -> device_del -> kobject_del -> sysfs_remove_dir
With sysfs_remove_dir recursively deleting all of it's subdirectories,
and nothing adding them back.

Ouch!

Therefore we need to call something that ultimate calls sysfs_mv_dir
as that sysfs function can move sysfs directories between namespaces
without deleting their subdirectories or their contents.   Allowing
us to avoid placing extra boiler plate into every driver that does
something interesting with sysfs.

Currently the function that provides that capability is device_rename.
That is the code works without nasty side effects as originally written.

So remove the misguided fix for moving devices between namespaces.  The
bug in the kobject layer that inspired it has now been recognized and
fixed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/dev.c       |   28 +++++-----------------------
 net/core/net-sysfs.c |   16 +---------------
 net/core/net-sysfs.h |    1 -
 3 files changed, 6 insertions(+), 39 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index bcc490c..fa54819 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -983,15 +983,10 @@ int dev_change_name(struct net_device *dev, const char *newname)
 		return err;
 
 rollback:
-	/* For now only devices in the initial network namespace
-	 * are in sysfs.
-	 */
-	if (net_eq(net, &init_net)) {
-		ret = device_rename(&dev->dev, dev->name);
-		if (ret) {
-			memcpy(dev->name, oldname, IFNAMSIZ);
-			return ret;
-		}
+	ret = device_rename(&dev->dev, dev->name);
+	if (ret) {
+		memcpy(dev->name, oldname, IFNAMSIZ);
+		return ret;
 	}
 
 	write_lock_bh(&dev_base_lock);
@@ -5106,8 +5101,6 @@ int register_netdevice(struct net_device *dev)
 	if (dev->features & NETIF_F_SG)
 		dev->features |= NETIF_F_GSO;
 
-	netdev_initialize_kobject(dev);
-
 	ret = call_netdevice_notifiers(NETDEV_POST_INIT, dev);
 	ret = notifier_to_errno(ret);
 	if (ret)
@@ -5628,15 +5621,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	if (dev->features & NETIF_F_NETNS_LOCAL)
 		goto out;
 
-#ifdef CONFIG_SYSFS
-	/* Don't allow real devices to be moved when sysfs
-	 * is enabled.
-	 */
-	err = -EINVAL;
-	if (dev->dev.parent)
-		goto out;
-#endif
-
 	/* Ensure the device has been registrered */
 	err = -EINVAL;
 	if (dev->reg_state != NETREG_REGISTERED)
@@ -5687,8 +5671,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	dev_unicast_flush(dev);
 	dev_addr_discard(dev);
 
-	netdev_unregister_kobject(dev);
-
 	/* Actually switch the network namespace */
 	dev_net_set(dev, net);
 
@@ -5701,7 +5683,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	}
 
 	/* Fixup kobjects */
-	err = netdev_register_kobject(dev);
+	err = device_rename(&dev->dev, dev->name);
 	WARN_ON(err);
 
 	/* Add the device back in the hashes */
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 1b98e36..0727c57 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -507,9 +507,6 @@ static int netdev_uevent(struct device *d, struct kobj_uevent_env *env)
 	struct net_device *dev = to_net_dev(d);
 	int retval;
 
-	if (!net_eq(dev_net(dev), &init_net))
-		return 0;
-
 	/* pass interface to uevent. */
 	retval = add_uevent_var(env, "INTERFACE=%s", dev->name);
 	if (retval)
@@ -568,9 +565,6 @@ void netdev_unregister_kobject(struct net_device * net)
 
 	kobject_get(&dev->kobj);
 
-	if (!net_eq(dev_net(net), &init_net))
-		return;
-
 	device_del(dev);
 }
 
@@ -580,6 +574,7 @@ int netdev_register_kobject(struct net_device *net)
 	struct device *dev = &(net->dev);
 	const struct attribute_group **groups = net->sysfs_groups;
 
+	device_initialize(dev);
 	dev->class = &net_class;
 	dev->platform_data = net;
 	dev->groups = groups;
@@ -602,9 +597,6 @@ int netdev_register_kobject(struct net_device *net)
 #endif
 #endif /* CONFIG_SYSFS */
 
-	if (!net_eq(dev_net(net), &init_net))
-		return 0;
-
 	return device_add(dev);
 }
 
@@ -621,12 +613,6 @@ void netdev_class_remove_file(struct class_attribute *class_attr)
 EXPORT_SYMBOL(netdev_class_create_file);
 EXPORT_SYMBOL(netdev_class_remove_file);
 
-void netdev_initialize_kobject(struct net_device *net)
-{
-	struct device *device = &(net->dev);
-	device_initialize(device);
-}
-
 int netdev_kobject_init(void)
 {
 	kobj_ns_type_register(&net_ns_type_operations);
diff --git a/net/core/net-sysfs.h b/net/core/net-sysfs.h
index 14e7524..805555e 100644
--- a/net/core/net-sysfs.h
+++ b/net/core/net-sysfs.h
@@ -4,5 +4,4 @@
 int netdev_kobject_init(void);
 int netdev_register_kobject(struct net_device *);
 void netdev_unregister_kobject(struct net_device *);
-void netdev_initialize_kobject(struct net_device *);
 #endif
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related

* [PATCH 3/6] netlink: Implment netlink_broadcast_filtered
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev, David Miller,
	Eric W. Biederman
In-Reply-To: <m1fx3hk6gw.fsf@fess.ebiederm.org>

From: Eric W. Biederman <ebiederm@xmission.com>

When netlink sockets are used to convey data that is in a namespace
we need a way to select a subset of the listening sockets to deliver
the packet to.  For the network namespace we have been doing this
by only transmitting packets in the correct network namespace.

For data belonging to other namespaces netlink_bradcast_filtered
provides a mechanism that allows us to examine the destination
socket and to decide if we should transmit the specified packet
to it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/netlink.h  |    4 ++++
 net/netlink/af_netlink.c |   21 +++++++++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index fde27c0..4f7bf4b 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -188,6 +188,10 @@ extern int netlink_has_listeners(struct sock *sk, unsigned int group);
 extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 pid, int nonblock);
 extern int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, __u32 pid,
 			     __u32 group, gfp_t allocation);
+extern int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb,
+	__u32 pid, __u32 group, gfp_t allocation,
+	int (*filter)(struct sock *dsk, struct sk_buff *skb, void *data),
+	void *filter_data);
 extern void netlink_set_err(struct sock *ssk, __u32 pid, __u32 group, int code);
 extern int netlink_register_notifier(struct notifier_block *nb);
 extern int netlink_unregister_notifier(struct notifier_block *nb);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 320d042..4f16d68 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -975,6 +975,8 @@ struct netlink_broadcast_data {
 	int delivered;
 	gfp_t allocation;
 	struct sk_buff *skb, *skb2;
+	int (*tx_filter)(struct sock *dsk, struct sk_buff *skb, void *data);
+	void *tx_data;
 };
 
 static inline int do_one_broadcast(struct sock *sk,
@@ -1017,6 +1019,9 @@ static inline int do_one_broadcast(struct sock *sk,
 		p->failure = 1;
 		if (nlk->flags & NETLINK_BROADCAST_SEND_ERROR)
 			p->delivery_failure = 1;
+	} else if (p->tx_filter && p->tx_filter(sk, p->skb2, p->tx_data)) {
+		kfree_skb(p->skb2);
+		p->skb2 = NULL;
 	} else if (sk_filter(sk, p->skb2)) {
 		kfree_skb(p->skb2);
 		p->skb2 = NULL;
@@ -1035,8 +1040,10 @@ out:
 	return 0;
 }
 
-int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
-		      u32 group, gfp_t allocation)
+int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 pid,
+	u32 group, gfp_t allocation,
+	int (*filter)(struct sock *dsk, struct sk_buff *skb, void *data),
+	void *filter_data)
 {
 	struct net *net = sock_net(ssk);
 	struct netlink_broadcast_data info;
@@ -1056,6 +1063,8 @@ int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
 	info.allocation = allocation;
 	info.skb = skb;
 	info.skb2 = NULL;
+	info.tx_filter = filter;
+	info.tx_data = filter_data;
 
 	/* While we sleep in clone, do not allow to change socket list */
 
@@ -1080,6 +1089,14 @@ int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
 	}
 	return -ESRCH;
 }
+EXPORT_SYMBOL(netlink_broadcast_filtered);
+
+int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
+		      u32 group, gfp_t allocation)
+{
+	return netlink_broadcast_filtered(ssk, skb, pid, group, allocation,
+		NULL, NULL);
+}
 EXPORT_SYMBOL(netlink_broadcast);
 
 struct netlink_set_err_data {
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related

* [PATCH 4/6] kobj: Send hotplug events in the proper namespace.
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev, David Miller,
	Eric W. Biederman
In-Reply-To: <m1fx3hk6gw.fsf@fess.ebiederm.org>

From: Eric W. Biederman <ebiederm@xmission.com>

Utilize netlink_broacast_filtered to allow sending hotplug events
in the proper namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 lib/kobject_uevent.c |   22 ++++++++++++++++++++--
 1 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 3f5f17b..9057ec1 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -82,6 +82,22 @@ out:
 	return ret;
 }
 
+static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
+{
+	struct kobject *kobj = data;
+	const struct kobj_ns_type_operations *ops;
+
+	ops = kobj_ns_ops(kobj);
+	if (ops) {
+		const void *sock_ns, *ns;
+		ns = kobj->ktype->namespace(kobj);
+		sock_ns = ops->netlink_ns(dsk);
+		return sock_ns != ns;
+	}
+
+	return 0;
+}
+
 /**
  * kobject_uevent_env - send an uevent with environmental data
  *
@@ -243,8 +259,10 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 			}
 
 			NETLINK_CB(skb).dst_group = 1;
-			retval = netlink_broadcast(uevent_sock, skb, 0, 1,
-						   GFP_KERNEL);
+			retval = netlink_broadcast_filtered(uevent_sock, skb,
+							    0, 1, GFP_KERNEL,
+							    kobj_bcast_filter,
+							    kobj);
 			/* ENOBUFS should be handled in userspace */
 			if (retval == -ENOBUFS)
 				retval = 0;
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related

* [PATCH 2/6] netns: Teach network device kobjects which namespace they are in.
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev, David Miller,
	Eric W. Biederman
In-Reply-To: <m1fx3hk6gw.fsf@fess.ebiederm.org>

From: Eric W. Biederman <ebiederm@xmission.com>

The problem.  Network devices show up in sysfs and with the network
namespace active multiple devices with the same name can show up in
the same directory, ouch!

To avoid that problem and allow existing applications in network namespaces
to see the same interface that is currently presented in sysfs, this
patch enables the tagging directory support in sysfs.

By using the network namespace pointers as tags to separate out the
the sysfs directory entries we ensure that we don't have conflicts
in the directories and applications only see a limited set of
the network devices.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/kobject.h |    1 +
 net/Kconfig             |    8 ++++++++
 net/core/net-sysfs.c    |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index b60d2df..cf343a8 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -142,6 +142,7 @@ extern const struct sysfs_ops kobj_sysfs_ops;
  */
 enum kobj_ns_type {
 	KOBJ_NS_TYPE_NONE = 0,
+	KOBJ_NS_TYPE_NET,
 	KOBJ_NS_TYPES
 };
 
diff --git a/net/Kconfig b/net/Kconfig
index 041c35e..265e33b 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -45,6 +45,14 @@ config COMPAT_NETLINK_MESSAGES
 
 menu "Networking options"
 
+config NET_NS
+	bool "Network namespace support"
+	default n
+	depends on EXPERIMENTAL && NAMESPACES
+	help
+	  Allow user space to create what appear to be multiple instances
+	  of the network stack.
+
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/xfrm/Kconfig"
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 099c753..1b98e36 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -13,7 +13,9 @@
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/if_arp.h>
+#include <linux/nsproxy.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <linux/rtnetlink.h>
 #include <linux/wireless.h>
 #include <net/wext.h>
@@ -466,6 +468,37 @@ static struct attribute_group wireless_group = {
 };
 #endif
 
+static const void *net_current_ns(void)
+{
+	return current->nsproxy->net_ns;
+}
+
+static const void *net_initial_ns(void)
+{
+	return &init_net;
+}
+
+static const void *net_netlink_ns(struct sock *sk)
+{
+	return sock_net(sk);
+}
+
+static struct kobj_ns_type_operations net_ns_type_operations = {
+	.type = KOBJ_NS_TYPE_NET,
+	.current_ns = net_current_ns,
+	.netlink_ns = net_netlink_ns,
+	.initial_ns = net_initial_ns,
+};
+
+static void net_kobj_ns_exit(struct net *net)
+{
+	kobj_ns_exit(KOBJ_NS_TYPE_NET, net);
+}
+
+static struct pernet_operations sysfs_net_ops = {
+	.exit = net_kobj_ns_exit,
+};
+
 #endif /* CONFIG_SYSFS */
 
 #ifdef CONFIG_HOTPLUG
@@ -506,6 +539,13 @@ static void netdev_release(struct device *d)
 	kfree((char *)dev - dev->padded);
 }
 
+static const void *net_namespace(struct device *d)
+{
+	struct net_device *dev;
+	dev = container_of(d, struct net_device, dev);
+	return dev_net(dev);
+}
+
 static struct class net_class = {
 	.name = "net",
 	.dev_release = netdev_release,
@@ -515,6 +555,8 @@ static struct class net_class = {
 #ifdef CONFIG_HOTPLUG
 	.dev_uevent = netdev_uevent,
 #endif
+	.ns_type = &net_ns_type_operations,
+	.namespace = net_namespace,
 };
 
 /* Delete sysfs entries but hold kobject reference until after all
@@ -587,5 +629,9 @@ void netdev_initialize_kobject(struct net_device *net)
 
 int netdev_kobject_init(void)
 {
+	kobj_ns_type_register(&net_ns_type_operations);
+#ifdef CONFIG_SYSFS
+	register_pernet_subsys(&sysfs_net_ops);
+#endif
 	return class_register(&net_class);
 }
-- 
1.6.5.2.143.g8cc62

^ permalink raw reply related

* [PATCH 1/6] kobject: Send hotplug events in all network namespaces
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev, David Miller,
	Eric W. Biederman
In-Reply-To: <m1fx3hk6gw.fsf@fess.ebiederm.org>

From: Eric W. Biederman <ebiederm@xmission.com>

Open a copy of the uevent kernel socket in each network
namespace so we can send uevents in all network namespaces.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 lib/kobject_uevent.c |   68 ++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index c9d3a3e..3f5f17b 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -23,13 +23,19 @@
 #include <linux/skbuff.h>
 #include <linux/netlink.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 
 
 u64 uevent_seqnum;
 char uevent_helper[UEVENT_HELPER_PATH_LEN] = CONFIG_UEVENT_HELPER_PATH;
 static DEFINE_SPINLOCK(sequence_lock);
-#if defined(CONFIG_NET)
-static struct sock *uevent_sock;
+#ifdef CONFIG_NET
+struct uevent_sock {
+	struct list_head list;
+	struct sock *sk;
+};
+static LIST_HEAD(uevent_sock_list);
+static DEFINE_MUTEX(uevent_sock_mutex);
 #endif
 
 /* the strings here must match the enum in include/linux/kobject.h */
@@ -99,6 +105,9 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	u64 seq;
 	int i = 0;
 	int retval = 0;
+#ifdef CONFIG_NET
+	struct uevent_sock *ue_sk;
+#endif
 
 	pr_debug("kobject: '%s' (%p): %s\n",
 		 kobject_name(kobj), kobj, __func__);
@@ -210,7 +219,9 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 
 #if defined(CONFIG_NET)
 	/* send netlink message */
-	if (uevent_sock) {
+	mutex_lock(&uevent_sock_mutex);
+	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
+		struct sock *uevent_sock = ue_sk->sk;
 		struct sk_buff *skb;
 		size_t len;
 
@@ -240,6 +251,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 		} else
 			retval = -ENOMEM;
 	}
+	mutex_unlock(&uevent_sock_mutex);
 #endif
 
 	/* call uevent_helper, usually only enabled during early boot */
@@ -319,18 +331,58 @@ int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...)
 EXPORT_SYMBOL_GPL(add_uevent_var);
 
 #if defined(CONFIG_NET)
-static int __init kobject_uevent_init(void)
+static int uevent_net_init(struct net *net)
 {
-	uevent_sock = netlink_kernel_create(&init_net, NETLINK_KOBJECT_UEVENT,
-					    1, NULL, NULL, THIS_MODULE);
-	if (!uevent_sock) {
+	struct uevent_sock *ue_sk;
+
+	ue_sk = kzalloc(sizeof(*ue_sk), GFP_KERNEL);
+	if (!ue_sk)
+		return -ENOMEM;
+
+	ue_sk->sk = netlink_kernel_create(net, NETLINK_KOBJECT_UEVENT,
+					  1, NULL, NULL, THIS_MODULE);
+	if (!ue_sk->sk) {
 		printk(KERN_ERR
 		       "kobject_uevent: unable to create netlink socket!\n");
 		return -ENODEV;
 	}
-	netlink_set_nonroot(NETLINK_KOBJECT_UEVENT, NL_NONROOT_RECV);
+	mutex_lock(&uevent_sock_mutex);
+	list_add_tail(&ue_sk->list, &uevent_sock_list);
+	mutex_unlock(&uevent_sock_mutex);
 	return 0;
 }
 
+static void uevent_net_exit(struct net *net)
+{
+	struct uevent_sock *ue_sk;
+
+	mutex_lock(&uevent_sock_mutex);
+	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
+		if (sock_net(ue_sk->sk) == net)
+			goto found;
+	}
+	mutex_unlock(&uevent_sock_mutex);
+	return;
+
+found:
+	list_del(&ue_sk->list);
+	mutex_unlock(&uevent_sock_mutex);
+
+	netlink_kernel_release(ue_sk->sk);
+	kfree(ue_sk);
+}
+
+static struct pernet_operations uevent_net_ops = {
+	.init	= uevent_net_init,
+	.exit	= uevent_net_exit,
+};
+
+static int __init kobject_uevent_init(void)
+{
+	netlink_set_nonroot(NETLINK_KOBJECT_UEVENT, NL_NONROOT_RECV);
+	return register_pernet_subsys(&uevent_net_ops);
+}
+
+
 postcore_initcall(kobject_uevent_init);
 #endif
-- 
1.6.5.2.143.g8cc62

^ permalink raw reply related

* [PATCH 0/6] netns support in the kobject layer
From: Eric W. Biederman @ 2010-05-05  0:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, Greg KH, linux-kernel, Tejun Heo, Cornelia Huck,
	Eric Dumazet, Benjamin LaHaise, Serge Hallyn, netdev,
	David Miller
In-Reply-To: <m1fx3hk6gw.fsf@fess.ebiederm.org>


With the tagged sysfs support finally merged into Greg's tree,
it is time for the last little bits of work to get the kobject
layer and network namespaces to play together properly.

These patches are roughly evenly divided between network layer work
and sysfs layer work.  Last time this conundrum came up I believe
we decided that the easiest way to handle this was for Greg to carry
all of the patches.  David, Greg does that still make sense?

This patchset adds:
- kobject layer support for sending events in all network namespaces
- netlink support for filtering broadcast packets based on attributes
  of the destination socket.
- Enabling the network namespace support for sysfs and the kobject layer.

 include/linux/kobject.h  |    1 +
 include/linux/netlink.h  |    4 ++
 lib/kobject_uevent.c     |  108 +++++++++++++++++++++++++++++++++++++++++-----
 net/Kconfig              |    8 +++
 net/core/dev.c           |   28 ++----------
 net/core/net-sysfs.c     |   62 ++++++++++++++++++++------
 net/core/net-sysfs.h     |    1 -
 net/netlink/af_netlink.c |   21 ++++++++-
 8 files changed, 181 insertions(+), 52 deletions(-)

^ permalink raw reply

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
From: David Miller @ 2010-05-05  0:32 UTC (permalink / raw)
  To: pthakkar; +Cc: pv-drivers, netdev, linux-kernel, virtualization, shemminger
In-Reply-To: <20100505001857.GQ8323@vmware.com>

From: Pankaj Thakkar <pthakkar@vmware.com>
Date: Tue, 4 May 2010 17:18:57 -0700

> The purpose of this email is to introduce the architecture and the
> design principles. The overall project involves more than just
> changes to vmxnet3 driver and hence we though an overview email
> would be better. Once people agree to the design in general we
> intend to provide the code changes to the vmxnet3 driver.

Stephen's point is that code talks and bullshit walks.

Talk about high level designs rarely gets any traction, and often goes
nowhere.  Give us an example implementation so there is something
concrete for us to sink our teeth into.

^ permalink raw reply

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Pankaj Thakkar @ 2010-05-05  0:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org, pv-drivers@vmware.com,
	Shreyas Bhatewara
In-Reply-To: <20100504170531.1e7122da@nehalam>

The purpose of this email is to introduce the architecture and the design principles. The overall project involves more than just changes to vmxnet3 driver and hence we though an overview email would be better. Once people agree to the design in general we intend to provide the code changes to the vmxnet3 driver.

The architecture supports more than Intel NICs. We started the project with Intel but plan to support all major IHVs including Broadcom, Qlogic, Emulex and others through a certification program. The architecture works on VMware ESX server only as it requires significant support from the hypervisor. Also, the vmxnet3 driver works on VMware platform only. AFAICT Xen has a different model for supporting SR-IOV devices and allowing live migration and the document briefly talks about it (paragraph 6).

Thanks,

-pankaj


On Tue, May 04, 2010 at 05:05:31PM -0700, Stephen Hemminger wrote:
> Date: Tue, 4 May 2010 17:05:31 -0700
> From: Stephen Hemminger <shemminger@vyatta.com>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> On Tue, 4 May 2010 16:02:25 -0700
> Pankaj Thakkar <pthakkar@vmware.com> wrote:
> 
> > Device passthrough technology allows a guest to bypass the hypervisor and drive
> > the underlying physical device. VMware has been exploring various ways to
> > deliver this technology to users in a manner which is easy to adopt. In this
> > process we have prepared an architecture along with Intel - NPA (Network Plugin
> > Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
> > passthrough to a number of physical NICs which support it. The document below
> > provides an overview of NPA.
> > 
> > We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> > Linux users can exploit the benefits provided by passthrough devices in a
> > seamless manner while retaining the benefits of virtualization. The document
> > below tries to answer most of the questions which we anticipated. Please let us
> > know your comments and queries.
> > 
> > Thank you.
> > 
> > Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>
> 
> 
> Code please. Also, it has to work for all architectures not just VMware and
> Intel.

^ permalink raw reply

* Re: [PATCH] compat-wireless: updates for orinoco
From: Luis R. Rodriguez @ 2010-05-05  0:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Luis Rodriguez, Hauke Mehrtens, David Miller,
	linux-wireless@vger.kernel.org, mcgrof@kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20100504170409.46914a88@nehalam>

On Tue, May 04, 2010 at 05:04:09PM -0700, Stephen Hemminger wrote:
> On Tue, 4 May 2010 16:26:53 -0700
> "Luis R. Rodriguez" <lrodriguez@atheros.com> wrote:
> 
> > First of all, thanks a lot! Some comments below.
> > 
> > On Tue, May 4, 2010 at 3:40 PM, Hauke Mehrtens <hauke@hauke-m.de> wrote:
> > > * Make all the patches apply again.
> > > * rename read_pda to avoid conflicts with definitions in kernel <= 2.6.29
> > 
> > I'm going to apply these two changes, if you get time can you send a
> > patch to rename read_pda upstream as well, that way we don't have to
> > carry this?
> > 
> > > * add orinoco usb
> > 
> > Thanks for this but I've grown tired of updating these netdev ops and
> > I think we can do better. I'll add a netdev_attach_ops() which would
> > simply do all the backport stuff for us, this way for backporting
> > purposes all we have to do is replace the old lines with a
> > netdev_attach_ops() call. In fact if we *really* wanted to we could
> > add a dummy netdev_attach_ops() upstream and just backport that on
> > older kernels, this would mean 0 line changes to backport a newer
> > driver.
> > 
> > Something like this maybe on the generic compat module, it builds for
> > me, will commit soon.
> > 
> > /*
> >  * Expand this as drivers require more ops, for now this
> >  * only sets the ones we need.
> >  */
> > void netdev_attach_ops(struct net_device *dev,
> >                       const struct net_device_ops *ops)
> > {
> > #define SET_NETDEVOP(_op) (_op ? _op : NULL)
> >        dev->open = SET_NETDEVOP(ops->ndo_open);
> >        dev->stop = SET_NETDEVOP(ops->ndo_stop);
> >        dev->hard_start_xmit = SET_NETDEVOP(ops->ndo_start_xmit);
> >        dev->set_multicast_list = SET_NETDEVOP(ops->ndo_set_multicast_list);
> >        dev->change_mtu = SET_NETDEVOP(ops->ndo_change_mtu);
> >        dev->set_mac_address = SET_NETDEVOP(ops->ndo_set_mac_address);
> >        dev->tx_timeout = SET_NETDEVOP(ops->ndo_tx_timeout);
> >        dev->get_stats = SET_NETDEVOP(ops->ndo_get_stats);
> > #undef SET_NETDEVOP
> > }
> > EXPORT_SYMBOL(netdev_attach_ops);
> > 
> > For newer kernels then this would just be:
> > 
> > static inline void netdev_attach_ops(struct net_device *dev,
> >                       const struct net_device_ops *ops)
> > {
> >        dev->netdev_ops = ops;
> > }
> > 
> > Stephen, would the above be acceptable upstream on netdevice.h ? It
> > would eliminate all needs from having to #ifdef network drivers when
> > backporting. If so I can send a respective patch and spatch all the
> > setters I think. An example of the nasty ifdef crap we have to do for
> > the current backport of netdevop'able drivers is below.
> > 
> 
> No. supporting backporting is not part of the upstream kernel
> mission. Honestly, we try for forward compatibility but intentionally
> ignore carrying extra backport baggage.

Sure, understood, just had to try :), if only I could find a *good*
non-backport reason to have the netdev_attach_ops()...

  Luis

^ permalink raw reply

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Stephen Hemminger @ 2010-05-05  0:05 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, sbhatewara
In-Reply-To: <20100504230225.GP8323@vmware.com>

On Tue, 4 May 2010 16:02:25 -0700
Pankaj Thakkar <pthakkar@vmware.com> wrote:

> Device passthrough technology allows a guest to bypass the hypervisor and drive
> the underlying physical device. VMware has been exploring various ways to
> deliver this technology to users in a manner which is easy to adopt. In this
> process we have prepared an architecture along with Intel - NPA (Network Plugin
> Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
> passthrough to a number of physical NICs which support it. The document below
> provides an overview of NPA.
> 
> We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> Linux users can exploit the benefits provided by passthrough devices in a
> seamless manner while retaining the benefits of virtualization. The document
> below tries to answer most of the questions which we anticipated. Please let us
> know your comments and queries.
> 
> Thank you.
> 
> Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>


Code please. Also, it has to work for all architectures not just VMware and
Intel.

^ permalink raw reply

* Re: [PATCH] compat-wireless: updates for orinoco
From: Stephen Hemminger @ 2010-05-05  0:04 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Hauke Mehrtens, David Miller, linux-wireless, mcgrof, netdev
In-Reply-To: <v2u43e72e891005041626n14deaca5z284f2472e909c923@mail.gmail.com>

On Tue, 4 May 2010 16:26:53 -0700
"Luis R. Rodriguez" <lrodriguez@atheros.com> wrote:

> First of all, thanks a lot! Some comments below.
> 
> On Tue, May 4, 2010 at 3:40 PM, Hauke Mehrtens <hauke@hauke-m.de> wrote:
> > * Make all the patches apply again.
> > * rename read_pda to avoid conflicts with definitions in kernel <= 2.6.29
> 
> I'm going to apply these two changes, if you get time can you send a
> patch to rename read_pda upstream as well, that way we don't have to
> carry this?
> 
> > * add orinoco usb
> 
> Thanks for this but I've grown tired of updating these netdev ops and
> I think we can do better. I'll add a netdev_attach_ops() which would
> simply do all the backport stuff for us, this way for backporting
> purposes all we have to do is replace the old lines with a
> netdev_attach_ops() call. In fact if we *really* wanted to we could
> add a dummy netdev_attach_ops() upstream and just backport that on
> older kernels, this would mean 0 line changes to backport a newer
> driver.
> 
> Something like this maybe on the generic compat module, it builds for
> me, will commit soon.
> 
> /*
>  * Expand this as drivers require more ops, for now this
>  * only sets the ones we need.
>  */
> void netdev_attach_ops(struct net_device *dev,
>                       const struct net_device_ops *ops)
> {
> #define SET_NETDEVOP(_op) (_op ? _op : NULL)
>        dev->open = SET_NETDEVOP(ops->ndo_open);
>        dev->stop = SET_NETDEVOP(ops->ndo_stop);
>        dev->hard_start_xmit = SET_NETDEVOP(ops->ndo_start_xmit);
>        dev->set_multicast_list = SET_NETDEVOP(ops->ndo_set_multicast_list);
>        dev->change_mtu = SET_NETDEVOP(ops->ndo_change_mtu);
>        dev->set_mac_address = SET_NETDEVOP(ops->ndo_set_mac_address);
>        dev->tx_timeout = SET_NETDEVOP(ops->ndo_tx_timeout);
>        dev->get_stats = SET_NETDEVOP(ops->ndo_get_stats);
> #undef SET_NETDEVOP
> }
> EXPORT_SYMBOL(netdev_attach_ops);
> 
> For newer kernels then this would just be:
> 
> static inline void netdev_attach_ops(struct net_device *dev,
>                       const struct net_device_ops *ops)
> {
>        dev->netdev_ops = ops;
> }
> 
> Stephen, would the above be acceptable upstream on netdevice.h ? It
> would eliminate all needs from having to #ifdef network drivers when
> backporting. If so I can send a respective patch and spatch all the
> setters I think. An example of the nasty ifdef crap we have to do for
> the current backport of netdevop'able drivers is below.
> 

No. supporting backporting is not part of the upstream kernel
mission. Honestly, we try for forward compatibility but intentionally
ignore carrying extra backport baggage.

^ permalink raw reply

* Re: 2.6.34-rc6-git2: Reported regressions from 2.6.33
From: Linus Torvalds @ 2010-05-05  0:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Maciej Rutecki, Andrew Morton,
	Kernel Testers List, Network Development, Linux ACPI,
	Linux PM List, Linux SCSI List, Linux Wireless List, DRI
In-Reply-To: <JzEGxUyyQHG.A.ZtH.YHJ4LB@chimera>



On Tue, 4 May 2010, Rafael J. Wysocki wrote:
> 
> Unresolved regressions
> ----------------------
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15880
> Subject		: Very bad regression from 2.6.33 as of 1600f9def
> Submitter	: Alex Elsayed <eternaleye-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date		: 2010-04-29 2:28 (6 days old)
> Message-ID	: <loom.20100429T041908-663-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
> References	: http://marc.info/?l=linux-kernel&m=127250825306178&w=2

This looks like it wasn't a regression, but some other compile/install 
issue. See

	http://marc.info/?l=linux-kernel&m=127274294422719&w=2

where he reports that his self-compiled 2.6.33 doesn't boot either. 

There's some confusion about .config, but it might well be an install 
problem too (in fact, that sounds more likely - the original bug-report 
seems to reboot before the kernel has really even booted - it apparently 
hasn't done the graphics mode switch by the early bootloader)

		Linus

^ permalink raw reply

* Re: [PATCH] compat-wireless: updates for orinoco
From: Luis R. Rodriguez @ 2010-05-04 23:26 UTC (permalink / raw)
  To: Hauke Mehrtens, Stephen Hemminger, David Miller
  Cc: linux-wireless, mcgrof, netdev
In-Reply-To: <1273012850-8359-1-git-send-email-hauke@hauke-m.de>

First of all, thanks a lot! Some comments below.

On Tue, May 4, 2010 at 3:40 PM, Hauke Mehrtens <hauke@hauke-m.de> wrote:
> * Make all the patches apply again.
> * rename read_pda to avoid conflicts with definitions in kernel <= 2.6.29

I'm going to apply these two changes, if you get time can you send a
patch to rename read_pda upstream as well, that way we don't have to
carry this?

> * add orinoco usb

Thanks for this but I've grown tired of updating these netdev ops and
I think we can do better. I'll add a netdev_attach_ops() which would
simply do all the backport stuff for us, this way for backporting
purposes all we have to do is replace the old lines with a
netdev_attach_ops() call. In fact if we *really* wanted to we could
add a dummy netdev_attach_ops() upstream and just backport that on
older kernels, this would mean 0 line changes to backport a newer
driver.

Something like this maybe on the generic compat module, it builds for
me, will commit soon.

/*
 * Expand this as drivers require more ops, for now this
 * only sets the ones we need.
 */
void netdev_attach_ops(struct net_device *dev,
                      const struct net_device_ops *ops)
{
#define SET_NETDEVOP(_op) (_op ? _op : NULL)
       dev->open = SET_NETDEVOP(ops->ndo_open);
       dev->stop = SET_NETDEVOP(ops->ndo_stop);
       dev->hard_start_xmit = SET_NETDEVOP(ops->ndo_start_xmit);
       dev->set_multicast_list = SET_NETDEVOP(ops->ndo_set_multicast_list);
       dev->change_mtu = SET_NETDEVOP(ops->ndo_change_mtu);
       dev->set_mac_address = SET_NETDEVOP(ops->ndo_set_mac_address);
       dev->tx_timeout = SET_NETDEVOP(ops->ndo_tx_timeout);
       dev->get_stats = SET_NETDEVOP(ops->ndo_get_stats);
#undef SET_NETDEVOP
}
EXPORT_SYMBOL(netdev_attach_ops);

For newer kernels then this would just be:

static inline void netdev_attach_ops(struct net_device *dev,
                      const struct net_device_ops *ops)
{
       dev->netdev_ops = ops;
}

Stephen, would the above be acceptable upstream on netdevice.h ? It
would eliminate all needs from having to #ifdef network drivers when
backporting. If so I can send a respective patch and spatch all the
setters I think. An example of the nasty ifdef crap we have to do for
the current backport of netdevop'able drivers is below.

  Luis

> Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
> ---
>  config.mk                                 |    2 +
>  patches/01-netdev.patch                   |   51 +++++++++++++++++++++-----
>  patches/24-pcmcia.patch                   |   10 +++---
>  patches/27-hermes-read-pda-conflict.patch |   56 +++++++++++++++++++++++++++++
>  4 files changed, 104 insertions(+), 15 deletions(-)
>  create mode 100644 patches/27-hermes-read-pda-conflict.patch
>
> diff --git a/config.mk b/config.mk
> index 6a7c5c9..176c0af 100644
> --- a/config.mk
> +++ b/config.mk
> @@ -388,6 +388,8 @@ CONFIG_LIBERTAS_USB=m
>  NEED_LIBERTAS=y
>  endif
>
> +CONFIG_ORINOCO_USB=m
> +
>  endif # end of USB driver list
>
>  ifneq ($(CONFIG_SPI_MASTER),)
> diff --git a/patches/01-netdev.patch b/patches/01-netdev.patch
> index 01dbbce..51d12c4 100644
> --- a/patches/01-netdev.patch
> +++ b/patches/01-netdev.patch
> @@ -575,7 +575,7 @@ without creating a headache on maintenance of the pathes.
>        dev->tx_queue_len = 0;
>  --- a/drivers/net/wireless/orinoco/main.c
>  +++ b/drivers/net/wireless/orinoco/main.c
> -@@ -2078,6 +2078,7 @@ int orinoco_init(struct orinoco_private
> +@@ -2087,6 +2087,7 @@ int orinoco_init(struct orinoco_private
>  }
>  EXPORT_SYMBOL(orinoco_init);
>
> @@ -583,7 +583,7 @@ without creating a headache on maintenance of the pathes.
>  static const struct net_device_ops orinoco_netdev_ops = {
>        .ndo_open               = orinoco_open,
>        .ndo_stop               = orinoco_stop,
> -@@ -2089,6 +2090,7 @@ static const struct net_device_ops orino
> +@@ -2098,6 +2099,7 @@ static const struct net_device_ops orino
>        .ndo_tx_timeout         = orinoco_tx_timeout,
>        .ndo_get_stats          = orinoco_get_stats,
>  };
> @@ -591,12 +591,15 @@ without creating a headache on maintenance of the pathes.
>
>  /* Allocate private data.
>   *
> -@@ -2211,7 +2213,18 @@ int orinoco_if_add(struct orinoco_privat
> -
> -       /* Setup / override net_device fields */
> -       dev->ieee80211_ptr = wdev;
> +@@ -2227,10 +2229,21 @@ int orinoco_if_add(struct orinoco_privat
> +       dev->wireless_data = &priv->wireless_data;
> + #endif
> +       /* Default to standard ops if not set */
>  +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,29))
> -       dev->netdev_ops = &orinoco_netdev_ops;
> +       if (ops)
> +               dev->netdev_ops = ops;
> +       else
> +               dev->netdev_ops = &orinoco_netdev_ops;
>  +#else
>  +      dev->open = orinoco_open;
>  +      dev->stop = orinoco_stop;
> @@ -607,9 +610,37 @@ without creating a headache on maintenance of the pathes.
>  +      dev->tx_timeout = orinoco_tx_timeout;
>  +      dev->get_stats = orinoco_get_stats;
>  +#endif
> -       dev->watchdog_timeo = HZ; /* 1 second timeout */
> -       dev->wireless_handlers = &orinoco_handler_def;
> - #ifdef WIRELESS_SPY
> +
> +       /* we use the default eth_mac_addr for setting the MAC addr */
> +
> +--- a/drivers/net/wireless/orinoco/orinoco_usb.c
> ++++ b/drivers/net/wireless/orinoco/orinoco_usb.c
> +@@ -1566,6 +1566,7 @@ static const struct hermes_ops ezusb_ops
> +       .unlock_irq = ezusb_unlock_irq,
> + };
> +
> ++#if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,29))
> + static const struct net_device_ops ezusb_netdev_ops = {
> +       .ndo_open               = orinoco_open,
> +       .ndo_stop               = orinoco_stop,
> +@@ -1577,6 +1578,7 @@ static const struct net_device_ops ezusb
> +       .ndo_tx_timeout         = orinoco_tx_timeout,
> +       .ndo_get_stats          = orinoco_get_stats,
> + };
> ++#endif
> +
> + static int ezusb_probe(struct usb_interface *interface,
> +                      const struct usb_device_id *id)
> +@@ -1722,6 +1724,9 @@ static int ezusb_probe(struct usb_interf
> +               err("%s: orinoco_if_add() failed", __func__);
> +               goto error;
> +       }
> ++#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,29))
> ++      priv->ndev->hard_start_xmit = ezusb_xmit;
> ++#endif
> +       upriv->dev = priv->ndev;
> +
> +       goto exit;
>  --- a/net/bluetooth/bnep/netdev.c
>  +++ b/net/bluetooth/bnep/netdev.c
>  @@ -168,8 +168,12 @@ static inline int bnep_net_proto_filter(
> diff --git a/patches/24-pcmcia.patch b/patches/24-pcmcia.patch
> index 283b30d..3bc395d 100644
> --- a/patches/24-pcmcia.patch
> +++ b/patches/24-pcmcia.patch
> @@ -251,9 +251,9 @@
>        /* Register an interface with the stack */
>        if (orinoco_if_add(priv, link->io.BasePort1,
>  +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,35))
> -                          link->irq) != 0) {
> +                          link->irq, NULL) != 0) {
>  +#else
> -+                         link->irq.AssignedIRQ) != 0) {
> ++                         link->irq.AssignedIRQ, NULL) != 0) {
>  +#endif
>                printk(KERN_ERR PFX "orinoco_if_add() failed\n");
>                goto failed;
> @@ -285,14 +285,14 @@
>        if (ret)
>                goto failed;
>
> -@@ -359,7 +369,11 @@ spectrum_cs_config(struct pcmcia_device
> +@@ -360,7 +370,11 @@ spectrum_cs_config(struct pcmcia_device
>
>        /* Register an interface with the stack */
>        if (orinoco_if_add(priv, link->io.BasePort1,
>  +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,35))
> -                          link->irq) != 0) {
> +                          link->irq, NULL) != 0) {
>  +#else
> -+                         link->irq.AssignedIRQ) != 0) {
> ++                         link->irq.AssignedIRQ, NULL) != 0) {
>  +#endif
>                printk(KERN_ERR PFX "orinoco_if_add() failed\n");
>                goto failed;
> diff --git a/patches/27-hermes-read-pda-conflict.patch b/patches/27-hermes-read-pda-conflict.patch
> new file mode 100644
> index 0000000..fe6b181
> --- /dev/null
> +++ b/patches/27-hermes-read-pda-conflict.patch
> @@ -0,0 +1,56 @@
> +Rename read_pda to something else because this symbol is used in a
> +define for something else in arch/um/include/asm/pda.h on older kernels.
> +
> +--- a/drivers/net/wireless/orinoco/fw.c
> ++++ b/drivers/net/wireless/orinoco/fw.c
> +@@ -122,7 +122,7 @@ orinoco_dl_firmware(struct orinoco_priva
> +       dev_dbg(dev, "Attempting to download firmware %s\n", firmware);
> +
> +       /* Read current plug data */
> +-      err = hw->ops->read_pda(hw, pda, fw->pda_addr, fw->pda_size);
> ++      err = hw->ops->read_pda_h(hw, pda, fw->pda_addr, fw->pda_size);
> +       dev_dbg(dev, "Read PDA returned %d\n", err);
> +       if (err)
> +               goto free;
> +@@ -224,7 +224,7 @@ symbol_dl_image(struct orinoco_private *
> +               if (!pda)
> +                       return -ENOMEM;
> +
> +-              ret = hw->ops->read_pda(hw, pda, fw->pda_addr, fw->pda_size);
> ++              ret = hw->ops->read_pda_h(hw, pda, fw->pda_addr, fw->pda_size);
> +               if (ret)
> +                       goto free;
> +       }
> +--- a/drivers/net/wireless/orinoco/hermes.c
> ++++ b/drivers/net/wireless/orinoco/hermes.c
> +@@ -765,7 +765,7 @@ static const struct hermes_ops hermes_op
> +       .write_ltv = hermes_write_ltv,
> +       .bap_pread = hermes_bap_pread,
> +       .bap_pwrite = hermes_bap_pwrite,
> +-      .read_pda = hermes_read_pda,
> ++      .read_pda_h = hermes_read_pda,
> +       .program_init = hermesi_program_init,
> +       .program_end = hermesi_program_end,
> +       .program = hermes_program_bytes,
> +--- a/drivers/net/wireless/orinoco/hermes.h
> ++++ b/drivers/net/wireless/orinoco/hermes.h
> +@@ -393,7 +393,7 @@ struct hermes_ops {
> +                        u16 id, u16 offset);
> +       int (*bap_pwrite)(struct hermes *hw, int bap, const void *buf,
> +                         int len, u16 id, u16 offset);
> +-      int (*read_pda)(struct hermes *hw, __le16 *pda,
> ++      int (*read_pda_h)(struct hermes *hw, __le16 *pda,
> +                       u32 pda_addr, u16 pda_len);
> +       int (*program_init)(struct hermes *hw, u32 entry_point);
> +       int (*program_end)(struct hermes *hw);
> +--- a/drivers/net/wireless/orinoco/orinoco_usb.c
> ++++ b/drivers/net/wireless/orinoco/orinoco_usb.c
> +@@ -1556,7 +1556,7 @@ static const struct hermes_ops ezusb_ops
> +       .read_ltv = ezusb_read_ltv,
> +       .write_ltv = ezusb_write_ltv,
> +       .bap_pread = ezusb_bap_pread,
> +-      .read_pda = ezusb_read_pda,
> ++      .read_pda_h = ezusb_read_pda,
> +       .program_init = ezusb_program_init,
> +       .program_end = ezusb_program_end,
> +       .program = ezusb_program,
> --
> 1.7.0.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH] bonding: fix arp_validate on bonds inside a bridge
From: David Miller @ 2010-05-04 23:18 UTC (permalink / raw)
  To: fubar; +Cc: jbohac, bonding-devel, netdev
In-Reply-To: <17907.1272935182@death.nxdomain.ibm.com>

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Mon, 03 May 2010 18:06:22 -0700

> 	Tested and it looks to work as advertised.  I see only one minor
> nit, there's a pr_debug that missed being renamed to the new function
> name; here's the whole patch with that fixed.

I don't think you need the ugly arp hook.

Instead, it's much cleaner to provide a way for packet type taps to
see the packet before bridge et al. decapsulation.  In fact this makes
a lot of sense, wanting to see the device as __netif_receive_skb() saw
it, with no changes whatsoever.

In fact ptype_all runs before bridging, ING, and MACVLAN decap the
thing, so we could have a 'ptype_base_predecap[]' that we run over
right after those.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox