Netdev List
 help / color / mirror / Atom feed
* [PATCH] staging: octeon-ethernet: remove .get_sg, etc. ethtool_ops
From: Michał Mirosław @ 2011-04-20 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Greg Kroah-Hartman, devel

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/staging/octeon/ethernet-mdio.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/octeon/ethernet-mdio.c b/drivers/staging/octeon/ethernet-mdio.c
index 10a82ef..8a11ffc 100644
--- a/drivers/staging/octeon/ethernet-mdio.c
+++ b/drivers/staging/octeon/ethernet-mdio.c
@@ -91,8 +91,6 @@ const struct ethtool_ops cvm_oct_ethtool_ops = {
 	.set_settings = cvm_oct_set_settings,
 	.nway_reset = cvm_oct_nway_reset,
 	.get_link = ethtool_op_get_link,
-	.get_sg = ethtool_op_get_sg,
-	.get_tx_csum = ethtool_op_get_tx_csum,
 };
 
 /**
-- 
1.7.2.5


^ permalink raw reply related

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Christoph Lameter @ 2011-04-20 14:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pekka Enberg, casteyde.christian, Andrew Morton, netdev,
	bugzilla-daemon, bugme-daemon, Vegard Nossum
In-Reply-To: <1303293765.3186.74.camel@edumazet-laptop>

On Wed, 20 Apr 2011, Eric Dumazet wrote:

> > Then, just disable SLUB_CMPXCHG_DOUBLE if KMEMCHECK is defined, as I did in my first patch.

Ok your first patch seems to be the sanest approach.

>  {
> @@ -1889,16 +1895,18 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
>  	struct kmem_cache_cpu *c;
>  #ifdef CONFIG_CMPXCHG_LOCAL
>  	unsigned long tid;
> -#else
> +#endif
> +#ifdef MASK_IRQ_IN_SLAB_ALLOC
>  	unsigned long flags;
>  #endif
>

Yea well that does not bring us much.

^ permalink raw reply

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Christoph Lameter @ 2011-04-20 14:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, netdev, bugzilla-daemon, bugme-daemon,
	casteyde.christian, Vegard Nossum, Pekka Enberg
In-Reply-To: <1303275858.2756.8.camel@edumazet-laptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1001 bytes --]

On Wed, 20 Apr 2011, Eric Dumazet wrote:

> Le mardi 19 avril 2011 à 16:18 -0500, Christoph Lameter a écrit :
> > On Tue, 19 Apr 2011, Eric Dumazet wrote:
> >
> > > > Ugg.. Isnt there some way to indicate to kmemcheck that a speculative
> > > > access is occurring?
> > >
> > > Yes, here is a totally untested patch (only compiled here), to keep
> > > kmemcheck & cmpxchg_double together.
> >
> > Ok looks somewhat saner. Why does DEBUG_PAGEALLOC need this? Pages are
> > never freed as long as a page is frozen and a page needs to be frozen to
> > be allocated from. I dont see how we could reference a free page.
>
> We run lockless and IRQ enabled, are you sure we cannot at this point :
>
> CPU0 - Receive an IRQ
> CPU0 - Allocate this same object consume the page(s) containing this object
> CPU0 - Pass this object/memory to another cpu1
> CPU1 - Free memory and free the page(s)
> CPU0 - Return from IRQ
> CPU0 - Access now unreachable memory ?

True this can occur.

^ permalink raw reply

* Re: Add NAPI support to ll_temac driver
From: Ben Hutchings @ 2011-04-20 12:47 UTC (permalink / raw)
  To: monstr; +Cc: Eric Dumazet, netdev
In-Reply-To: <4DAEBE44.4060801@monstr.eu>

On Wed, 2011-04-20 at 13:06 +0200, Michal Simek wrote:
[...]
> I have measured TX path and I have found that driver design is not so good.
> It is always create one BD for one SKB and it starts DMA to copy packet to 
> controller and send it.

You will always get a single packet at a time to push to the hardware.
The only improvement you can make on this is to implement segmentation
offload, but if the hardware doesn't do this then... well, it's not
easy.

> On 66MHz cpu it takes approximately 800 cpu cycles (not 
> 800 instructions) for sending (1.5k packet).
> Current driver also enable irq for TX and when the packet is send interrupt is 
> generated and skb is freed.
> I see that it takes more time to handle the IRQ than busy waiting when DMA is 
> done. I looked at sfc driver and there is any TX queue and any notifier. Hos 
> does it work? Is it required to have any hw support?

The principle of NAPI is that once you receive an IRQ you mask it and
poll until there are no more completions to handle.  So it greatly
reduces this IRQ overhead.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH 3/3] IPVS: init and cleanup restructuring.
From: Hans Schillstrom @ 2011-04-20 12:00 UTC (permalink / raw)
  To: Simon Horman
  Cc: ja, ebiederm, lvs-devel, netdev, netfilter-devel,
	hans.schillstrom
In-Reply-To: <20110419231234.GE6418@verge.net.au>

On Wednesday, April 20, 2011 01:12:34 Simon Horman wrote:
> On Tue, Apr 19, 2011 at 05:25:05PM +0200, Hans Schillstrom wrote:
> > This patch tries to restore the initial init and cleanup
> > sequences that was before name space patch.
[snip]
> perhaps enable or active would be names that fits better with the
> schemantics used.  Using a bool might also make things more obvious.

I'll use enable
> 
[snip]
> 
> Can we just remove ip_vs_app_init() and ip_vs_app_cleanup() as
> they no longer do anything? Likewise with other init and cleanup
> functions below.

I will add a "final" patch that removes empty functions,
(They are nice to have during the review, to keep track of the order in different contexts)

> 
> > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> > index 36cd5ea..f8d6702 100644
> > --- a/net/netfilter/ipvs/ip_vs_conn.c
> > +++ b/net/netfilter/ipvs/ip_vs_conn.c
> > @@ -1251,30 +1251,30 @@ int __net_init __ip_vs_conn_init(struct net *net)
> >  {
> >  	struct netns_ipvs *ipvs = net_ipvs(net);
> > 
> > +	EnterFunction(2);
> >  	atomic_set(&ipvs->conn_count, 0);
> > 
> >  	proc_net_fops_create(net, "ip_vs_conn", 0, &ip_vs_conn_fops);
> >  	proc_net_fops_create(net, "ip_vs_conn_sync", 0, &ip_vs_conn_sync_fops);
> > +	LeaveFunction(2);
> >  	return 0;
> >  }
> 
> Does adding these EnterFunction() and LeaveFunction() calls
> restore some previous behaviour? If not, I think they should at the very
> least be in a separate patch. Likewise for similar changes below.
> 

I can remove them if you want, (but they are nice for debugging)

[snip]
> 
> While I do prefer labels to be in column 0, putting those changes
> here is rather a lot of noise. Could you put them in a separate patch?

OK it will be patch no 1 later on

Regards
Hans

^ permalink raw reply

* [PATCH v2] can: add missing socket check in can/raw release
From: Oliver Hartkopp @ 2011-04-20 11:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dave Jones

v2: added space after 'if' according code style.

We can get here with a NULL socket argument passed from userspace,
so we need to handle it accordingly.

Thanks to Dave Jones pointing at this issue in net/can/bcm.c

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>

---

diff --git a/net/can/raw.c b/net/can/raw.c
index 649acfa..8f215e6 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -305,7 +305,12 @@ static int raw_init(struct sock *sk)
 static int raw_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
-	struct raw_sock *ro = raw_sk(sk);
+	struct raw_sock *ro;
+
+	if (!sk)
+		return 0;
+
+	ro = raw_sk(sk);

 	unregister_netdevice_notifier(&ro->notifier);


^ permalink raw reply related

* can: add missing socket check in can/raw release.
From: Oliver Hartkopp @ 2011-04-20 11:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dave Jones

We can get here with a NULL socket argument passed from userspace,
so we need to handle it accordingly.

Thanks to Dave Jones pointing at this issue in net/can/bcm.c

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>

---

diff --git a/net/can/raw.c b/net/can/raw.c
index 649acfa..8f215e6 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -305,7 +305,12 @@ static int raw_init(struct sock *sk)
 static int raw_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
-	struct raw_sock *ro = raw_sk(sk);
+	struct raw_sock *ro;
+
+	if(!sk)
+		return 0;
+
+	ro = raw_sk(sk);

 	unregister_netdevice_notifier(&ro->notifier);


^ permalink raw reply related

* Re: [patch net-next-2.6] bonding: move processing of recv handlers into handle_frame()
From: Jiri Pirko @ 2011-04-20 11:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, davem, shemminger, kaber, fubar, nicolas.2p.debian, andy
In-Reply-To: <1303221752.3480.217.camel@edumazet-laptop>

Tue, Apr 19, 2011 at 04:02:32PM CEST, eric.dumazet@gmail.com wrote:
>Le mardi 19 avril 2011 à 15:48 +0200, Jiri Pirko a écrit :
>> Since now when bonding uses rx_handler, all traffic going into bond
>> device goes thru bond_handle_frame. So there's no need to go back into
>> bonding code later via ptype handlers. This patch converts
>> original ptype handlers into "bonding receive probes". These functions
>> are called from bond_handle_frame and they are registered per-mode.
>> 
>> Note that vlan packets are also handled because they are always untagged
>> thanks to vlan_untag()
>> 
>> Note that this also allows arpmon for eth-bond-bridge-vlan topology.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>
>
>>  			}
>> diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>> index 6126c6a..85fb822 100644
>> --- a/drivers/net/bonding/bonding.h
>> +++ b/drivers/net/bonding/bonding.h
>> @@ -226,6 +226,8 @@ struct bonding {
>>  	struct   slave *primary_slave;
>>  	bool     force_primary;
>>  	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
>> +	void     (*recv_probe)(struct sk_buff *, struct bonding *,
>> +			       struct slave *);
>>  	rwlock_t lock;
>>  	rwlock_t curr_slave_lock;
>>  	s8       kill_timers;
>
>
>Any performance numbers ?

Getting these. (Hitting nasty panic on another machine running pktgen @
vlan dev).

>
>I am asking because recv_probe sits in a often dirtied cache line...
>
>It would be nice to separate rx & tx path needs
>
>
>

^ permalink raw reply

* Re: Add NAPI support to ll_temac driver
From: Michal Simek @ 2011-04-20 11:06 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Eric Dumazet, netdev
In-Reply-To: <1303218898.3464.59.camel@localhost>

Hi,

Ben Hutchings wrote:
> On Tue, 2011-04-19 at 14:48 +0200, Michal Simek wrote:
>> Ben Hutchings wrote:
>>> On Tue, 2011-04-19 at 12:43 +0200, Eric Dumazet wrote:
>>> [...]
>>>> One possible way to get better performance is to change driver to
>>>> allocate skbs only right before calling netif_rx(), so that you dont
>>>> have to access cold sk_buff data twice (once when allocating skb and put
>>>> it in ring buffer, a second time when receiving frame)
>>>>
>>>> drivers/net/niu.c is a good example for this (NAPI + netdev_alloc_skb()
>>>> just in time + pull in skbhead only first cache line of packet)
>>> [...]
>>>
>>> If the hardware can do RX checksumming (it's not clear) then the driver
>>> should pass the paged buffers into GRO and that will take care of skb
>>> allocation as necessary.
>> Hardware supports RX and TX partial checksumming. I can enable it. The driver 
>> has also this option and from my tests there is of course some performance 
>> improvemetn.
>>
>> Just for sure - here are links on documentation.
>> http://www.xilinx.com/support/documentation/ip_documentation/xps_ll_temac.pdf
>> or
>> http://www.xilinx.com/support/documentation/ip_documentation/axi_ethernet/v2_01_a/ds759_axi_ethernet.pdf
> 
> I'm not going to read those.  Just providing brief advice.
> 
>> About SKB allocation. I fixed our non mainline driver to allocate skb based on 
>> current mtu size. Mainline driver allocate max mtu (9k). This has also impact on 
>> performance because Microblaze works with smaller SKBs.
>>
>> Can you please be more specific about passing the paged buffers into GRO?
>> Or point me to any documentation or code which can help me to understand what 
>> that means.
> 
> You would use napi_get_frags() to get a new or recycled skb, fill in
> skb->frags, then call napi_gro_frags() to pass it into GRO.  The benet,
> cxgb3 and sfc drivers do this.

I have measured TX path and I have found that driver design is not so good.
It is always create one BD for one SKB and it starts DMA to copy packet to 
controller and send it. On 66MHz cpu it takes approximately 800 cpu cycles (not 
800 instructions) for sending (1.5k packet).
Current driver also enable irq for TX and when the packet is send interrupt is 
generated and skb is freed.
I see that it takes more time to handle the IRQ than busy waiting when DMA is 
done. I looked at sfc driver and there is any TX queue and any notifier. Hos 
does it work? Is it required to have any hw support?

Thanks,
Michal



-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply

* Re: [RFC v3 5/6] j1939: rename NAME to UUID?
From: Oliver Hartkopp @ 2011-04-20 10:59 UTC (permalink / raw)
  To: Kurt Van Dijck
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20110420072439.GB332-ozGf4kBk5synFtIcQ8t7k3L8HoS0Hn3T@public.gmane.org>

On 20.04.2011 09:24, Kurt Van Dijck wrote:
> On Fri, Apr 15, 2011 at 07:57:25PM +0200, Oliver Hartkopp wrote:
>> On 13.04.2011 06:49, Kurt Van Dijck wrote:
>>> Oliver et.al.,
>>
>> Thinking about the approach to implement the j1939 address claiming (AC) in
>> userspace, i discovered two ways which could both be hidden inside some
>> easy-to-use helper functions:
>>
>> 1. implement a thread (e.g. within a library) which opens a CAN_RAW socket on
>> a specific CAN-interface and takes care of the AC procedure and monitors
>> ongoing AC procedures on the bus. In this case every j1939 application
>> requiring AC internally would monitor all the AC handling on itself (which
>> should be no general problem - written only once).
>>
>> 2. create j1939ac daemon(s) using PF_UNIX-sockets to be named e.g.
>> j1939ac_can0, j1939ac_can1, etc. - these daemons take care for all AC
>> requirements of the host it is running on. The PF_UNIX-sockets are used in
>> SOCK_DGRAM mode and only the j1939 processes that need AC can then register
>> their NAME by sending a request datagram, and get back the j1939-address once
>> it is claimed (and all the updates on changes). As the j1939ac daemons are
>> running on the same host as the j1939 application processes, optional the
>> process' PID could be provided to the daemon during the registering process,
>> so that the daemon can send a signal to a signal handler of the application
>> process (if you would like to omit the select() syscall to handle both the
>> j1939 and PF_UNIX sockets).
>>
>> ->   <Req><Name="A3B5667799332242" PID="12345">
>> <-   <Resp><ACState="claimed" Name="A3B5667799332242" Address="1B">
>> (some time)
>> <-   <Resp><ACState="changed" Name="A3B5667799332242" Address="1C">
>>
>> This is a sketch that could be put into simple C-structs that are sent via the
>> PF_UNIX DGRAM socket.
>>
>> In all suggested cases (using a thread, daemon with/without signal) the AC
>> procedure can be managed in userspace without real pain.
> I seriously doubt this statement.
> define 'real pain'.

The same pain as writing a DNS or DHCP daemon which is doing a similar job ...

>> But especially with
>> less pain than putting the AC process into kernelspace and provide your
>> suggested socket API with bind/connect/... in very different manners.
> * You're only counting LOC in kernel, and not in userspace.

Yes! Things that can be left out of the kernel, should be implemented and
maintained in userspace - at least to reduce complexity and potential security
issues.

> * The constructs you present create some kind of infrastructure. You will
>   need a torough documentation of the 'good practice' since that's crucial
>   to the correct operation of the stack. Did you ever consider the impact
>   on the userspace application side.

Yes.

>   Remember that userspace programs should be easy to write.
>   I found your proposals impact application development very hard.

define 'very hard' ;-)

> I still think my passive support in kernel performs better and gives an
> easier API to get things done with.

Kurt, the problem for me is, that you constantly state that your approach is
the best. For me it is not.

The major issue in your implementation is the lack of the possibility to
simulate several j1939 ECUs on one Linux host talking to each other via
virtual CAN busses to create a complete j1939 network. And so far you did not
address this request.

There are several j1939 implementations that are running (or can be made
running) completely in userspace using raw-sockets. Obviously many people are
convenient with these implementations.

I'm fine to place the j1939 data transfer part (supporting the segmented
transfer of long j1939 PDUs) into the kernel - but not all the address
claiming and the binding of j1939 addresses to network interfaces that also
kills the requested feature of simulating a complete j1939 network.

As an amicable approach i would suggest to proceed in two steps:

1. post and mainline the j1939 bits that deal with the data transfer only
   - no address claiming / j1939 name handling
   - no binding of j1939 addresses to CAN network devices
   - no extensions in the current af_can.c
   - etc.

2. discuss the implementation of the (optional) j1939 address claiming

For me this approach makes sense for j1939 newbies and also experienced j1939
users that may become interested in the Linux mainline implementation.

Then - discussing with a larger number of potential and real j1939 users - we
should face the address claiming and its possible implementation options.

Best regards,
Oliver

^ permalink raw reply

* Re: [PATCH BUG-FIX] ipv6: udp: fix the wrong headroom check
From: Herbert Xu @ 2011-04-20 10:50 UTC (permalink / raw)
  To: Shan Wei
  Cc: kuznet, David Miller, pekkas, jmorris,
	yoshfuji@linux-ipv6.org >> YOSHIFUJI Hideaki,
	Patrick McHardy, netdev
In-Reply-To: <4DAE9EE1.1050405@cn.fujitsu.com>

On Wed, Apr 20, 2011 at 04:52:49PM +0800, Shan Wei wrote:
> At this point, skb->data points to skb_transport_header.
> So, headroom check is wrong. 
> 
> For some case:bridge(UFO is on) + eth device(UFO is off),
> there is no enough headroom for IPv6 frag head.
> But headroom check is always false.
> 
> This will bring about data be moved to there prior to skb->head,
> when adding IPv6 frag header to skb.
> 
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>

Ouch.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/3] IPVS: init and cleanup restructuring.
From: Hans Schillstrom @ 2011-04-20 10:41 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: horms, ebiederm, lvs-devel, netdev, netfilter-devel,
	hans.schillstrom
In-Reply-To: <alpine.LFD.2.00.1104200207340.1724@ja.ssi.bg>

On Wednesday, April 20, 2011 01:19:38 Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Tue, 19 Apr 2011, Hans Schillstrom wrote:
> 
> > This patch tries to restore the initial init and cleanup
> > sequences that was before name space patch.
> > 
> > The number of calls to register_pernet_device have been
> > reduced to one for the ip_vs.ko
> > Schedulers still have their own calls.
> > 
> > This patch adds a function __ip_vs_service_cleanup()
> > and a throttle or actually on/off switch for
> > the netfilter hooks.
> > 
> > The nf hooks will be enabled when the first service is loaded
> > and disabled when the last service is removed or when a
> > name space exit starts.
> 
> 	For me using _net suffix is more clear compared
> to __ prefix for the pernet methods.
> 
Sure I'll do that.

> 	For ip_vs_in: may be we can move the check here:
> 
> +	net = skb_net(skb);
> +	if (net->ipvs->throttle)
> +		return NF_ACCEPT;
>  	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
>  
>  	/* Bad... Do not break raw sockets */

Done

> 
> 	It will save such checks later in ICMP funcs. But this
> throttle idea looks dangerous for cleanup. It does not use RCU.
> The readers can cache the 0 in throttle for long time.
> May be by using register_pernet_device we are in list with other
> devices and it is still possible some device used by
> our dst_cache to be unregistered before IPVS or we to be
> unregistered before such device and some race with throttle
> to happen. throttle is good when enabling traffic with
> the first virtual service, later it can slowly stop the traffic
> but we can not rely on it during netns cleanup.

OK , I will revert back to register_pernet_subsys(),
and use one single register_pernet_device()  exit hook for :
 - the throttle to disable traffic
 - stop the kernel threads.

> 
> 	So, there are 2 problems with the devices:
> 
> - if we use _device pernet registration we can see packets
> in our netns during cleanup. I assume this is possible
> when IPVS is unregistered before such devices.
> 
> - dests can cache dst and to hold the device after it is
> unregistered in netns, obviously for very short time until
> IPVS is later unregistered from netns. And for long time
> if device is unregistered but netns remains.
> 
> 	Also, in most of the cases svc->refcnt is above 0 because
> dests can be in trash list. You should be lucky to delete the
> service without any connections, only then ip_vs_svc_unhash can
> see refcnt == 0.
> 
> 	So, may be we have to use register_pernet_subsys (not
> _device). We need just to register notifier with
> register_netdevice_notifier and to catch NETDEV_UNREGISTER,
> so that if any dest uses this device we have to release the dst:
> 
I made a quick test of the concept and it seems to work, 
(A mix of new and old connections at 1GB/s into 4 namespaces when killing them all)

> 	- lock mutex
> 	- for every dest (also in trash):
> 	spin_lock_bh(&dest->dst_lock);
> 	if (dest->dst_cache && dest->dst_cache->dev == unregistered_dev)
> 		ip_vs_dst_reset(dest);
> 	spin_unlock_bh(&dest->dst_lock);


Here is a what i did

static inline void
__ip_vs_dev_reset(struct ip_vs_dest *dest, struct net_device *dev)
{
	spin_lock_bh(&dest->dst_lock);
	if (dest->dst_cache && dest->dst_cache->dev == dev)
		ip_vs_dst_reset(dest);
	spin_unlock_bh(&dest->dst_lock);
}

static void ip_vs_svc_reset(struct ip_vs_service *svc, struct net_device *dev)
{
	struct ip_vs_dest *dest;

	list_for_each_entry(dest, &svc->destinations, n_list) {
		__ip_vs_dev_reset(dest, dev);
	}
}

static int ip_vs_dst_event(struct notifier_block *this, unsigned long event,
			    void *ptr)
{
	struct net_device *dev = ptr;
	struct net *net = dev_net(dev);
	struct ip_vs_service *svc;
	struct ip_vs_dest *dest;
	unsigned int idx;

	if (event != NETDEV_UNREGISTER)
		return NOTIFY_DONE;

	EnterFunction(2);
	mutex_lock(&__ip_vs_mutex);
	for (idx = 0; idx<IP_VS_SVC_TAB_SIZE; idx++) {
		write_lock_bh(&__ip_vs_svc_lock);
		list_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) {
			if (net_eq(svc->net, net))
				ip_vs_svc_reset(svc, dev);
		}
		list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
			if (net_eq(svc->net, net))
				ip_vs_svc_reset(svc, dev);
		}
		write_unlock_bh(&__ip_vs_svc_lock);
	}

	list_for_each_entry(dest, &net_ipvs(net)->dest_trash, n_list) {
		__ip_vs_dev_reset(dest, dev);
	}
	mutex_unlock(&__ip_vs_mutex);
	LeaveFunction(2);
	return NOTIFY_DONE;
}

static struct notifier_block ip_vs_dst_notifier = {
	.notifier_call = ip_vs_dst_event,
};


int __init ip_vs_control_init(void)
...
	at the end
	ret = register_netdevice_notifier(&ip_vs_dst_notifier);
...

and an unregister_ of course.


> 
> 	There are many examples for this, eg. net/core/fib_rules.c
> Then we are sure on cleanup we can not see traffic for our net
> because all devices are unregistered before us. We don't have
> to rely on throttle to stop the traffic during cleanup. And
> we do not hold devices after NETDEV_UNREGISTER.
> 
> 	I can prepare such patch but in next days. We need such
> code anyways because the dests can hold such dsts when no
> traffic is present and we can see again this "waiting for %s" ...
> message.
> 
> 	throttle still can be used but now it can not stop
> the traffic if connections exist.
> 
> 	For __ip_vs_service_cleanup: it still has to use mutex.

OK I will try to use unlock methods, if it doesn't work use the mutex.

> Or we can avoid it by introducing ip_vs_unlink_service_nolock:
> ip_vs_flush will look like your __ip_vs_service_cleanup and
> will call ip_vs_unlink_service_nolock. ip_vs_unlink_service_nolock
> will be called by ip_vs_flush and by ip_vs_unlink_service.
> You can try such changes, if not, I'll prepare some patches
> after 2-3 days.
> 

Regards
Hans

^ permalink raw reply

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Eric Dumazet @ 2011-04-20 10:02 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: casteyde.christian, Christoph Lameter, Andrew Morton, netdev,
	bugzilla-daemon, bugme-daemon, Vegard Nossum
In-Reply-To: <1303290464.3186.32.camel@edumazet-laptop>

Le mercredi 20 avril 2011 à 11:07 +0200, Eric Dumazet a écrit :
> I had one splat (before applying any patch), adding CONFIG_PREEMPT to my
> build :
> 
> [   84.629936] WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88011a7376f0)
> [   84.629939] 5868ba1a0188ffff5868ba1a0188ffff7078731a0188ffffe0e1181a0188ffff
> [   84.629952]  i i i i i i i i i i i i i i i i u u u u u u u u u u u u u u u u
> [   84.629963]                                  ^
> [   84.629964] 
> [   84.629966] Pid: 2060, comm: 05-wait_for_sys Not tainted 2.6.39-rc4-00237-ge3de956-dirty #550 HP ProLiant BL460c G6
> [   84.629969] RIP: 0010:[<ffffffff810f0e80>]  [<ffffffff810f0e80>] kmem_cache_alloc+0x50/0x190
> [   84.629977] RSP: 0018:ffff88011a0b9988  EFLAGS: 00010286
> [   84.629979] RAX: 0000000000000000 RBX: ffff88011a739a98 RCX: 0000000000620780
> [   84.629980] RDX: 0000000000620740 RSI: 0000000000017400 RDI: ffffffff810db8f5
> [   84.629982] RBP: ffff88011a0b99b8 R08: ffff88011a261dd8 R09: 0000000000000009
> [   84.629983] R10: 0000000000000002 R11: ffff88011fffce00 R12: ffff88011b00a600
> [   84.629985] R13: ffff88011a7376f0 R14: 00000000000000d0 R15: ffff88011abbb0c0
> [   84.629987] FS:  0000000000000000(0000) GS:ffff88011fc00000(0063) knlGS:00000000f76ff6c0
> [   84.629989] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> [   84.629991] CR2: ffff88011998dab8 CR3: 000000011a13f000 CR4: 00000000000006f0
> [   84.629992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   84.629994] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
> [   84.629995]  [<ffffffff810db8f5>] anon_vma_prepare+0x55/0x190
> [   84.630000]  [<ffffffff810cfe90>] handle_pte_fault+0x470/0x6e0
> [   84.630002]  [<ffffffff810d1654>] handle_mm_fault+0x124/0x1a0
> [   84.630005]  [<ffffffff8147a510>] do_page_fault+0x160/0x560
> [   84.630008]  [<ffffffff81477fcf>] page_fault+0x1f/0x30
> [   84.630013]  [<ffffffff810b31f8>] generic_file_aio_read+0x528/0x740
> [   84.630017]  [<ffffffff810f5461>] do_sync_read+0xd1/0x110
> [   84.630021]  [<ffffffff810f5c36>] vfs_read+0xc6/0x160
> [   84.630023]  [<ffffffff810f60b0>] sys_read+0x50/0x90
> [   84.630025]  [<ffffffff8147f4e9>] sysenter_dispatch+0x7/0x27
> [   84.630030]  [<ffffffffffffffff>] 0xffffffffffffffff
> 
> ffffffff810f0e6d:       0f 84 b5 00 00 00       je     ffffffff810f0f28 <kmem_cache_alloc+0xf8>
> ffffffff810f0e73:       49 63 44 24 20          movslq 0x20(%r12),%rax
> ffffffff810f0e78:       48 8d 4a 40             lea    0x40(%rdx),%rcx
> ffffffff810f0e7c:       49 8b 34 24             mov    (%r12),%rsi
> >>ffffffff810f0e80:       49 8b 5c 05 00          mov    0x0(%r13,%rax,1),%rbx
> ffffffff810f0e85:       4c 89 e8                mov    %r13,%rax
> ffffffff810f0e88:       e8 83 19 0e 00          callq  ffffffff811d2810 <this_cpu_cmpxchg16b_emu>
> ffffffff810f0e8d:       0f 1f 40 00             nopl   0x0(%rax)
> ffffffff810f0e91:       84 c0                   test   %al,%al
> ffffffff810f0e93:       74 c1                   je     ffffffff810f0e56 <kmem_cache_alloc+0x26>
> 
> 
> So this is definitely the access to object->next that triggers the fault.
> 
> Problem is : my (2nd) patch only reduces the window of occurrence of the bug, since we can
> be preempted/interrupted between the kmemcheck_mark_initialized() and access to object->next :
> 
> The preempt/irq could allocate the object and free it again.
> 
> So KMEMCHECK would require us to block IRQ to be safe.
> 
> Then, just disable SLUB_CMPXCHG_DOUBLE if KMEMCHECK is defined, as I did in my first patch.
> 
> Christoph, please reconsider first version of patch (disabling SLUB_CMPXCHG_DOUBLE if
> either KMEMCHECK or DEBUG_PAGEALLOC is defined)

Or this alternate one : keep cmpxchg_double() stuff but block IRQ in
slab_alloc() if KMEMCHECK or DEBUG_PAGEALLOC

I tested it and got no kmemcheck splat

Thanks

[PATCH] slub: block IRQ in slab_alloc() if KMEMCHECK or DEBUG_PAGEALLOC

Christian Casteyde reported a KMEMCHECK splat in slub code.

Problem is now we are lockless and allow IRQ in slab_alloc, the object
we manipulate from freelist can be allocated and freed right before we
try to read object->next

Same problem can happen with DEBUG_PAGEALLOC

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Christoph Lameter <cl@linux.com>
---
 mm/slub.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 94d2a33..9c34a2b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1881,7 +1881,13 @@ debug:
  * If not then __slab_alloc is called for slow processing.
  *
  * Otherwise we can simply pick the next object from the lockless free list.
+ * Since we access object->next, we must disable irqs if KMEMCHECK or
+ * DEBUG_PAGEALLOC is defined, since object could be allocated and freed
+ * under us.
  */
+#if !defined(CONFIG_CMPXCHG_LOCAL) || defined(CONFIG_KMEMCHECK) || defined(CONFIG_DEBUG_PAGEALLOC)
+#define MASK_IRQ_IN_SLAB_ALLOC
+#endif
 static __always_inline void *slab_alloc(struct kmem_cache *s,
 		gfp_t gfpflags, int node, unsigned long addr)
 {
@@ -1889,16 +1895,18 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
 	struct kmem_cache_cpu *c;
 #ifdef CONFIG_CMPXCHG_LOCAL
 	unsigned long tid;
-#else
+#endif
+#ifdef MASK_IRQ_IN_SLAB_ALLOC
 	unsigned long flags;
 #endif
 
 	if (slab_pre_alloc_hook(s, gfpflags))
 		return NULL;
 
-#ifndef CONFIG_CMPXCHG_LOCAL
+#ifdef MASK_IRQ_IN_SLAB_ALLOC
 	local_irq_save(flags);
-#else
+#endif
+#ifdef CONFIG_CMPXCHG_LOCAL
 redo:
 #endif
 
@@ -1954,7 +1962,7 @@ redo:
 		stat(s, ALLOC_FASTPATH);
 	}
 
-#ifndef CONFIG_CMPXCHG_LOCAL
+#ifdef MASK_IRQ_IN_SLAB_ALLOC
 	local_irq_restore(flags);
 #endif
 



^ permalink raw reply related

* Re: [PATCH 1/3] IPVS: Change of socket usage to enable name space exit.
From: Eric W. Biederman @ 2011-04-20 10:00 UTC (permalink / raw)
  To: Simon Horman
  Cc: Hans Schillstrom, ja, lvs-devel, netdev, netfilter-devel,
	hans.schillstrom
In-Reply-To: <20110419221145.GC12922@verge.net.au>

Simon Horman <horms@verge.net.au> writes:

> On Tue, Apr 19, 2011 at 05:25:03PM +0200, Hans Schillstrom wrote:
>> This is the first patch in a series of three.
>> The cleanup doesn't work when not exit in a clean way by using ipvsadm.
>> Killing of a namespace causes a hanging ipvs, this series will cure that.
>> 
>> If the sync daemons run in a namespace while it crashes
>> or get killed, there is no way to stop them except for a reboot.
>> 
>> Kernel threads should not increment the use count of a socket.
>> By calling sk_change_net() after creating a socket this is avoided.
>> sock_release cant be used, instead sk_release_kernel() should be used.
>> 
>> Thanks to Eric W Biederman.
>> 
>> This patch is based on net-next-2.6  ver 2.6.39-rc2
>
> Thanks Hans and Eric.
>
> Is it only this 1st patch that is intended for 2.6.39?
> The entire series feels a bit long to be applied
> this late in the rc series.

Hans was tracking two bugs that I gave him some advice on.
The first was the threads keeping namespaces from exiting
at the proper time, which the first patch appears to almost
fix.  I didn't see the code in that to ensure the threads
were gone.

The second was a something that sounded like packets flying
around during network namespace subsys cleanup.  Simply
not having network devices and sockets open in the namespace
should be enough to guarantee you don't have packets flying
around.

My feeling is that the patches are in the right neighbourhood
but not quite there yet.

Eric

^ permalink raw reply

* Re: [PATCH 3/3] IPVS: init and cleanup restructuring.
From: Hans Schillstrom @ 2011-04-20  9:56 UTC (permalink / raw)
  To: Julian Anastasov, ebiederm
  Cc: horms, lvs-devel, netdev, netfilter-devel, hans.schillstrom
In-Reply-To: <alpine.LFD.2.00.1104200207340.1724@ja.ssi.bg>

Thanks everyone
I will send a new patch series today based on your comments

Regards
Hans

^ permalink raw reply

* Re: [PATCH 2/3] IPVS: Change of register_pernet_subsys to register_pernet_device
From: Eric W. Biederman @ 2011-04-20  9:46 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: horms, ja, lvs-devel, netdev, netfilter-devel, hans.schillstrom
In-Reply-To: <1303226705-29178-2-git-send-email-hans@schillstrom.com>

Hans Schillstrom <hans@schillstrom.com> writes:

> This is part 1 of a makeover of the init and cleanup
> functions in ip_vs using name space.

That this fixes problems for you is great.  Why does this
fix the problems.  Does ip_vs really act more as a
network device passing packets than as a protocol implementation
or a library?

register_pernet_subsys already gives you the guarantee that
cleanup is in the reverse order of initialization.

I expect you are on the right track but it isn't clear to me from
reading the patch and it's description why this code should work.

Eric


> Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
> ---
>  net/netfilter/ipvs/ip_vs_app.c   |    4 ++--
>  net/netfilter/ipvs/ip_vs_conn.c  |    4 ++--
>  net/netfilter/ipvs/ip_vs_core.c  |    6 +++---
>  net/netfilter/ipvs/ip_vs_ctl.c   |    6 +++---
>  net/netfilter/ipvs/ip_vs_est.c   |    4 ++--
>  net/netfilter/ipvs/ip_vs_ftp.c   |    4 ++--
>  net/netfilter/ipvs/ip_vs_lblc.c  |    6 +++---
>  net/netfilter/ipvs/ip_vs_lblcr.c |    6 +++---
>  net/netfilter/ipvs/ip_vs_proto.c |    4 ++--
>  net/netfilter/ipvs/ip_vs_sync.c  |    4 ++--
>  10 files changed, 24 insertions(+), 24 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c
> index 2dc6de1..7e8e769 100644
> --- a/net/netfilter/ipvs/ip_vs_app.c
> +++ b/net/netfilter/ipvs/ip_vs_app.c
> @@ -599,12 +599,12 @@ int __init ip_vs_app_init(void)
>  {
>  	int rv;
>  
> -	rv = register_pernet_subsys(&ip_vs_app_ops);
> +	rv = register_pernet_device(&ip_vs_app_ops);
>  	return rv;
>  }
>  
>  
>  void ip_vs_app_cleanup(void)
>  {
> -	unregister_pernet_subsys(&ip_vs_app_ops);
> +	unregister_pernet_device(&ip_vs_app_ops);
>  }
> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> index c97bd45..36cd5ea 100644
> --- a/net/netfilter/ipvs/ip_vs_conn.c
> +++ b/net/netfilter/ipvs/ip_vs_conn.c
> @@ -1309,7 +1309,7 @@ int __init ip_vs_conn_init(void)
>  		rwlock_init(&__ip_vs_conntbl_lock_array[idx].l);
>  	}
>  
> -	retc = register_pernet_subsys(&ipvs_conn_ops);
> +	retc = register_pernet_device(&ipvs_conn_ops);
>  
>  	/* calculate the random value for connection hash */
>  	get_random_bytes(&ip_vs_conn_rnd, sizeof(ip_vs_conn_rnd));
> @@ -1319,7 +1319,7 @@ int __init ip_vs_conn_init(void)
>  
>  void ip_vs_conn_cleanup(void)
>  {
> -	unregister_pernet_subsys(&ipvs_conn_ops);
> +	unregister_pernet_device(&ipvs_conn_ops);
>  	/* Release the empty cache */
>  	kmem_cache_destroy(ip_vs_conn_cachep);
>  	vfree(ip_vs_conn_tab);
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 07accf6..a7bb81d 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -1913,7 +1913,7 @@ static int __init ip_vs_init(void)
>  {
>  	int ret;
>  
> -	ret = register_pernet_subsys(&ipvs_core_ops);	/* Alloc ip_vs struct */
> +	ret = register_pernet_device(&ipvs_core_ops);	/* Alloc ip_vs struct */
>  	if (ret < 0)
>  		return ret;
>  
> @@ -1964,7 +1964,7 @@ cleanup_sync:
>  	ip_vs_control_cleanup();
>    cleanup_estimator:
>  	ip_vs_estimator_cleanup();
> -	unregister_pernet_subsys(&ipvs_core_ops);	/* free ip_vs struct */
> +	unregister_pernet_device(&ipvs_core_ops);	/* free ip_vs struct */
>  	return ret;
>  }
>  
> @@ -1977,7 +1977,7 @@ static void __exit ip_vs_cleanup(void)
>  	ip_vs_protocol_cleanup();
>  	ip_vs_control_cleanup();
>  	ip_vs_estimator_cleanup();
> -	unregister_pernet_subsys(&ipvs_core_ops);	/* free ip_vs struct */
> +	unregister_pernet_device(&ipvs_core_ops);	/* free ip_vs struct */
>  	pr_info("ipvs unloaded.\n");
>  }
>  
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index ae47090..08715d8 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -3657,7 +3657,7 @@ int __init ip_vs_control_init(void)
>  		INIT_LIST_HEAD(&ip_vs_svc_fwm_table[idx]);
>  	}
>  
> -	ret = register_pernet_subsys(&ipvs_control_ops);
> +	ret = register_pernet_device(&ipvs_control_ops);
>  	if (ret) {
>  		pr_err("cannot register namespace.\n");
>  		goto err;
> @@ -3682,7 +3682,7 @@ int __init ip_vs_control_init(void)
>  	return 0;
>  
>  err_net:
> -	unregister_pernet_subsys(&ipvs_control_ops);
> +	unregister_pernet_device(&ipvs_control_ops);
>  err:
>  	return ret;
>  }
> @@ -3691,7 +3691,7 @@ err:
>  void ip_vs_control_cleanup(void)
>  {
>  	EnterFunction(2);
> -	unregister_pernet_subsys(&ipvs_control_ops);
> +	unregister_pernet_device(&ipvs_control_ops);
>  	ip_vs_genl_unregister();
>  	nf_unregister_sockopt(&ip_vs_sockopts);
>  	LeaveFunction(2);
> diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c
> index 8c8766c..759163e 100644
> --- a/net/netfilter/ipvs/ip_vs_est.c
> +++ b/net/netfilter/ipvs/ip_vs_est.c
> @@ -216,11 +216,11 @@ int __init ip_vs_estimator_init(void)
>  {
>  	int rv;
>  
> -	rv = register_pernet_subsys(&ip_vs_app_ops);
> +	rv = register_pernet_device(&ip_vs_app_ops);
>  	return rv;
>  }
>  
>  void ip_vs_estimator_cleanup(void)
>  {
> -	unregister_pernet_subsys(&ip_vs_app_ops);
> +	unregister_pernet_device(&ip_vs_app_ops);
>  }
> diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
> index 6b5dd6d..dfa04d3 100644
> --- a/net/netfilter/ipvs/ip_vs_ftp.c
> +++ b/net/netfilter/ipvs/ip_vs_ftp.c
> @@ -451,7 +451,7 @@ int __init ip_vs_ftp_init(void)
>  {
>  	int rv;
>  
> -	rv = register_pernet_subsys(&ip_vs_ftp_ops);
> +	rv = register_pernet_device(&ip_vs_ftp_ops);
>  	return rv;
>  }
>  
> @@ -460,7 +460,7 @@ int __init ip_vs_ftp_init(void)
>   */
>  static void __exit ip_vs_ftp_exit(void)
>  {
> -	unregister_pernet_subsys(&ip_vs_ftp_ops);
> +	unregister_pernet_device(&ip_vs_ftp_ops);
>  }
>  
>  
> diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
> index 87e40ea..96765d0 100644
> --- a/net/netfilter/ipvs/ip_vs_lblc.c
> +++ b/net/netfilter/ipvs/ip_vs_lblc.c
> @@ -603,20 +603,20 @@ static int __init ip_vs_lblc_init(void)
>  {
>  	int ret;
>  
> -	ret = register_pernet_subsys(&ip_vs_lblc_ops);
> +	ret = register_pernet_device(&ip_vs_lblc_ops);
>  	if (ret)
>  		return ret;
>  
>  	ret = register_ip_vs_scheduler(&ip_vs_lblc_scheduler);
>  	if (ret)
> -		unregister_pernet_subsys(&ip_vs_lblc_ops);
> +		unregister_pernet_device(&ip_vs_lblc_ops);
>  	return ret;
>  }
>  
>  static void __exit ip_vs_lblc_cleanup(void)
>  {
>  	unregister_ip_vs_scheduler(&ip_vs_lblc_scheduler);
> -	unregister_pernet_subsys(&ip_vs_lblc_ops);
> +	unregister_pernet_device(&ip_vs_lblc_ops);
>  }
>  
>  
> diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
> index 90f618a..5de425f 100644
> --- a/net/netfilter/ipvs/ip_vs_lblcr.c
> +++ b/net/netfilter/ipvs/ip_vs_lblcr.c
> @@ -799,20 +799,20 @@ static int __init ip_vs_lblcr_init(void)
>  {
>  	int ret;
>  
> -	ret = register_pernet_subsys(&ip_vs_lblcr_ops);
> +	ret = register_pernet_device(&ip_vs_lblcr_ops);
>  	if (ret)
>  		return ret;
>  
>  	ret = register_ip_vs_scheduler(&ip_vs_lblcr_scheduler);
>  	if (ret)
> -		unregister_pernet_subsys(&ip_vs_lblcr_ops);
> +		unregister_pernet_device(&ip_vs_lblcr_ops);
>  	return ret;
>  }
>  
>  static void __exit ip_vs_lblcr_cleanup(void)
>  {
>  	unregister_ip_vs_scheduler(&ip_vs_lblcr_scheduler);
> -	unregister_pernet_subsys(&ip_vs_lblcr_ops);
> +	unregister_pernet_device(&ip_vs_lblcr_ops);
>  }
>  
>  
> diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
> index 17484a4..f7021fc 100644
> --- a/net/netfilter/ipvs/ip_vs_proto.c
> +++ b/net/netfilter/ipvs/ip_vs_proto.c
> @@ -382,7 +382,7 @@ int __init ip_vs_protocol_init(void)
>  	REGISTER_PROTOCOL(&ip_vs_protocol_esp);
>  #endif
>  	pr_info("Registered protocols (%s)\n", &protocols[2]);
> -	return register_pernet_subsys(&ipvs_proto_ops);
> +	return register_pernet_device(&ipvs_proto_ops);
>  
>  	return 0;
>  }
> @@ -393,7 +393,7 @@ void ip_vs_protocol_cleanup(void)
>  	struct ip_vs_protocol *pp;
>  	int i;
>  
> -	unregister_pernet_subsys(&ipvs_proto_ops);
> +	unregister_pernet_device(&ipvs_proto_ops);
>  	/* unregister all the ipvs protocols */
>  	for (i = 0; i < IP_VS_PROTO_TAB_SIZE; i++) {
>  		while ((pp = ip_vs_proto_table[i]) != NULL)
> diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
> index 3f87555..1aeca1d 100644
> --- a/net/netfilter/ipvs/ip_vs_sync.c
> +++ b/net/netfilter/ipvs/ip_vs_sync.c
> @@ -1692,10 +1692,10 @@ static struct pernet_operations ipvs_sync_ops = {
>  
>  int __init ip_vs_sync_init(void)
>  {
> -	return register_pernet_subsys(&ipvs_sync_ops);
> +	return register_pernet_device(&ipvs_sync_ops);
>  }
>  
>  void ip_vs_sync_cleanup(void)
>  {
> -	unregister_pernet_subsys(&ipvs_sync_ops);
> +	unregister_pernet_device(&ipvs_sync_ops);
>  }

^ permalink raw reply

* Re: [PATCH] bnx2x: dont dereference tcp header on non tcp frames
From: Dmitry Kravkov @ 2011-04-20  9:40 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet@gmail.com, Eilon Greenstein, netdev@vger.kernel.org
In-Reply-To: <20110420.014657.232735224.davem@davemloft.net>

On Wed, 2011-04-20 at 01:46 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 20 Apr 2011 09:31:53 +0200
> 
> > [PATCH] bnx2x: dont dereference tcp header on non tcp frames
> > 
> > bnx2x_set_pbd_csum() & bnx2x_set_pbd_csum_e2() use 
> > tcp_hdrlen(skb) even for non TCP frames
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Broadcom folks please review.
> 

Following patch fixes udp checksum offload flow and also addresses
issues raised by Eric. We are testing it now.

From: Vladislav Zolotarov <vladz@broadcom.com>

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_cmn.c |   34 ++++++++++++++++++++++++----------
 1 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_cmn.c b/drivers/net/bnx2x/bnx2x_cmn.c
index e83ac6d..16581df 100644
--- a/drivers/net/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/bnx2x/bnx2x_cmn.c
@@ -2019,15 +2019,23 @@ static inline void bnx2x_set_pbd_gso(struct sk_buff *skb,
 static inline  u8 bnx2x_set_pbd_csum_e2(struct bnx2x *bp, struct sk_buff *skb,
 	u32 *parsing_data, u32 xmit_type)
 {
-	*parsing_data |= ((tcp_hdrlen(skb)/4) <<
-		ETH_TX_PARSE_BD_E2_TCP_HDR_LENGTH_DW_SHIFT) &
-		ETH_TX_PARSE_BD_E2_TCP_HDR_LENGTH_DW;
+	*parsing_data |=
+			((((u8 *)skb_transport_header(skb) - skb->data) >> 1) <<
+			ETH_TX_PARSE_BD_E2_TCP_HDR_START_OFFSET_W_SHIFT) &
+			ETH_TX_PARSE_BD_E2_TCP_HDR_START_OFFSET_W;
 
-	*parsing_data |= ((((u8 *)tcp_hdr(skb) - skb->data) / 2) <<
-		ETH_TX_PARSE_BD_E2_TCP_HDR_START_OFFSET_W_SHIFT) &
-		ETH_TX_PARSE_BD_E2_TCP_HDR_START_OFFSET_W;
+	if (xmit_type & XMIT_CSUM_TCP) {
+		*parsing_data |= ((tcp_hdrlen(skb) / 4) <<
+			ETH_TX_PARSE_BD_E2_TCP_HDR_LENGTH_DW_SHIFT) &
+			ETH_TX_PARSE_BD_E2_TCP_HDR_LENGTH_DW;
 
-	return skb_transport_header(skb) + tcp_hdrlen(skb) - skb->data;
+		return skb_transport_header(skb) + tcp_hdrlen(skb) - skb->data;
+	} else
+		/* We support checksum offload for TCP and UDP only.
+		 * No need to pass the UDP header length - it's a constant.
+		 */
+		return skb_transport_header(skb) +
+				sizeof(struct udphdr) - skb->data;
 }
 
 /**
@@ -2043,7 +2051,7 @@ static inline u8 bnx2x_set_pbd_csum(struct bnx2x *bp, struct sk_buff *skb,
 	struct eth_tx_parse_bd_e1x *pbd,
 	u32 xmit_type)
 {
-	u8 hlen = (skb_network_header(skb) - skb->data) / 2;
+	u8 hlen = (skb_network_header(skb) - skb->data) >> 1;
 
 	/* for now NS flag is not used in Linux */
 	pbd->global_data =
@@ -2051,9 +2059,15 @@ static inline u8 bnx2x_set_pbd_csum(struct bnx2x *bp, struct sk_buff *skb,
 			 ETH_TX_PARSE_BD_E1X_LLC_SNAP_EN_SHIFT));
 
 	pbd->ip_hlen_w = (skb_transport_header(skb) -
-			skb_network_header(skb)) / 2;
+			skb_network_header(skb)) >> 1;
 
-	hlen += pbd->ip_hlen_w + tcp_hdrlen(skb) / 2;
+	hlen += pbd->ip_hlen_w;
+
+	/* We support checksum offload for TCP and UDP only */
+	if (xmit_type & XMIT_CSUM_TCP)
+		hlen += tcp_hdrlen(skb) / 2;
+	else
+		hlen += sizeof(struct udphdr) / 2;
 
 	pbd->total_hlen_w = cpu_to_le16(hlen);
 	hlen = hlen*2;
-- 
1.7.2.2





^ permalink raw reply related

* Re: [PATCH 1/3] IPVS: Change of socket usage to enable name space exit.
From: Eric W. Biederman @ 2011-04-20  9:40 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: horms, ja, lvs-devel, netdev, netfilter-devel, hans.schillstrom
In-Reply-To: <1303226705-29178-1-git-send-email-hans@schillstrom.com>

Hans Schillstrom <hans@schillstrom.com> writes:

> This is the first patch in a series of three.
> The cleanup doesn't work when not exit in a clean way by using ipvsadm.
> Killing of a namespace causes a hanging ipvs, this series will cure that.
>
> If the sync daemons run in a namespace while it crashes
> or get killed, there is no way to stop them except for a reboot.
>
> Kernel threads should not increment the use count of a socket.
> By calling sk_change_net() after creating a socket this is avoided.
> sock_release cant be used, instead sk_release_kernel() should be used.
>
> Thanks to Eric W Biederman.
>
> This patch is based on net-next-2.6  ver 2.6.39-rc2

I am not comfortable with this patch on it's own.  What kills the
daemons on network namespace exit?

I would be a little happier if the code also used sock_create_kern
instead of __sock_create. Now that it uses sk_change_net.  Just to make
the code more idiomatic.

But not actually guaranteeing that the namespace exit kills the threads
at this point seems pretty badly broken, as this idiom is not safe
unless we have a hard guarantee that the threads are dead.

Eric

> Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
> ---
>  net/netfilter/ipvs/ip_vs_sync.c |   28 +++++++++++++++++++---------
>  1 files changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
> index 3e7961e..3f87555 100644
> --- a/net/netfilter/ipvs/ip_vs_sync.c
> +++ b/net/netfilter/ipvs/ip_vs_sync.c
> @@ -1309,7 +1309,12 @@ static struct socket *make_send_sock(struct net *net)
>  		pr_err("Error during creation of socket; terminating\n");
>  		return ERR_PTR(result);
>  	}
> -
> +	/*
> +	 * Kernel sockets that are a part of a namespace, should not
> +	 * hold a reference to a namespace in order to allow to stop it.
> +	 * After sk_change_net should be released using sk_release_kernel.
> +	 */
> +	sk_change_net(sock->sk, net);
>  	result = set_mcast_if(sock->sk, ipvs->master_mcast_ifn);
>  	if (result < 0) {
>  		pr_err("Error setting outbound mcast interface\n");
> @@ -1334,8 +1339,8 @@ static struct socket *make_send_sock(struct net *net)
>
>  	return sock;
>
> -  error:
> -	sock_release(sock);
> +error:
> +	sk_release_kernel(sock->sk);
>  	return ERR_PTR(result);
>  }
>
> @@ -1355,7 +1360,12 @@ static struct socket *make_receive_sock(struct net *net)
>  		pr_err("Error during creation of socket; terminating\n");
>  		return ERR_PTR(result);
>  	}
> -
> +	/*
> +	 * Kernel sockets that are a part of a namespace, should not
> +	 * hold a reference to a namespace in order to allow to stop it.
> +	 * After sk_change_net should be released using sk_release_kernel.
> +	 */
> +	sk_change_net(sock->sk, net);
>  	/* it is equivalent to the REUSEADDR option in user-space */
>  	sock->sk->sk_reuse = 1;
>
> @@ -1377,8 +1387,8 @@ static struct socket *make_receive_sock(struct net *net)
>
>  	return sock;
>
> -  error:
> -	sock_release(sock);
> +error:
> +	sk_release_kernel(sock->sk);
>  	return ERR_PTR(result);
>  }
>
> @@ -1473,7 +1483,7 @@ static int sync_thread_master(void *data)
>  		ip_vs_sync_buff_release(sb);
>
>  	/* release the sending multicast socket */
> -	sock_release(tinfo->sock);
> +	sk_release_kernel(tinfo->sock->sk);
>  	kfree(tinfo);
>
>  	return 0;
> @@ -1513,7 +1523,7 @@ static int sync_thread_backup(void *data)
>  	}
>
>  	/* release the sending multicast socket */
> -	sock_release(tinfo->sock);
> +	sk_release_kernel(tinfo->sock->sk);
>  	kfree(tinfo->buf);
>  	kfree(tinfo);
>
> @@ -1601,7 +1611,7 @@ outtinfo:
>  outbuf:
>  	kfree(buf);
>  outsocket:
> -	sock_release(sock);
> +	sk_release_kernel(sock->sk);
>  out:
>  	return result;
>  }
> --
> 1.7.2.3

^ permalink raw reply

* Re: A patch you wrote some time ago (aka: "[patch 41/54] ICMP: Fix icmp_errors_use_inbound_ifaddr sysctl")
From: Alexander Hoogerhuis @ 2011-04-20  9:11 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Chris Wright, linux-kernel, netdev
In-Reply-To: <4DAE9824.10802@trash.net>

Also, in a quick followup on myself, this is ambiguous, as it does not 
state that it will select the primary IP. I've

The second part is that RFC 5798 (VRRP v3) lists in section 8.1.1:

>    The IPv4 source address of an ICMP redirect should be the address
>    that the end-host used when making its next-hop routing decision.

I.e. the settings in linux, with or without the sysctl flag set, would 
run against this.

mvh,
A
-- 
Alexander Hoogerhuis | http://no.linkedin.com/in/alexh
Boxed Solutions AS   | +47 908 21 485 - alexh@boxed.no
"Given enough eyeballs, all bugs are shallow." -Eric S. Raymond

^ permalink raw reply

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Eric Dumazet @ 2011-04-20  9:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: casteyde.christian, Christoph Lameter, Andrew Morton, netdev,
	bugzilla-daemon, bugme-daemon, Vegard Nossum
In-Reply-To: <1303286998.3186.18.camel@edumazet-laptop>


I had one splat (before applying any patch), adding CONFIG_PREEMPT to my
build :

[   84.629936] WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88011a7376f0)
[   84.629939] 5868ba1a0188ffff5868ba1a0188ffff7078731a0188ffffe0e1181a0188ffff
[   84.629952]  i i i i i i i i i i i i i i i i u u u u u u u u u u u u u u u u
[   84.629963]                                  ^
[   84.629964] 
[   84.629966] Pid: 2060, comm: 05-wait_for_sys Not tainted 2.6.39-rc4-00237-ge3de956-dirty #550 HP ProLiant BL460c G6
[   84.629969] RIP: 0010:[<ffffffff810f0e80>]  [<ffffffff810f0e80>] kmem_cache_alloc+0x50/0x190
[   84.629977] RSP: 0018:ffff88011a0b9988  EFLAGS: 00010286
[   84.629979] RAX: 0000000000000000 RBX: ffff88011a739a98 RCX: 0000000000620780
[   84.629980] RDX: 0000000000620740 RSI: 0000000000017400 RDI: ffffffff810db8f5
[   84.629982] RBP: ffff88011a0b99b8 R08: ffff88011a261dd8 R09: 0000000000000009
[   84.629983] R10: 0000000000000002 R11: ffff88011fffce00 R12: ffff88011b00a600
[   84.629985] R13: ffff88011a7376f0 R14: 00000000000000d0 R15: ffff88011abbb0c0
[   84.629987] FS:  0000000000000000(0000) GS:ffff88011fc00000(0063) knlGS:00000000f76ff6c0
[   84.629989] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[   84.629991] CR2: ffff88011998dab8 CR3: 000000011a13f000 CR4: 00000000000006f0
[   84.629992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   84.629994] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[   84.629995]  [<ffffffff810db8f5>] anon_vma_prepare+0x55/0x190
[   84.630000]  [<ffffffff810cfe90>] handle_pte_fault+0x470/0x6e0
[   84.630002]  [<ffffffff810d1654>] handle_mm_fault+0x124/0x1a0
[   84.630005]  [<ffffffff8147a510>] do_page_fault+0x160/0x560
[   84.630008]  [<ffffffff81477fcf>] page_fault+0x1f/0x30
[   84.630013]  [<ffffffff810b31f8>] generic_file_aio_read+0x528/0x740
[   84.630017]  [<ffffffff810f5461>] do_sync_read+0xd1/0x110
[   84.630021]  [<ffffffff810f5c36>] vfs_read+0xc6/0x160
[   84.630023]  [<ffffffff810f60b0>] sys_read+0x50/0x90
[   84.630025]  [<ffffffff8147f4e9>] sysenter_dispatch+0x7/0x27
[   84.630030]  [<ffffffffffffffff>] 0xffffffffffffffff

ffffffff810f0e6d:       0f 84 b5 00 00 00       je     ffffffff810f0f28 <kmem_cache_alloc+0xf8>
ffffffff810f0e73:       49 63 44 24 20          movslq 0x20(%r12),%rax
ffffffff810f0e78:       48 8d 4a 40             lea    0x40(%rdx),%rcx
ffffffff810f0e7c:       49 8b 34 24             mov    (%r12),%rsi
>>ffffffff810f0e80:       49 8b 5c 05 00          mov    0x0(%r13,%rax,1),%rbx
ffffffff810f0e85:       4c 89 e8                mov    %r13,%rax
ffffffff810f0e88:       e8 83 19 0e 00          callq  ffffffff811d2810 <this_cpu_cmpxchg16b_emu>
ffffffff810f0e8d:       0f 1f 40 00             nopl   0x0(%rax)
ffffffff810f0e91:       84 c0                   test   %al,%al
ffffffff810f0e93:       74 c1                   je     ffffffff810f0e56 <kmem_cache_alloc+0x26>


So this is definitely the access to object->next that triggers the fault.

Problem is : my (2nd) patch only reduces the window of occurrence of the bug, since we can
be preempted/interrupted between the kmemcheck_mark_initialized() and access to object->next :

The preempt/irq could allocate the object and free it again.

So KMEMCHECK would require us to block IRQ to be safe.

Then, just disable SLUB_CMPXCHG_DOUBLE if KMEMCHECK is defined, as I did in my first patch.

Christoph, please reconsider first version of patch (disabling SLUB_CMPXCHG_DOUBLE if
either KMEMCHECK or DEBUG_PAGEALLOC is defined)

Thanks



^ permalink raw reply

* [PATCH BUG-FIX] ipv6: udp: fix the wrong headroom check
From: Shan Wei @ 2011-04-20  8:52 UTC (permalink / raw)
  To: kuznet, David Miller, pekkas, jmorris,
	yoshfuji@linux-ipv6.org >> YOSHIFUJI Hideaki, Patrick 

At this point, skb->data points to skb_transport_header.
So, headroom check is wrong. 

For some case:bridge(UFO is on) + eth device(UFO is off),
there is no enough headroom for IPv6 frag head.
But headroom check is always false.

This will bring about data be moved to there prior to skb->head,
when adding IPv6 frag header to skb.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
 net/ipv6/udp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 15c3774..9e305d7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1335,7 +1335,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, u32 features)
 	skb->ip_summed = CHECKSUM_NONE;
 
 	/* Check if there is enough headroom to insert fragment header. */
-	if ((skb_headroom(skb) < frag_hdr_sz) &&
+	if ((skb_mac_header(skb) < skb->head + frag_hdr_sz) &&
 	    pskb_expand_head(skb, frag_hdr_sz, 0, GFP_ATOMIC))
 		goto out;
 
-- 
1.6.3.3
 

^ permalink raw reply related

* Re: [PATCH net-next-2.6 0/9] SCTP updates for net-next-2.6
From: David Miller @ 2011-04-20  8:55 UTC (permalink / raw)
  To: yjwei; +Cc: netdev, linux-sctp
In-Reply-To: <4DAE8A27.3040007@cn.fujitsu.com>

From: Wei Yongjun <yjwei@cn.fujitsu.com>
Date: Wed, 20 Apr 2011 15:24:23 +0800

> Here is a set of SCTP patches for net-next-2.6, also from vlad's
> lksctp-dev tree.  Please apply.
> 
> Shan Wei (4):
>       sctp: check parameter value of length in ERROR chunk
>       sctp: check invalid value of length parameter in error cause
>       sctp: remove redundant check when walking through a list of TLV parameters
>       sctp: handle ootb packet in chunk order as defined
> 
> Vlad Yasevich (2):
>       sctp: remove completely unsed EMPTY state
>       sctp: bail from sctp_endpoint_lookup_assoc() if not bound.
> 
> Wei Yongjun (3):
>       sctp: fix to check the source address of COOKIE-ECHO chunk
>       sctp: make heartbeat information in sctp_make_heartbeat()
>       sctp: move chunk from retransmit queue to abandoned list

Applied, thanks for doing these backports from Vlad's tree.

^ permalink raw reply

* [PATCH 1/1] powerpc: Fix multicast problem in fs_enet driver
From: Andrea Galbusera @ 2011-04-20  8:55 UTC (permalink / raw)
  To: Pantelis Antoniou, Vitaly Bordug
  Cc: linuxppc-dev, netdev, linux-kernel, Andrea Galbusera

mac-fec.c was setting individual UDP address registers instead of multicast
group address registers when joining a multicast group.
This prevented from correctly receiving UDP multicast packets.
According to datasheet, replaced hash_table_high and hash_table_low
with grp_hash_table_high and grp_hash_table_low respectively.

Tested on a MPC5121 based board.

Signed-off-by: Andrea Galbusera <gizero@gmail.com>
---
 drivers/net/fs_enet/mac-fec.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/fs_enet/mac-fec.c b/drivers/net/fs_enet/mac-fec.c
index 61035fc..b9fbc83 100644
--- a/drivers/net/fs_enet/mac-fec.c
+++ b/drivers/net/fs_enet/mac-fec.c
@@ -226,8 +226,8 @@ static void set_multicast_finish(struct net_device *dev)
 	}
 
 	FC(fecp, r_cntrl, FEC_RCNTRL_PROM);
-	FW(fecp, hash_table_high, fep->fec.hthi);
-	FW(fecp, hash_table_low, fep->fec.htlo);
+	FW(fecp, grp_hash_table_high, fep->fec.hthi);
+	FW(fecp, grp_hash_table_low, fep->fec.htlo);
 }
 
 static void set_multicast_list(struct net_device *dev)
@@ -273,8 +273,8 @@ static void restart(struct net_device *dev)
 	/*
 	 * Reset all multicast.
 	 */
-	FW(fecp, hash_table_high, fep->fec.hthi);
-	FW(fecp, hash_table_low, fep->fec.htlo);
+	FW(fecp, grp_hash_table_high, fep->fec.hthi);
+	FW(fecp, grp_hash_table_low, fep->fec.htlo);
 
 	/*
 	 * Set maximum receive buffer size.
-- 
1.7.0.4

^ permalink raw reply related

* Re: [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)
From: Krishna Kumar2 @ 2011-04-20  8:52 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, mst, netdev, rusty, qemu-devel
In-Reply-To: <20110420082706.32157.59668.stgit@dhcp-91-7.nay.redhat.com.englab.nay.redhat.com>

Thanks Jason!

So I can use my virtio-net guest driver and test with this patch?
Please provide the script you use to start MQ guest.

Regards,

- KK

Jason Wang <jasowang@redhat.com> wrote on 04/20/2011 02:03:07 PM:

> Jason Wang <jasowang@redhat.com>
> 04/20/2011 02:03 PM
>
> To
>
> Krishna Kumar2/India/IBM@IBMIN, kvm@vger.kernel.org, mst@redhat.com,
> netdev@vger.kernel.org, rusty@rustcorp.com.au, qemu-
> devel@nongnu.org, anthony@codemonkey.ws
>
> cc
>
> Subject
>
> [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)
>
> Inspired by Krishna's patch
(http://www.spinics.net/lists/kvm/msg52098.html
> ) and
> Michael's suggestions.  The following series adds the multiqueue support
for
> qemu and enable it for virtio-net (both userspace and vhost).
>
> The aim for this series is to simplified the management and achieve the
same
> performacne with less codes.
>
> Follows are the differences between this series and Krishna's:
>
> - Add the multiqueue support for qemu and also for userspace virtio-net
> - Instead of hacking the vhost module to manipulate kthreads, this patch
just
> implement the userspace based multiqueues and thus can re-use the
> existed vhost kernel-side codes without any modification.
> - Use 1:1 mapping between TX/RX pairs and vhost kthread because the
> implementation is based on usersapce.
> - The cli is also changed to make the mgmt easier, the -netdev option of
qdev
> can now accpet more than one ids. You can start a multiqueue virtio-net
device
> through:
> ./qemu-system-x86_64 -netdev tap,id=hn0,vhost=on,fd=X -netdev
> tap,id=hn0,vhost=on,fd=Y -device
virtio-net-pci,netdev=hn0#hn1,queues=2 ...
>
> The series is very primitive and still need polished.
>
> Suggestions are welcomed.
> ---
>
> Jason Wang (2):
>       net: Add multiqueue support
>       virtio-net: add multiqueue support
>
>
>  hw/qdev-properties.c |   37 ++++-
>  hw/qdev.h            |    3
>  hw/vhost.c           |   26 ++-
>  hw/vhost.h           |    1
>  hw/vhost_net.c       |    7 +
>  hw/vhost_net.h       |    2
>  hw/virtio-net.c      |  409 +++++++++++++++++++++++++++++++
> +------------------
>  hw/virtio-net.h      |    2
>  hw/virtio-pci.c      |    1
>  hw/virtio.h          |    1
>  net.c                |   34 +++-
>  net.h                |   15 +-
>  12 files changed, 353 insertions(+), 185 deletions(-)
>
> --
> Jason Wang

^ permalink raw reply

* Re: [PATCH] bnx2x: dont dereference tcp header on non tcp frames
From: David Miller @ 2011-04-20  8:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: dmitry, eilong, netdev
In-Reply-To: <1303284713.3186.13.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Apr 2011 09:31:53 +0200

> [PATCH] bnx2x: dont dereference tcp header on non tcp frames
> 
> bnx2x_set_pbd_csum() & bnx2x_set_pbd_csum_e2() use 
> tcp_hdrlen(skb) even for non TCP frames
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Broadcom folks please review.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox