Netdev List

Netdev List
 help / color / mirror / Atom feed

* [GIT PULL] first round of vhost-net enhancements for net-next
From: Michael S. Tsirkin @ 2010-05-03 21:32 UTC (permalink / raw)
  To: David Miller; +Cc: kvm, virtualization, netdev, linux-kernel

David,
The following tree includes a couple of enhancements that help vhost-net.
Please pull them for net-next. Another set of patches is under
debugging/testing and I hope to get them ready in time for 2.6.35,
so there may be another pull request later.

Thanks!

The following changes since commit 7ef527377b88ff05fb122a47619ea506c631c914:

  Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 (2010-05-02 22:02:06 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost

Michael S. Tsirkin (2):
      tun: add ioctl to modify vnet header size
      macvtap: add ioctl to modify vnet header size

 drivers/net/macvtap.c  |   31 +++++++++++++++++++++++++++----
 drivers/net/tun.c      |   32 ++++++++++++++++++++++++++++----
 include/linux/if_tun.h |    2 ++
 3 files changed, 57 insertions(+), 8 deletions(-)

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: show stopped status in sysfs
From: Ben Hutchings @ 2010-05-03 21:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David S. Miller, Eric Dumazet, Eric W. Biederman, Johannes Berg,
	Tom Herbert, netdev, linux-kernel
In-Reply-To: <20100503212423.GA15998@redhat.com>

On Tue, 2010-05-04 at 00:24 +0300, Michael S. Tsirkin wrote:
> When debugging faulty hardware (in case of virt, faulty
> emulation) I found it helpful to be able to examine
> stopped status of the interface. The following patch makes
> this visible in sysfs.
[...]

This is a per-queue attribute and should not be associated directly with
the netdev.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] net: show stopped status in sysfs
From: David Miller @ 2010-05-03 22:04 UTC (permalink / raw)
  To: bhutchings
  Cc: mst, eric.dumazet, ebiederm, johannes, therbert, netdev,
	linux-kernel
In-Reply-To: <1272923093.27948.63.camel@localhost>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 03 May 2010 22:44:53 +0100

> On Tue, 2010-05-04 at 00:24 +0300, Michael S. Tsirkin wrote:
>> When debugging faulty hardware (in case of virt, faulty
>> emulation) I found it helpful to be able to examine
>> stopped status of the interface. The following patch makes
>> this visible in sysfs.
> [...]
> 
> This is a per-queue attribute and should not be associated directly with
> the netdev.

Right.

^ permalink raw reply

* Re: [PATCH] net: show stopped status in sysfs
From: David Miller @ 2010-05-03 22:05 UTC (permalink / raw)
  To: mst; +Cc: eric.dumazet, ebiederm, johannes, therbert, netdev, linux-kernel
In-Reply-To: <20100503212423.GA15998@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 4 May 2010 00:24:25 +0300

> @@ -303,6 +313,7 @@ static struct device_attribute net_class_attributes[] = {
>  	__ATTR(address, S_IRUGO, show_address, NULL),
>  	__ATTR(broadcast, S_IRUGO, show_broadcast, NULL),
>  	__ATTR(carrier, S_IRUGO, show_carrier, NULL),
> +	__ATTR(carrier, S_IRUGO, show_stopped, NULL),

Besides the fact that you have to publish this as a per-queue attribute,
you're also erroneously naming it 'carrier' here.

^ permalink raw reply

* Re: [GIT PULL] first round of vhost-net enhancements for net-next
From: David Miller @ 2010-05-03 22:07 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20100503213244.GA16006@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 4 May 2010 00:32:45 +0300

> The following tree includes a couple of enhancements that help vhost-net.
> Please pull them for net-next. Another set of patches is under
> debugging/testing and I hope to get them ready in time for 2.6.35,
> so there may be another pull request later.

Pulled, thanks.

^ permalink raw reply

* Re: [GIT PULL] first round of vhost-net enhancements for net-next
From: David Miller @ 2010-05-03 22:08 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20100503.150729.00474027.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Mon, 03 May 2010 15:07:29 -0700 (PDT)

> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Tue, 4 May 2010 00:32:45 +0300
> 
>> The following tree includes a couple of enhancements that help vhost-net.
>> Please pull them for net-next. Another set of patches is under
>> debugging/testing and I hope to get them ready in time for 2.6.35,
>> so there may be another pull request later.
> 
> Pulled, thanks.

Nevermind, reverted.

Do you even compile test what you send to people?

drivers/net/macvtap.c: In function ‘macvtap_ioctl’:
drivers/net/macvtap.c:713: warning: control reaches end of non-void function

You're really batting 1000 today Michael...

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: if6_get_next() fix
From: David Miller @ 2010-05-03 22:17 UTC (permalink / raw)
  To: eric.dumazet
  Cc: paulmck, shemminger, Valdis.Kletnieks, akpm, peterz, kaber,
	linux-kernel, netfilter-devel, netdev
In-Reply-To: <1272919814.2407.149.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 03 May 2010 22:50:14 +0200

> Paul, David, here the patch I was thinking about :
> 
> Feel free to split it in two parts if you like, I am too tired and must
> sleep now ;)
 ...
> [PATCH net-next-2.6] net: rcu fixes
> 
> Add hlist_for_each_entry_rcu_bh() and
> hlist_for_each_entry_continue_rcu_bh() macros, and use them in
> ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps
> warnings.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Paul, let me know if you want to handle these seperately (one commit
in your tree for the rculist.h bit and one for the ipv6 change) or to
put it all at once into net-next-2.6, I'm happy either way.

^ permalink raw reply

* Re: [net-next-2.6 PATCH v3] ixgbe: disable MSI-X by default on certain Cisco adapters
From: David Miller @ 2010-05-03 22:18 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, nicholasx.d.nunley, john.ronciak
In-Reply-To: <w2u9929d2391005031516m6ae1bb3bpb9b4b9dc84bf4b1d@mail.gmail.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 3 May 2010 15:16:07 -0700

> Dave please revert the following patch because it is being fixed
> without a software workaround:
> 
> commit d5ffd75a27fade39ba5df3b07290c5a2c297b9bd

Done.

^ permalink raw reply

* Re: [GIT PULL] first round of vhost-net enhancements for net-next
From: Michael S. Tsirkin @ 2010-05-03 22:20 UTC (permalink / raw)
  To: David Miller; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20100503.150829.254849182.davem@davemloft.net>

On Mon, May 03, 2010 at 03:08:29PM -0700, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Mon, 03 May 2010 15:07:29 -0700 (PDT)
> 
> > From: "Michael S. Tsirkin" <mst@redhat.com>
> > Date: Tue, 4 May 2010 00:32:45 +0300
> > 
> >> The following tree includes a couple of enhancements that help vhost-net.
> >> Please pull them for net-next. Another set of patches is under
> >> debugging/testing and I hope to get them ready in time for 2.6.35,
> >> so there may be another pull request later.
> > 
> > Pulled, thanks.
> 
> Nevermind, reverted.
> 
> Do you even compile test what you send to people?
> 
> drivers/net/macvtap.c: In function ‘macvtap_ioctl’:
> drivers/net/macvtap.c:713: warning: control reaches end of non-void function
> 
> You're really batting 1000 today Michael...

Ouch. Should teach me not to send out stuff after midnight. Sorry about
the noise.

-- 
MST

^ permalink raw reply

* Re: [net-next-2.6 PATCH v3] ixgbe: disable MSI-X by default on certain Cisco adapters
From: Jeff Kirsher @ 2010-05-03 22:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Nicholas Nunley, John Ronciak, Jeff Kirsher
In-Reply-To: <20100428024521.28991.37874.stgit@localhost.localdomain>

Dave please revert the following patch because it is being fixed
without a software workaround:

commit d5ffd75a27fade39ba5df3b07290c5a2c297b9bd
Author: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Date:   Tue Apr 27 19:47:49 2010 -0700

    ixgbe: disable MSI-X by default on certain Cisco adapters

    Due to an errata in 82598 parts MSI-X needs to be disabled
    in certain ixgbe devices designed to transfer peer-to-peer
    traffic on the PCIe bus. This patch sets the default
    interrupt type to MSI rather than MSI-X for specific Cisco
    ixgbe adapters.

    Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
    Acked-by: John Ronciak <john.ronciak@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

-- 
Cheers,
Jeff

^ permalink raw reply

* Re: OOP in ip_cmsg_recv (net-next)
From: David Miller @ 2010-05-03 22:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, netdev
In-Reply-To: <1272907269.2226.111.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 03 May 2010 19:21:09 +0200

>  
> -	/* skb is now orphaned, might be freed outside of locked section */
> -	consume_skb(skb);
> +	/* skb is now orphaned, can be freed outside of locked section */
> +	__kfree_skb(skb);
>  }
>  EXPORT_SYMBOL(skb_free_datagram_locked);

Eric, if you do this you undo the utility of the SKB packet drop tracing
that Neil wrote.

consome_skb() says that the application actually took in the packet and
we didn't drop it due to some error or similar.

Whereas __kfree_skb() is going to be tagged as a packet drop and the
data didn't reach the application.

So if you need to use __kfree_skb() to fix this you'll need to somehow
add some appropriate annotations for the tracer.  Perhaps add a
__consume_skb() that is marked for the tracing stuff and does what
you need.

^ permalink raw reply

* Re: [PATCH] dm9601: fix phy/eeprom write routine
From: David Miller @ 2010-05-03 22:27 UTC (permalink / raw)
  To: jacmet; +Cc: netdev, michael.planes, stable
In-Reply-To: <1272916886-8841-1-git-send-email-jacmet@sunsite.dk>

From: Peter Korsgaard <jacmet@sunsite.dk>
Date: Mon,  3 May 2010 22:01:26 +0200

> Use correct bit positions in DM_SHARED_CTRL register for writes.
> 
> Michael Planes recently encountered a 'KY-RS9600 USB-LAN converter', which
> came with a driver CD containing a Linux driver. This driver turns out to
> be a copy of dm9601.c with symbols renamed and my copyright stripped.
> That aside, it did contain 1 functional change in dm_write_shared_word(),
> and after checking the datasheet the original value was indeed wrong
> (read versus write bits).
> 
> On Michaels HW, this change bumps receive speed from ~30KB/s to ~900KB/s.
> On other devices the difference is less spectacular, but still significant
> (~30%).
> 
> Reported-by: Michael Planes <michael.planes@free.fr>
> CC: stable@kernel.org
> Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH] net/gianfar: drop recycled skbs on MTU change
From: David Miller @ 2010-05-03 22:29 UTC (permalink / raw)
  To: sebastian; +Cc: afleming, netdev
In-Reply-To: <20100503151745.GA17997@Chamillionaire.breakpoint.cc>

From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Date: Mon, 3 May 2010 17:17:45 +0200

> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> The size for skb which is added to the recycled list is using the
> current descriptor size which is current MTU. gfar_new_skb() is also
> using this size. So after changing or alteast increasing the MTU all
> recycled skbs should be dropped.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> I'm not 100% sure but it looks like it is wrong.

It looks right to me, can I get an ACK from gianfar developers?

^ permalink raw reply

* Re: OOP in ip_cmsg_recv (net-next)
From: David Miller @ 2010-05-03 22:30 UTC (permalink / raw)
  To: shemminger; +Cc: eric.dumazet, netdev
In-Reply-To: <20100503140048.30aedad7@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 3 May 2010 14:00:48 -0700

> On Mon, 03 May 2010 19:04:26 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> diff --git a/net/core/datagram.c b/net/core/datagram.c
>> index 95b851f..88949b0 100644
>> --- a/net/core/datagram.c
>> +++ b/net/core/datagram.c
>> @@ -230,12 +230,8 @@ EXPORT_SYMBOL(skb_free_datagram);
>>  void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
>>  {
>>  	lock_sock_bh(sk);
>> -	skb_orphan(skb);
>> -	sk_mem_reclaim_partial(sk);
>> +	skb_free_datagram(sk, skb);
>>  	unlock_sock_bh(sk);
>> -
>> -	/* skb is now orphaned, might be freed outside of locked section */
>> -	consume_skb(skb);
>>  }
>>  EXPORT_SYMBOL(skb_free_datagram_locked);
> 
> This works great for me. No messages for several hours.

Eric if we can't refine properly your other approach to fixing this
I'd like to apply this version meanwhile...

^ permalink raw reply

* Re: [PATCH] skge: use the DMA state API instead of the pci equivalents
From: David Miller @ 2010-05-03 22:32 UTC (permalink / raw)
  To: fujita.tomonori; +Cc: netdev, shemminger
In-Reply-To: <20100428095730K.fujita.tomonori@lab.ntt.co.jp>

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Wed, 28 Apr 2010 09:57:04 +0900

> This replace the PCI DMA state API (include/linux/pci-dma.h) with the
> DMA equivalents since the PCI DMA state API will be obsolete.
> 
> No functional change.
> 
> For further information about the background:
> 
> http://marc.info/?l=linux-netdev&m=127037540020276&w=2
> 
> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

Stephen have you had a chance to smoke test this yet?
I'd like to apply it as it's been rotting in patchwork
for almost a week now.

^ permalink raw reply

* Re: [PATCH] sky2: Avoid race in sky2_change_mtu
From: David Miller @ 2010-05-03 22:37 UTC (permalink / raw)
  To: mikem; +Cc: shemminger, netdev
In-Reply-To: <4BDEDB50.7000707@ring3k.org>

From: Mike McCormack <mikem@ring3k.org>
Date: Mon, 03 May 2010 23:18:56 +0900

> netif_stop_queue does not ensure all in-progress transmits are complete,
>  so use netif_tx_disable() instead.
> 
> Make sure NAPI polls are disabled, otherwise NAPI might trigger a TX
>  restart between when we stop the queue and NAPI is disabled.
> 
> Signed-off-by: Mike McCormack <mikem@ring3k.org>

This looks quite reasonable, Stephen please review.

^ permalink raw reply

* Re: [PATCH] unix/garbage: kill copy of the skb queue walker
From: David Miller @ 2010-05-03 22:40 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev
In-Reply-To: <alpine.DEB.2.00.1005031620130.7041@wel-95.cs.helsinki.fi>

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Mon, 3 May 2010 16:22:18 +0300 (EEST)

> Worse yet, it seems that its arguments were in reverse order. Also
> remove one related helper which seems hardly worth keeping.
> 
> Compile tested.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied, thanks Ilpo.

^ permalink raw reply

* Re: [PATCH] IPv4: unresolved multicast route cleanup
From: David Miller @ 2010-05-03 22:41 UTC (permalink / raw)
  To: andreas.meissner; +Cc: netdev
In-Reply-To: <4BDE9BCB.8040803@indakom.de>

From: Andreas Meißner <andreas.meissner@indakom.de>
Date: Mon, 03 May 2010 11:47:55 +0200

> --- net/ipv4/ipmr.c.orig    2010-05-03 10:55:06.000000000 +0200
> +++ net/ipv4/ipmr.c    2010-05-03 10:58:30.000000000 +0200
> @@ -753,7 +753,8 @@ ipmr_cache_unresolved(struct net *net, v
>         c->next = mfc_unres_queue;
>         mfc_unres_queue = c;
> 
> -        mod_timer(&ipmr_expire_timer, c->mfc_un.unres.expires);
> +        if (atomic_read(&net->ipv4.cache_resolve_queue_len) == 1)
> +            mod_timer(&ipmr_expire_timer, c->mfc_un.unres.expires);
>     }

Your email client has corrupted tabs into space characters, and
the new code in your patch is not indented properly.

Please read Documentation/SubmittingPatches and
Documentation/email-clients.txt for help.

Thanks.

^ permalink raw reply

* Re: [PATCH linux-2.6.34-rc5] drivers/net/phy: micrel phy driver
From: David Miller @ 2010-05-03 22:44 UTC (permalink / raw)
  To: David.Choi; +Cc: netdev
In-Reply-To: <C43529A246480145B0A6D0234BDB0F0D02129E@MELANITE.micrel.com>

From: "Choi, David" <David.Choi@Micrel.Com>
Date: Thu, 29 Apr 2010 09:12:41 -0700

> To whom it may have concerned:
> 
> From: David J. Choi <david.choi@micrel.com>
> Body of the explanation: This is the first version of phy driver from Micrel Inc.
> Signed-off-by: David J. Choi <david.choi@micrel.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: if6_get_next() fix
From: Paul E. McKenney @ 2010-05-03 22:48 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, shemminger, Valdis.Kletnieks, akpm, peterz, kaber,
	linux-kernel, netfilter-devel, netdev
In-Reply-To: <20100503.151725.94856059.davem@davemloft.net>

On Mon, May 03, 2010 at 03:17:25PM -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 03 May 2010 22:50:14 +0200
> 
> > Paul, David, here the patch I was thinking about :
> > 
> > Feel free to split it in two parts if you like, I am too tired and must
> > sleep now ;)
>  ...
> > [PATCH net-next-2.6] net: rcu fixes
> > 
> > Add hlist_for_each_entry_rcu_bh() and
> > hlist_for_each_entry_continue_rcu_bh() macros, and use them in
> > ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps
> > warnings.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Paul, let me know if you want to handle these seperately (one commit
> in your tree for the rculist.h bit and one for the ipv6 change) or to
> put it all at once into net-next-2.6, I'm happy either way.

These changes look pretty closely integrated, so it is probably better
if they go up your tree with the related networking changes.  I will
take a look at them.

							Thanx, Paul

^ permalink raw reply

* Re: [PATCHv7] add mergeable buffers support to vhost_net
From: Michael S. Tsirkin @ 2010-05-03 22:48 UTC (permalink / raw)
  To: David L Stevens; +Cc: netdev, kvm, virtualization
In-Reply-To: <1272488232.11307.4.camel@w-dls.beaverton.ibm.com>

On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote:
> This patch adds mergeable receive buffer support to vhost_net.
> 
> Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
> 
> diff -ruNp net-next-v0/drivers/vhost/net.c net-next-v7/drivers/vhost/net.c
> --- net-next-v0/drivers/vhost/net.c	2010-04-24 21:36:54.000000000 -0700
> +++ net-next-v7/drivers/vhost/net.c	2010-04-28 12:26:18.000000000 -0700
> @@ -74,6 +74,23 @@ static int move_iovec_hdr(struct iovec *
>  	}
>  	return seg;
>  }
> +/* Copy iovec entries for len bytes from iovec. Return segments used. */
> +static int copy_iovec_hdr(const struct iovec *from, struct iovec *to,
> +			  size_t len, int iovcount)
> +{
> +	int seg = 0;
> +	size_t size;
> +	while (len && seg < iovcount) {
> +		size = min(from->iov_len, len);
> +		to->iov_base = from->iov_base;
> +		to->iov_len = size;
> +		len -= size;
> +		++from;
> +		++to;
> +		++seg;
> +	}
> +	return seg;
> +}
>  
>  /* Caller must have TX VQ lock */
>  static void tx_poll_stop(struct vhost_net *net)
> @@ -109,7 +126,7 @@ static void handle_tx(struct vhost_net *
>  	};
>  	size_t len, total_len = 0;
>  	int err, wmem;
> -	size_t hdr_size;
> +	size_t vhost_hlen;
>  	struct socket *sock = rcu_dereference(vq->private_data);
>  	if (!sock)
>  		return;
> @@ -128,13 +145,13 @@ static void handle_tx(struct vhost_net *
>  
>  	if (wmem < sock->sk->sk_sndbuf / 2)
>  		tx_poll_stop(net);
> -	hdr_size = vq->hdr_size;
> +	vhost_hlen = vq->vhost_hlen;
>  
>  	for (;;) {
> -		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
> -					 ARRAY_SIZE(vq->iov),
> -					 &out, &in,
> -					 NULL, NULL);
> +		head = vhost_get_desc(&net->dev, vq, vq->iov,
> +				      ARRAY_SIZE(vq->iov),
> +				      &out, &in,
> +				      NULL, NULL);
>  		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>  		if (head == vq->num) {
>  			wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> @@ -155,20 +172,20 @@ static void handle_tx(struct vhost_net *
>  			break;
>  		}
>  		/* Skip header. TODO: support TSO. */
> -		s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out);
> +		s = move_iovec_hdr(vq->iov, vq->hdr, vhost_hlen, out);
>  		msg.msg_iovlen = out;
>  		len = iov_length(vq->iov, out);
>  		/* Sanity check */
>  		if (!len) {
>  			vq_err(vq, "Unexpected header len for TX: "
>  			       "%zd expected %zd\n",
> -			       iov_length(vq->hdr, s), hdr_size);
> +			       iov_length(vq->hdr, s), vhost_hlen);
>  			break;
>  		}
>  		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>  		err = sock->ops->sendmsg(NULL, sock, &msg, len);
>  		if (unlikely(err < 0)) {
> -			vhost_discard_vq_desc(vq);
> +			vhost_discard_desc(vq, 1);
>  			tx_poll_start(net, sock);
>  			break;
>  		}
> @@ -187,12 +204,25 @@ static void handle_tx(struct vhost_net *
>  	unuse_mm(net->dev.mm);
>  }
>  
> +static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk)
> +{
> +	struct sk_buff *head;
> +	int len = 0;
> +
> +	lock_sock(sk);
> +	head = skb_peek(&sk->sk_receive_queue);
> +	if (head)
> +		len = head->len + vq->sock_hlen;
> +	release_sock(sk);
> +	return len;
> +}
> +
>  /* Expects to be always run from workqueue - which acts as
>   * read-size critical section for our kind of RCU. */
>  static void handle_rx(struct vhost_net *net)
>  {
>  	struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
> -	unsigned head, out, in, log, s;
> +	unsigned in, log, s;
>  	struct vhost_log *vq_log;
>  	struct msghdr msg = {
>  		.msg_name = NULL,
> @@ -203,14 +233,14 @@ static void handle_rx(struct vhost_net *
>  		.msg_flags = MSG_DONTWAIT,
>  	};
>  
> -	struct virtio_net_hdr hdr = {
> -		.flags = 0,
> -		.gso_type = VIRTIO_NET_HDR_GSO_NONE
> +	struct virtio_net_hdr_mrg_rxbuf hdr = {
> +		.hdr.flags = 0,
> +		.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
>  	};
>  
>  	size_t len, total_len = 0;
> -	int err;
> -	size_t hdr_size;
> +	int err, headcount, datalen;
> +	size_t vhost_hlen;
>  	struct socket *sock = rcu_dereference(vq->private_data);
>  	if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
>  		return;
> @@ -218,18 +248,19 @@ static void handle_rx(struct vhost_net *
>  	use_mm(net->dev.mm);
>  	mutex_lock(&vq->mutex);
>  	vhost_disable_notify(vq);
> -	hdr_size = vq->hdr_size;
> +	vhost_hlen = vq->vhost_hlen;
>  
>  	vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
>  		vq->log : NULL;
>  
> -	for (;;) {
> -		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
> -					 ARRAY_SIZE(vq->iov),
> -					 &out, &in,
> -					 vq_log, &log);
> +	while ((datalen = vhost_head_len(vq, sock->sk))) {
> +		headcount = vhost_get_desc_n(vq, vq->heads,
> +					     datalen + vhost_hlen,
> +					     &in, vq_log, &log);
> +		if (headcount < 0)
> +			break;
>  		/* OK, now we need to know about added descriptors. */
> -		if (head == vq->num) {
> +		if (!headcount) {
>  			if (unlikely(vhost_enable_notify(vq))) {
>  				/* They have slipped one in as we were
>  				 * doing that: check again. */
> @@ -241,46 +272,53 @@ static void handle_rx(struct vhost_net *
>  			break;
>  		}
>  		/* We don't need to be notified again. */
> -		if (out) {
> -			vq_err(vq, "Unexpected descriptor format for RX: "
> -			       "out %d, int %d\n",
> -			       out, in);
> -			break;
> -		}
> -		/* Skip header. TODO: support TSO/mergeable rx buffers. */
> -		s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, in);
> +		if (vhost_hlen)
> +			/* Skip header. TODO: support TSO. */
> +			s = move_iovec_hdr(vq->iov, vq->hdr, vhost_hlen, in);
> +		else
> +			s = copy_iovec_hdr(vq->iov, vq->hdr, vq->sock_hlen, in);
>  		msg.msg_iovlen = in;
>  		len = iov_length(vq->iov, in);
>  		/* Sanity check */
>  		if (!len) {
>  			vq_err(vq, "Unexpected header len for RX: "
>  			       "%zd expected %zd\n",
> -			       iov_length(vq->hdr, s), hdr_size);
> +			       iov_length(vq->hdr, s), vhost_hlen);
>  			break;
>  		}
>  		err = sock->ops->recvmsg(NULL, sock, &msg,
>  					 len, MSG_DONTWAIT | MSG_TRUNC);
>  		/* TODO: Check specific error and bomb out unless EAGAIN? */
>  		if (err < 0) {
> -			vhost_discard_vq_desc(vq);
> +			vhost_discard_desc(vq, headcount);
>  			break;
>  		}
> -		/* TODO: Should check and handle checksum. */
> -		if (err > len) {
> -			pr_err("Discarded truncated rx packet: "
> -			       " len %d > %zd\n", err, len);
> -			vhost_discard_vq_desc(vq);
> +		if (err != datalen) {
> +			pr_err("Discarded rx packet: "
> +			       " len %d, expected %zd\n", err, datalen);
> +			vhost_discard_desc(vq, headcount);
>  			continue;
>  		}
>  		len = err;
> -		err = memcpy_toiovec(vq->hdr, (unsigned char *)&hdr, hdr_size);
> -		if (err) {
> -			vq_err(vq, "Unable to write vnet_hdr at addr %p: %d\n",
> -			       vq->iov->iov_base, err);
> +		if (vhost_hlen &&
> +		    memcpy_toiovecend(vq->hdr, (unsigned char *)&hdr, 0,
> +				      vhost_hlen)) {
> +			vq_err(vq, "Unable to write vnet_hdr at addr %p\n",
> +			       vq->iov->iov_base);
>  			break;
>  		}
> -		len += hdr_size;
> -		vhost_add_used_and_signal(&net->dev, vq, head, len);
> +		/* TODO: Should check and handle checksum. */
> +		if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF) &&
> +		    memcpy_toiovecend(vq->hdr, (unsigned char *)&headcount,
> +				      offsetof(typeof(hdr), num_buffers),
> +				      sizeof(hdr.num_buffers))) {
> +			vq_err(vq, "Failed num_buffers write");
> +			vhost_discard_desc(vq, headcount);
> +			break;
> +		}
> +		len += vhost_hlen;
> +		vhost_add_used_and_signal_n(&net->dev, vq, vq->heads,
> +					    headcount);
>  		if (unlikely(vq_log))
>  			vhost_log_write(vq, vq_log, log, len);
>  		total_len += len;
> @@ -561,9 +599,21 @@ done:
>  
>  static int vhost_net_set_features(struct vhost_net *n, u64 features)
>  {
> -	size_t hdr_size = features & (1 << VHOST_NET_F_VIRTIO_NET_HDR) ?
> -		sizeof(struct virtio_net_hdr) : 0;
> +	size_t vhost_hlen, sock_hlen, hdr_len;
>  	int i;
> +
> +	hdr_len = (features & (1 << VIRTIO_NET_F_MRG_RXBUF)) ?
> +			sizeof(struct virtio_net_hdr_mrg_rxbuf) :
> +			sizeof(struct virtio_net_hdr);
> +	if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
> +		/* vhost provides vnet_hdr */
> +		vhost_hlen = hdr_len;
> +		sock_hlen = 0;
> +	} else {
> +		/* socket provides vnet_hdr */
> +		vhost_hlen = 0;
> +		sock_hlen = hdr_len;
> +	}
>  	mutex_lock(&n->dev.mutex);
>  	if ((features & (1 << VHOST_F_LOG_ALL)) &&
>  	    !vhost_log_access_ok(&n->dev)) {
> @@ -574,7 +624,8 @@ static int vhost_net_set_features(struct
>  	smp_wmb();
>  	for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
>  		mutex_lock(&n->vqs[i].mutex);
> -		n->vqs[i].hdr_size = hdr_size;
> +		n->vqs[i].vhost_hlen = vhost_hlen;
> +		n->vqs[i].sock_hlen = sock_hlen;
>  		mutex_unlock(&n->vqs[i].mutex);
>  	}
>  	vhost_net_flush(n);
> diff -ruNp net-next-v0/drivers/vhost/vhost.c net-next-v7/drivers/vhost/vhost.c
> --- net-next-v0/drivers/vhost/vhost.c	2010-04-22 11:31:57.000000000 -0700
> +++ net-next-v7/drivers/vhost/vhost.c	2010-04-28 11:16:13.000000000 -0700
> @@ -114,7 +114,8 @@ static void vhost_vq_reset(struct vhost_
>  	vq->used_flags = 0;
>  	vq->log_used = false;
>  	vq->log_addr = -1ull;
> -	vq->hdr_size = 0;
> +	vq->vhost_hlen = 0;
> +	vq->sock_hlen = 0;
>  	vq->private_data = NULL;
>  	vq->log_base = NULL;
>  	vq->error_ctx = NULL;
> @@ -861,6 +862,53 @@ static unsigned get_indirect(struct vhos
>  	return 0;
>  }
>  
> +/* This is a multi-buffer version of vhost_get_desc
> + * @vq		- the relevant virtqueue
> + * datalen	- data length we'll be reading
> + * @iovcount	- returned count of io vectors we fill
> + * @log		- vhost log
> + * @log_num	- log offset
> + *	returns number of buffer heads allocated, negative on error
> + */
> +int vhost_get_desc_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
> +		     int datalen, unsigned *iovcount, struct vhost_log *log,
> +		     unsigned int *log_num)
> +{
> +	unsigned int out, in;
> +	int seg = 0;
> +	int headcount = 0;
> +	int r;
> +
> +	while (datalen > 0) {
> +		if (headcount >= VHOST_NET_MAX_SG) {
> +			r = -ENOBUFS;
> +			goto err;
> +		}
> +		heads[headcount].id = vhost_get_desc(vq->dev, vq, vq->iov + seg,
> +					      ARRAY_SIZE(vq->iov) - seg, &out,
> +					      &in, log, log_num);

By the way, logging here looks broken to me.
Does live migration work for you?

log_num gets zeroed out on each call to vhost_get_desc.
I guess we could just change vhost_get_desc not to zero out
log_num at the start, do it here instead,
and pass in log + *log_num instead of log_num.

Need to also document the API for vhost_get_desc noting
that log_num is incremented, not stored to.

Pls think about it.


> +		if (heads[headcount].id == vq->num) {
> +			r = 0;
> +			goto err;
> +		}
> +		if (out || in <= 0) {
> +			vq_err(vq, "unexpected descriptor format for RX: "
> +				"out %d, in %d\n", out, in);
> +			r = -EINVAL;
> +			goto err;
> +		}
> +		heads[headcount].len = iov_length(vq->iov + seg, in);
> +		datalen -= heads[headcount].len;
> +		++headcount;
> +		seg += in;
> +	}
> +	*iovcount = seg;
> +	return headcount;
> +err:
> +	vhost_discard_desc(vq, headcount);
> +	return r;
> +}
> +
>  /* This looks in the virtqueue and for the first available buffer, and converts
>   * it to an iovec for convenient access.  Since descriptors consist of some
>   * number of output then some number of input descriptors, it's actually two
> @@ -868,7 +916,7 @@ static unsigned get_indirect(struct vhos
>   *
>   * This function returns the descriptor number found, or vq->num (which
>   * is never a valid descriptor number) if none was found. */
> -unsigned vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
> +unsigned vhost_get_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
>  			   struct iovec iov[], unsigned int iov_size,
>  			   unsigned int *out_num, unsigned int *in_num,
>  			   struct vhost_log *log, unsigned int *log_num)
> @@ -986,9 +1034,9 @@ unsigned vhost_get_vq_desc(struct vhost_
>  }
>  
>  /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
> -void vhost_discard_vq_desc(struct vhost_virtqueue *vq)
> +void vhost_discard_desc(struct vhost_virtqueue *vq, int n)
>  {
> -	vq->last_avail_idx--;
> +	vq->last_avail_idx -= n;
>  }
>  
>  /* After we've used one of their buffers, we tell them about it.  We'll then
> @@ -1033,6 +1081,68 @@ int vhost_add_used(struct vhost_virtqueu
>  	return 0;
>  }
>  
> +static void vhost_log_used(struct vhost_virtqueue *vq,
> +			   struct vring_used_elem __user *used)
> +{
> +	/* Make sure data is seen before log. */
> +	smp_wmb();
> +	/* Log used ring entry write. */
> +	log_write(vq->log_base,
> +		  vq->log_addr +
> +		   ((void __user *)used - (void __user *)vq->used),
> +		  sizeof *used);
> +	/* Log used index update. */
> +	log_write(vq->log_base,
> +		  vq->log_addr + offsetof(struct vring_used, idx),
> +		  sizeof vq->used->idx);
> +	if (vq->log_ctx)
> +		eventfd_signal(vq->log_ctx, 1);
> +}
> +
> +static int __vhost_add_used_n(struct vhost_virtqueue *vq,
> +			    struct vring_used_elem *heads,
> +			    unsigned count)
> +{
> +	struct vring_used_elem __user *used;
> +	int start;
> +
> +	start = vq->last_used_idx % vq->num;
> +	used = vq->used->ring + start;
> +	if (copy_to_user(used, heads, count * sizeof *used)) {
> +		vq_err(vq, "Failed to write used");
> +		return -EFAULT;
> +	}
> +	/* Make sure buffer is written before we update index. */
> +	smp_wmb();
> +	if (put_user(vq->last_used_idx + count, &vq->used->idx)) {
> +		vq_err(vq, "Failed to increment used idx");
> +		return -EFAULT;
> +	}
> +	if (unlikely(vq->log_used))
> +		vhost_log_used(vq, used);
> +	vq->last_used_idx += count;
> +	return 0;
> +}
> +
> +/* After we've used one of their buffers, we tell them about it.  We'll then
> + * want to notify the guest, using eventfd. */
> +int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
> +		     unsigned count)
> +{
> +	int start, n, r;
> +
> +	start = vq->last_used_idx % vq->num;
> +	n = vq->num - start;
> +	if (n < count) {
> +		r = __vhost_add_used_n(vq, heads, n);
> +		if (r < 0)
> +			return r;
> +		heads += n;
> +		count -= n;
> +	}
> +	return __vhost_add_used_n(vq, heads, count);
> +}
> +
>  /* This actually signals the guest, using eventfd. */
>  void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  {
> @@ -1062,6 +1172,15 @@ void vhost_add_used_and_signal(struct vh
>  	vhost_signal(dev, vq);
>  }
>  
> +/* multi-buffer version of vhost_add_used_and_signal */
> +void vhost_add_used_and_signal_n(struct vhost_dev *dev,
> +				 struct vhost_virtqueue *vq,
> +				 struct vring_used_elem *heads, unsigned count)
> +{
> +	vhost_add_used_n(vq, heads, count);
> +	vhost_signal(dev, vq);
> +}
> +
>  /* OK, now we need to know about added descriptors. */
>  bool vhost_enable_notify(struct vhost_virtqueue *vq)
>  {
> @@ -1086,7 +1205,7 @@ bool vhost_enable_notify(struct vhost_vi
>  		return false;
>  	}
>  
> -	return avail_idx != vq->last_avail_idx;
> +	return avail_idx != vq->avail_idx;
>  }
>  
>  /* We don't need to be notified again. */
> diff -ruNp net-next-v0/drivers/vhost/vhost.h net-next-v7/drivers/vhost/vhost.h
> --- net-next-v0/drivers/vhost/vhost.h	2010-04-24 21:37:41.000000000 -0700
> +++ net-next-v7/drivers/vhost/vhost.h	2010-04-26 10:35:25.000000000 -0700
> @@ -84,7 +84,9 @@ struct vhost_virtqueue {
>  	struct iovec indirect[VHOST_NET_MAX_SG];
>  	struct iovec iov[VHOST_NET_MAX_SG];
>  	struct iovec hdr[VHOST_NET_MAX_SG];
> -	size_t hdr_size;
> +	size_t vhost_hlen;
> +	size_t sock_hlen;
> +	struct vring_used_elem heads[VHOST_NET_MAX_SG];
>  	/* We use a kind of RCU to access private pointer.
>  	 * All readers access it from workqueue, which makes it possible to
>  	 * flush the workqueue instead of synchronize_rcu. Therefore readers do
> @@ -120,16 +122,23 @@ long vhost_dev_ioctl(struct vhost_dev *,
>  int vhost_vq_access_ok(struct vhost_virtqueue *vq);
>  int vhost_log_access_ok(struct vhost_dev *);
>  
> -unsigned vhost_get_vq_desc(struct vhost_dev *, struct vhost_virtqueue *,
> +int vhost_get_desc_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
> +		     int datalen, unsigned int *iovcount, struct vhost_log *log,
> +		     unsigned int *log_num);
> +unsigned vhost_get_desc(struct vhost_dev *, struct vhost_virtqueue *,
>  			   struct iovec iov[], unsigned int iov_count,
>  			   unsigned int *out_num, unsigned int *in_num,
>  			   struct vhost_log *log, unsigned int *log_num);
> -void vhost_discard_vq_desc(struct vhost_virtqueue *);
> +void vhost_discard_desc(struct vhost_virtqueue *, int);
>  
>  int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
> -void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
> +int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
> +		     unsigned count);
>  void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
> -			       unsigned int head, int len);
> +			       unsigned int id, int len);
> +void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
> +			       struct vring_used_elem *heads, unsigned count);
> +void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
>  void vhost_disable_notify(struct vhost_virtqueue *);
>  bool vhost_enable_notify(struct vhost_virtqueue *);
>  
> @@ -149,7 +158,8 @@ enum {
>  	VHOST_FEATURES = (1 << VIRTIO_F_NOTIFY_ON_EMPTY) |
>  			 (1 << VIRTIO_RING_F_INDIRECT_DESC) |
>  			 (1 << VHOST_F_LOG_ALL) |
> -			 (1 << VHOST_NET_F_VIRTIO_NET_HDR),
> +			 (1 << VHOST_NET_F_VIRTIO_NET_HDR) |
> +			 (1 << VIRTIO_NET_F_MRG_RXBUF),
>  };
>  
>  static inline int vhost_has_feature(struct vhost_dev *dev, int bit)
> 

^ permalink raw reply

* Re: [PATCH linux-2.6.34-rc5] drivers/net/phy: micrel phy driver
From: David Miller @ 2010-05-03 22:49 UTC (permalink / raw)
  To: David.Choi; +Cc: netdev
In-Reply-To: <20100503.154415.160071433.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Mon, 03 May 2010 15:44:15 -0700 (PDT)

> From: "Choi, David" <David.Choi@Micrel.Com>
> Date: Thu, 29 Apr 2010 09:12:41 -0700
> 
>> To whom it may have concerned:
>> 
>> From: David J. Choi <david.choi@micrel.com>
>> Body of the explanation: This is the first version of phy driver from Micrel Inc.
>> Signed-off-by: David J. Choi <david.choi@micrel.com>
> 
> Applied, thank you.

When I merged this into net-next-2.6 from net-2.6, I added the
appropriate module device table to the driver.

phy/micrel: Add module device ID table for autoloading.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/phy/micrel.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 0cd80e4..68dd107 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -102,3 +102,12 @@ module_exit(ksphy_exit);
 MODULE_DESCRIPTION("Micrel PHY driver");
 MODULE_AUTHOR("David J. Choi");
 MODULE_LICENSE("GPL");
+
+static struct mdio_device_id micrel_tbl[] = {
+	{ PHY_ID_KSZ9021, 0x000fff10 },
+	{ PHY_ID_VSC8201, 0x00fffff0 },
+	{ PHY_ID_KS8001, 0x00fffff0 },
+	{ }
+};
+
+MODULE_DEVICE_TABLE(mdio, micrel_tbl);
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH net-next-2.6] net: if6_get_next() fix
From: Paul E. McKenney @ 2010-05-03 22:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, shemminger, Valdis.Kletnieks, akpm, peterz, kaber,
	linux-kernel, netfilter-devel, netdev
In-Reply-To: <1272919814.2407.149.camel@edumazet-laptop>

On Mon, May 03, 2010 at 10:50:14PM +0200, Eric Dumazet wrote:
> Paul, David, here the patch I was thinking about :
> 
> Feel free to split it in two parts if you like, I am too tired and must
> sleep now ;)
> 
> Thanks
> 
> [PATCH net-next-2.6] net: rcu fixes
> 
> Add hlist_for_each_entry_rcu_bh() and
> hlist_for_each_entry_continue_rcu_bh() macros, and use them in
> ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps
> warnings.

Looks good!!!

It will collide with Arnd's sparse-based changes, but that will be
easy to fix, so no problem.

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  include/linux/rculist.h |   29 +++++++++++++++++++++++++++++
>  net/ipv6/addrconf.c     |   16 ++++++++--------
>  2 files changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 004908b..4ec3b38 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -429,6 +429,23 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
>  		pos = rcu_dereference_raw(pos->next))
> 
>  /**
> + * hlist_for_each_entry_rcu_bh - iterate over rcu list of given type
> + * @tpos:	the type * to use as a loop cursor.
> + * @pos:	the &struct hlist_node to use as a loop cursor.
> + * @head:	the head for your list.
> + * @member:	the name of the hlist_node within the struct.
> + *
> + * This list-traversal primitive may safely run concurrently with
> + * the _rcu list-mutation primitives such as hlist_add_head_rcu()
> + * as long as the traversal is guarded by rcu_read_lock().
> + */
> +#define hlist_for_each_entry_rcu_bh(tpos, pos, head, member)		 \
> +	for (pos = rcu_dereference_bh((head)->first);			 \
> +		pos && ({ prefetch(pos->next); 1; }) &&			 \
> +		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
> +		pos = rcu_dereference_bh(pos->next))
> +
> +/**
>   * hlist_for_each_entry_continue_rcu - iterate over a hlist continuing after current point
>   * @tpos:	the type * to use as a loop cursor.
>   * @pos:	the &struct hlist_node to use as a loop cursor.
> @@ -440,6 +457,18 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
>  	     ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });  \
>  	     pos = rcu_dereference(pos->next))
> 
> +/**
> + * hlist_for_each_entry_continue_rcu_bh - iterate over a hlist continuing after current point
> + * @tpos:	the type * to use as a loop cursor.
> + * @pos:	the &struct hlist_node to use as a loop cursor.
> + * @member:	the name of the hlist_node within the struct.
> + */
> +#define hlist_for_each_entry_continue_rcu_bh(tpos, pos, member)		\
> +	for (pos = rcu_dereference_bh((pos)->next);			\
> +	     pos && ({ prefetch(pos->next); 1; }) &&			\
> +	     ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });  \
> +	     pos = rcu_dereference_bh(pos->next))
> +
> 
>  #endif	/* __KERNEL__ */
>  #endif
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 34d2d64..3984f52 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1346,7 +1346,7 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *add
>  	struct hlist_node *node;
> 
>  	rcu_read_lock_bh();
> -	hlist_for_each_entry_rcu(ifp, node, &inet6_addr_lst[hash], addr_lst) {
> +	hlist_for_each_entry_rcu_bh(ifp, node, &inet6_addr_lst[hash], addr_lst) {
>  		if (!net_eq(dev_net(ifp->idev->dev), net))
>  			continue;
>  		if (ipv6_addr_equal(&ifp->addr, addr)) {
> @@ -2959,7 +2959,7 @@ static struct inet6_ifaddr *if6_get_first(struct seq_file *seq)
> 
>  	for (state->bucket = 0; state->bucket < IN6_ADDR_HSIZE; ++state->bucket) {
>  		struct hlist_node *n;
> -		hlist_for_each_entry_rcu(ifa, n, &inet6_addr_lst[state->bucket],
> +		hlist_for_each_entry_rcu_bh(ifa, n, &inet6_addr_lst[state->bucket],
>  					 addr_lst)
>  			if (net_eq(dev_net(ifa->idev->dev), net))
>  				return ifa;
> @@ -2974,12 +2974,12 @@ static struct inet6_ifaddr *if6_get_next(struct seq_file *seq,
>  	struct net *net = seq_file_net(seq);
>  	struct hlist_node *n = &ifa->addr_lst;
> 
> -	hlist_for_each_entry_continue_rcu(ifa, n, addr_lst)
> +	hlist_for_each_entry_continue_rcu_bh(ifa, n, addr_lst)
>  		if (net_eq(dev_net(ifa->idev->dev), net))
>  			return ifa;
> 
>  	while (++state->bucket < IN6_ADDR_HSIZE) {
> -		hlist_for_each_entry(ifa, n,
> +		hlist_for_each_entry_rcu_bh(ifa, n,
>  				     &inet6_addr_lst[state->bucket], addr_lst) {
>  			if (net_eq(dev_net(ifa->idev->dev), net))
>  				return ifa;
> @@ -3000,7 +3000,7 @@ static struct inet6_ifaddr *if6_get_idx(struct seq_file *seq, loff_t pos)
>  }
> 
>  static void *if6_seq_start(struct seq_file *seq, loff_t *pos)
> -	__acquires(rcu)
> +	__acquires(rcu_bh)
>  {
>  	rcu_read_lock_bh();
>  	return if6_get_idx(seq, *pos);
> @@ -3016,7 +3016,7 @@ static void *if6_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>  }
> 
>  static void if6_seq_stop(struct seq_file *seq, void *v)
> -	__releases(rcu)
> +	__releases(rcu_bh)
>  {
>  	rcu_read_unlock_bh();
>  }
> @@ -3093,7 +3093,7 @@ int ipv6_chk_home_addr(struct net *net, struct in6_addr *addr)
>  	unsigned int hash = ipv6_addr_hash(addr);
> 
>  	rcu_read_lock_bh();
> -	hlist_for_each_entry_rcu(ifp, n, &inet6_addr_lst[hash], addr_lst) {
> +	hlist_for_each_entry_rcu_bh(ifp, n, &inet6_addr_lst[hash], addr_lst) {
>  		if (!net_eq(dev_net(ifp->idev->dev), net))
>  			continue;
>  		if (ipv6_addr_equal(&ifp->addr, addr) &&
> @@ -3127,7 +3127,7 @@ static void addrconf_verify(unsigned long foo)
> 
>  	for (i = 0; i < IN6_ADDR_HSIZE; i++) {
>  restart:
> -		hlist_for_each_entry_rcu(ifp, node,
> +		hlist_for_each_entry_rcu_bh(ifp, node,
>  					 &inet6_addr_lst[i], addr_lst) {
>  			unsigned long age;
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: if6_get_next() fix
From: David Miller @ 2010-05-03 22:54 UTC (permalink / raw)
  To: paulmck
  Cc: eric.dumazet, shemminger, Valdis.Kletnieks, akpm, peterz, kaber,
	linux-kernel, netfilter-devel, netdev
In-Reply-To: <20100503225229.GO2597@linux.vnet.ibm.com>

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Mon, 3 May 2010 15:52:29 -0700

> On Mon, May 03, 2010 at 10:50:14PM +0200, Eric Dumazet wrote:
>> Paul, David, here the patch I was thinking about :
>> 
>> Feel free to split it in two parts if you like, I am too tired and must
>> sleep now ;)
>> 
>> Thanks
>> 
>> [PATCH net-next-2.6] net: rcu fixes
>> 
>> Add hlist_for_each_entry_rcu_bh() and
>> hlist_for_each_entry_continue_rcu_bh() macros, and use them in
>> ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps
>> warnings.
> 
> Looks good!!!
> 
> It will collide with Arnd's sparse-based changes, but that will be
> easy to fix, so no problem.
> 
> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Applied, thanks!

^ permalink raw reply

* Re: [RFC PATCH v2] sctp: fix sctp to work with ipv6 source address routing
From: David Miller @ 2010-05-03 22:56 UTC (permalink / raw)
  To: vladislav.yasevich; +Cc: Weixing.Shi, yjwei, netdev
In-Reply-To: <1272906432-6237-1-git-send-email-vladislav.yasevich@hp.com>

From: Vlad Yasevich <vladislav.yasevich@hp.com>
Date: Mon,  3 May 2010 13:07:12 -0400

> From: Weixing Shi <Weixing.Shi@windriver.com>
> 
> <vlad>
> Ok, updated to be a bit more correct.  Only leave the function early if the
> first lookup succeeds.
> </vlad>

This patch looks fine to me.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox