* Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Michael S. Tsirkin @ 2012-12-27 11:51 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <50DBC1B8.9050406@redhat.com>
On Thu, Dec 27, 2012 at 11:34:16AM +0800, Jason Wang wrote:
> On 12/26/2012 06:46 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 26, 2012 at 03:06:54PM +0800, Wanlong Gao wrote:
> >> Add a cpu notifier to virtio-net, so that we can reset the
> >> virtqueue affinity if the cpu hotplug happens. It improve
> >> the performance through enabling or disabling the virtqueue
> >> affinity after doing cpu hotplug.
> >>
> >> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Jason Wang <jasowang@redhat.com>
> >> Cc: virtualization@lists.linux-foundation.org
> >> Cc: netdev@vger.kernel.org
> >> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> > Thanks for looking into this.
> > Some comments:
> >
> > 1. Looks like the logic in
> > virtnet_set_affinity (and in virtnet_select_queue)
> > will not work very well when CPU IDs are not
> > consequitive. This can happen with hot unplug.
> >
> > Maybe we should add a VQ allocator, and defining
> > a per-cpu variable specifying the VQ instead
> > of using CPU ID.
>
> Yes, and generate the affinity hint based on the mapping. Btw, what does
> VQ allocator means here?
Some logic to generate CPU to VQ mapping.
> >
> >
> > 2. The below code seems racy e.g. when CPU is added
> > during device init.
> >
> > 3. using a global cpu_hotplug seems inelegant.
> > In any case we should document what is the
> > meaning of this variable.
> >
> >> ---
> >> drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++++++-
> >> 1 file changed, 38 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index a6fcf15..9710cf4 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -26,6 +26,7 @@
> >> #include <linux/scatterlist.h>
> >> #include <linux/if_vlan.h>
> >> #include <linux/slab.h>
> >> +#include <linux/cpu.h>
> >>
> >> static int napi_weight = 128;
> >> module_param(napi_weight, int, 0444);
> >> @@ -34,6 +35,8 @@ static bool csum = true, gso = true;
> >> module_param(csum, bool, 0444);
> >> module_param(gso, bool, 0444);
> >>
> >> +static bool cpu_hotplug = false;
> >> +
> >> /* FIXME: MTU in config. */
> >> #define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> >> #define GOOD_COPY_LEN 128
> >> @@ -1041,6 +1044,26 @@ static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
> >> vi->affinity_hint_set = false;
> >> }
> >>
> >> +static int virtnet_cpu_callback(struct notifier_block *nfb,
> >> + unsigned long action, void *hcpu)
> >> +{
> >> + switch(action) {
> >> + case CPU_ONLINE:
> >> + case CPU_ONLINE_FROZEN:
> >> + case CPU_DEAD:
> >> + case CPU_DEAD_FROZEN:
> >> + cpu_hotplug = true;
> >> + break;
> >> + default:
> >> + break;
> >> + }
> >> + return NOTIFY_OK;
> >> +}
> >> +
> >> +static struct notifier_block virtnet_cpu_notifier = {
> >> + .notifier_call = virtnet_cpu_callback,
> >> +};
> >> +
> >> static void virtnet_get_ringparam(struct net_device *dev,
> >> struct ethtool_ringparam *ring)
> >> {
> >> @@ -1131,7 +1154,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> >> */
> >> static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
> >> {
> >> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> >> + int txq;
> >> +
> >> + if (unlikely(cpu_hotplug == true)) {
> >> + virtnet_set_affinity(netdev_priv(dev), true);
> >> + cpu_hotplug = false;
> >> + }
> >> +
> >> + txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> >> smp_processor_id();
> >>
> >> while (unlikely(txq >= dev->real_num_tx_queues))
> >> @@ -1248,6 +1278,8 @@ static void virtnet_del_vqs(struct virtnet_info *vi)
> >> {
> >> struct virtio_device *vdev = vi->vdev;
> >>
> >> + unregister_hotcpu_notifier(&virtnet_cpu_notifier);
> >> +
> >> virtnet_set_affinity(vi, false);
> >>
> >> vdev->config->del_vqs(vdev);
> >> @@ -1372,6 +1404,11 @@ static int init_vqs(struct virtnet_info *vi)
> >> goto err_free;
> >>
> >> virtnet_set_affinity(vi, true);
> >> +
> >> + ret = register_hotcpu_notifier(&virtnet_cpu_notifier);
> >> + if (ret)
> >> + goto err_free;
> >> +
> >> return 0;
> >>
> >> err_free:
> >> --
> >> 1.8.0
^ permalink raw reply
* echo 0 > /proc/sys/net/ipv6/conf/lo/disable_ipv6 broken re-investigation
From: Balakumaran Kannan @ 2012-12-27 12:30 UTC (permalink / raw)
To: netdev
Dear All,
Please have a look at the references below.
Ref1: http://markmail.org/message/q2u2eik3pvm6qprg#query:+page:1+mid:q2u2eik3pvm6qprg+state:results
Ref2: http://markmail.org/message/ikfz2iiuyimwp35l#query:+page:1+mid:ikfz2iiuyimwp35l+state:results
In Ref1, "disable_ipv6 for lo broken in 2.6.37-rc4" issue was reported.
In Ref2, this issue seems to have fixed by "reverting 'administrative
down' address handling" commits.
It seems above fix solves only "ping6 ::1" issue, But it has some
other issues like below
1. ping6 to link local addresses of the host fails.
2. ping6 to link local addresses of other hosts fails.
3. It seems link local addresses of other interfaces has become non
functional, so dhcpv6 will fail to assign dynamic IPv6 address because
dhcp6c uses link local address of the interface.
The above observation is on 2.6.35 kernel, but it seems this can be
reproduced in latest kernel versions also (3.6).
Could anybody verify and confirms this issue?
It seems the fix provided in Ref2, adds route only for ::1 but not for
other local addresses (host's own address). Disabling IPv6 of lo
removes all entries for lo from routing table. Enabling IPv6 of lo
doesn't re-creates these entries. So pinging to local addresses of
other interfaces (for example eth0) fails. Also can't able to ping
other machines connected to host. It seems like a total network
communication failure happens after disabling and enabling IPv6 of lo.
Regards,
K.Balakumaran
^ permalink raw reply
* Re: [PATCH 1/2] vhost_net: correct error hanlding in vhost_net_set_backend()
From: Michael S. Tsirkin @ 2012-12-27 13:03 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <1356590360-32770-1-git-send-email-jasowang@redhat.com>
On Thu, Dec 27, 2012 at 02:39:20PM +0800, Jason Wang wrote:
> Currently, polling error were ignored in vhost. This may lead some issues (e.g
> kenrel crash when passing a tap fd to vhost before calling TUNSETIFF). Fix this
> by:
>
> - extend the idea of vhost_net_poll_state to all vhost_polls
> - change the state only when polling is succeed
> - make vhost_poll_start() report errors to the caller, which could be used
> caller or userspace.
Maybe it could but this patch just ignores these errors.
And it's not clear how would userspace handle these errors.
Also, since we have a reference on the fd, it would seem
that once poll succeeds it can't fail in the future.
So two other options would make more sense to me:
- if vhost is bound to tun without SETIFF, fail this immediately
- if vhost is bound to tun without SETIFF, defer polling
until SETIFF
Option 1 would seem much easier to implement, I think it's
preferable.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/net.c | 75 +++++++++++++++++--------------------------------
> drivers/vhost/vhost.c | 16 +++++++++-
> drivers/vhost/vhost.h | 11 ++++++-
> 3 files changed, 50 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 629d6b5..56e7f5a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -64,20 +64,10 @@ enum {
> VHOST_NET_VQ_MAX = 2,
> };
>
> -enum vhost_net_poll_state {
> - VHOST_NET_POLL_DISABLED = 0,
> - VHOST_NET_POLL_STARTED = 1,
> - VHOST_NET_POLL_STOPPED = 2,
> -};
> -
> struct vhost_net {
> struct vhost_dev dev;
> struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
> struct vhost_poll poll[VHOST_NET_VQ_MAX];
> - /* Tells us whether we are polling a socket for TX.
> - * We only do this when socket buffer fills up.
> - * Protected by tx vq lock. */
> - enum vhost_net_poll_state tx_poll_state;
> /* Number of TX recently submitted.
> * Protected by tx vq lock. */
> unsigned tx_packets;
> @@ -155,24 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
> }
> }
>
> -/* Caller must have TX VQ lock */
> -static void tx_poll_stop(struct vhost_net *net)
> -{
> - if (likely(net->tx_poll_state != VHOST_NET_POLL_STARTED))
> - return;
> - vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
> - net->tx_poll_state = VHOST_NET_POLL_STOPPED;
> -}
> -
> -/* Caller must have TX VQ lock */
> -static void tx_poll_start(struct vhost_net *net, struct socket *sock)
> -{
> - if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
> - return;
> - vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> - net->tx_poll_state = VHOST_NET_POLL_STARTED;
> -}
> -
> /* In case of DMA done not in order in lower device driver for some reason.
> * upend_idx is used to track end of used idx, done_idx is used to track head
> * of used idx. Once lower device DMA done contiguously, we will signal KVM
> @@ -252,7 +224,7 @@ static void handle_tx(struct vhost_net *net)
> wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> if (wmem >= sock->sk->sk_sndbuf) {
> mutex_lock(&vq->mutex);
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> mutex_unlock(&vq->mutex);
> return;
> }
> @@ -261,7 +233,7 @@ static void handle_tx(struct vhost_net *net)
> vhost_disable_notify(&net->dev, vq);
>
> if (wmem < sock->sk->sk_sndbuf / 2)
> - tx_poll_stop(net);
> + vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
> hdr_size = vq->vhost_hlen;
> zcopy = vq->ubufs;
>
> @@ -283,7 +255,8 @@ static void handle_tx(struct vhost_net *net)
>
> wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
> + sock->file);
> set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
> break;
> }
> @@ -294,7 +267,8 @@ static void handle_tx(struct vhost_net *net)
> (vq->upend_idx - vq->done_idx) :
> (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
> if (unlikely(num_pends > VHOST_MAX_PEND)) {
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
> + sock->file);
> set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
> break;
> }
> @@ -360,7 +334,8 @@ static void handle_tx(struct vhost_net *net)
> }
> vhost_discard_vq_desc(vq, 1);
> if (err == -EAGAIN || err == -ENOBUFS)
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
> + sock->file);
> break;
> }
> if (err != len)
> @@ -623,7 +598,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>
> vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
> vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
> - n->tx_poll_state = VHOST_NET_POLL_DISABLED;
>
> f->private_data = n;
>
> @@ -635,27 +609,26 @@ static void vhost_net_disable_vq(struct vhost_net *n,
> {
> if (!vq->private_data)
> return;
> - if (vq == n->vqs + VHOST_NET_VQ_TX) {
> - tx_poll_stop(n);
> - n->tx_poll_state = VHOST_NET_POLL_DISABLED;
> - } else
> + if (vq == n->vqs + VHOST_NET_VQ_TX)
> + vhost_poll_stop(n->poll + VHOST_NET_VQ_TX);
> + else
> vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
> }
>
> -static void vhost_net_enable_vq(struct vhost_net *n,
> - struct vhost_virtqueue *vq)
> +static int vhost_net_enable_vq(struct vhost_net *n,
> + struct vhost_virtqueue *vq)
> {
> + int err, index = vq - n->vqs;
> struct socket *sock;
>
> sock = rcu_dereference_protected(vq->private_data,
> lockdep_is_held(&vq->mutex));
> if (!sock)
> - return;
> - if (vq == n->vqs + VHOST_NET_VQ_TX) {
> - n->tx_poll_state = VHOST_NET_POLL_STOPPED;
> - tx_poll_start(n, sock);
> - } else
> - vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
> + return 0;
> +
> + n->poll[index].state = VHOST_POLL_STOPPED;
> + err = vhost_poll_start(n->poll + index, sock->file);
> + return err;
> }
>
> static struct socket *vhost_net_stop_vq(struct vhost_net *n,
> @@ -831,12 +804,16 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> vq->ubufs = ubufs;
> vhost_net_disable_vq(n, vq);
> rcu_assign_pointer(vq->private_data, sock);
> - vhost_net_enable_vq(n, vq);
> + r = vhost_net_enable_vq(n, vq);
> + if (r) {
> + sock = NULL;
> + goto err_enable;
> + }
>
> r = vhost_init_used(vq);
> if (r) {
> sock = NULL;
> - goto err_used;
> + goto err_enable;
> }
>
> n->tx_packets = 0;
> @@ -861,7 +838,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> mutex_unlock(&n->dev.mutex);
> return 0;
>
> -err_used:
> +err_enable:
> if (oldubufs)
> vhost_ubuf_put_and_wait(oldubufs);
> if (oldsock)
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 34389f7..1cb2604 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -77,26 +77,36 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> init_poll_funcptr(&poll->table, vhost_poll_func);
> poll->mask = mask;
> poll->dev = dev;
> + poll->state = VHOST_POLL_DISABLED;
>
> vhost_work_init(&poll->work, fn);
> }
>
> /* Start polling a file. We add ourselves to file's wait queue. The caller must
> * keep a reference to a file until after vhost_poll_stop is called. */
> -void vhost_poll_start(struct vhost_poll *poll, struct file *file)
> +int vhost_poll_start(struct vhost_poll *poll, struct file *file)
> {
> unsigned long mask;
> + if (unlikely(poll->state != VHOST_POLL_STOPPED))
> + return 0;
>
> mask = file->f_op->poll(file, &poll->table);
> + if (mask & POLLERR)
> + return -EINVAL;
> if (mask)
> vhost_poll_wakeup(&poll->wait, 0, 0, (void *)mask);
> + poll->state = VHOST_POLL_STARTED;
> + return 0;
> }
>
Hmm, interesting. I note that tun has this:
if (tun->dev->reg_state != NETREG_REGISTERED)
mask = POLLERR;
So apparently we sometimes return POLLERR when poll
did succeed, then test below wouldn't remove
from wqh in this case. Maybe it's a bug in tun,
need to look into this.
> /* Stop polling a file. After this function returns, it becomes safe to drop the
> * file reference. You must also flush afterwards. */
> void vhost_poll_stop(struct vhost_poll *poll)
> {
> + if (likely(poll->state != VHOST_POLL_STARTED))
> + return;
> remove_wait_queue(poll->wqh, &poll->wait);
> + poll->state = VHOST_POLL_STOPPED;
> }
>
> static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
> @@ -791,8 +801,10 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
> if (filep)
> fput(filep);
>
> - if (pollstart && vq->handle_kick)
> + if (pollstart && vq->handle_kick) {
> + vq->poll.state = VHOST_POLL_STOPPED;
> vhost_poll_start(&vq->poll, vq->kick);
> + }
>
> mutex_unlock(&vq->mutex);
>
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 2639c58..98861d9 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -26,6 +26,12 @@ struct vhost_work {
> unsigned done_seq;
> };
>
> +enum vhost_poll_state {
> + VHOST_POLL_DISABLED = 0,
> + VHOST_POLL_STARTED = 1,
> + VHOST_POLL_STOPPED = 2,
> +};
> +
> /* Poll a file (eventfd or socket) */
> /* Note: there's nothing vhost specific about this structure. */
> struct vhost_poll {
> @@ -35,6 +41,9 @@ struct vhost_poll {
> struct vhost_work work;
> unsigned long mask;
> struct vhost_dev *dev;
> + /* Tells us whether we are polling a file.
> + * Protected by tx vq lock. */
tx vq lock does not make sense in this context.
> + enum vhost_poll_state state;
> };
>
> void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
> @@ -42,7 +51,7 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
>
> void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> unsigned long mask, struct vhost_dev *dev);
> -void vhost_poll_start(struct vhost_poll *poll, struct file *file);
> +int vhost_poll_start(struct vhost_poll *poll, struct file *file);
> void vhost_poll_stop(struct vhost_poll *poll);
> void vhost_poll_flush(struct vhost_poll *poll);
> void vhost_poll_queue(struct vhost_poll *poll);
> --
> 1.7.1
^ permalink raw reply
* Re: [PATCH 1/2] vhost_net: correct error hanlding in vhost_net_set_backend()
From: Michael S. Tsirkin @ 2012-12-27 13:14 UTC (permalink / raw)
To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <1356590360-32770-1-git-send-email-jasowang@redhat.com>
On Thu, Dec 27, 2012 at 02:39:19PM +0800, Jason Wang wrote:
> Fix the leaking of oldubufs and fd refcnt when fail to initialized used ring.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/net.c | 14 +++++++++++---
> 1 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ebd08b2..629d6b5 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -834,8 +834,10 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> vhost_net_enable_vq(n, vq);
>
> r = vhost_init_used(vq);
> - if (r)
> - goto err_vq;
> + if (r) {
> + sock = NULL;
> + goto err_used;
> + }
>
> n->tx_packets = 0;
> n->tx_zcopy_err = 0;
> @@ -859,8 +861,14 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> mutex_unlock(&n->dev.mutex);
> return 0;
>
> +err_used:
> + if (oldubufs)
> + vhost_ubuf_put_and_wait(oldubufs);
> + if (oldsock)
> + fput(oldsock->file);
> err_ubufs:
> - fput(sock->file);
> + if (sock)
> + fput(sock->file);
> err_vq:
> mutex_unlock(&vq->mutex);
> err:
I think it's a real bug, but I don't see how the fix
makes sense.
We are returning an error, so we ideally
revert to the state before the faulty
operation. So this should put sock and ubufs,
not oldsock/oldubufs.
The best way is probably to change
vhost_init_used so that it gets private data
pointer as a parameter.
We can then call it before ubuf alloc.
You can then add err_used right after err_ubufs
with no extra logic.
--
MST
^ permalink raw reply
* netlink NLM_F_DUMP for unsupported address family / PF_UNSPEC
From: Timo Teras @ 2012-12-27 15:19 UTC (permalink / raw)
To: netdev
It seems that currently PF_UNSPEC is overloaded when dumping rtnetlink
things. It is used for two purposes: as wildcard entry to dump
all protocol families, and as fallback for unsupported families.
On my system with IPv[46] only and no IPX, running "ip -f ipx
route" would print the IPv4 and IPv6 routes instead of "unsupported" or
"not implemented" error which is rather confusing and unexpected.
Just removing the fallback from rtnl_get_dumpit() does not sound right
since some commands seem to rely on this behaviour e.g. RTM_GETQDISC.
Perhaps rtnl_dump_all should check that request family truly was
PF_UNSPEC or error out if not?
- Timo
^ permalink raw reply
* Re: Linux kernel handling of IPv6 temporary addresses
From: George Kargiotakis @ 2012-12-27 15:57 UTC (permalink / raw)
To: netdev
In-Reply-To: <20121114.180824.1930899985436392426.davem@davemloft.net>
Hello all,
I had previously informed this list about the issue of the linux kernel
losing IPv6 privacy extensions by a local LAN attacker.
Recently I've found that there's actually another, more serious in my
opinion, issue that follows the previous one. If the user tries to
disconnect/reconnect the network device/connection for whatever reason
(e.g. thinking he might gain back privacy extensions), then the device
gets IPs from SLAAC that have the "tentative" flag and never loses
that. That means that IPv6 functionality for that device is from then
on completely lost. I haven't been able to bring back the kernel to a
working IPv6 state without a reboot.
This is definitely a DoS situation and it needs fixing.
Here are the steps to reproduce:
== Step 1. Boot Ubuntu 12.10 (kernel 3.5.0-17-generic) ==
ubuntu@ubuntu:~$ ip a ls dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:54:00:8b:99:5d brd ff:ff:ff:ff:ff:ff
inet 192.168.1.96/24 brd 192.168.1.255 scope global eth0
inet6 2001:db8:f00:f00:ad1f:9166:93d4:fd6d/64 scope global temporary dynamic
valid_lft 86379sec preferred_lft 3579sec
inet6 2001:db8:f00:f00:5054:ff:fe8b:995d/64 scope global dynamic
valid_lft 86379sec preferred_lft 3579sec
inet6 fdbb:aaaa:bbbb:cccc:ad1f:9166:93d4:fd6d/64 scope global temporary dynamic
valid_lft 86379sec preferred_lft 3579sec
inet6 fdbb:aaaa:bbbb:cccc:5054:ff:fe8b:995d/64 scope global dynamic
valid_lft 86379sec preferred_lft 3579sec
inet6 fe80::5054:ff:fe8b:995d/64 scope link
valid_lft forever preferred_lft forever
ubuntu@ubuntu:~$ sysctl -a | grep use_tempaddr
net.ipv6.conf.all.use_tempaddr = 2
net.ipv6.conf.default.use_tempaddr = 2
net.ipv6.conf.eth0.use_tempaddr = 2
net.ipv6.conf.lo.use_tempaddr = 2
ubuntu@ubuntu:~$ nmcli con status
NAME UUID DEVICES DEFAULT VPN MASTER-PATH
Wired connection 1 923e6729-74a7-4389-9dbd-43ed7db3d1b8 eth0 yes no --
ubuntu@ubuntu:~$ nmcli dev status
DEVICE TYPE STATE
eth0 802-3-ethernet connected
//ping6 2a00:1450:4002:800::100e while in another terminal: tcpdump -ni eth0 ip6
ubuntu@ubuntu:~$ ping6 2a00:1450:4002:800::100e -c1
PING 2a00:1450:4002:800::100e(2a00:1450:4002:800::100e) 56 data bytes
64 bytes from 2a00:1450:4002:800::100e: icmp_seq=1 ttl=53 time=70.9 ms
--- 2a00:1450:4002:800::100e ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 70.994/70.994/70.994/0.000 ms
# tcpdump -ni eth0 host 2a00:1450:4002:800::100e
17:57:37.784658 IP6 2001:db8:f00:f00:ad1f:9166:93d4:fd6d > 2a00:1450:4002:800::100e: ICMP6, echo request, seq 1, length 64
17:57:37.855257 IP6 2a00:1450:4002:800::100e > 2001:db8:f00:f00:ad1f:9166:93d4:fd6d: ICMP6, echo reply, seq 1, length 64
== Step 2. flood RAs on the LAN ==
$ dmesg | tail
[ 1093.642053] IPv6: ipv6_create_tempaddr: retry temporary address regeneration
[ 1093.642062] IPv6: ipv6_create_tempaddr: retry temporary address regeneration
[ 1093.642065] IPv6: ipv6_create_tempaddr: retry temporary address regeneration
[ 1093.642067] IPv6: ipv6_create_tempaddr: regeneration time exceeded - disabled temporary address support
ubuntu@ubuntu:~$ sysctl -a | grep use_tempaddr
net.ipv6.conf.all.use_tempaddr = 2
net.ipv6.conf.default.use_tempaddr = 2
net.ipv6.conf.eth0.use_tempaddr = -1
net.ipv6.conf.lo.use_tempaddr = 2
//ping6 2a00:1450:4002:800::100e while in another terminal: tcpdump -ni eth0 ip6
ubuntu@ubuntu:~$ ping6 2a00:1450:4002:800::100e -c1
PING 2a00:1450:4002:800::100e(2a00:1450:4002:800::100e) 56 data bytes
64 bytes from 2a00:1450:4002:800::100e: icmp_seq=1 ttl=53 time=77.5 ms
--- 2a00:1450:4002:800::100e ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 77.568/77.568/77.568/0.000 ms
# tcpdump -ni eth0 host 2a00:1450:4002:800::100e
17:59:38.204173 IP6 2001:db8:f00:f00:5054:ff:fe8b:995d > 2a00:1450:4002:800::100e: ICMP6, echo request, seq 1, length 64
17:59:38.281437 IP6 2a00:1450:4002:800::100e > 2001:db8:f00:f00:5054:ff:fe8b:995d: ICMP6, echo reply, seq 1, length 64
//notice the change of IPv6 address to the one not using privacy extensions even after the flooding has finished long ago.
== Step 3. Disconnect/Reconnect connection ==
// restoring net.ipv6.conf.eth0.use_tempaddr to value '2' makes no difference at all for the rest of the process
# nmcli dev disconnect iface eth0
# nmcli con up uuid 923e6729-74a7-4389-9dbd-43ed7db3d1b8
ubuntu@ubuntu:~$ ip a ls dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:54:00:8b:99:5d brd ff:ff:ff:ff:ff:ff
inet 192.168.1.96/24 brd 192.168.1.255 scope global eth0
inet6 2001:db8:f00:f00:5054:ff:fe8b:995d/64 scope global tentative dynamic
valid_lft 86400sec preferred_lft 3600sec
inet6 fdbb:aaaa:bbbb:cccc:5054:ff:fe8b:995d/64 scope global tentative dynamic
valid_lft 86400sec preferred_lft 3600sec
inet6 fe80::5054:ff:fe8b:995d/64 scope link tentative
valid_lft forever preferred_lft forever
//Notice the "tentative" flag of the IPs on the device
//ping6 2a00:1450:4002:800::100e while in another terminal: tcpdump -ni eth0 ip6
ubuntu@ubuntu:~$ ping6 2a00:1450:4002:800::100e -c1
PING 2a00:1450:4002:800::100e(2a00:1450:4002:800::100e) 56 data bytes
^C
--- 2a00:1450:4002:800::100e ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
# tcpdump -ni eth0 host 2a00:1450:4002:800::100e
18:01:45.264194 IP6 ::1 > 2a00:1450:4002:800::100e: ICMP6, echo request, seq 1, length 64
Summary:
Before flooding it uses IP: 2001:db8:f00:f00:ad1f:9166:93d4:fd6d
After flooding it uses IP: 2001:db8:f00:f00:5054:ff:fe8b:995d --> it has lost privacy extensions
After disconnect/reconnect it tries to use IP: ::1 --> it has lost IPv6 connectivity
Best regards,
--
George Kargiotakis
https://void.gr
GPG KeyID: 0xE4F4FFE6
GPG Fingerprint: 9EB8 31BE C618 07CE 1B51 818D 4A0A 1BC8 E4F4 FFE6
^ permalink raw reply
* [PATCH 3.8 1/5 V2] rtlwifi: Fix warning for unchecked pci_map_single() call
From: Larry Finger @ 2012-12-27 16:37 UTC (permalink / raw)
To: linville-2XuSBdqkA4R54TAoqtyWWQ
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA, Larry Finger,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1356626252-4058-1-git-send-email-Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
Kernel 3.8 implements checking of all DMA mapping calls and issues
a WARNING for the first it finds that is not checked.
Signed-off-by: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
---
drivers/net/wireless/rtlwifi/pci.c | 6 ++++++
1 file changed, 6 insertions(+)
No change for V2.
diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c
index 3deacaf..4261e8e 100644
--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -743,6 +743,8 @@ static void _rtl_pci_rx_interrupt(struct ieee80211_hw *hw)
done:
bufferaddress = (*((dma_addr_t *)skb->cb));
+ if (pci_dma_mapping_error(rtlpci->pdev, bufferaddress))
+ return;
tmp_one = 1;
rtlpriv->cfg->ops->set_desc((u8 *) pdesc, false,
HW_DESC_RXBUFF_ADDR,
@@ -1115,6 +1117,10 @@ static int _rtl_pci_init_rx_ring(struct ieee80211_hw *hw)
PCI_DMA_FROMDEVICE);
bufferaddress = (*((dma_addr_t *)skb->cb));
+ if (pci_dma_mapping_error(rtlpci->pdev, bufferaddress)) {
+ dev_kfree_skb_any(skb);
+ return 1;
+ }
rtlpriv->cfg->ops->set_desc((u8 *)entry, false,
HW_DESC_RXBUFF_ADDR,
(u8 *)&bufferaddress);
--
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 3.8 0/5 V2] Fixes for WARNINGS due to unchecked DMA mapping calls
From: Larry Finger @ 2012-12-27 16:37 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, Larry Finger, netdev
Beginning with kernel 3.8, the DMA mapping systems begins issuing a WARNING for
the first call that maps DMA and fails to call dma_mapping_error() to check
the result. This set of patches adds the appropriate checks to the rtlwifi
family of drivers.
V2 fixes a copy-and-paste error in patch 4/5. All others are unchanged.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Larry Finger (5):
rtlwifi: Fix warning for unchecked pci_map_single() call
rtlwifi: rtl8192ce: Fix warning for unchecked pci_map_single() call
rtlwifi: rtl8192de: Fix warning for unchecked pci_map_single() call
rtlwifi: rtl8192se: Fix warning for unchecked pci_map_single() call
rtlwifi: rtl8723ae: Fix warning for unchecked pci_map_single() call
drivers/net/wireless/rtlwifi/pci.c | 6 ++++++
drivers/net/wireless/rtlwifi/rtl8192ce/trx.c | 11 +++++++++++
drivers/net/wireless/rtlwifi/rtl8192de/trx.c | 10 ++++++++++
drivers/net/wireless/rtlwifi/rtl8192se/trx.c | 13 ++++++++++++-
drivers/net/wireless/rtlwifi/rtl8723ae/trx.c | 12 ++++++++++++
5 files changed, 51 insertions(+), 1 deletion(-)
--
1.7.10.4
^ permalink raw reply
* [PATCH 3.8 3/5 V2] rtlwifi: rtl8192de: Fix warning for unchecked pci_map_single() call
From: Larry Finger @ 2012-12-27 16:37 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1356626252-4058-1-git-send-email-Larry.Finger@lwfinger.net>
Kernel 3.8 implements checking of all DMA mapping calls and issues
a WARNING for the first it finds that is not checked.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
drivers/net/wireless/rtlwifi/rtl8192de/trx.c | 10 ++++++++++
1 file changed, 10 insertions(+)
No change for V2.
diff --git a/drivers/net/wireless/rtlwifi/rtl8192de/trx.c b/drivers/net/wireless/rtlwifi/rtl8192de/trx.c
index f9f3861..a0fbf28 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192de/trx.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192de/trx.c
@@ -587,6 +587,11 @@ void rtl92de_tx_fill_desc(struct ieee80211_hw *hw,
buf_len = skb->len;
mapping = pci_map_single(rtlpci->pdev, skb->data, skb->len,
PCI_DMA_TODEVICE);
+ if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
+ RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
+ "DMA mapping error");
+ return;
+ }
CLEAR_PCI_TX_DESC_CONTENT(pdesc, sizeof(struct tx_desc_92d));
if (ieee80211_is_nullfunc(fc) || ieee80211_is_ctl(fc)) {
firstseg = true;
@@ -740,6 +745,11 @@ void rtl92de_tx_fill_cmddesc(struct ieee80211_hw *hw,
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)(skb->data);
__le16 fc = hdr->frame_control;
+ if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
+ RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
+ "DMA mapping error");
+ return;
+ }
CLEAR_PCI_TX_DESC_CONTENT(pdesc, TX_DESC_SIZE);
if (firstseg)
SET_TX_DESC_OFFSET(pdesc, USB_HWDESC_HEADER_LEN);
--
1.7.10.4
^ permalink raw reply related
* [PATCH 3.8 4/5 V2] rtlwifi: rtl8192se: Fix warning for unchecked pci_map_single() call
From: Larry Finger @ 2012-12-27 16:37 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1356626252-4058-1-git-send-email-Larry.Finger@lwfinger.net>
Kernel 3.8 implements checking of all DMA mapping calls and issues
a WARNING for the first it finds that is not checked.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
drivers/net/wireless/rtlwifi/rtl8192se/trx.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
v2 - Copy-and-paste error fixed.
diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/trx.c b/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
index 0e9f6eb..206561d 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
@@ -611,6 +611,11 @@ void rtl92se_tx_fill_desc(struct ieee80211_hw *hw,
PCI_DMA_TODEVICE);
u8 bw_40 = 0;
+ if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
+ RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
+ "DMA mapping error");
+ return;
+ }
if (mac->opmode == NL80211_IFTYPE_STATION) {
bw_40 = mac->bw_40;
} else if (mac->opmode == NL80211_IFTYPE_AP ||
@@ -763,6 +768,7 @@ void rtl92se_tx_fill_desc(struct ieee80211_hw *hw,
void rtl92se_tx_fill_cmddesc(struct ieee80211_hw *hw, u8 *pdesc,
bool firstseg, bool lastseg, struct sk_buff *skb)
{
+ struct rtl_priv *rtlpriv = rtl_priv(hw);
struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
struct rtl_hal *rtlhal = rtl_hal(rtl_priv(hw));
struct rtl_tcb_desc *tcb_desc = (struct rtl_tcb_desc *)(skb->cb);
@@ -770,7 +776,12 @@ void rtl92se_tx_fill_cmddesc(struct ieee80211_hw *hw, u8 *pdesc,
dma_addr_t mapping = pci_map_single(rtlpci->pdev, skb->data, skb->len,
PCI_DMA_TODEVICE);
- /* Clear all status */
+ if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
+ RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
+ "DMA mapping error");
+ return;
+ }
+ /* Clear all status */
CLEAR_PCI_TX_DESC_CONTENT(pdesc, TX_CMDDESC_SIZE_RTL8192S);
/* This bit indicate this packet is used for FW download. */
--
1.7.10.4
^ permalink raw reply related
* [PATCH 3.8 5/5 V2] rtlwifi: rtl8723ae: Fix warning for unchecked pci_map_single() call
From: Larry Finger @ 2012-12-27 16:37 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1356626252-4058-1-git-send-email-Larry.Finger@lwfinger.net>
Kernel 3.8 implements checking of all DMA mapping calls and issues
a WARNING for the first it finds that is not checked.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
drivers/net/wireless/rtlwifi/rtl8723ae/trx.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
No change in V2
diff --git a/drivers/net/wireless/rtlwifi/rtl8723ae/trx.c b/drivers/net/wireless/rtlwifi/rtl8723ae/trx.c
index 87331d8..7ddd517 100644
--- a/drivers/net/wireless/rtlwifi/rtl8723ae/trx.c
+++ b/drivers/net/wireless/rtlwifi/rtl8723ae/trx.c
@@ -387,6 +387,11 @@ void rtl8723ae_tx_fill_desc(struct ieee80211_hw *hw,
PCI_DMA_TODEVICE);
u8 bw_40 = 0;
+ if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
+ RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
+ "DMA mapping error");
+ return;
+ }
if (mac->opmode == NL80211_IFTYPE_STATION) {
bw_40 = mac->bw_40;
} else if (mac->opmode == NL80211_IFTYPE_AP ||
@@ -542,6 +548,11 @@ void rtl8723ae_tx_fill_cmddesc(struct ieee80211_hw *hw,
PCI_DMA_TODEVICE);
__le16 fc = hdr->frame_control;
+ if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
+ RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
+ "DMA mapping error");
+ return;
+ }
CLEAR_PCI_TX_DESC_CONTENT(pdesc, TX_DESC_SIZE);
if (firstseg)
--
1.7.10.4
^ permalink raw reply related
* [PATCH 3.8 2/5 V2] rtlwifi: rtl8192ce: Fix warning for unchecked pci_map_single() call
From: Larry Finger @ 2012-12-27 16:37 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1356626252-4058-1-git-send-email-Larry.Finger@lwfinger.net>
Kernel 3.8 implements checking of all DMA mapping calls and issues
a WARNING for the first it finds that is not checked.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
drivers/net/wireless/rtlwifi/rtl8192ce/trx.c | 11 +++++++++++
1 file changed, 11 insertions(+)
No change for V2.
diff --git a/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c b/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c
index 1734247..c31795e 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c
@@ -611,8 +611,14 @@ void rtl92ce_tx_fill_desc(struct ieee80211_hw *hw,
dma_addr_t mapping = pci_map_single(rtlpci->pdev,
skb->data, skb->len,
PCI_DMA_TODEVICE);
+
u8 bw_40 = 0;
+ if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
+ RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
+ "DMA mapping error");
+ return;
+ }
rcu_read_lock();
sta = get_sta(hw, mac->vif, mac->bssid);
if (mac->opmode == NL80211_IFTYPE_STATION) {
@@ -774,6 +780,11 @@ void rtl92ce_tx_fill_cmddesc(struct ieee80211_hw *hw,
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)(skb->data);
__le16 fc = hdr->frame_control;
+ if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
+ RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
+ "DMA mapping error");
+ return;
+ }
CLEAR_PCI_TX_DESC_CONTENT(pdesc, TX_DESC_SIZE);
if (firstseg)
--
1.7.10.4
^ permalink raw reply related
* Re: Linux kernel handling of IPv6 temporary addresses
From: Eric Dumazet @ 2012-12-27 16:54 UTC (permalink / raw)
To: George Kargiotakis; +Cc: netdev
In-Reply-To: <20121227175735.26c70cbb@lola.kot>
On Thu, 2012-12-27 at 17:57 +0200, George Kargiotakis wrote:
> Hello all,
>
> I had previously informed this list about the issue of the linux kernel
> losing IPv6 privacy extensions by a local LAN attacker.
> Recently I've found that there's actually another, more serious in my
> opinion, issue that follows the previous one. If the user tries to
> disconnect/reconnect the network device/connection for whatever reason
> (e.g. thinking he might gain back privacy extensions), then the device
> gets IPs from SLAAC that have the "tentative" flag and never loses
> that. That means that IPv6 functionality for that device is from then
> on completely lost. I haven't been able to bring back the kernel to a
> working IPv6 state without a reboot.
>
> This is definitely a DoS situation and it needs fixing.
>
> Here are the steps to reproduce:
>
> == Step 1. Boot Ubuntu 12.10 (kernel 3.5.0-17-generic) ==
> ubuntu@ubuntu:~$ ip a ls dev eth0
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
> link/ether 52:54:00:8b:99:5d brd ff:ff:ff:ff:ff:ff
> inet 192.168.1.96/24 brd 192.168.1.255 scope global eth0
> inet6 2001:db8:f00:f00:ad1f:9166:93d4:fd6d/64 scope global temporary dynamic
> valid_lft 86379sec preferred_lft 3579sec
> inet6 2001:db8:f00:f00:5054:ff:fe8b:995d/64 scope global dynamic
> valid_lft 86379sec preferred_lft 3579sec
> inet6 fdbb:aaaa:bbbb:cccc:ad1f:9166:93d4:fd6d/64 scope global temporary dynamic
> valid_lft 86379sec preferred_lft 3579sec
> inet6 fdbb:aaaa:bbbb:cccc:5054:ff:fe8b:995d/64 scope global dynamic
> valid_lft 86379sec preferred_lft 3579sec
> inet6 fe80::5054:ff:fe8b:995d/64 scope link
> valid_lft forever preferred_lft forever
>
> ubuntu@ubuntu:~$ sysctl -a | grep use_tempaddr
> net.ipv6.conf.all.use_tempaddr = 2
> net.ipv6.conf.default.use_tempaddr = 2
> net.ipv6.conf.eth0.use_tempaddr = 2
> net.ipv6.conf.lo.use_tempaddr = 2
>
> ubuntu@ubuntu:~$ nmcli con status
> NAME UUID DEVICES DEFAULT VPN MASTER-PATH
> Wired connection 1 923e6729-74a7-4389-9dbd-43ed7db3d1b8 eth0 yes no --
> ubuntu@ubuntu:~$ nmcli dev status
> DEVICE TYPE STATE
> eth0 802-3-ethernet connected
>
> //ping6 2a00:1450:4002:800::100e while in another terminal: tcpdump -ni eth0 ip6
>
> ubuntu@ubuntu:~$ ping6 2a00:1450:4002:800::100e -c1
> PING 2a00:1450:4002:800::100e(2a00:1450:4002:800::100e) 56 data bytes
> 64 bytes from 2a00:1450:4002:800::100e: icmp_seq=1 ttl=53 time=70.9 ms
>
> --- 2a00:1450:4002:800::100e ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 70.994/70.994/70.994/0.000 ms
>
> # tcpdump -ni eth0 host 2a00:1450:4002:800::100e
> 17:57:37.784658 IP6 2001:db8:f00:f00:ad1f:9166:93d4:fd6d > 2a00:1450:4002:800::100e: ICMP6, echo request, seq 1, length 64
> 17:57:37.855257 IP6 2a00:1450:4002:800::100e > 2001:db8:f00:f00:ad1f:9166:93d4:fd6d: ICMP6, echo reply, seq 1, length 64
>
> == Step 2. flood RAs on the LAN ==
>
> $ dmesg | tail
> [ 1093.642053] IPv6: ipv6_create_tempaddr: retry temporary address regeneration
> [ 1093.642062] IPv6: ipv6_create_tempaddr: retry temporary address regeneration
> [ 1093.642065] IPv6: ipv6_create_tempaddr: retry temporary address regeneration
> [ 1093.642067] IPv6: ipv6_create_tempaddr: regeneration time exceeded - disabled temporary address support
>
> ubuntu@ubuntu:~$ sysctl -a | grep use_tempaddr
> net.ipv6.conf.all.use_tempaddr = 2
> net.ipv6.conf.default.use_tempaddr = 2
> net.ipv6.conf.eth0.use_tempaddr = -1
> net.ipv6.conf.lo.use_tempaddr = 2
>
> //ping6 2a00:1450:4002:800::100e while in another terminal: tcpdump -ni eth0 ip6
>
> ubuntu@ubuntu:~$ ping6 2a00:1450:4002:800::100e -c1
> PING 2a00:1450:4002:800::100e(2a00:1450:4002:800::100e) 56 data bytes
> 64 bytes from 2a00:1450:4002:800::100e: icmp_seq=1 ttl=53 time=77.5 ms
>
> --- 2a00:1450:4002:800::100e ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 77.568/77.568/77.568/0.000 ms
>
> # tcpdump -ni eth0 host 2a00:1450:4002:800::100e
> 17:59:38.204173 IP6 2001:db8:f00:f00:5054:ff:fe8b:995d > 2a00:1450:4002:800::100e: ICMP6, echo request, seq 1, length 64
> 17:59:38.281437 IP6 2a00:1450:4002:800::100e > 2001:db8:f00:f00:5054:ff:fe8b:995d: ICMP6, echo reply, seq 1, length 64
>
> //notice the change of IPv6 address to the one not using privacy extensions even after the flooding has finished long ago.
>
> == Step 3. Disconnect/Reconnect connection ==
> // restoring net.ipv6.conf.eth0.use_tempaddr to value '2' makes no difference at all for the rest of the process
>
> # nmcli dev disconnect iface eth0
> # nmcli con up uuid 923e6729-74a7-4389-9dbd-43ed7db3d1b8
>
> ubuntu@ubuntu:~$ ip a ls dev eth0
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
> link/ether 52:54:00:8b:99:5d brd ff:ff:ff:ff:ff:ff
> inet 192.168.1.96/24 brd 192.168.1.255 scope global eth0
> inet6 2001:db8:f00:f00:5054:ff:fe8b:995d/64 scope global tentative dynamic
> valid_lft 86400sec preferred_lft 3600sec
> inet6 fdbb:aaaa:bbbb:cccc:5054:ff:fe8b:995d/64 scope global tentative dynamic
> valid_lft 86400sec preferred_lft 3600sec
> inet6 fe80::5054:ff:fe8b:995d/64 scope link tentative
> valid_lft forever preferred_lft forever
>
> //Notice the "tentative" flag of the IPs on the device
>
> //ping6 2a00:1450:4002:800::100e while in another terminal: tcpdump -ni eth0 ip6
>
> ubuntu@ubuntu:~$ ping6 2a00:1450:4002:800::100e -c1
> PING 2a00:1450:4002:800::100e(2a00:1450:4002:800::100e) 56 data bytes
> ^C
> --- 2a00:1450:4002:800::100e ping statistics ---
> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
>
> # tcpdump -ni eth0 host 2a00:1450:4002:800::100e
> 18:01:45.264194 IP6 ::1 > 2a00:1450:4002:800::100e: ICMP6, echo request, seq 1, length 64
>
> Summary:
> Before flooding it uses IP: 2001:db8:f00:f00:ad1f:9166:93d4:fd6d
> After flooding it uses IP: 2001:db8:f00:f00:5054:ff:fe8b:995d --> it has lost privacy extensions
> After disconnect/reconnect it tries to use IP: ::1 --> it has lost IPv6 connectivity
>
> Best regards,
We should only rate limit, and not disable forever.
If I cook a patch, are you willing to compile a kernel and test it ?
Thanks
^ permalink raw reply
* RE: testing
From: Qin, Xiaohong @ 2012-12-27 18:38 UTC (permalink / raw)
To: netdev@vger.kernel.org
^ permalink raw reply
* vxlan in Linux kernel 3.7
From: Qin, Xiaohong @ 2012-12-27 18:42 UTC (permalink / raw)
To: netdev@vger.kernel.org
Hi All,
I have installed kernel 3.7 on my Linux box, see the following uname -a output,
uname -a
Linux c210-m2-sib-3 3.7.0-030700-generic #201212102335 SMP Tue Dec 11 04:36:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Does that mean I've got VXLAN module loaded or I still need to go through some extra steps to enable or configure it? Do you have any VXLAN setup or configuration document by chance?
Thanks.
Dennis Qin
P.S. If this is not the right place to ask this kind of questions, please let me know which mailing list I should use.
^ permalink raw reply
* Re: vxlan in Linux kernel 3.7
From: Stephen Hemminger @ 2012-12-27 18:49 UTC (permalink / raw)
To: Qin, Xiaohong; +Cc: netdev@vger.kernel.org
In-Reply-To: <A3CA455BB4F1DA4E92CB43AAF0E4BB1D0667E12B@MX01A.corp.emc.com>
On Thu, 27 Dec 2012 13:42:51 -0500
"Qin, Xiaohong" <Xiaohong.Qin@emc.com> wrote:
> Hi All,
>
> I have installed kernel 3.7 on my Linux box, see the following uname -a output,
>
> uname -a
> Linux c210-m2-sib-3 3.7.0-030700-generic #201212102335 SMP Tue Dec 11 04:36:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> Does that mean I've got VXLAN module loaded or I still need to go through some extra steps to enable or configure it? Do you have any VXLAN setup or configuration document by chance?
>
> Thanks.
>
> Dennis Qin
>
> P.S. If this is not the right place to ask this kind of questions, please let me know which mailing list I should use.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
VXLAN driver is part of the kernel config.
If using a vendor supplied kernel, most likely it is available as a module.
Try:
/sbin/modinfo vxlan
If you see 'ERROR: Module vxlan not found' then vxlan was not configured.
You will also need to have current iproute2 utilities.
$ ip -V
ip utility, iproute2-ss121211
^ permalink raw reply
* RE: vxlan in Linux kernel 3.7
From: Qin, Xiaohong @ 2012-12-27 18:56 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20121227104916.4fb2af04@nehalam.linuxnetplumber.net>
Hi Stephen,
Thanks very much for the tip. Here is the output on my box,
/sbin/modinfo vxlan
filename: /lib/modules/3.7.0-030700-generic/kernel/drivers/net/vxlan.ko
alias: rtnl-link-vxlan
author: Stephen Hemminger <shemminger@vyatta.com>
version: 0.1
license: GPL
srcversion: D5253D8FFAF3FEF6A3A3026
depends:
intree: Y
vermagic: 3.7.0-030700-generic SMP mod_unload modversions
parm: udp_port:Destination UDP port (uint)
parm: log_ecn_error:Log packets received with corrupted ECN (bool)
# ip -V
ip utility, iproute2-ss111117
So I think I'm all set to give it a try?
Thanks.
Dennis Qin
-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Stephen Hemminger
Sent: Thursday, December 27, 2012 10:49 AM
To: Qin, Xiaohong
Cc: netdev@vger.kernel.org
Subject: Re: vxlan in Linux kernel 3.7
On Thu, 27 Dec 2012 13:42:51 -0500
"Qin, Xiaohong" <Xiaohong.Qin@emc.com> wrote:
> Hi All,
>
> I have installed kernel 3.7 on my Linux box, see the following uname
> -a output,
>
> uname -a
> Linux c210-m2-sib-3 3.7.0-030700-generic #201212102335 SMP Tue Dec 11
> 04:36:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> Does that mean I've got VXLAN module loaded or I still need to go through some extra steps to enable or configure it? Do you have any VXLAN setup or configuration document by chance?
>
> Thanks.
>
> Dennis Qin
>
> P.S. If this is not the right place to ask this kind of questions, please let me know which mailing list I should use.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org More majordomo info
> at http://vger.kernel.org/majordomo-info.html
VXLAN driver is part of the kernel config.
If using a vendor supplied kernel, most likely it is available as a module.
Try:
/sbin/modinfo vxlan
If you see 'ERROR: Module vxlan not found' then vxlan was not configured.
You will also need to have current iproute2 utilities.
$ ip -V
ip utility, iproute2-ss121211
--
To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 4/4 v2] net/smsc911x: Provide common clock functionality
From: Lee Jones @ 2012-12-27 19:31 UTC (permalink / raw)
To: Linus Walleij
Cc: Steve Glendinning, Robert Marklund, linus.walleij, arnd, netdev,
linux-kernel, Russell King - ARM Linux, linux-arm-kernel
In-Reply-To: <CACRpkdZGC=f0X1wLt1QLMB=x+ba6VRDAbRcW6wR_2gvNLuGw1g@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 1216 bytes --]
No, you're right, I'm a moron.
Will fix up and resend when I'm back to work.
Sent from my mobile Linux device.
On Dec 26, 2012 12:51 AM, "Linus Walleij" <linus.walleij@linaro.org> wrote:
> On Fri, Dec 21, 2012 at 12:41 PM, Lee Jones <lee.jones@linaro.org> wrote:
>
> > + if (IS_ERR(pdata->clk)) {
> > + ret = clk_prepare_enable(pdata->clk);
> > + if (ret < 0)
> > + netdev_err(ndev, "failed to enable clock %d\n",
> ret);
> > + }
>
> I think you got all of these backwards now, shouldn't it be if
> (!IS_ERR(pdata->clk)) { } ...?
>
> It's late here but enlighten me if I don't get it.
>
> > + if (IS_ERR(pdata->clk))
> > + clk_disable_unprepare(pdata->clk);
>
> Dito.
>
> > + /* Request clock */
> > + pdata->clk = clk_get(&pdev->dev, NULL);
> > + if (IS_ERR(pdata->clk))
> > + netdev_warn(ndev, "couldn't get clock %li\n",
> PTR_ERR(pdata->clk));
>
> This one seems correct though.
>
> > + /* Free clock */
> > + if (IS_ERR(pdata->clk)) {
> > + clk_put(pdata->clk);
> > + pdata->clk = NULL;
> > + }
>
> Should be !IS_ERR()
>
> Yours,
> Linus Walleij
>
[-- Attachment #1.2: Type: text/html, Size: 1767 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked
From: Larry Finger @ 2012-12-27 19:42 UTC (permalink / raw)
To: linville-2XuSBdqkA4R54TAoqtyWWQ, davem-fT/PcQaiUtIeIZ0/mPfg9Q
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA, Larry Finger,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
With 3.8-rc1, the first call of pci_map_single() that is not checked
with a corresponding pci_dma_mapping_error() call results in a warning
with a splat as follows:
WARNING: at lib/dma-debug.c:933 check_unmap+0x480/0x950()
Hardware name: HP Pavilion dv2700 Notebook PC
forcedeth 0000:00:0a.0: DMA-API: device driver failed to check
map error[device address=0x00000000b176e002] [size=90 bytes] [mapped as single]
Signed-off-by: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
---
drivers/net/ethernet/nvidia/forcedeth.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 653487d..de39cf2 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -1821,6 +1821,11 @@ static int nv_alloc_rx(struct net_device *dev)
skb->data,
skb_tailroom(skb),
PCI_DMA_FROMDEVICE);
+ if (pci_dma_mapping_error(np->pci_dev,
+ np->put_rx_ctx->dma)) {
+ dev_kfree_skb_any(skb);
+ goto packet_dropped;
+ }
np->put_rx_ctx->dma_len = skb_tailroom(skb);
np->put_rx.orig->buf = cpu_to_le32(np->put_rx_ctx->dma);
wmb();
@@ -1830,6 +1835,7 @@ static int nv_alloc_rx(struct net_device *dev)
if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
np->put_rx_ctx = np->first_rx_ctx;
} else {
+packet_dropped:
u64_stats_update_begin(&np->swstats_rx_syncp);
np->stat_rx_dropped++;
u64_stats_update_end(&np->swstats_rx_syncp);
@@ -1856,6 +1862,11 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
skb->data,
skb_tailroom(skb),
PCI_DMA_FROMDEVICE);
+ if (pci_dma_mapping_error(np->pci_dev,
+ np->put_rx_ctx->dma)) {
+ dev_kfree_skb_any(skb);
+ goto packet_dropped;
+ }
np->put_rx_ctx->dma_len = skb_tailroom(skb);
np->put_rx.ex->bufhigh = cpu_to_le32(dma_high(np->put_rx_ctx->dma));
np->put_rx.ex->buflow = cpu_to_le32(dma_low(np->put_rx_ctx->dma));
@@ -1866,6 +1877,7 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
np->put_rx_ctx = np->first_rx_ctx;
} else {
+packet_dropped:
u64_stats_update_begin(&np->swstats_rx_syncp);
np->stat_rx_dropped++;
u64_stats_update_end(&np->swstats_rx_syncp);
@@ -2217,6 +2229,9 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
bcnt = (size > NV_TX2_TSO_MAX_SIZE) ? NV_TX2_TSO_MAX_SIZE : size;
np->put_tx_ctx->dma = pci_map_single(np->pci_dev, skb->data + offset, bcnt,
PCI_DMA_TODEVICE);
+ if (pci_dma_mapping_error(np->pci_dev,
+ np->put_tx_ctx->dma))
+ return NETDEV_TX_BUSY;
np->put_tx_ctx->dma_len = bcnt;
np->put_tx_ctx->dma_single = 1;
put_tx->buf = cpu_to_le32(np->put_tx_ctx->dma);
@@ -2337,6 +2352,9 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
bcnt = (size > NV_TX2_TSO_MAX_SIZE) ? NV_TX2_TSO_MAX_SIZE : size;
np->put_tx_ctx->dma = pci_map_single(np->pci_dev, skb->data + offset, bcnt,
PCI_DMA_TODEVICE);
+ if (pci_dma_mapping_error(np->pci_dev,
+ np->put_tx_ctx->dma))
+ return NETDEV_TX_BUSY;
np->put_tx_ctx->dma_len = bcnt;
np->put_tx_ctx->dma_single = 1;
put_tx->bufhigh = cpu_to_le32(dma_high(np->put_tx_ctx->dma));
@@ -5003,6 +5021,11 @@ static int nv_loopback_test(struct net_device *dev)
test_dma_addr = pci_map_single(np->pci_dev, tx_skb->data,
skb_tailroom(tx_skb),
PCI_DMA_FROMDEVICE);
+ if (pci_dma_mapping_error(np->pci_dev,
+ test_dma_addr)) {
+ dev_kfree_skb_any(tx_skb);
+ goto out;
+ }
pkt_data = skb_put(tx_skb, pkt_len);
for (i = 0; i < pkt_len; i++)
pkt_data[i] = (u8)(i & 0xff);
--
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: vxlan in Linux kernel 3.7
From: Stephen Hemminger @ 2012-12-27 19:43 UTC (permalink / raw)
To: Qin, Xiaohong; +Cc: netdev@vger.kernel.org
In-Reply-To: <A3CA455BB4F1DA4E92CB43AAF0E4BB1D0667E12C@MX01A.corp.emc.com>
On Thu, 27 Dec 2012 13:56:51 -0500
"Qin, Xiaohong" <Xiaohong.Qin@emc.com> wrote:
> # ip -V
> ip utility, iproute2-ss111117
>
> So I think I'm all set to give it a try?
That version is way to old to know about vxlan configuration.
The last digits of the iproute version are the date it was
released. That date is November 17 2011 which is the 3.1 version.
You need 3.7 which was released on December 11 2012, or you can
use latest from iproute2 git repository.
^ permalink raw reply
* Re: [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked
From: Eric Dumazet @ 2012-12-27 20:05 UTC (permalink / raw)
To: Larry Finger; +Cc: linville, davem, linux-wireless, netdev, linux-kernel
In-Reply-To: <1356637327-4884-1-git-send-email-Larry.Finger@lwfinger.net>
On Thu, 2012-12-27 at 13:42 -0600, Larry Finger wrote:
> With 3.8-rc1, the first call of pci_map_single() that is not checked
> with a corresponding pci_dma_mapping_error() call results in a warning
> with a splat as follows:
>
> WARNING: at lib/dma-debug.c:933 check_unmap+0x480/0x950()
> Hardware name: HP Pavilion dv2700 Notebook PC
> forcedeth 0000:00:0a.0: DMA-API: device driver failed to check
> map error[device address=0x00000000b176e002] [size=90 bytes] [mapped as single]
>
> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
> ---
> drivers/net/ethernet/nvidia/forcedeth.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
> index 653487d..de39cf2 100644
> --- a/drivers/net/ethernet/nvidia/forcedeth.c
> +++ b/drivers/net/ethernet/nvidia/forcedeth.c
> @@ -1821,6 +1821,11 @@ static int nv_alloc_rx(struct net_device *dev)
> skb->data,
> skb_tailroom(skb),
> PCI_DMA_FROMDEVICE);
> + if (pci_dma_mapping_error(np->pci_dev,
> + np->put_rx_ctx->dma)) {
> + dev_kfree_skb_any(skb);
skb has no destructor yet, kfree_skb(skb) should be fine
> + goto packet_dropped;
> + }
> np->put_rx_ctx->dma_len = skb_tailroom(skb);
> np->put_rx.orig->buf = cpu_to_le32(np->put_rx_ctx->dma);
> wmb();
> @@ -1830,6 +1835,7 @@ static int nv_alloc_rx(struct net_device *dev)
> if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
> np->put_rx_ctx = np->first_rx_ctx;
> } else {
> +packet_dropped:
> u64_stats_update_begin(&np->swstats_rx_syncp);
> np->stat_rx_dropped++;
> u64_stats_update_end(&np->swstats_rx_syncp);
> @@ -1856,6 +1862,11 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
> skb->data,
> skb_tailroom(skb),
> PCI_DMA_FROMDEVICE);
> + if (pci_dma_mapping_error(np->pci_dev,
> + np->put_rx_ctx->dma)) {
> + dev_kfree_skb_any(skb);
> + goto packet_dropped;
> + }
> np->put_rx_ctx->dma_len = skb_tailroom(skb);
> np->put_rx.ex->bufhigh = cpu_to_le32(dma_high(np->put_rx_ctx->dma));
> np->put_rx.ex->buflow = cpu_to_le32(dma_low(np->put_rx_ctx->dma));
> @@ -1866,6 +1877,7 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
> if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
> np->put_rx_ctx = np->first_rx_ctx;
> } else {
> +packet_dropped:
> u64_stats_update_begin(&np->swstats_rx_syncp);
> np->stat_rx_dropped++;
> u64_stats_update_end(&np->swstats_rx_syncp);
> @@ -2217,6 +2229,9 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
> bcnt = (size > NV_TX2_TSO_MAX_SIZE) ? NV_TX2_TSO_MAX_SIZE : size;
> np->put_tx_ctx->dma = pci_map_single(np->pci_dev, skb->data + offset, bcnt,
> PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(np->pci_dev,
> + np->put_tx_ctx->dma))
> + return NETDEV_TX_BUSY;
Really this is not going to work very well : caller will call this in a
loop.
> np->put_tx_ctx->dma_len = bcnt;
> np->put_tx_ctx->dma_single = 1;
> put_tx->buf = cpu_to_le32(np->put_tx_ctx->dma);
> @@ -2337,6 +2352,9 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
> bcnt = (size > NV_TX2_TSO_MAX_SIZE) ? NV_TX2_TSO_MAX_SIZE : size;
> np->put_tx_ctx->dma = pci_map_single(np->pci_dev, skb->data + offset, bcnt,
> PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(np->pci_dev,
> + np->put_tx_ctx->dma))
> + return NETDEV_TX_BUSY;
same problem here.
> np->put_tx_ctx->dma_len = bcnt;
> np->put_tx_ctx->dma_single = 1;
> put_tx->bufhigh = cpu_to_le32(dma_high(np->put_tx_ctx->dma));
> @@ -5003,6 +5021,11 @@ static int nv_loopback_test(struct net_device *dev)
> test_dma_addr = pci_map_single(np->pci_dev, tx_skb->data,
> skb_tailroom(tx_skb),
> PCI_DMA_FROMDEVICE);
> + if (pci_dma_mapping_error(np->pci_dev,
> + test_dma_addr)) {
> + dev_kfree_skb_any(tx_skb);
kfree_skb(skb);
> + goto out;
> + }
> pkt_data = skb_put(tx_skb, pkt_len);
> for (i = 0; i < pkt_len; i++)
> pkt_data[i] = (u8)(i & 0xff);
^ permalink raw reply
* Re: [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked
From: Larry Finger @ 2012-12-27 20:38 UTC (permalink / raw)
To: Eric Dumazet
Cc: linville-2XuSBdqkA4R54TAoqtyWWQ, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1356638715.30414.1349.camel@edumazet-glaptop>
On 12/27/2012 02:05 PM, Eric Dumazet wrote:
> On Thu, 2012-12-27 at 13:42 -0600, Larry Finger wrote:
>> With 3.8-rc1, the first call of pci_map_single() that is not checked
>> with a corresponding pci_dma_mapping_error() call results in a warning
>> with a splat as follows:
>>
>> WARNING: at lib/dma-debug.c:933 check_unmap+0x480/0x950()
>> Hardware name: HP Pavilion dv2700 Notebook PC
>> forcedeth 0000:00:0a.0: DMA-API: device driver failed to check
>> map error[device address=0x00000000b176e002] [size=90 bytes] [mapped as single]
>>
>> Signed-off-by: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
>> ---
>> drivers/net/ethernet/nvidia/forcedeth.c | 23 +++++++++++++++++++++++
>> 1 file changed, 23 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
>> index 653487d..de39cf2 100644
>> --- a/drivers/net/ethernet/nvidia/forcedeth.c
>> +++ b/drivers/net/ethernet/nvidia/forcedeth.c
>> @@ -1821,6 +1821,11 @@ static int nv_alloc_rx(struct net_device *dev)
>> skb->data,
>> skb_tailroom(skb),
>> PCI_DMA_FROMDEVICE);
>> + if (pci_dma_mapping_error(np->pci_dev,
>> + np->put_rx_ctx->dma)) {
>> + dev_kfree_skb_any(skb);
>
> skb has no destructor yet, kfree_skb(skb) should be fine
OK.
>
>> + goto packet_dropped;
>> + }
>> np->put_rx_ctx->dma_len = skb_tailroom(skb);
>> np->put_rx.orig->buf = cpu_to_le32(np->put_rx_ctx->dma);
>> wmb();
>> @@ -1830,6 +1835,7 @@ static int nv_alloc_rx(struct net_device *dev)
>> if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
>> np->put_rx_ctx = np->first_rx_ctx;
>> } else {
>> +packet_dropped:
>> u64_stats_update_begin(&np->swstats_rx_syncp);
>> np->stat_rx_dropped++;
>> u64_stats_update_end(&np->swstats_rx_syncp);
>> @@ -1856,6 +1862,11 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
>> skb->data,
>> skb_tailroom(skb),
>> PCI_DMA_FROMDEVICE);
>> + if (pci_dma_mapping_error(np->pci_dev,
>> + np->put_rx_ctx->dma)) {
>> + dev_kfree_skb_any(skb);
>> + goto packet_dropped;
>> + }
>> np->put_rx_ctx->dma_len = skb_tailroom(skb);
>> np->put_rx.ex->bufhigh = cpu_to_le32(dma_high(np->put_rx_ctx->dma));
>> np->put_rx.ex->buflow = cpu_to_le32(dma_low(np->put_rx_ctx->dma));
>> @@ -1866,6 +1877,7 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
>> if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
>> np->put_rx_ctx = np->first_rx_ctx;
>> } else {
>> +packet_dropped:
>> u64_stats_update_begin(&np->swstats_rx_syncp);
>> np->stat_rx_dropped++;
>> u64_stats_update_end(&np->swstats_rx_syncp);
>> @@ -2217,6 +2229,9 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
>> bcnt = (size > NV_TX2_TSO_MAX_SIZE) ? NV_TX2_TSO_MAX_SIZE : size;
>> np->put_tx_ctx->dma = pci_map_single(np->pci_dev, skb->data + offset, bcnt,
>> PCI_DMA_TODEVICE);
>> + if (pci_dma_mapping_error(np->pci_dev,
>> + np->put_tx_ctx->dma))
>> + return NETDEV_TX_BUSY;
>
> Really this is not going to work very well : caller will call this in a
> loop.
Any suggestions on what value should be returned, or does the caller need to be
modified?
Thanks for the review,
Larry
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked
From: Eric Dumazet @ 2012-12-27 21:03 UTC (permalink / raw)
To: Larry Finger; +Cc: linville, davem, linux-wireless, netdev, linux-kernel
In-Reply-To: <50DCB1D9.50906@lwfinger.net>
On Thu, 2012-12-27 at 14:38 -0600, Larry Finger wrote:
> On 12/27/2012 02:05 PM, Eric Dumazet wrote:
> > On Thu, 2012-12-27 at 13:42 -0600, Larry Finger wrote:
> >> + if (pci_dma_mapping_error(np->pci_dev,
> >> + np->put_tx_ctx->dma))
> >> + return NETDEV_TX_BUSY;
> >
> > Really this is not going to work very well : caller will call this in a
> > loop.
>
> Any suggestions on what value should be returned, or does the caller need to be
> modified?
NETDEV_TX_BUSY is really obsolete
Documentation/networking/driver.txt
In case of mapping error, I would drop the packet.
(kfree_skb() it, increment a device tx_dropped counter, and return
NETDEV_TX_OK)
^ permalink raw reply
* Re: [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked
From: David Miller @ 2012-12-27 21:32 UTC (permalink / raw)
To: eric.dumazet; +Cc: Larry.Finger, linville, linux-wireless, netdev, linux-kernel
In-Reply-To: <1356642209.30414.1411.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 27 Dec 2012 13:03:29 -0800
> On Thu, 2012-12-27 at 14:38 -0600, Larry Finger wrote:
>> On 12/27/2012 02:05 PM, Eric Dumazet wrote:
>> > On Thu, 2012-12-27 at 13:42 -0600, Larry Finger wrote:
>
>> >> + if (pci_dma_mapping_error(np->pci_dev,
>> >> + np->put_tx_ctx->dma))
>> >> + return NETDEV_TX_BUSY;
>> >
>> > Really this is not going to work very well : caller will call this in a
>> > loop.
>>
>> Any suggestions on what value should be returned, or does the caller need to be
>> modified?
>
> NETDEV_TX_BUSY is really obsolete
>
> Documentation/networking/driver.txt
>
> In case of mapping error, I would drop the packet.
Agreed.
^ permalink raw reply
* Re: Is keepalive behaving as expected in 3.7.0+/net-next?
From: Eric Dumazet @ 2012-12-27 21:54 UTC (permalink / raw)
To: Rick Jones; +Cc: netdev, Jamie Gloudon
In-Reply-To: <50D4DD28.30903@hp.com>
On Fri, 2012-12-21 at 14:05 -0800, Rick Jones wrote:
> I was looking to do a bit more documentation clean-up and thought I
> would work on the descriptions of the "keepalive" sysctls, but first I
> wanted to see if they behaved as the existing descriptions suggested:
>
> > tcp_keepalive_time - INTEGER
> > How often TCP sends out keepalive messages when keepalive is enabled.
> > Default: 2hours.
> >
> > tcp_keepalive_probes - INTEGER
> > How many keepalive probes TCP sends out, until it decides that the
> > connection is broken. Default value: 9.
> >
> > tcp_keepalive_intvl - INTEGER
> > How frequently the probes are send out. Multiplied by
> > tcp_keepalive_probes it is time to kill not responding connection,
> > after probes started. Default value: 75sec i.e. connection
> > will be aborted after ~11 minutes of retries.
>
> I interpreted all that that as: When a connection is idle, TCP will
> send a keepalive probe every tcp_keepalive_time seconds. If a response
> to a keepalive probe is not received, TCP will resend (retransmit) it
> every tcp_keepalive_intvl seconds.
>
> However, what I see is that on a connection where the remote is indeed
> still there, only the first keepalive probe is sent after
> tcp_keepalive_time, and thereafter it is sent every tcp_keepalive_intvl
> seconds.
>
> Now, some of this may relate to my being impatient - rather than wait
> two hours for the first probe, I set tcp_keepalive_time to 3 seconds,
> and tcp_keepalive_intvl to 7 seconds. I then kicked-off a ./configure
> --intervals-enable netperf TCP_RR test with a burst of one and a wait
> time of 90 seconds and got the following (trimmed) trace:
>
> 13:43:46.879133 IP netnextraj.43054 > netnextraj2.srvr: Flags [S], seq
> 807869796, win 14600, options [mss 1460,sackOK,TS val 133470 ecr
> 0,nop,wscale 7], length 0
> 13:43:46.880091 IP netnextraj2.srvr > netnextraj.43054: Flags [S.], seq
> 1522345902, ack 807869797, win 14480, options [mss 1460,sackOK,TS val
> 136186 ecr 133470,nop,wscale 4], length 0
> 13:43:46.880114 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 1, win 115, options [nop,nop,TS val 133470 ecr 136186], length 0
> 13:43:46.880306 IP netnextraj.43054 > netnextraj2.srvr: Flags [P.], seq
> 1:11, ack 1, win 115, options [nop,nop,TS val 133470 ecr 136186], length 10
> 13:43:46.880948 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 11, win 905, options [nop,nop,TS val 136187 ecr 133470], length 0
> 13:43:46.880964 IP netnextraj2.srvr > netnextraj.43054: Flags [P.], seq
> 1:11, ack 11, win 905, options [nop,nop,TS val 136187 ecr 133470], length 10
> 13:43:46.881161 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 11, win 115, options [nop,nop,TS val 133470 ecr 136187], length 0
>
> The first probe above comes after 3 seconds - tcp_keepalive_time - at
> 13:43:49
>
> 13:43:49.886752 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 11, win 115, options [nop,nop,TS val 134222 ecr 136187], length 0
>
> And it does seem to elicit a response:
>
> 13:43:49.887530 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 11, win 905, options [nop,nop,TS val 136938 ecr 133470], length 0
>
> Now it starts sending probes every 7 seconds (tcp_keepalive_intvl):
>
> 13:43:56.903576 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 11, win 115, options [nop,nop,TS val 135976 ecr 136938], length 0
> 13:43:56.904480 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 11, win 905, options [nop,nop,TS val 138693 ecr 133470], length 0
> 13:44:03.910744 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 11, win 115, options [nop,nop,TS val 137728 ecr 138693], length 0
> 13:44:03.911623 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 11, win 905, options [nop,nop,TS val 140444 ecr 133470], length 0
>
> I;ve deleted the next 9 or so probes... It continues, and doesn't
> terminate the connection, so I assume it was happy with the responses to
> the probes.
>
> 13:45:13.990746 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 11, win 115, options [nop,nop,TS val 155248 ecr 156213], length 0
> 13:45:13.991578 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 11, win 905, options [nop,nop,TS val 157965 ecr 133470], length 0
>
> Now the next netperf transaction happens:
>
> 13:45:16.879222 IP netnextraj.43054 > netnextraj2.srvr: Flags [P.], seq
> 11:21, ack 11, win 115, options [nop,nop,TS val 155970 ecr 157965],
> length 10
> 13:45:16.880033 IP netnextraj2.srvr > netnextraj.43054: Flags [P.], seq
> 11:21, ack 21, win 905, options [nop,nop,TS val 158687 ecr 155970],
> length 10
> 13:45:16.880220 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 21, win 115, options [nop,nop,TS val 155970 ecr 158687], length 0
>
> But the next keepalive probe is tcp_keepalive_intvl seconds after the
> last one, rather than that many, or tcp_keepalive_time seconds after the
> connection was last "active."
>
> 13:45:20.998739 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 21, win 115, options [nop,nop,TS val 157000 ecr 158687], length 0
> 13:45:20.999754 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 21, win 905, options [nop,nop,TS val 159717 ecr 155970], length 0
> 13:45:28.006747 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 21, win 115, options [nop,nop,TS val 158752 ecr 159717], length 0
> 13:45:28.007624 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 21, win 905, options [nop,nop,TS val 161469 ecr 155970], length 0
>
> Is this the expected behaviour? If I reverse the values - make
> tcp_keepalive_time 7 and tcp_keepalive_intvl 3, it seems that all the
> probes are after 7 seconds.
>
> rick jones
Not sure if it makes sense to have
tcp_keepalive_intvl > tcp_keepalive_time
time should be an order of magnitude bigger than intvl.
keepalive timer is not reset each time we receive a valid frame, it
would be very expensive.
Its a long period timer.
First interval is tcp_keepalive_time, and subsequent interval are
tcp_keepalive_intvl
Each time timer is fired (once every 7200 seconds), we re-arm it with
the observed elapsed time (keepalive_time_elapsed)
Fixing this would require to add a timestamp in inet socket, to remember
time of next/last probe, and firing the timer using
min(keepalive_time_when(tp), keepalive_intvl_when(tp))
Probably not worth it.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox