Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: hackbench regression due to commit 9dfc6e68bfe6e
From: Eric Dumazet @ 2010-04-06 22:10 UTC (permalink / raw)
  To: Christoph Lameter, netdev
  Cc: Zhang, Yanmin, Tejun Heo, Pekka Enberg, alex.shi,
	linux-kernel@vger.kernel.org, Ma, Ling, Chen, Tim C,
	Andrew Morton
In-Reply-To: <alpine.DEB.2.00.1004061552500.19151@router.home>

Le mardi 06 avril 2010 à 15:55 -0500, Christoph Lameter a écrit :
> We cannot reproduce the issue here. Our tests here (dual quad dell) show a
> performance increase in hackbench instead.
> 
> Linux 2.6.33.2 #2 SMP Mon Apr 5 11:30:56 CDT 2010 x86_64 GNU/Linux
> ./hackbench 100 process 200000
> Running with 100*40 (== 4000) tasks.
> Time: 3102.142
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 308.731
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 311.591
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 310.200
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 38.048
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 44.711
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 39.407
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 9.411
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 8.765
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 8.822
> 
> Linux 2.6.34-rc3 #1 SMP Tue Apr 6 13:30:34 CDT 2010 x86_64 GNU/Linux
> ./hackbench 100 process 200000
> Running with 100*40 (== 4000) tasks.
> Time: 3003.578
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 300.289
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 301.462
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 301.173
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 41.191
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 41.964
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 41.470
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 8.829
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 9.166
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 8.681
> 
> 


Well, your config might be very different... and hackbench results can
vary by 10% on same machine, same kernel.

This is not a reliable bench, because af_unix is not prepared to get
such a lazy workload.

We really should warn people about this.



# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 12.922
# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 12.696
# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 13.060
# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 14.108
# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 13.165
# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 13.310
# hackbench 25 process 3000 
Running with 25*40 (== 1000) tasks.
Time: 12.530


booting with slub_min_order=3 do change hackbench results for example ;)

All writers can compete on spinlock for a target UNIX socket, we spend _lot_ of time spinning.

If we _really_ want to speedup hackbench, we would have to change unix_state_lock()
to use a non spinning locking primitive (aka lock_sock()), and slowdown normal path.


# perf record -f hackbench 25 process 3000 
Running with 25*40 (== 1000) tasks.
Time: 13.330
[ perf record: Woken up 289 times to write data ]
[ perf record: Captured and wrote 54.312 MB perf.data (~2372928 samples) ]
# perf report
# Samples: 2370135
#
# Overhead    Command                 Shared Object  Symbol
# ........  .........  ............................  ......
#
     9.68%  hackbench  [kernel]                      [k] do_raw_spin_lock
     6.50%  hackbench  [kernel]                      [k] schedule
     4.38%  hackbench  [kernel]                      [k] __kmalloc_track_caller
     3.95%  hackbench  [kernel]                      [k] copy_to_user
     3.86%  hackbench  [kernel]                      [k] __alloc_skb
     3.77%  hackbench  [kernel]                      [k] unix_stream_recvmsg
     3.12%  hackbench  [kernel]                      [k] sock_alloc_send_pskb
     2.75%  hackbench  [vdso]                        [.] 0x000000ffffe425
     2.28%  hackbench  [kernel]                      [k] sysenter_past_esp
     2.03%  hackbench  [kernel]                      [k] __mutex_lock_common
     2.00%  hackbench  [kernel]                      [k] kfree
     2.00%  hackbench  [kernel]                      [k] delay_tsc
     1.75%  hackbench  [kernel]                      [k] update_curr
     1.70%  hackbench  [kernel]                      [k] kmem_cache_alloc
     1.69%  hackbench  [kernel]                      [k] do_raw_spin_unlock
     1.60%  hackbench  [kernel]                      [k] unix_stream_sendmsg
     1.54%  hackbench  [kernel]                      [k] sched_clock_local
     1.46%  hackbench  [kernel]                      [k] __slab_free
     1.37%  hackbench  [kernel]                      [k] do_raw_read_lock
     1.34%  hackbench  [kernel]                      [k] __switch_to
     1.24%  hackbench  [kernel]                      [k] select_task_rq_fair
     1.23%  hackbench  [kernel]                      [k] sock_wfree
     1.21%  hackbench  [kernel]                      [k] _raw_spin_unlock_irqrestore
     1.19%  hackbench  [kernel]                      [k] __mutex_unlock_slowpath
     1.05%  hackbench  [kernel]                      [k] trace_hardirqs_off
     0.99%  hackbench  [kernel]                      [k] __might_sleep
     0.93%  hackbench  [kernel]                      [k] do_raw_read_unlock
     0.93%  hackbench  [kernel]                      [k] _raw_spin_lock
     0.91%  hackbench  [kernel]                      [k] try_to_wake_up
     0.81%  hackbench  [kernel]                      [k] sched_clock
     0.80%  hackbench  [kernel]                      [k] trace_hardirqs_on



^ permalink raw reply

* [PATCH 1/1] NET: usb: Adding URB_ZERO_PACKET flag to usbnet.c
From: Elina Pasheva @ 2010-04-07  0:23 UTC (permalink / raw)
  To: dbrownell; +Cc: epasheva, rfiler, netdev, linux-usb

Subject: [PATCH 1/1] NET: usb: Adding URB_ZERO_PACKET flag to usbnet.c
From: Elina Pasheva <epasheva@sierrawireless.com>
This patch adds setting of the urb transfer flag URB_ZERO_PACKET  before
submitting an urb for drivers that have requested it (by advertising flag
FLAG_SEND_ZLP).
The modification is in usbnet.c function usbnet_start_xmit().
This patch only adds the zero length flag.
A subsequent patch will address the buggy code we found when devices do not
advertise FLAG_SEND_ZLP in which case there is a possibility of transferring
packets with non-deterministic length.

This patch has been tested on kernel-2.6.34-rc3.
This patch has been checked against net-2.6 tree.
Signed-off-by: Elina Pasheva <epasheva@sierrawireless.com>
Signed-off-by: Rory Filer <rfiler@sierrawireless.com>
---

 drivers/net/usb/usbnet.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

--- a/drivers/net/usb/usbnet.c	2010-04-06 10:52:54.000000000 -0700
+++ b/drivers/net/usb/usbnet.c	2010-04-06 16:54:44.000000000 -0700
@@ -1068,12 +1068,15 @@ netdev_tx_t usbnet_start_xmit (struct sk
 	 * NOTE:  strictly conforming cdc-ether devices should expect
 	 * the ZLP here, but ignore the one-byte packet.
 	 */
-	if (!(info->flags & FLAG_SEND_ZLP) && (length % dev->maxpacket) == 0) {
-		urb->transfer_buffer_length++;
-		if (skb_tailroom(skb)) {
-			skb->data[skb->len] = 0;
-			__skb_put(skb, 1);
-		}
+	if (length % dev->maxpacket == 0) {
+		if (!(info->flags & FLAG_SEND_ZLP)) {
+			urb->transfer_buffer_length++;
+			if (skb_tailroom(skb)) {
+				skb->data[skb->len] = 0;
+				__skb_put(skb, 1);
+			}
+		} else
+			urb->transfer_flags |= URB_ZERO_PACKET;
 	}

 	spin_lock_irqsave(&dev->txq.lock, flags);

^ permalink raw reply

* RE: [PATCH v1 2/3] Provides multiple submits and asynchronous notifications.
From: Xin, Xiaohui @ 2010-04-07  1:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu, jdike@addtoit.com
In-Reply-To: <20100406075103.GC9662@redhat.com>

Michael,
> > >>>> For the write logging, do you have a function in hand that we can
> > >>>> recompute the log? If that, I think I can use it to recompute the
> > >>>>log info when the logging is suddenly enabled.
> > >>>> For the outstanding requests, do you mean all the user buffers have
> > >>>>submitted before the logging ioctl changed? That may be a lot, and
> > >> >>some of them are still in NIC ring descriptors. Waiting them to be
> > >>>>finished may be need some time. I think when logging ioctl changed,
> > >> >>then the logging is changed just after that is also reasonable.

> > >>>The key point is that after loggin ioctl returns, any
> > >>>subsequent change to memory must be logged. It does not
> > >>>matter when was the request submitted, otherwise we will
> > >>>get memory corruption on migration.

> > >>The change to memory happens when vhost_add_used_and_signal(), right?
> > >>So after ioctl returns, just recompute the log info to the events in the async queue,
> > >>is ok. Since the ioctl and write log operations are all protected by vq->mutex.

> >>> Thanks
> >> >Xiaohui

> >>Yes, I think this will work.

>> Thanks, so do you have the function to recompute the log info in your hand that I can
>>use? I have weakly remembered that you have noticed it before some time.

>Doesn't just rerunning vhost_get_vq_desc work?

Am I missing something here?
The vhost_get_vq_desc() looks in vq, and finds the first available buffers, and converts it
to an iovec. I think the first available buffer is not the buffers in the async queue, so I
think rerunning vhost_get_vq_desc() cannot work.

Thanks
Xiaohui

> > > Thanks
> > > Xiaohui
> > >
> > >  drivers/vhost/net.c   |  189 +++++++++++++++++++++++++++++++++++++++++++++++--
> > >  drivers/vhost/vhost.h |   10 +++
> > >  2 files changed, 192 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index 22d5fef..2aafd90 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -17,11 +17,13 @@
> > >  #include <linux/workqueue.h>
> > >  #include <linux/rcupdate.h>
> > >  #include <linux/file.h>
> > > +#include <linux/aio.h>
> > >
> > >  #include <linux/net.h>
> > >  #include <linux/if_packet.h>
> > >  #include <linux/if_arp.h>
> > >  #include <linux/if_tun.h>
> > > +#include <linux/mpassthru.h>
> > >
> > >  #include <net/sock.h>
> > >
> > > @@ -47,6 +49,7 @@ struct vhost_net {
> > >   struct vhost_dev dev;
> > >   struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
> > >   struct vhost_poll poll[VHOST_NET_VQ_MAX];
> > > + struct kmem_cache       *cache;
> > >   /* Tells us whether we are polling a socket for TX.
> > >    * We only do this when socket buffer fills up.
> > >    * Protected by tx vq lock. */
> > > @@ -91,11 +94,88 @@ static void tx_poll_start(struct vhost_net *net, struct socket *sock)
> > >   net->tx_poll_state = VHOST_NET_POLL_STARTED;
> > >  }
> > >
> > > +struct kiocb *notify_dequeue(struct vhost_virtqueue *vq)
> > > +{
> > > + struct kiocb *iocb = NULL;
> > > + unsigned long flags;
> > > +
> > > + spin_lock_irqsave(&vq->notify_lock, flags);
> > > + if (!list_empty(&vq->notifier)) {
> > > +         iocb = list_first_entry(&vq->notifier,
> > > +                         struct kiocb, ki_list);
> > > +         list_del(&iocb->ki_list);
> > > + }
> > > + spin_unlock_irqrestore(&vq->notify_lock, flags);
> > > + return iocb;
> > > +}
> > > +
> > > +static void handle_async_rx_events_notify(struct vhost_net *net,
> > > +                                 struct vhost_virtqueue *vq)
> > > +{
> > > + struct kiocb *iocb = NULL;
> > > + struct vhost_log *vq_log = NULL;
> > > + int rx_total_len = 0;
> > > + int log, size;
> > > +
> > > + if (vq->link_state != VHOST_VQ_LINK_ASYNC)
> > > +         return;
> > > +
> > > + if (vq->receiver)
> > > +         vq->receiver(vq);
> > > +
> > > + vq_log = unlikely(vhost_has_feature(
> > > +                         &net->dev, VHOST_F_LOG_ALL)) ? vq->log : NULL;
> > > + while ((iocb = notify_dequeue(vq)) != NULL) {
> > > +         vhost_add_used_and_signal(&net->dev, vq,
> > > +                         iocb->ki_pos, iocb->ki_nbytes);
> > > +         log = (int)iocb->ki_user_data;
> > > +         size = iocb->ki_nbytes;
> > > +         rx_total_len += iocb->ki_nbytes;
> > > +
> > > +         if (iocb->ki_dtor)
> > > +                 iocb->ki_dtor(iocb);
> > > +         kmem_cache_free(net->cache, iocb);
> > > +
> > > +         if (unlikely(vq_log))
> > > +                 vhost_log_write(vq, vq_log, log, size);
> > > +         if (unlikely(rx_total_len >= VHOST_NET_WEIGHT)) {
> > > +                 vhost_poll_queue(&vq->poll);
> > > +                 break;
> > > +         }
> > > + }
> > > +}
> > > +
> > > +static void handle_async_tx_events_notify(struct vhost_net *net,
> > > +                                 struct vhost_virtqueue *vq)
> > > +{
> > > + struct kiocb *iocb = NULL;
> > > + int tx_total_len = 0;
> > > +
> > > + if (vq->link_state != VHOST_VQ_LINK_ASYNC)
> > > +         return;
> > > +
> > > + while ((iocb = notify_dequeue(vq)) != NULL) {
> > > +         vhost_add_used_and_signal(&net->dev, vq,
> > > +                         iocb->ki_pos, 0);
> > > +         tx_total_len += iocb->ki_nbytes;
> > > +
> > > +         if (iocb->ki_dtor)
> > > +                 iocb->ki_dtor(iocb);
> > > +
> > > +         kmem_cache_free(net->cache, iocb);
> > > +         if (unlikely(tx_total_len >= VHOST_NET_WEIGHT)) {
> > > +                 vhost_poll_queue(&vq->poll);
> > > +                 break;
> > > +         }
> > > + }
> > > +}
> > > +
> > >  /* Expects to be always run from workqueue - which acts as
> > >   * read-size critical section for our kind of RCU. */
> > >  static void handle_tx(struct vhost_net *net)
> > >  {
> > >   struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
> > > + struct kiocb *iocb = NULL;
> > >   unsigned head, out, in, s;
> > >   struct msghdr msg = {
> > >           .msg_name = NULL,
> > > @@ -124,6 +204,8 @@ static void handle_tx(struct vhost_net *net)
> > >           tx_poll_stop(net);
> > >   hdr_size = vq->hdr_size;
> > >
> > > + handle_async_tx_events_notify(net, vq);
> > > +
> > >   for (;;) {
> > >           head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
> > >                                    ARRAY_SIZE(vq->iov),
> > > @@ -151,6 +233,15 @@ static void handle_tx(struct vhost_net *net)
> > >           /* Skip header. TODO: support TSO. */
> > >           s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out);
> > >           msg.msg_iovlen = out;
> > > +
> > > +         if (vq->link_state == VHOST_VQ_LINK_ASYNC) {
> > > +                 iocb = kmem_cache_zalloc(net->cache, GFP_KERNEL);
> > > +                 if (!iocb)
> > > +                         break;
> > > +                 iocb->ki_pos = head;
> > > +                 iocb->private = (void *)vq;
> > > +         }
> > > +
> > >           len = iov_length(vq->iov, out);
> > >           /* Sanity check */
> > >           if (!len) {
> > > @@ -160,12 +251,16 @@ static void handle_tx(struct vhost_net *net)
> > >                   break;
> > >           }
> > >           /* TODO: Check specific error and bomb out unless ENOBUFS? */
> > > -         err = sock->ops->sendmsg(NULL, sock, &msg, len);
> > > +         err = sock->ops->sendmsg(iocb, sock, &msg, len);
> > >           if (unlikely(err < 0)) {
> > >                   vhost_discard_vq_desc(vq);
> > >                   tx_poll_start(net, sock);
> > >                   break;
> > >           }
> > > +
> > > +         if (vq->link_state == VHOST_VQ_LINK_ASYNC)
> > > +                 continue;
> > > +
> > >           if (err != len)
> > >                   pr_err("Truncated TX packet: "
> > >                          " len %d != %zd\n", err, len);
> > > @@ -177,6 +272,8 @@ static void handle_tx(struct vhost_net *net)
> > >           }
> > >   }
> > >
> > > + handle_async_tx_events_notify(net, vq);
> > > +
> > >   mutex_unlock(&vq->mutex);
> > >   unuse_mm(net->dev.mm);
> > >  }
> > > @@ -186,6 +283,7 @@ static void handle_tx(struct vhost_net *net)
> > >  static void handle_rx(struct vhost_net *net)
> > >  {
> > >   struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
> > > + struct kiocb *iocb = NULL;
> > >   unsigned head, out, in, log, s;
> > >   struct vhost_log *vq_log;
> > >   struct msghdr msg = {
> > > @@ -206,7 +304,8 @@ static void handle_rx(struct vhost_net *net)
> > >   int err;
> > >   size_t hdr_size;
> > >   struct socket *sock = rcu_dereference(vq->private_data);
> > > - if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
> > > + if (!sock || (skb_queue_empty(&sock->sk->sk_receive_queue) &&
> > > +                 vq->link_state == VHOST_VQ_LINK_SYNC))
> > >           return;
> > >
> > >   use_mm(net->dev.mm);
> > > @@ -214,9 +313,18 @@ static void handle_rx(struct vhost_net *net)
> > >   vhost_disable_notify(vq);
> > >   hdr_size = vq->hdr_size;
> > >
> > > - vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
> > > + /* In async cases, for write logging, the simple way is to get
> > > +  * the log info always, and really logging is decided later.
> > > +  * Thus, when logging enabled, we can get log, and when logging
> > > +  * disabled, we can get log disabled accordingly.
> > > +  */
> > > +
> > > + vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) |
> > > +         (vq->link_state == VHOST_VQ_LINK_ASYNC) ?
> > >           vq->log : NULL;
> > >
> > > + handle_async_rx_events_notify(net, vq);
> > > +
> > >   for (;;) {
> > >           head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
> > >                                    ARRAY_SIZE(vq->iov),
> > > @@ -245,6 +353,14 @@ static void handle_rx(struct vhost_net *net)
> > >           s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, in);
> > >           msg.msg_iovlen = in;
> > >           len = iov_length(vq->iov, in);
> > > +         if (vq->link_state == VHOST_VQ_LINK_ASYNC) {
> > > +                 iocb = kmem_cache_zalloc(net->cache, GFP_KERNEL);
> > > +                 if (!iocb)
> > > +                         break;
> > > +                 iocb->private = vq;
> > > +                 iocb->ki_pos = head;
> > > +                 iocb->ki_user_data = log;
> > > +         }
> > >           /* Sanity check */
> > >           if (!len) {
> > >                   vq_err(vq, "Unexpected header len for RX: "
> > > @@ -252,13 +368,18 @@ static void handle_rx(struct vhost_net *net)
> > >                          iov_length(vq->hdr, s), hdr_size);
> > >                   break;
> > >           }
> > > -         err = sock->ops->recvmsg(NULL, sock, &msg,
> > > +
> > > +         err = sock->ops->recvmsg(iocb, sock, &msg,
> > >                                    len, MSG_DONTWAIT | MSG_TRUNC);
> > >           /* TODO: Check specific error and bomb out unless EAGAIN? */
> > >           if (err < 0) {
> > >                   vhost_discard_vq_desc(vq);
> > >                   break;
> > >           }
> > > +
> > > +         if (vq->link_state == VHOST_VQ_LINK_ASYNC)
> > > +                 continue;
> > > +
> > >           /* TODO: Should check and handle checksum. */
> > >           if (err > len) {
> > >                   pr_err("Discarded truncated rx packet: "
> > > @@ -284,10 +405,13 @@ static void handle_rx(struct vhost_net *net)
> > >           }
> > >   }
> > >
> > > + handle_async_rx_events_notify(net, vq);
> > > +
> > >   mutex_unlock(&vq->mutex);
> > >   unuse_mm(net->dev.mm);
> > >  }
> > >
> > > +
> > >  static void handle_tx_kick(struct work_struct *work)
> > >  {
> > >   struct vhost_virtqueue *vq;
> > > @@ -338,6 +462,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
> > >   vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT);
> > >   vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN);
> > >   n->tx_poll_state = VHOST_NET_POLL_DISABLED;
> > > + n->cache = NULL;
> > >   return 0;
> > >  }
> > >
> > > @@ -398,6 +523,17 @@ static void vhost_net_flush(struct vhost_net *n)
> > >   vhost_net_flush_vq(n, VHOST_NET_VQ_RX);
> > >  }
> > >
> > > +static void vhost_notifier_cleanup(struct vhost_net *n)
> > > +{
> > > + struct vhost_virtqueue *vq = &n->dev.vqs[VHOST_NET_VQ_RX];
> > > + struct kiocb *iocb = NULL;
> > > + if (n->cache) {
> > > +         while ((iocb = notify_dequeue(vq)) != NULL)
> > > +                 kmem_cache_free(n->cache, iocb);
> > > +         kmem_cache_destroy(n->cache);
> > > + }
> > > +}
> > > +
> > >  static int vhost_net_release(struct inode *inode, struct file *f)
> > >  {
> > >   struct vhost_net *n = f->private_data;
> > > @@ -414,6 +550,7 @@ static int vhost_net_release(struct inode *inode, struct file *f)
> > >   /* We do an extra flush before freeing memory,
> > >    * since jobs can re-queue themselves. */
> > >   vhost_net_flush(n);
> > > + vhost_notifier_cleanup(n);
> > >   kfree(n);
> > >   return 0;
> > >  }
> > > @@ -462,7 +599,19 @@ static struct socket *get_tun_socket(int fd)
> > >   return sock;
> > >  }
> > >
> > > -static struct socket *get_socket(int fd)
> > > +static struct socket *get_mp_socket(int fd)
> > > +{
> > > + struct file *file = fget(fd);
> > > + struct socket *sock;
> > > + if (!file)
> > > +         return ERR_PTR(-EBADF);
> > > + sock = mp_get_socket(file);
> > > + if (IS_ERR(sock))
> > > +         fput(file);
> > > + return sock;
> > > +}
> > > +
> > > +static struct socket *get_socket(struct vhost_virtqueue *vq, int fd)
> > >  {
> > >   struct socket *sock;
> > >   if (fd == -1)
> > > @@ -473,9 +622,31 @@ static struct socket *get_socket(int fd)
> > >   sock = get_tun_socket(fd);
> > >   if (!IS_ERR(sock))
> > >           return sock;
> > > + sock = get_mp_socket(fd);
> > > + if (!IS_ERR(sock)) {
> > > +         vq->link_state = VHOST_VQ_LINK_ASYNC;
> > > +         return sock;
> > > + }
> > >   return ERR_PTR(-ENOTSOCK);
> > >  }
> > >
> > > +static void vhost_init_link_state(struct vhost_net *n, int index)
> > > +{
> > > + struct vhost_virtqueue *vq = n->vqs + index;
> > > +
> > > + WARN_ON(!mutex_is_locked(&vq->mutex));
> > > + if (vq->link_state == VHOST_VQ_LINK_ASYNC) {
> > > +         vq->receiver = NULL;
> > > +         INIT_LIST_HEAD(&vq->notifier);
> > > +         spin_lock_init(&vq->notify_lock);
> > > +         if (!n->cache) {
> > > +                 n->cache = kmem_cache_create("vhost_kiocb",
> > > +                                 sizeof(struct kiocb), 0,
> > > +                                 SLAB_HWCACHE_ALIGN, NULL);
> > > +         }
> > > + }
> > > +}
> > > +
> > >  static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> > >  {
> > >   struct socket *sock, *oldsock;
> > > @@ -493,12 +664,15 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> > >   }
> > >   vq = n->vqs + index;
> > >   mutex_lock(&vq->mutex);
> > > - sock = get_socket(fd);
> > > + vq->link_state = VHOST_VQ_LINK_SYNC;
> > > + sock = get_socket(vq, fd);
> > >   if (IS_ERR(sock)) {
> > >           r = PTR_ERR(sock);
> > >           goto err;
> > >   }
> > >
> > > + vhost_init_link_state(n, index);
> > > +
> > >   /* start polling new socket */
> > >   oldsock = vq->private_data;
> > >   if (sock == oldsock)
> > > @@ -507,8 +681,8 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> > >   vhost_net_disable_vq(n, vq);
> > >   rcu_assign_pointer(vq->private_data, sock);
> > >   vhost_net_enable_vq(n, vq);
> > > - mutex_unlock(&vq->mutex);
> > >  done:
> > > + mutex_unlock(&vq->mutex);
> > >   mutex_unlock(&n->dev.mutex);
> > >   if (oldsock) {
> > >           vhost_net_flush_vq(n, index);
> > > @@ -516,6 +690,7 @@ done:
> > >   }
> > >   return r;
> > >  err:
> > > + mutex_unlock(&vq->mutex);
> > >   mutex_unlock(&n->dev.mutex);
> > >   return r;
> > >  }
> > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > > index d1f0453..cffe39a 100644
> > > --- a/drivers/vhost/vhost.h
> > > +++ b/drivers/vhost/vhost.h
> > > @@ -43,6 +43,11 @@ struct vhost_log {
> > >   u64 len;
> > >  };
> > >
> > > +enum vhost_vq_link_state {
> > > + VHOST_VQ_LINK_SYNC =    0,
> > > + VHOST_VQ_LINK_ASYNC =   1,
> > > +};
> > > +
> > >  /* The virtqueue structure describes a queue attached to a device. */
> > >  struct vhost_virtqueue {
> > >   struct vhost_dev *dev;
> > > @@ -96,6 +101,11 @@ struct vhost_virtqueue {
> > >   /* Log write descriptors */
> > >   void __user *log_base;
> > >   struct vhost_log log[VHOST_NET_MAX_SG];
> > > + /*Differiate async socket for 0-copy from normal*/
> > > + enum vhost_vq_link_state link_state;
> > > + struct list_head notifier;
> > > + spinlock_t notify_lock;
> > > + void (*receiver)(struct vhost_virtqueue *);
> > >  };
> > >
> > >  struct vhost_dev {
> > > --
> > > 1.5.4.4

^ permalink raw reply

* [PATCH] PCI: Disable MSI for MCP55 on P5N32-E SLI
From: Ben Hutchings @ 2010-04-07  2:27 UTC (permalink / raw)
  To: linux-pci, netdev; +Cc: 552299

As reported in <http://bugs.debian.org/552299>, MSI appears to be
broken for this on-board device.  We already have a quirk for the
P5N32-SLI Premium; extend it to cover both variants of the board.

Reported-by: Romain DEGEZ <romain.degez@smartjog.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@kernel.org
---
 drivers/pci/quirks.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 27c0e6e..4807825 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2218,15 +2218,16 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SERVERWORKS,
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8132_BRIDGE,
 			 ht_enable_msi_mapping);
 
-/* The P5N32-SLI Premium motherboard from Asus has a problem with msi
+/* The P5N32-SLI motherboards from Asus have a problem with msi
  * for the MCP55 NIC. It is not yet determined whether the msi problem
  * also affects other devices. As for now, turn off msi for this device.
  */
 static void __devinit nvenet_msi_disable(struct pci_dev *dev)
 {
-	if (dmi_name_in_vendors("P5N32-SLI PREMIUM")) {
+	if (dmi_name_in_vendors("P5N32-SLI PREMIUM") ||
+	    dmi_name_in_vendors("P5N32-E SLI")) {
 		dev_info(&dev->dev,
-			 "Disabling msi for MCP55 NIC on P5N32-SLI Premium\n");
+			 "Disabling msi for MCP55 NIC on P5N32-SLI\n");
 		dev->no_msi = 1;
 	}
 }
-- 
1.7.0.3



^ permalink raw reply related

* Re: [v2 Patch 3/3] bonding: make bonding support netpoll
From: Cong Wang @ 2010-04-07  2:32 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: linux-kernel, Matt Mackall, netdev, bridge, Andy Gospodarek,
	Neil Horman, Jeff Moyer, Stephen Hemminger, bonding-devel,
	Jay Vosburgh, David Miller
In-Reply-To: <20100406144824.GB10488@gospo.rdu.redhat.com>

Andy Gospodarek wrote:
> On Tue, Apr 06, 2010 at 12:38:16PM +0800, Cong Wang wrote:
>> Cong Wang wrote:
>>> Before I try to reproduce it, could you please try to replace the  
>>> 'read_lock()'
>>> in slaves_support_netpoll() with 'read_lock_bh()'? (read_unlock() too)  
>>> Try if this helps.
>>>
>> Confirmed. Please use the attached patch instead, for your testing.
>>
>> Thanks!
>>
> 
> Moving those locks to bh-locks will not resolve this.  I tried that
> yesterday and tried your new patch today without success.  That warning
> is a WARN_ON_ONCE so you need to reboot to see that it is still a
> problem.  Simply unloading and loading the new module is not an accurate
> test.
> 
> Also, my system still hangs when removing the bonding module.  I do not
> think you intended to fix this with the patch, but wanted it to be clear
> to everyone on the list.


Actually I did reboot and then tested the module. I didn't get any warning.
I just tried again today, and no warnings at all.

For removing bonding module, you may need another fix of mine,
which is to fix a potential deadlock of workqueue. Try:

http://lkml.org/lkml/2010/4/1/58

> 
> You should also configure your kernel with a some of the lock debugging
> enabled.  I've been using the following:
> 
> CONFIG_DETECT_HUNG_TASK=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_LOCKDEP=y
> CONFIG_LOCK_STAT=y
> CONFIG_DEBUG_LOCKDEP=y


Sure, I always keep these.

> 
> Here is the output when I remove a slave from the bond.  My
> xmit_roundrobin patch from earlier (replacing read_lock with
> read_trylock) was applied.  It might be helpful for you when debugging
> these issues.


I don't apply your patch, just tested my patch.

> 
> Dead loop on virtual device bond0, fix it urgently!
> 

Please provide your bonding configuration and steps to reproduce it.

What I did is:

1. Load bonding module with "mode=0 miimon=100"
2. Enslave eth0 and active bond0
3. Load netconsole and send messages via bond0
4. Remove eth0 from bond0
5. Remove bonding module
6. Remove netconsole module

And no deadlocks, no warnings.

Thanks.

^ permalink raw reply

* Re: hackbench regression due to commit 9dfc6e68bfe6e
From: Zhang, Yanmin @ 2010-04-07  2:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, netdev, Tejun Heo, Pekka Enberg, alex.shi,
	linux-kernel@vger.kernel.org, Ma, Ling, Chen, Tim C,
	Andrew Morton
In-Reply-To: <1270591841.2091.170.camel@edumazet-laptop>

On Wed, 2010-04-07 at 00:10 +0200, Eric Dumazet wrote:
> Le mardi 06 avril 2010 à 15:55 -0500, Christoph Lameter a écrit :
> > We cannot reproduce the issue here. Our tests here (dual quad dell) show a
> > performance increase in hackbench instead.
> > 
> > Linux 2.6.33.2 #2 SMP Mon Apr 5 11:30:56 CDT 2010 x86_64 GNU/Linux
> > ./hackbench 100 process 200000
> > Running with 100*40 (== 4000) tasks.
> > Time: 3102.142
> > ./hackbench 100 process 20000
> > Running with 100*40 (== 4000) tasks.
> > Time: 308.731
> > ./hackbench 100 process 20000
> > Running with 100*40 (== 4000) tasks.
> > Time: 311.591
> > ./hackbench 100 process 20000
> > Running with 100*40 (== 4000) tasks.
> > Time: 310.200
> > ./hackbench 10 process 20000
> > Running with 10*40 (== 400) tasks.
> > Time: 38.048
> > ./hackbench 10 process 20000
> > Running with 10*40 (== 400) tasks.
> > Time: 44.711
> > ./hackbench 10 process 20000
> > Running with 10*40 (== 400) tasks.
> > Time: 39.407
> > ./hackbench 1 process 20000
> > Running with 1*40 (== 40) tasks.
> > Time: 9.411
> > ./hackbench 1 process 20000
> > Running with 1*40 (== 40) tasks.
> > Time: 8.765
> > ./hackbench 1 process 20000
> > Running with 1*40 (== 40) tasks.
> > Time: 8.822
> > 
> > Linux 2.6.34-rc3 #1 SMP Tue Apr 6 13:30:34 CDT 2010 x86_64 GNU/Linux
> > ./hackbench 100 process 200000
> > Running with 100*40 (== 4000) tasks.
> > Time: 3003.578
> > ./hackbench 100 process 20000
> > Running with 100*40 (== 4000) tasks.
> > Time: 300.289
> > ./hackbench 100 process 20000
> > Running with 100*40 (== 4000) tasks.
> > Time: 301.462
> > ./hackbench 100 process 20000
> > Running with 100*40 (== 4000) tasks.
> > Time: 301.173
> > ./hackbench 10 process 20000
> > Running with 10*40 (== 400) tasks.
> > Time: 41.191
> > ./hackbench 10 process 20000
> > Running with 10*40 (== 400) tasks.
> > Time: 41.964
> > ./hackbench 10 process 20000
> > Running with 10*40 (== 400) tasks.
> > Time: 41.470
> > ./hackbench 1 process 20000
> > Running with 1*40 (== 40) tasks.
> > Time: 8.829
> > ./hackbench 1 process 20000
> > Running with 1*40 (== 40) tasks.
> > Time: 9.166
> > ./hackbench 1 process 20000
> > Running with 1*40 (== 40) tasks.
> > Time: 8.681
> > 
> > 
> 
> 
> Well, your config might be very different... and hackbench results can
> vary by 10% on same machine, same kernel.
> 
> This is not a reliable bench, because af_unix is not prepared to get
> such a lazy workload.
Thanks. I also found that. Normally, my script runs hackbench for 3 times and
gets an average value. To decrease the variation, I use 
'./hackbench 100 process 200000' to get a more stable result.


> 
> We really should warn people about this.
> 
> 
> 
> # hackbench 25 process 3000
> Running with 25*40 (== 1000) tasks.
> Time: 12.922
> # hackbench 25 process 3000
> Running with 25*40 (== 1000) tasks.
> Time: 12.696
> # hackbench 25 process 3000
> Running with 25*40 (== 1000) tasks.
> Time: 13.060
> # hackbench 25 process 3000
> Running with 25*40 (== 1000) tasks.
> Time: 14.108
> # hackbench 25 process 3000
> Running with 25*40 (== 1000) tasks.
> Time: 13.165
> # hackbench 25 process 3000
> Running with 25*40 (== 1000) tasks.
> Time: 13.310
> # hackbench 25 process 3000 
> Running with 25*40 (== 1000) tasks.
> Time: 12.530
> 
> 
> booting with slub_min_order=3 do change hackbench results for example ;)
By default, slub_min_order=3 on my Nehalem machines. I also tried different
larger slub_min_order and didn't find help.


> 
> All writers can compete on spinlock for a target UNIX socket, we spend _lot_ of time spinning.
> 
> If we _really_ want to speedup hackbench, we would have to change unix_state_lock()
> to use a non spinning locking primitive (aka lock_sock()), and slowdown normal path.
> 
> 
> # perf record -f hackbench 25 process 3000 
> Running with 25*40 (== 1000) tasks.
> Time: 13.330
> [ perf record: Woken up 289 times to write data ]
> [ perf record: Captured and wrote 54.312 MB perf.data (~2372928 samples) ]
> # perf report
> # Samples: 2370135
> #
> # Overhead    Command                 Shared Object  Symbol
> # ........  .........  ............................  ......
> #
>      9.68%  hackbench  [kernel]                      [k] do_raw_spin_lock
>      6.50%  hackbench  [kernel]                      [k] schedule
>      4.38%  hackbench  [kernel]                      [k] __kmalloc_track_caller
>      3.95%  hackbench  [kernel]                      [k] copy_to_user
>      3.86%  hackbench  [kernel]                      [k] __alloc_skb
>      3.77%  hackbench  [kernel]                      [k] unix_stream_recvmsg
>      3.12%  hackbench  [kernel]                      [k] sock_alloc_send_pskb
>      2.75%  hackbench  [vdso]                        [.] 0x000000ffffe425
>      2.28%  hackbench  [kernel]                      [k] sysenter_past_esp
>      2.03%  hackbench  [kernel]                      [k] __mutex_lock_common
>      2.00%  hackbench  [kernel]                      [k] kfree
>      2.00%  hackbench  [kernel]                      [k] delay_tsc
>      1.75%  hackbench  [kernel]                      [k] update_curr
>      1.70%  hackbench  [kernel]                      [k] kmem_cache_alloc
>      1.69%  hackbench  [kernel]                      [k] do_raw_spin_unlock
>      1.60%  hackbench  [kernel]                      [k] unix_stream_sendmsg
>      1.54%  hackbench  [kernel]                      [k] sched_clock_local
>      1.46%  hackbench  [kernel]                      [k] __slab_free
>      1.37%  hackbench  [kernel]                      [k] do_raw_read_lock
>      1.34%  hackbench  [kernel]                      [k] __switch_to
>      1.24%  hackbench  [kernel]                      [k] select_task_rq_fair
>      1.23%  hackbench  [kernel]                      [k] sock_wfree
>      1.21%  hackbench  [kernel]                      [k] _raw_spin_unlock_irqrestore
>      1.19%  hackbench  [kernel]                      [k] __mutex_unlock_slowpath
>      1.05%  hackbench  [kernel]                      [k] trace_hardirqs_off
>      0.99%  hackbench  [kernel]                      [k] __might_sleep
>      0.93%  hackbench  [kernel]                      [k] do_raw_read_unlock
>      0.93%  hackbench  [kernel]                      [k] _raw_spin_lock
>      0.91%  hackbench  [kernel]                      [k] try_to_wake_up
>      0.81%  hackbench  [kernel]                      [k] sched_clock
>      0.80%  hackbench  [kernel]                      [k] trace_hardirqs_on

I collected retired instruction, dtlb miss and LLC miss.
Below is data of LLC miss.

Kernel 2.6.33:
# Samples: 11639436896 LLC-load-misses
#
# Overhead          Command                                           Shared Object  Symbol
# ........  ...............  ......................................................  ......
#
    20.94%        hackbench  [kernel.kallsyms]                                       [k] copy_user_generic_string
    14.56%        hackbench  [kernel.kallsyms]                                       [k] unix_stream_recvmsg
    12.88%        hackbench  [kernel.kallsyms]                                       [k] kfree
     7.37%        hackbench  [kernel.kallsyms]                                       [k] kmem_cache_free
     7.18%        hackbench  [kernel.kallsyms]                                       [k] kmem_cache_alloc_node
     6.78%        hackbench  [kernel.kallsyms]                                       [k] kfree_skb
     6.27%        hackbench  [kernel.kallsyms]                                       [k] __kmalloc_node_track_caller
     2.73%        hackbench  [kernel.kallsyms]                                       [k] __slab_free
     2.21%        hackbench  [kernel.kallsyms]                                       [k] get_partial_node
     2.01%        hackbench  [kernel.kallsyms]                                       [k] _raw_spin_lock
     1.59%        hackbench  [kernel.kallsyms]                                       [k] schedule
     1.27%        hackbench  hackbench                                               [.] receiver
     0.99%        hackbench  libpthread-2.9.so                                       [.] __read
     0.87%        hackbench  [kernel.kallsyms]                                       [k] unix_stream_sendmsg




Kernel 2.6.34-rc3:
# Samples: 13079611308 LLC-load-misses
#
# Overhead          Command                                                         Shared Object  Symbol
# ........  ...............  ....................................................................  ......
#
    18.55%        hackbench  [kernel.kallsyms]                                                     [k] copy_user_generic_str
ing
    13.19%        hackbench  [kernel.kallsyms]                                                     [k] unix_stream_recvmsg
    11.62%        hackbench  [kernel.kallsyms]                                                     [k] kfree
     8.54%        hackbench  [kernel.kallsyms]                                                     [k] kmem_cache_free
     7.88%        hackbench  [kernel.kallsyms]                                                     [k] __kmalloc_node_track_
caller
     6.54%        hackbench  [kernel.kallsyms]                                                     [k] kmem_cache_alloc_node
     5.94%        hackbench  [kernel.kallsyms]                                                     [k] kfree_skb
     3.48%        hackbench  [kernel.kallsyms]                                                     [k] __slab_free
     2.15%        hackbench  [kernel.kallsyms]                                                     [k] _raw_spin_lock
     1.83%        hackbench  [kernel.kallsyms]                                                     [k] schedule
     1.82%        hackbench  [kernel.kallsyms]                                                     [k] get_partial_node
     1.59%        hackbench  hackbench                                                             [.] receiver
     1.37%        hackbench  libpthread-2.9.so                                                     [.] __read



^ permalink raw reply

* RE: [PATCH 1/3] A device for zero-copy based on KVM virtio-net.
From: Xin, Xiaohui @ 2010-04-07  2:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu,
	jdike@c2.user-mode-linux.org, yzhao81@gmail.com
In-Reply-To: <20100401110841.GE3323@redhat.com>

Michael,
 
>> Qemu needs a userspace write, is that a synchronous one or
>>asynchronous one?

>It's a synchronous non-blocking write.
Sorry, why the Qemu live migration needs the device have a userspace write?
how does the write operation work? And why a read operation is not cared here?

Thanks
Xiaohui

^ permalink raw reply

* Re: [PATCH 1/1] NET: usb: Adding URB_ZERO_PACKET flag to usbnet.c
From: David Miller @ 2010-04-07  2:50 UTC (permalink / raw)
  To: epasheva; +Cc: dbrownell, rfiler, netdev, linux-usb
In-Reply-To: <1270599787.8900.8.camel@Linuxdev4-laptop>

From: Elina Pasheva <epasheva@sierrawireless.com>
Date: Tue, 6 Apr 2010 17:23:07 -0700

> Subject: [PATCH 1/1] NET: usb: Adding URB_ZERO_PACKET flag to usbnet.c
> From: Elina Pasheva <epasheva@sierrawireless.com>
> This patch adds setting of the urb transfer flag URB_ZERO_PACKET  before
> submitting an urb for drivers that have requested it (by advertising flag
> FLAG_SEND_ZLP).
> The modification is in usbnet.c function usbnet_start_xmit().
> This patch only adds the zero length flag.
> A subsequent patch will address the buggy code we found when devices do not
> advertise FLAG_SEND_ZLP in which case there is a possibility of transferring
> packets with non-deterministic length.
> 
> This patch has been tested on kernel-2.6.34-rc3.
> This patch has been checked against net-2.6 tree.
> Signed-off-by: Elina Pasheva <epasheva@sierrawireless.com>
> Signed-off-by: Rory Filer <rfiler@sierrawireless.com>

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH 1/1] TIPC: Updated topology subscription protocol according to latest spec
From: David Miller @ 2010-04-07  2:51 UTC (permalink / raw)
  To: jon.maloy; +Cc: netdev, tipc-discussion
In-Reply-To: <1270590052-975-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Tue, 6 Apr 2010 17:40:52 -0400

> This patch makes it explicit in the API that all fields in subscriptions and events exchanged with the Topology Server must be in
> network byte order.
> It also ensures that all fields of a subscription are compared when cancelling a subscription, in order to avoid inadvertent
> cancelling of the wrong subscription.
> Finally, the tipc module version is updated to 2.0.0, to reflect the API change.
> 
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH] socket: remove duplicate declaration of struct timespec
From: David Miller @ 2010-04-07  2:51 UTC (permalink / raw)
  To: hagen; +Cc: netdev
In-Reply-To: <1270568392-22030-1-git-send-email-hagen@jauu.net>

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Tue,  6 Apr 2010 17:39:52 +0200

> struct timespec ts was alreay defined. Reuse the previously
> defined one and reduce the memory footprint on the stack by
> 16 bytes.
> 
> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: David Miller @ 2010-04-07  2:52 UTC (permalink / raw)
  To: eric.dumazet
  Cc: john.linn, netdev, linuxppc-dev, grant.likely, jwboyer,
	john.williams, michal.simek, jtyner
In-Reply-To: <1270502993.9013.36.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 05 Apr 2010 23:29:53 +0200

> So, I ask, cant you use netdev_alloc_skb_ip_align() in this driver ?

Thanks to everyone for getting this patch into shape.

I applied version 4 of the patch to net-next-2.6, thanks!

^ permalink raw reply

* newbie question about qdisc_watchdog_schedule
From: Yongjian Zhang @ 2010-04-07  2:52 UTC (permalink / raw)
  To: netdev

Hi all,

I have a newbie question about qdisc_watchdog_schedule(). This
function appears in the source code of several qdiscs and it seems to
be used when the qdisc is asked to dequeue a packet but cannot.

I tried to trace the source code but only found some function calls to
start a timer. No callback function seems to be involved...

Google did not give me any documentation to this function either...

Can anyone maybe show me what does qdisc_watchdog_schedule() do and/or
why we have to call it when there's no packet to dequeue?

Thanks a lot!

Jack

^ permalink raw reply

* Re: [PATCH 0/2] net/irda: sh_sir: Bug fix patches
From: David Miller @ 2010-04-07  2:52 UTC (permalink / raw)
  To: kuninori.morimoto.gx; +Cc: netdev, samuel
In-Reply-To: <u39z96vk2.wl%kuninori.morimoto.gx@renesas.com>

From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Date: Tue, 06 Apr 2010 13:42:39 +0900 (JST)

> 
> Dear David
> 
> Kuninori Morimoto (2):
>       net/irda: sh_sir: fixup err return value on sh_sir_open
>       net/irda: sh_sir: Modify iounmap wrong execution
> 
> These 2 patches are bug fix of sh_sir driver.

Both applied to net-next-2.6, thank you.

^ permalink raw reply

* Re: [PATCH] net/irda: Add SuperH IrDA driver support
From: David Miller @ 2010-04-07  2:52 UTC (permalink / raw)
  To: kuninori.morimoto.gx; +Cc: netdev, samuel
In-Reply-To: <uy6h15gtn.wl%kuninori.morimoto.gx@renesas.com>

From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Date: Tue, 06 Apr 2010 13:46:12 +0900 (JST)

> This is very simple driver for SuperH Mobile IrDA
> which support SIR/MIR/FIR.
> This patch add only SIR support for now.
> 
> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>

Applied to net-next-2.6, thank you.

^ permalink raw reply

* linux-next: build failure after merge of the net tree
From: Stephen Rothwell @ 2010-04-07  2:58 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Dimitris Michailidis, Jiri Pirko

[-- Attachment #1: Type: text/plain, Size: 938 bytes --]

Hi Dave,

After merging the wireless tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

drivers/net/cxgb4/cxgb4_main.c: In function 'set_addr_filters':
drivers/net/cxgb4/cxgb4_main.c:264: error: dereferencing pointer to incomplete type
drivers/net/cxgb4/cxgb4_main.c:265: error: dereferencing pointer to incomplete type

Caused by commit b8ff05a9c3237f694a1c3bf8ceec3bf6c3c14b15 ("cxgb4: Add
main driver file and driver Makefile") from Linus' tree interacting with
commit 22bedad3ce112d5ca1eaf043d4990fa2ed698c87 ("net: convert multicast
list to list_head").  The latter removed struct dev_addr_list.

I have reverted commit b8ff05a9c3237f694a1c3bf8ceec3bf6c3c14b15 (and
43e9da8d782b8a40d5127fcc59ac2e543cf16d7d ("net: Hook up cxgb4 to Kconfig
and Makefile") that depended on it) for today.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2010-04-07  2:58 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel, Amerigo Wang, Jiri Pirko

Hi all,

Today's linux-next merge of the net tree got a conflict in
drivers/net/bonding/bond_main.c between commit
9e2e61fbf8ad016d24e4af0afff13505f3dd2a2a ("bonding: fix potential
deadlock in bond_uninit()") from Linus' tree and commit
22bedad3ce112d5ca1eaf043d4990fa2ed698c87 ("net: convert multicast list to
list_head") from the net tree.

Just context changes.  If fixed it up (see below) and can carry the fix
for a while.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc drivers/net/bonding/bond_main.c
index 0075514,22682f1..0000000
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@@ -4550,9 -4476,10 +4508,7 @@@ static void bond_uninit(struct net_devi
  
  	bond_remove_proc_entry(bond);
  
- 	netif_addr_lock_bh(bond_dev);
- 	bond_mc_list_destroy(bond);
- 	netif_addr_unlock_bh(bond_dev);
 -	if (bond->wq)
 -		destroy_workqueue(bond->wq);
 -
+ 	__hw_addr_flush(&bond->mc_list);
  }
  
  /*------------------------- Module initialization ---------------------------*/

^ permalink raw reply

* linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2010-04-07  2:58 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel, Francois Romieu

Hi all,

Today's linux-next merge of the net tree got a conflict in
drivers/net/via-velocity.c between commit
bcbe53682f65330bdd9ad7eed9575d2ff536353a ("via-velocity: Fix
FLOW_CNTL_TX_RX handling in set_mii_flow_control()") from Linus' tree and
commit 3a7f8681ffb27bcc540fb74cda15e39c9395737b ("via-velocity: remove
private #define") from the net tree.

I fixed it up (see below) and can carry the fix for a while.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc drivers/net/via-velocity.c
index bc278d4,078903f..0000000
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@@ -811,8 -811,8 +811,8 @@@ static void set_mii_flow_control(struc
  		break;
  
  	case FLOW_CNTL_TX_RX:
- 		MII_REG_BITS_ON(ANAR_PAUSE, MII_REG_ANAR, vptr->mac_regs);
- 		MII_REG_BITS_OFF(ANAR_ASMDIR, MII_REG_ANAR, vptr->mac_regs);
+ 		MII_REG_BITS_ON(ADVERTISE_PAUSE_CAP, MII_ADVERTISE, vptr->mac_regs);
 -		MII_REG_BITS_ON(ADVERTISE_PAUSE_ASYM, MII_ADVERTISE, vptr->mac_regs);
++		MII_REG_BITS_OFF(ADVERTISE_PAUSE_ASYM, MII_ADVERTISE, vptr->mac_regs);
  		break;
  
  	case FLOW_CNTL_DISABLE:

^ permalink raw reply

* Re: linux-next: build failure after merge of the net tree
From: David Miller @ 2010-04-07  3:12 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, dm, jpirko
In-Reply-To: <20100407125817.e8e1d174.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 7 Apr 2010 12:58:17 +1000

> After merging the wireless tree, today's linux-next build (x86_64 allmodconfig)
> failed like this:
> 
> drivers/net/cxgb4/cxgb4_main.c: In function 'set_addr_filters':
> drivers/net/cxgb4/cxgb4_main.c:264: error: dereferencing pointer to incomplete type
> drivers/net/cxgb4/cxgb4_main.c:265: error: dereferencing pointer to incomplete type
> 
> Caused by commit b8ff05a9c3237f694a1c3bf8ceec3bf6c3c14b15 ("cxgb4: Add
> main driver file and driver Makefile") from Linus' tree interacting with
> commit 22bedad3ce112d5ca1eaf043d4990fa2ed698c87 ("net: convert multicast
> list to list_head").  The latter removed struct dev_addr_list.
> 
> I have reverted commit b8ff05a9c3237f694a1c3bf8ceec3bf6c3c14b15 (and
> 43e9da8d782b8a40d5127fcc59ac2e543cf16d7d ("net: Hook up cxgb4 to Kconfig
> and Makefile") that depended on it) for today.

Thanks, I'll merge net-2.6 into net-next-2.6 and fix this build
failure in the latter.

^ permalink raw reply

* Re: linux-next: manual merge of the net tree with Linus' tree
From: Cong Wang @ 2010-04-07  3:20 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, netdev, linux-next, linux-kernel, Jiri Pirko
In-Reply-To: <20100407125836.3c7f1095.sfr@canb.auug.org.au>

Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net tree got a conflict in
> drivers/net/bonding/bond_main.c between commit
> 9e2e61fbf8ad016d24e4af0afff13505f3dd2a2a ("bonding: fix potential
> deadlock in bond_uninit()") from Linus' tree and commit
> 22bedad3ce112d5ca1eaf043d4990fa2ed698c87 ("net: convert multicast list to
> list_head") from the net tree.
> 
> Just context changes.  If fixed it up (see below) and can carry the fix
> for a while.

Hi, Stephen,

If I read the combined diff correctly, it looks fine for me.

Thanks for letting me know!

^ permalink raw reply

* linux-next: manual merge of the rr tree with the net tree
From: Stephen Rothwell @ 2010-04-07  3:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-next, linux-kernel, David Woodhouse, Ben Hutchings,
	David Miller, netdev, Ondrej Zary

Hi Rusty,

Today's linux-next merge of the rr tree got a conflict in
include/linux/mod_devicetable.h scripts/mod/file2alias.c between commit
8626d3b4328061f5b82b11ae1d6918a0c3602f42 ("phylib: Support phy module
autoloading") from the net tree and commits
df73f69d6a29eb5a22327009627f23c7ae4be007
("modules:isapnp-mod_devicetable.h") and a8e67afa03daca8b570115ba9b33f508cc5a9133 ("MODULE_DEVICE_TABLE(isapnp, ...) does nothing") from the rr tree.

Just overlapping additions.  I fixed it up (see below) and can carry the
fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc include/linux/mod_devicetable.h
index 55f1f9c,e69d69f..0000000
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@@ -474,30 -474,11 +474,37 @@@ struct platform_device_id 
  			__attribute__((aligned(sizeof(kernel_ulong_t))));
  };
  
 +#define MDIO_MODULE_PREFIX	"mdio:"
 +
 +#define MDIO_ID_FMT "%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d"
 +#define MDIO_ID_ARGS(_id) \
 +	(_id)>>31, ((_id)>>30) & 1, ((_id)>>29) & 1, ((_id)>>28) & 1,	\
 +	((_id)>>27) & 1, ((_id)>>26) & 1, ((_id)>>25) & 1, ((_id)>>24) & 1, \
 +	((_id)>>23) & 1, ((_id)>>22) & 1, ((_id)>>21) & 1, ((_id)>>20) & 1, \
 +	((_id)>>19) & 1, ((_id)>>18) & 1, ((_id)>>17) & 1, ((_id)>>16) & 1, \
 +	((_id)>>15) & 1, ((_id)>>14) & 1, ((_id)>>13) & 1, ((_id)>>12) & 1, \
 +	((_id)>>11) & 1, ((_id)>>10) & 1, ((_id)>>9) & 1, ((_id)>>8) & 1, \
 +	((_id)>>7) & 1, ((_id)>>6) & 1, ((_id)>>5) & 1, ((_id)>>4) & 1, \
 +	((_id)>>3) & 1, ((_id)>>2) & 1, ((_id)>>1) & 1, (_id) & 1
 +
 +/**
 + * struct mdio_device_id - identifies PHY devices on an MDIO/MII bus
 + * @phy_id: The result of
 + *     (mdio_read(&MII_PHYSID1) << 16 | mdio_read(&PHYSID2)) & @phy_id_mask
 + *     for this PHY type
 + * @phy_id_mask: Defines the significant bits of @phy_id.  A value of 0
 + *     is used to terminate an array of struct mdio_device_id.
 + */
 +struct mdio_device_id {
 +	__u32 phy_id;
 +	__u32 phy_id_mask;
 +};
 +
+ #define ISAPNP_ANY_ID		0xffff
+ struct isapnp_device_id {
+ 	unsigned short card_vendor, card_device;
+ 	unsigned short vendor, function;
+ 	kernel_ulong_t driver_data;	/* data private to the driver */
+ };
+ 
  #endif /* LINUX_MOD_DEVICETABLE_H */
diff --cc scripts/mod/file2alias.c
index 36a60a8,6494f5b..0000000
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@@ -796,28 -796,19 +796,41 @@@ static int do_platform_entry(const cha
  	return 1;
  }
  
 +static int do_mdio_entry(const char *filename,
 +			 struct mdio_device_id *id, char *alias)
 +{
 +	int i;
 +
 +	alias += sprintf(alias, MDIO_MODULE_PREFIX);
 +
 +	for (i = 0; i < 32; i++) {
 +		if (!((id->phy_id_mask >> (31-i)) & 1))
 +			*(alias++) = '?';
 +		else if ((id->phy_id >> (31-i)) & 1)
 +			*(alias++) = '1';
 +		else
 +			*(alias++) = '0';
 +	}
 +
 +	/* Terminate the string */
 +	*alias = 0;
 +
 +	return 1;
 +}
 +
+ /* looks like: "pnp:dD" */
+ static int do_isapnp_entry(const char *filename,
+ 			   struct isapnp_device_id *id, char *alias)
+ {
+ 	sprintf(alias, "pnp:d%c%c%c%x%x%x%x*",
+ 		'A' + ((id->vendor >> 2) & 0x3f) - 1,
+ 		'A' + (((id->vendor & 3) << 3) | ((id->vendor >> 13) & 7)) - 1,
+ 		'A' + ((id->vendor >> 8) & 0x1f) - 1,
+ 		(id->function >> 4) & 0x0f, id->function & 0x0f,
+ 		(id->function >> 12) & 0x0f, (id->function >> 8) & 0x0f);
+ 	return 1;
+ }
+ 
  /* Ignore any prefix, eg. some architectures prepend _ */
  static inline int sym_is(const char *symbol, const char *name)
  {
@@@ -965,10 -956,10 +978,14 @@@ void handle_moddevtable(struct module *
  		do_table(symval, sym->st_size,
  			 sizeof(struct platform_device_id), "platform",
  			 do_platform_entry, mod);
 +	else if (sym_is(symname, "__mod_mdio_device_table"))
 +		do_table(symval, sym->st_size,
 +			 sizeof(struct mdio_device_id), "mdio",
 +			 do_mdio_entry, mod);
+ 	else if (sym_is(symname, "__mod_isapnp_device_table"))
+ 		do_table(symval, sym->st_size,
+ 			sizeof(struct isapnp_device_id), "isa",
+ 			do_isapnp_entry, mod);
  	free(zeros);
  }
  

^ permalink raw reply

* Re: [v2 Patch 3/3] bonding: make bonding support netpoll
From: Andy Gospodarek @ 2010-04-07  4:02 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jay Vosburgh, Neil Horman, netdev@vger.kernel.org, Matt Mackall,
	bridge@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	David Miller, Jeff Moyer, Andy Gospodarek,
	bonding-devel@lists.sourceforge.net
In-Reply-To: <4BBBEEAA.1050100@redhat.com>

On Apr 6, 2010, at 10:32 PM, Cong Wang <amwang@redhat.com> wrote:

> Andy Gospodarek wrote:
>> On Tue, Apr 06, 2010 at 12:38:16PM +0800, Cong Wang wrote:
>>> Cong Wang wrote:
>>>> Before I try to reproduce it, could you please try to replace
>>>> the  'read_lock()'
>>>> in slaves_support_netpoll() with 'read_lock_bh()'? (read_unlock()
>>>> too)  Try if this helps.
>>>>
>>> Confirmed. Please use the attached patch instead, for your testing.
>>>
>>> Thanks!
>>>
>> Moving those locks to bh-locks will not resolve this.  I tried that
>> yesterday and tried your new patch today without success.  That
>> warning
>> is a WARN_ON_ONCE so you need to reboot to see that it is still a
>> problem.  Simply unloading and loading the new module is not an
>> accurate
>> test.
>> Also, my system still hangs when removing the bonding module.  I do
>> not
>> think you intended to fix this with the patch, but wanted it to be
>> clear
>> to everyone on the list.
>
>
> Actually I did reboot and then tested the module. I didn't get any
> warning.
> I just tried again today, and no warnings at all.
>
> For removing bonding module, you may need another fix of mine,
> which is to fix a potential deadlock of workqueue. Try:
>
> http://lkml.org/lkml/2010/4/1/58
>
>> You should also configure your kernel with a some of the lock
>> debugging
>> enabled.  I've been using the following:
>> CONFIG_DETECT_HUNG_TASK=y
>> CONFIG_DEBUG_SPINLOCK=y
>> CONFIG_DEBUG_MUTEXES=y
>> CONFIG_DEBUG_LOCK_ALLOC=y
>> CONFIG_PROVE_LOCKING=y
>> CONFIG_LOCKDEP=y
>> CONFIG_LOCK_STAT=y
>> CONFIG_DEBUG_LOCKDEP=y
>
>
> Sure, I always keep these.
>
>> Here is the output when I remove a slave from the bond.  My
>> xmit_roundrobin patch from earlier (replacing read_lock with
>> read_trylock) was applied.  It might be helpful for you when
>> debugging
>> these issues.
>
>
> I don't apply your patch, just tested my patch.
>
>> Dead loop on virtual device bond0, fix it urgently!
>
> Please provide your bonding configuration and steps to reproduce it.
>

My first response in this thread provides the commands and
configuration needed to reproduce this.

> What I did is:
>
> 1. Load bonding module with "mode=0 miimon=100"
> 2. Enslave eth0 and active bond0
> 3. Load netconsole and send messages via bond0
> 4. Remove eth0 from bond0
> 5. Remove bonding module
> 6. Remove netconsole module

Thanks for sending your configuration.

What values are in /proc/sys/kernel/printk?

> And no deadlocks, no warnings.
>
> Thanks.

^ permalink raw reply

* linux-next: manual merge of the pcmcia tree with the net tree
From: Stephen Rothwell @ 2010-04-07  4:17 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: linux-next, linux-kernel, Alexander Kurz, David Miller, netdev

Hi Dominik,

Today's linux-next merge of the pcmcia tree got a conflict in
drivers/net/pcmcia/3c589_cs.c between commit
f64e96973a1fa885ce6e4f7e3fdbae83de98fcab ("net/pcmcia/3c589_cs: using
netdev_info and friends where appropriate") from the net tree and commits
cbd8ca0260b78f3b4157dc8ff23edbd211c4eef5 ("pcmcia: re-work
pcmcia_request_irq()") and cc25a704447fad12205ca8c7c42d6bba1c1590b0
("pcmcia: dev_node removal (drivers with unregister_netdev check)") from
the pcmcia tree.

Most of these conflicts are just due to unrelated white space changes in
the net tree commit.  I fixed it up (see below) and can carry the fix as
necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc drivers/net/pcmcia/3c589_cs.c
index 580977f,5ab589d..0000000
--- a/drivers/net/pcmcia/3c589_cs.c
+++ b/drivers/net/pcmcia/3c589_cs.c
@@@ -129,13 -106,12 +129,12 @@@ enum RxFilter 
  
  struct el3_private {
  	struct pcmcia_device	*p_dev;
- 	dev_node_t		node;
 -    /* For transceiver monitoring */
 -    struct timer_list	media;
 -    u16			media_status;
 -    u16			fast_poll;
 -    unsigned long	last_irq;
 -    spinlock_t		lock;
 +	/* For transceiver monitoring */
 +	struct timer_list	media;
 +	u16			media_status;
 +	u16			fast_poll;
 +	unsigned long		last_irq;
 +	spinlock_t		lock;
  };
  
  static const char *if_names[] = { "auto", "10baseT", "10base2", "AUI" };
@@@ -301,8 -274,8 +297,8 @@@ static int tc589_config(struct pcmcia_d
      ret = pcmcia_request_configuration(link, &link->conf);
      if (ret)
  	    goto failed;
 -	
 +
-     dev->irq = link->irq.AssignedIRQ;
+     dev->irq = link->irq;
      dev->base_addr = link->io.BasePort1;
      ioaddr = dev->base_addr;
      EL3WINDOW(0);
@@@ -335,8 -308,7 +331,7 @@@
  	dev->if_port = if_port;
      else
  	printk(KERN_ERR "3c589_cs: invalid if_port requested\n");
 -    
 +
-     link->dev_node = &lp->node;
      SET_NETDEV_DEV(dev, &link->dev);
  
      if (register_netdev(dev) != 0) {
@@@ -344,15 -316,13 +339,12 @@@
  	goto failed;
      }
  
-     strcpy(lp->node.dev_name, dev->name);
- 
 -    printk(KERN_INFO "%s: 3Com 3c%s, io %#3lx, irq %d, "
 -	   "hw_addr %pM\n",
 -	   dev->name, (multi ? "562" : "589"), dev->base_addr, dev->irq,
 -	   dev->dev_addr);
 -    printk(KERN_INFO "  %dK FIFO split %s Rx:Tx, %s xcvr\n",
 -	   (fifo & 7) ? 32 : 8, ram_split[(fifo >> 16) & 3],
 -	   if_names[dev->if_port]);
 +    netdev_info(dev, "3Com 3c%s, io %#3lx, irq %d, hw_addr %pM\n",
 +		(multi ? "562" : "589"), dev->base_addr, dev->irq,
 +		dev->dev_addr);
 +    netdev_info(dev, "  %dK FIFO split %s Rx:Tx, %s xcvr\n",
 +		(fifo & 7) ? 32 : 8, ram_split[(fifo >> 16) & 3],
 +		if_names[dev->if_port]);
      return 0;
  
  failed:

^ permalink raw reply

* Re: [v2 Patch 3/3] bonding: make bonding support netpoll
From: Cong Wang @ 2010-04-07  4:20 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: linux-kernel@vger.kernel.org, Matt Mackall,
	netdev@vger.kernel.org, bridge@lists.linux-foundation.org,
	Andy Gospodarek, Neil Horman, Jeff Moyer, Stephen Hemminger,
	bonding-devel@lists.sourceforge.net, Jay Vosburgh, David Miller
In-Reply-To: <70501020920527933@unknownmsgid>

Andy Gospodarek wrote:
> On Apr 6, 2010, at 10:32 PM, Cong Wang <amwang@redhat.com> wrote:
> 
>> Andy Gospodarek wrote:
>>> On Tue, Apr 06, 2010 at 12:38:16PM +0800, Cong Wang wrote:
>>>> Cong Wang wrote:
>>>>> Before I try to reproduce it, could you please try to replace
>>>>> the  'read_lock()'
>>>>> in slaves_support_netpoll() with 'read_lock_bh()'? (read_unlock()
>>>>> too)  Try if this helps.
>>>>>
>>>> Confirmed. Please use the attached patch instead, for your testing.
>>>>
>>>> Thanks!
>>>>
>>> Moving those locks to bh-locks will not resolve this.  I tried that
>>> yesterday and tried your new patch today without success.  That
>>> warning
>>> is a WARN_ON_ONCE so you need to reboot to see that it is still a
>>> problem.  Simply unloading and loading the new module is not an
>>> accurate
>>> test.
>>> Also, my system still hangs when removing the bonding module.  I do
>>> not
>>> think you intended to fix this with the patch, but wanted it to be
>>> clear
>>> to everyone on the list.
>>
>> Actually I did reboot and then tested the module. I didn't get any
>> warning.
>> I just tried again today, and no warnings at all.
>>
>> For removing bonding module, you may need another fix of mine,
>> which is to fix a potential deadlock of workqueue. Try:
>>
>> http://lkml.org/lkml/2010/4/1/58
>>
>>> You should also configure your kernel with a some of the lock
>>> debugging
>>> enabled.  I've been using the following:
>>> CONFIG_DETECT_HUNG_TASK=y
>>> CONFIG_DEBUG_SPINLOCK=y
>>> CONFIG_DEBUG_MUTEXES=y
>>> CONFIG_DEBUG_LOCK_ALLOC=y
>>> CONFIG_PROVE_LOCKING=y
>>> CONFIG_LOCKDEP=y
>>> CONFIG_LOCK_STAT=y
>>> CONFIG_DEBUG_LOCKDEP=y
>>
>> Sure, I always keep these.
>>
>>> Here is the output when I remove a slave from the bond.  My
>>> xmit_roundrobin patch from earlier (replacing read_lock with
>>> read_trylock) was applied.  It might be helpful for you when
>>> debugging
>>> these issues.
>>
>> I don't apply your patch, just tested my patch.
>>
>>> Dead loop on virtual device bond0, fix it urgently!
>> Please provide your bonding configuration and steps to reproduce it.
>>
> 
> My first response in this thread provides the commands and
> configuration needed to reproduce this.


Then I should do the right thing.

> 
>> What I did is:
>>
>> 1. Load bonding module with "mode=0 miimon=100"
>> 2. Enslave eth0 and active bond0
>> 3. Load netconsole and send messages via bond0
>> 4. Remove eth0 from bond0
>> 5. Remove bonding module
>> 6. Remove netconsole module
> 
> Thanks for sending your configuration.
> 
> What values are in /proc/sys/kernel/printk?
> 

I use default values on RHEL5:

6 4 1 7

I don't think this is related with loglevel, what I checked
is dmesg, not just the console screen.

Thanks.

^ permalink raw reply

* Re: linux-next: build failure after merge of the net tree
From: Stephen Rothwell @ 2010-04-07  5:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-next, linux-kernel, dm, jpirko
In-Reply-To: <20100406.201220.195404728.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 319 bytes --]

Hi Dave,

On Tue, 06 Apr 2010 20:12:20 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
>
> Thanks, I'll merge net-2.6 into net-next-2.6 and fix this build
> failure in the latter.

Thanks for that.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2010-04-07  5:28 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, netdev, linux-next, linux-kernel, Jiri Pirko
In-Reply-To: <4BBBF9E9.4080504@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 288 bytes --]

Hi,

On Wed, 07 Apr 2010 11:20:09 +0800 Cong Wang <amwang@redhat.com> wrote:
>
> If I read the combined diff correctly, it looks fine for me.

Thanks for the confirmation.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox