Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: John Fastabend @ 2011-01-18  3:16 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Oleg V. Ukhno, netdev@vger.kernel.org, David S. Miller
In-Reply-To: <26330.1295049912@death>

On 1/14/2011 4:05 PM, Jay Vosburgh wrote:
> Oleg V. Ukhno <olegu@yandex-team.ru> wrote:
>> Jay Vosburgh wrote:
>>
>>> 	This is a violation of the 802.3ad (now 802.1ax) standard, 5.2.1
>>> (f), which requires that all frames of a given "conversation" are passed
>>> to a single port.
>>>
>>> 	The existing layer3+4 hash has a similar problem (that it may
>>> send packets from a conversation to multiple ports), but for that case
>>> it's an unlikely exception (only in the case of IP fragmentation), but
>>> here it's the norm.  At a minimum, this must be clearly documented.
>>>
>>> 	Also, what does a round robin in 802.3ad provide that the
>>> existing round robin does not?  My presumption is that you're looking to
>>> get the aggregator autoconfiguration that 802.3ad provides, but you
>>> don't say.
> 
> 	I'm still curious about this question.  Given the rather
> intricate setup of your particular network (described below), I'm not
> sure why 802.3ad is of benefit over traditional etherchannel
> (balance-rr / balance-xor).
> 
>>> 	I don't necessarily think this is a bad cheat (round robining on
>>> 802.3ad as an explicit non-standard extension), since everybody wants to
>>> stripe their traffic across multiple slaves.  I've given some thought to
>>> making round robin into just another hash mode, but this also does some
>>> magic to the MAC addresses of the outgoing frames (more on that below).
>> Yes, I am resetting MAC addresses when transmitting packets to have switch
>> to put packets into different ports of the receiving etherchannel.
> 
> 	By "etherchannel" do you really mean "Cisco switch with a
> port-channel group using LACP"?
> 
>> I am using this patch to provide full-mesh ISCSI connectivity between at
>> least 4 hosts (all hosts of course are in same ethernet segment) and every
>> host is connected with aggregate link with 4 slaves(usually).
>> Using round-robin I provide near-equal load striping when transmitting,
>> using MAC address magic I force switch to stripe packets over all slave
>> links in destination port-channel(when number of rx-ing slaves is equal to
>> number ot tx-ing slaves and is even).
> 
> 	By "MAC address magic" do you mean that you're assigning
> specifically chosen MAC addresses to the slaves so that the switch's
> hash is essentially "assigning" the bonding slaves to particular ports
> on the outgoing port-channel group?
> 
> 	Assuming that this is the case, it's an interesting idea, but
> I'm unconvinced that it's better on 802.3ad vs. balance-rr.  Unless I'm
> missing something, you can get everything you need from an option to
> have balance-rr / balance-xor utilize the slave's permanent address as
> the source address for outgoing traffic.
> 
>> [...] So I am able to utilize all slaves
>> for tx and for rx up to maximum capacity; besides I am getting L2 link
>> failure detection (and load rebalancing), which is (in my opinion) much
>> faster and robust than L3 or than dm-multipath provides.
>> It's my idea with the patch
> 
> 	Can somebody (John?) more knowledgable than I about dm-multipath
> comment on the above?

Here I'll give it a go.

I don't think detecting L2 link failure this way is very robust. If there
is a failure farther away then your immediate link your going to break
completely? Your bonding hash will continue to round robin the iscsi
packets and half them will get dropped on the floor. dm-multipath handles
this reasonably gracefully. Also in this bonding environment you seem to
be very sensitive to RTT times on the network. Maybe not bad out right but
I wouldn't consider this robust either.

You could tweak your scsi timeout values and fail_fast values, set the io
retry to 0 to cause the fail over to occur faster. I suspect you already
did this and still it is too slow? Maybe adding a checker in multipathd to
listen for link events would be fast enough. The checker could then fail
the path immediately.

I'll try to address your comments from the other thread here. In general I
wonder if it would be better to solve the problems in dm-multipath rather than
add another bonding mode?

OVU - it is slow(I am using ISCSI for Oracle , so I need to minimize latency)

The dm-multipath layer is adding latency? How much? If this is really true
maybe its best to the address the real issue here and not avoid it by
using the bonding layer.

OVU - it handles any link failures bad, because of it's command queue 
limitation(all queued commands above 32 are discarded in case of path 
failure, as I remember)

Maybe true but only link failures with the immediate peer are handled
with a bonding strategy. By working at the block layer we can detect
failures throughout the path. I would need to look into this again I
know when we were looking at this sometime ago there was some talk about
improving this behavior. I need to take some time to go back through the
error recovery stuff to remember how this works.

OVU - it performs very bad when there are many devices and maтy paths(I was 
unable to utilize more that 2Gbps of 4 even with 100 disks with 4 paths 
per each disk)

Hmm well that seems like something is broken. I'll try this setup when
I get some time next few days. This really shouldn't be the case dm-multipath
should not add a bunch of extra latency or effect throughput significantly.
By the way what are you seeing without mpio?

Thanks,
John

^ permalink raw reply

* Re: [PATCH 1/3] vhost-net: check the support of mergeable buffer outside the receive loop
From: Jason Wang @ 2011-01-18  4:26 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, virtualization, netdev, kvm, linux-kernel
In-Reply-To: <20110117084644.GB23479@redhat.com>

Michael S. Tsirkin writes:
 > On Mon, Jan 17, 2011 at 04:10:59PM +0800, Jason Wang wrote:
 > > No need to check the support of mergeable buffer inside the recevie
 > > loop as the whole handle_rx()_xx is in the read critical region.  So
 > > this patch move it ahead of the receiving loop.
 > > 
 > > Signed-off-by: Jason Wang <jasowang@redhat.com>
 > 
 > Well feature check is mostly just features & bit
 > so why would it be slower? Because of the rcu
 > adding memory barriers? Do you observe a
 > measureable speedup with this patch?
 > 

I do not measure the performance for just this patch, maybe not obvious. And it
can also help the code unification.

 > Apropos, I noticed that the check in vhost_has_feature
 > is wrong: it uses the same kind of RCU as the
 > private pointer. So we'll have to fix that properly
 > by adding more lockdep classes, but for now
 > we'll need to make
 > the check 1 || lockdep_is_held(&dev->mutex);
 > and add a TODO.
 > 

Not sure, lockdep_is_head(&dev->mutex) maybe not accurate but sufficient, as it
was always held in the read critical region.

 > > ---
 > >  drivers/vhost/net.c |    5 +++--
 > >  1 files changed, 3 insertions(+), 2 deletions(-)
 > > 
 > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 > > index 14fc189..95e49de 100644
 > > --- a/drivers/vhost/net.c
 > > +++ b/drivers/vhost/net.c
 > > @@ -411,7 +411,7 @@ static void handle_rx_mergeable(struct vhost_net *net)
 > >  	};
 > >  
 > >  	size_t total_len = 0;
 > > -	int err, headcount;
 > > +	int err, headcount, mergeable;
 > >  	size_t vhost_hlen, sock_hlen;
 > >  	size_t vhost_len, sock_len;
 > >  	struct socket *sock = rcu_dereference(vq->private_data);
 > > @@ -425,6 +425,7 @@ static void handle_rx_mergeable(struct vhost_net *net)
 > >  
 > >  	vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
 > >  		vq->log : NULL;
 > > +	mergeable = vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF);
 > >  
 > >  	while ((sock_len = peek_head_len(sock->sk))) {
 > >  		sock_len += sock_hlen;
 > > @@ -474,7 +475,7 @@ static void handle_rx_mergeable(struct vhost_net *net)
 > >  			break;
 > >  		}
 > >  		/* TODO: Should check and handle checksum. */
 > > -		if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF) &&
 > > +		if (likely(mergeable) &&
 > >  		    memcpy_toiovecend(vq->hdr, (unsigned char *)&headcount,
 > >  				      offsetof(typeof(hdr), num_buffers),
 > >  				      sizeof hdr.num_buffers)) {

^ permalink raw reply

* Re: [PATCH 1/3] vhost-net: check the support of mergeable buffer outside the receive loop
From: Michael S. Tsirkin @ 2011-01-18  4:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, netdev, kvm, linux-kernel
In-Reply-To: <19765.5737.352179.50100@gargle.gargle.HOWL>

On Tue, Jan 18, 2011 at 12:26:17PM +0800, Jason Wang wrote:
> Michael S. Tsirkin writes:
>  > On Mon, Jan 17, 2011 at 04:10:59PM +0800, Jason Wang wrote:
>  > > No need to check the support of mergeable buffer inside the recevie
>  > > loop as the whole handle_rx()_xx is in the read critical region.  So
>  > > this patch move it ahead of the receiving loop.
>  > > 
>  > > Signed-off-by: Jason Wang <jasowang@redhat.com>
>  > 
>  > Well feature check is mostly just features & bit
>  > so why would it be slower? Because of the rcu
>  > adding memory barriers? Do you observe a
>  > measureable speedup with this patch?
>  > 
> 
> I do not measure the performance for just this patch, maybe not obvious. And it
> can also help the code unification.
> 
>  > Apropos, I noticed that the check in vhost_has_feature
>  > is wrong: it uses the same kind of RCU as the
>  > private pointer. So we'll have to fix that properly
>  > by adding more lockdep classes, but for now
>  > we'll need to make
>  > the check 1 || lockdep_is_held(&dev->mutex);
>  > and add a TODO.
>  > 
> 
> Not sure, lockdep_is_head(&dev->mutex) maybe not accurate but sufficient, as it
> was always held in the read critical region.

Not really, when we call vhost_has_feature from the vq handling thread
it's not.

>  > > ---
>  > >  drivers/vhost/net.c |    5 +++--
>  > >  1 files changed, 3 insertions(+), 2 deletions(-)
>  > > 
>  > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>  > > index 14fc189..95e49de 100644
>  > > --- a/drivers/vhost/net.c
>  > > +++ b/drivers/vhost/net.c
>  > > @@ -411,7 +411,7 @@ static void handle_rx_mergeable(struct vhost_net *net)
>  > >  	};
>  > >  
>  > >  	size_t total_len = 0;
>  > > -	int err, headcount;
>  > > +	int err, headcount, mergeable;
>  > >  	size_t vhost_hlen, sock_hlen;
>  > >  	size_t vhost_len, sock_len;
>  > >  	struct socket *sock = rcu_dereference(vq->private_data);
>  > > @@ -425,6 +425,7 @@ static void handle_rx_mergeable(struct vhost_net *net)
>  > >  
>  > >  	vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
>  > >  		vq->log : NULL;
>  > > +	mergeable = vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF);
>  > >  
>  > >  	while ((sock_len = peek_head_len(sock->sk))) {
>  > >  		sock_len += sock_hlen;
>  > > @@ -474,7 +475,7 @@ static void handle_rx_mergeable(struct vhost_net *net)
>  > >  			break;
>  > >  		}
>  > >  		/* TODO: Should check and handle checksum. */
>  > > -		if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF) &&
>  > > +		if (likely(mergeable) &&
>  > >  		    memcpy_toiovecend(vq->hdr, (unsigned char *)&headcount,
>  > >  				      offsetof(typeof(hdr), num_buffers),
>  > >  				      sizeof hdr.num_buffers)) {

^ permalink raw reply

* Re: [PATCH 2/3] vhost-net: Unify the code of mergeable and big buffer handling
From: Michael S. Tsirkin @ 2011-01-18  4:37 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, netdev, kvm, linux-kernel
In-Reply-To: <19765.893.776528.869640@gargle.gargle.HOWL>

On Tue, Jan 18, 2011 at 11:05:33AM +0800, Jason Wang wrote:
> Michael S. Tsirkin writes:
>  > On Mon, Jan 17, 2011 at 04:11:08PM +0800, Jason Wang wrote:
>  > > Codes duplication were found between the handling of mergeable and big
>  > > buffers, so this patch tries to unify them. This could be easily done
>  > > by adding a quota to the get_rx_bufs() which is used to limit the
>  > > number of buffers it returns (for mergeable buffer, the quota is
>  > > simply UIO_MAXIOV, for big buffers, the quota is just 1), and then the
>  > > previous handle_rx_mergeable() could be resued also for big buffers.
>  > > 
>  > > Signed-off-by: Jason Wang <jasowang@redhat.com>
>  > 
>  > We actually started this way. However the code turned out
>  > to have measureable overhead when handle_rx_mergeable
>  > is called with quota 1.
>  > So before applying this I'd like to see some data
>  > to show this is not the case anymore.
>  > 
> 
> I've run a round of test (Host->Guest) for these three patches on my desktop:

Yes but what if you only apply patch 3?

> Without these patches
> 
> mergeable buffers:
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.66.91.42 (10.66.91.42) port 0 AF_INET
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
>  87380  16384     64    60.00       575.87   69.20    26.36    39.375  7.499  
>  87380  16384    256    60.01      1123.57   73.16    34.73    21.335  5.064  
>  87380  16384    512    60.00      1351.12   75.26    35.80    18.251  4.341  
>  87380  16384   1024    60.00      1955.31   74.73    37.67    12.523  3.156  
>  87380  16384   2048    60.01      3411.92   74.82    39.49    7.186   1.896  
> 
> bug buffers:
> Netperf test results
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.66.91.109 (10.66.91.109) port 0 AF_INET
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
>  87380  16384     64    60.00       567.10   72.06    26.13    41.638  7.550  
>  87380  16384    256    60.00      1143.69   74.66    32.58    21.392  4.667  
>  87380  16384    512    60.00      1460.92   73.94    33.40    16.585  3.746  
>  87380  16384   1024    60.00      3454.85   77.49    33.89    7.349   1.607  
>  87380  16384   2048    60.00      3781.11   76.51    38.38    6.631   1.663  
> 
> With these patches:
> 
> mergeable buffers:
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.66.91.236 (10.66.91.236) port 0 AF_INET
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
>  87380  16384     64    60.00       657.53   71.27    26.42    35.517  6.583  
>  87380  16384    256    60.00      1217.73   74.34    34.67    20.004  4.665  
>  87380  16384    512    60.00      1575.25   75.27    37.12    15.658  3.861  
>  87380  16384   1024    60.00      2416.07   74.77    37.20    10.140  2.522  
>  87380  16384   2048    60.00      3702.29   77.31    36.29    6.842   1.606  
> 
> big buffers:
> Netperf test results
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.66.91.202 (10.66.91.202) port 0 AF_INET
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
>  87380  16384     64    60.00       647.67   71.86    27.26    36.356  6.895  
>  87380  16384    256    60.00      1265.82   76.19    36.54    19.724  4.729  
>  87380  16384    512    60.00      1796.64   76.06    39.48    13.872  3.601  
>  87380  16384   1024    60.00      4008.37   77.05    36.47    6.299   1.491  
>  87380  16384   2048    60.00      4468.56   75.18    41.79    5.513   1.532 
> 
> Looks like the unification does not hurt the performance, and with those patches
> we can get some improvement. BTW, the regression of mergeable buffer still
> exist.
> 
>  > > ---
>  > >  drivers/vhost/net.c |  128 +++------------------------------------------------
>  > >  1 files changed, 7 insertions(+), 121 deletions(-)
>  > > 
>  > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>  > > index 95e49de..c32a2e4 100644
>  > > --- a/drivers/vhost/net.c
>  > > +++ b/drivers/vhost/net.c
>  > > @@ -227,6 +227,7 @@ static int peek_head_len(struct sock *sk)
>  > >   * @iovcount	- returned count of io vectors we fill
>  > >   * @log		- vhost log
>  > >   * @log_num	- log offset
>  > > + * @quota       - headcount quota, 1 for big buffer
>  > >   *	returns number of buffer heads allocated, negative on error
>  > >   */
>  > >  static int get_rx_bufs(struct vhost_virtqueue *vq,
>  > > @@ -234,7 +235,8 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
>  > >  		       int datalen,
>  > >  		       unsigned *iovcount,
>  > >  		       struct vhost_log *log,
>  > > -		       unsigned *log_num)
>  > > +		       unsigned *log_num,
>  > > +		       unsigned int quota)
>  > >  {
>  > >  	unsigned int out, in;
>  > >  	int seg = 0;
>  > > @@ -242,7 +244,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
>  > >  	unsigned d;
>  > >  	int r, nlogs = 0;
>  > >  
>  > > -	while (datalen > 0) {
>  > > +	while (datalen > 0 && headcount < quota) {
>  > >  		if (unlikely(seg >= UIO_MAXIOV)) {
>  > >  			r = -ENOBUFS;
>  > >  			goto err;
>  > > @@ -282,116 +284,7 @@ err:
>  > >  
>  > >  /* Expects to be always run from workqueue - which acts as
>  > >   * read-size critical section for our kind of RCU. */
>  > > -static void handle_rx_big(struct vhost_net *net)
>  > > -{
>  > > -	struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
>  > > -	unsigned out, in, log, s;
>  > > -	int head;
>  > > -	struct vhost_log *vq_log;
>  > > -	struct msghdr msg = {
>  > > -		.msg_name = NULL,
>  > > -		.msg_namelen = 0,
>  > > -		.msg_control = NULL, /* FIXME: get and handle RX aux data. */
>  > > -		.msg_controllen = 0,
>  > > -		.msg_iov = vq->iov,
>  > > -		.msg_flags = MSG_DONTWAIT,
>  > > -	};
>  > > -
>  > > -	struct virtio_net_hdr hdr = {
>  > > -		.flags = 0,
>  > > -		.gso_type = VIRTIO_NET_HDR_GSO_NONE
>  > > -	};
>  > > -
>  > > -	size_t len, total_len = 0;
>  > > -	int err;
>  > > -	size_t hdr_size;
>  > > -	struct socket *sock = rcu_dereference(vq->private_data);
>  > > -	if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
>  > > -		return;
>  > > -
>  > > -	mutex_lock(&vq->mutex);
>  > > -	vhost_disable_notify(vq);
>  > > -	hdr_size = vq->vhost_hlen;
>  > > -
>  > > -	vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
>  > > -		vq->log : NULL;
>  > > -
>  > > -	for (;;) {
>  > > -		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
>  > > -					 ARRAY_SIZE(vq->iov),
>  > > -					 &out, &in,
>  > > -					 vq_log, &log);
>  > > -		/* On error, stop handling until the next kick. */
>  > > -		if (unlikely(head < 0))
>  > > -			break;
>  > > -		/* OK, now we need to know about added descriptors. */
>  > > -		if (head == vq->num) {
>  > > -			if (unlikely(vhost_enable_notify(vq))) {
>  > > -				/* They have slipped one in as we were
>  > > -				 * doing that: check again. */
>  > > -				vhost_disable_notify(vq);
>  > > -				continue;
>  > > -			}
>  > > -			/* Nothing new?  Wait for eventfd to tell us
>  > > -			 * they refilled. */
>  > > -			break;
>  > > -		}
>  > > -		/* We don't need to be notified again. */
>  > > -		if (out) {
>  > > -			vq_err(vq, "Unexpected descriptor format for RX: "
>  > > -			       "out %d, int %d\n",
>  > > -			       out, in);
>  > > -			break;
>  > > -		}
>  > > -		/* Skip header. TODO: support TSO/mergeable rx buffers. */
>  > > -		s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, in);
>  > > -		msg.msg_iovlen = in;
>  > > -		len = iov_length(vq->iov, in);
>  > > -		/* Sanity check */
>  > > -		if (!len) {
>  > > -			vq_err(vq, "Unexpected header len for RX: "
>  > > -			       "%zd expected %zd\n",
>  > > -			       iov_length(vq->hdr, s), hdr_size);
>  > > -			break;
>  > > -		}
>  > > -		err = sock->ops->recvmsg(NULL, sock, &msg,
>  > > -					 len, MSG_DONTWAIT | MSG_TRUNC);
>  > > -		/* TODO: Check specific error and bomb out unless EAGAIN? */
>  > > -		if (err < 0) {
>  > > -			vhost_discard_vq_desc(vq, 1);
>  > > -			break;
>  > > -		}
>  > > -		/* TODO: Should check and handle checksum. */
>  > > -		if (err > len) {
>  > > -			pr_debug("Discarded truncated rx packet: "
>  > > -				 " len %d > %zd\n", err, len);
>  > > -			vhost_discard_vq_desc(vq, 1);
>  > > -			continue;
>  > > -		}
>  > > -		len = err;
>  > > -		err = memcpy_toiovec(vq->hdr, (unsigned char *)&hdr, hdr_size);
>  > > -		if (err) {
>  > > -			vq_err(vq, "Unable to write vnet_hdr at addr %p: %d\n",
>  > > -			       vq->iov->iov_base, err);
>  > > -			break;
>  > > -		}
>  > > -		len += hdr_size;
>  > > -		vhost_add_used_and_signal(&net->dev, vq, head, len);
>  > > -		if (unlikely(vq_log))
>  > > -			vhost_log_write(vq, vq_log, log, len);
>  > > -		total_len += len;
>  > > -		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
>  > > -			vhost_poll_queue(&vq->poll);
>  > > -			break;
>  > > -		}
>  > > -	}
>  > > -
>  > > -	mutex_unlock(&vq->mutex);
>  > > -}
>  > > -
>  > > -/* Expects to be always run from workqueue - which acts as
>  > > - * read-size critical section for our kind of RCU. */
>  > > -static void handle_rx_mergeable(struct vhost_net *net)
>  > > +static void handle_rx(struct vhost_net *net)
>  > >  {
>  > >  	struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
>  > >  	unsigned uninitialized_var(in), log;
>  > > @@ -431,7 +324,8 @@ static void handle_rx_mergeable(struct vhost_net *net)
>  > >  		sock_len += sock_hlen;
>  > >  		vhost_len = sock_len + vhost_hlen;
>  > >  		headcount = get_rx_bufs(vq, vq->heads, vhost_len,
>  > > -					&in, vq_log, &log);
>  > > +					&in, vq_log, &log,
>  > > +					likely(mergeable) ? UIO_MAXIOV : 1);
>  > >  		/* On error, stop handling until the next kick. */
>  > >  		if (unlikely(headcount < 0))
>  > >  			break;
>  > > @@ -497,14 +391,6 @@ static void handle_rx_mergeable(struct vhost_net *net)
>  > >  	mutex_unlock(&vq->mutex);
>  > >  }
>  > >  
>  > > -static void handle_rx(struct vhost_net *net)
>  > > -{
>  > > -	if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF))
>  > > -		handle_rx_mergeable(net);
>  > > -	else
>  > > -		handle_rx_big(net);
>  > > -}
>  > > -
>  > >  static void handle_tx_kick(struct vhost_work *work)
>  > >  {
>  > >  	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,

^ permalink raw reply

* Re: [BUG] bnx2 + vlan + TSO : doesnt work
From: David Miller @ 2011-01-18  6:09 UTC (permalink / raw)
  To: jesse; +Cc: bhutchings, eric.dumazet, netdev
In-Reply-To: <AANLkTincFt4fKLvtity=MgQ2+cuvNFrBHqu3+bMLsypc@mail.gmail.com>

From: Jesse Gross <jesse@nicira.com>
Date: Mon, 17 Jan 2011 16:13:18 -0800

> I think it is better for netif_skb_features() to actually return the
> correct features rather than bypass it here.  NETIF_F_HW_VLAN_TX
> doesn't depend on any other offloads, so we can always include it if
> it is in dev->features.
> 
> Separately, this means there is a problem with bnx2 because it allows
> vlan insertion to be turned off, which would have the same effect.
> Maybe it is looking directly at skb->protocol or similar for TSO.

Please, someone cons up an acceptable fix fast.

Thanks.

^ permalink raw reply

* Re: [BUG] bnx2 + vlan + TSO : doesnt work
From: Eric Dumazet @ 2011-01-18  6:21 UTC (permalink / raw)
  To: David Miller; +Cc: jesse, bhutchings, netdev
In-Reply-To: <20110117.220927.189715926.davem@davemloft.net>

Le lundi 17 janvier 2011 à 22:09 -0800, David Miller a écrit :
> From: Jesse Gross <jesse@nicira.com>
> Date: Mon, 17 Jan 2011 16:13:18 -0800
> 
> > I think it is better for netif_skb_features() to actually return the
> > correct features rather than bypass it here.  NETIF_F_HW_VLAN_TX
> > doesn't depend on any other offloads, so we can always include it if
> > it is in dev->features.
> > 
> > Separately, this means there is a problem with bnx2 because it allows
> > vlan insertion to be turned off, which would have the same effect.
> > Maybe it is looking directly at skb->protocol or similar for TSO.
> 
> Please, someone cons up an acceptable fix fast.
> 

I just woke up, and honestly dont understand why only bnx2 is affected
by this problem of masking NETIF_F_HW_VLAN_TX

And I dont understand all this netif_skb_features() stuff : if we want
to actually test dev->features & NETIF_F_HW_VLAN_TX, and this flag
doesnt depend on other offloads, why are we doing features &
vlan_features.

Jesse, I dont understand why you say "bnx2 allows vlan insertion to be
turned off". Really.




^ permalink raw reply

* Re: [BUG] bnx2 + vlan + TSO : doesnt work
From: Jesse Gross @ 2011-01-18  6:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bhutchings, netdev
In-Reply-To: <1295331690.3362.522.camel@edumazet-laptop>

On Tue, Jan 18, 2011 at 1:21 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 17 janvier 2011 à 22:09 -0800, David Miller a écrit :
>> From: Jesse Gross <jesse@nicira.com>
>> Date: Mon, 17 Jan 2011 16:13:18 -0800
>>
>> > I think it is better for netif_skb_features() to actually return the
>> > correct features rather than bypass it here.  NETIF_F_HW_VLAN_TX
>> > doesn't depend on any other offloads, so we can always include it if
>> > it is in dev->features.
>> >
>> > Separately, this means there is a problem with bnx2 because it allows
>> > vlan insertion to be turned off, which would have the same effect.
>> > Maybe it is looking directly at skb->protocol or similar for TSO.
>>
>> Please, someone cons up an acceptable fix fast.
>>
>
> I just woke up, and honestly dont understand why only bnx2 is affected
> by this problem of masking NETIF_F_HW_VLAN_TX

If NETIF_F_HW_VLAN_TX is masked then the tag will be inserted in
software, which is generally OK.  The problem is that some drivers
assume that if they can do vlan tagging in hardware it will always be
used and therefore don't expect vlan tags when setting up TSO, etc.

>
> And I dont understand all this netif_skb_features() stuff : if we want
> to actually test dev->features & NETIF_F_HW_VLAN_TX, and this flag
> doesnt depend on other offloads, why are we doing features &
> vlan_features.

The idea is to put all of the logic in one place rather than having
pieces that are really interdependent scattered around in the
different offloads.  So we could test dev->features directly for vlans
but I would rather just have netif_skb_features() return the right
values to start off with.

>
> Jesse, I dont understand why you say "bnx2 allows vlan insertion to be
> turned off". Really.

You can disable it using Ethtool, which will turn off
NETIF_F_HW_VLAN_TX the same as this bug.

I'm running a quick test on a patch to always allow NETIF_F_HW_VLAN_TX
to be returned from netif_skb_features().

^ permalink raw reply

* Re: [BUG] bnx2 + vlan + TSO : doesnt work
From: Eric Dumazet @ 2011-01-18  6:38 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, bhutchings, netdev
In-Reply-To: <AANLkTi=zvsmnemO+9OggD01i==3mZnb_-09CjAcRS94w@mail.gmail.com>

Le mardi 18 janvier 2011 à 01:32 -0500, Jesse Gross a écrit :
> On Tue, Jan 18, 2011 at 1:21 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le lundi 17 janvier 2011 à 22:09 -0800, David Miller a écrit :
> >> From: Jesse Gross <jesse@nicira.com>
> >> Date: Mon, 17 Jan 2011 16:13:18 -0800
> >>
> >> > I think it is better for netif_skb_features() to actually return the
> >> > correct features rather than bypass it here.  NETIF_F_HW_VLAN_TX
> >> > doesn't depend on any other offloads, so we can always include it if
> >> > it is in dev->features.
> >> >
> >> > Separately, this means there is a problem with bnx2 because it allows
> >> > vlan insertion to be turned off, which would have the same effect.
> >> > Maybe it is looking directly at skb->protocol or similar for TSO.
> >>
> >> Please, someone cons up an acceptable fix fast.
> >>
> >
> > I just woke up, and honestly dont understand why only bnx2 is affected
> > by this problem of masking NETIF_F_HW_VLAN_TX
> 
> If NETIF_F_HW_VLAN_TX is masked then the tag will be inserted in
> software, which is generally OK.  The problem is that some drivers
> assume that if they can do vlan tagging in hardware it will always be
> used and therefore don't expect vlan tags when setting up TSO, etc.
> 

OK, but then this driver gave us the hint core network was actually
always doing software vlan, tagging ;)

> >
> > And I dont understand all this netif_skb_features() stuff : if we want
> > to actually test dev->features & NETIF_F_HW_VLAN_TX, and this flag
> > doesnt depend on other offloads, why are we doing features &
> > vlan_features.
> 
> The idea is to put all of the logic in one place rather than having
> pieces that are really interdependent scattered around in the
> different offloads.  So we could test dev->features directly for vlans
> but I would rather just have netif_skb_features() return the right
> values to start off with.
> 
> >
> > Jesse, I dont understand why you say "bnx2 allows vlan insertion to be
> > turned off". Really.
> 
> You can disable it using Ethtool, which will turn off
> NETIF_F_HW_VLAN_TX the same as this bug.
> 
> I'm running a quick test on a patch to always allow NETIF_F_HW_VLAN_TX
> to be returned from netif_skb_features().

Thanks




^ permalink raw reply

* [PATCH] net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
From: Jesse Gross @ 2011-01-18  6:46 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, netdev

In netif_skb_features() we return only the features that are valid for vlans
if we have a vlan packet.  However, we should not mask out NETIF_F_HW_VLAN_TX
since it enables transmission of vlan tags and is obviously valid.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 net/core/dev.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 83507c2..4c58d11 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2023,13 +2023,13 @@ int netif_skb_features(struct sk_buff *skb)
 		return harmonize_features(skb, protocol, features);
 	}
 
-	features &= skb->dev->vlan_features;
+	features &= (skb->dev->vlan_features | NETIF_F_HW_VLAN_TX);
 
 	if (protocol != htons(ETH_P_8021Q)) {
 		return harmonize_features(skb, protocol, features);
 	} else {
 		features &= NETIF_F_SG | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST |
-				NETIF_F_GEN_CSUM;
+				NETIF_F_GEN_CSUM | NETIF_F_HW_VLAN_TX;
 		return harmonize_features(skb, protocol, features);
 	}
 }
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH] net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
From: Eric Dumazet @ 2011-01-18  6:55 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev
In-Reply-To: <1295333160-1667-1-git-send-email-jesse@nicira.com>

Le lundi 17 janvier 2011 à 22:46 -0800, Jesse Gross a écrit :
> In netif_skb_features() we return only the features that are valid for vlans
> if we have a vlan packet.  However, we should not mask out NETIF_F_HW_VLAN_TX
> since it enables transmission of vlan tags and is obviously valid.
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Jesse Gross <jesse@nicira.com>

Thanks Jesse

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Now back to the "ethtool -K eth0 txvlan off" problem on bnx2

Is it a driver/software problem or hardware/firmware one ?




^ permalink raw reply

* Re: [PATCH] net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
From: Jesse Gross @ 2011-01-18  7:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Michael Chan
In-Reply-To: <1295333714.3362.561.camel@edumazet-laptop>

On Tue, Jan 18, 2011 at 1:55 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 17 janvier 2011 à 22:46 -0800, Jesse Gross a écrit :
>> In netif_skb_features() we return only the features that are valid for vlans
>> if we have a vlan packet.  However, we should not mask out NETIF_F_HW_VLAN_TX
>> since it enables transmission of vlan tags and is obviously valid.
>>
>> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
>> Signed-off-by: Jesse Gross <jesse@nicira.com>
>
> Thanks Jesse
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> Now back to the "ethtool -K eth0 txvlan off" problem on bnx2
>
> Is it a driver/software problem or hardware/firmware one ?

CC'ing Michael Chan

It looks like bnx2 is storing the offsets of various headers so the
hardware can find them for TSO.  The parsing logic doesn't do anything
for vlan tags, so the hardware gets confused if one is present in the
packet itself.

Quick fix is to simply disallow disabling TX vlan offload or disable
TSO at the same time or some other Ethtool game.  However, if the
hardware supports it then it would be nicer to fix up the TSO setup
logic.  Maybe we can just add the size of the vlan tag to the offset
but I am not certain.  Michael, do you know if this is possible?

^ permalink raw reply

* Re: [PATCH 2/3] vhost-net: Unify the code of mergeable and big buffer handling
From: Jason Wang @ 2011-01-18  7:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, virtualization, netdev, kvm, linux-kernel
In-Reply-To: <20110118043725.GB2233@redhat.com>

Michael S. Tsirkin writes:
 > On Tue, Jan 18, 2011 at 11:05:33AM +0800, Jason Wang wrote:
 > > Michael S. Tsirkin writes:
 > >  > On Mon, Jan 17, 2011 at 04:11:08PM +0800, Jason Wang wrote:
 > >  > > Codes duplication were found between the handling of mergeable and big
 > >  > > buffers, so this patch tries to unify them. This could be easily done
 > >  > > by adding a quota to the get_rx_bufs() which is used to limit the
 > >  > > number of buffers it returns (for mergeable buffer, the quota is
 > >  > > simply UIO_MAXIOV, for big buffers, the quota is just 1), and then the
 > >  > > previous handle_rx_mergeable() could be resued also for big buffers.
 > >  > > 
 > >  > > Signed-off-by: Jason Wang <jasowang@redhat.com>
 > >  > 
 > >  > We actually started this way. However the code turned out
 > >  > to have measureable overhead when handle_rx_mergeable
 > >  > is called with quota 1.
 > >  > So before applying this I'd like to see some data
 > >  > to show this is not the case anymore.
 > >  > 
 > > 
 > > I've run a round of test (Host->Guest) for these three patches on my desktop:
 > 
 > Yes but what if you only apply patch 3?
 > 

Result here, slightly better than without it but worse than applying all the three
patches except for big buffer size at 2048 but the differnece could be neglected.

mergeable buffers:
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.66.91.161 (10.66.91.161) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384     64    60.00       584.91   69.41    26.10    38.882  7.310  
 87380  16384    256    60.00      1194.05   72.81    34.82    19.980  4.778  
 87380  16384    512    60.00      1503.93   74.23    36.86    16.174  4.016  
 87380  16384   1024    60.00      2004.57   73.53    37.59    12.019  3.073  
 87380  16384   2048    60.00      3445.96   73.76    38.88    7.014   1.849  

big buffers:
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.66.91.129 (10.66.91.129) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384     64    60.00       646.95   72.25    27.05    36.595  6.850  
 87380  16384    256    60.00      1258.61   74.88    33.01    19.495  4.297  
 87380  16384    512    60.00      1655.95   74.14    33.96    14.671  3.360  
 87380  16384   1024    60.00      3220.32   74.24    33.31    7.555   1.695  
 87380  16384   2048    60.00      4516.40   73.88    42.10    5.360   1.527  

 > > Without these patches
 > > 
 > > mergeable buffers:
 > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.66.91.42 (10.66.91.42) port 0 AF_INET
 > > Recv   Send    Send                          Utilization       Service Demand
 > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
 > > Size   Size    Size     Time     Throughput  local    remote   local   remote
 > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
 > > 
 > >  87380  16384     64    60.00       575.87   69.20    26.36    39.375  7.499  
 > >  87380  16384    256    60.01      1123.57   73.16    34.73    21.335  5.064  
 > >  87380  16384    512    60.00      1351.12   75.26    35.80    18.251  4.341  
 > >  87380  16384   1024    60.00      1955.31   74.73    37.67    12.523  3.156  
 > >  87380  16384   2048    60.01      3411.92   74.82    39.49    7.186   1.896  
 > > 
 > > bug buffers:
 > > Netperf test results
 > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.66.91.109 (10.66.91.109) port 0 AF_INET
 > > Recv   Send    Send                          Utilization       Service Demand
 > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
 > > Size   Size    Size     Time     Throughput  local    remote   local   remote
 > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
 > > 
 > >  87380  16384     64    60.00       567.10   72.06    26.13    41.638  7.550  
 > >  87380  16384    256    60.00      1143.69   74.66    32.58    21.392  4.667  
 > >  87380  16384    512    60.00      1460.92   73.94    33.40    16.585  3.746  
 > >  87380  16384   1024    60.00      3454.85   77.49    33.89    7.349   1.607  
 > >  87380  16384   2048    60.00      3781.11   76.51    38.38    6.631   1.663  
 > > 
 > > With these patches:
 > > 
 > > mergeable buffers:
 > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.66.91.236 (10.66.91.236) port 0 AF_INET
 > > Recv   Send    Send                          Utilization       Service Demand
 > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
 > > Size   Size    Size     Time     Throughput  local    remote   local   remote
 > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
 > > 
 > >  87380  16384     64    60.00       657.53   71.27    26.42    35.517  6.583  
 > >  87380  16384    256    60.00      1217.73   74.34    34.67    20.004  4.665  
 > >  87380  16384    512    60.00      1575.25   75.27    37.12    15.658  3.861  
 > >  87380  16384   1024    60.00      2416.07   74.77    37.20    10.140  2.522  
 > >  87380  16384   2048    60.00      3702.29   77.31    36.29    6.842   1.606  
 > > 
 > > big buffers:
 > > Netperf test results
 > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.66.91.202 (10.66.91.202) port 0 AF_INET
 > > Recv   Send    Send                          Utilization       Service Demand
 > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
 > > Size   Size    Size     Time     Throughput  local    remote   local   remote
 > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
 > > 
 > >  87380  16384     64    60.00       647.67   71.86    27.26    36.356  6.895  
 > >  87380  16384    256    60.00      1265.82   76.19    36.54    19.724  4.729  
 > >  87380  16384    512    60.00      1796.64   76.06    39.48    13.872  3.601  
 > >  87380  16384   1024    60.00      4008.37   77.05    36.47    6.299   1.491  
 > >  87380  16384   2048    60.00      4468.56   75.18    41.79    5.513   1.532 
 > > 
 > > Looks like the unification does not hurt the performance, and with those patches
 > > we can get some improvement. BTW, the regression of mergeable buffer still
 > > exist.
 > > 
 > >  > > ---
 > >  > >  drivers/vhost/net.c |  128 +++------------------------------------------------
 > >  > >  1 files changed, 7 insertions(+), 121 deletions(-)
 > >  > > 
 > >  > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 > >  > > index 95e49de..c32a2e4 100644
 > >  > > --- a/drivers/vhost/net.c
 > >  > > +++ b/drivers/vhost/net.c
 > >  > > @@ -227,6 +227,7 @@ static int peek_head_len(struct sock *sk)
 > >  > >   * @iovcount	- returned count of io vectors we fill
 > >  > >   * @log		- vhost log
 > >  > >   * @log_num	- log offset
 > >  > > + * @quota       - headcount quota, 1 for big buffer
 > >  > >   *	returns number of buffer heads allocated, negative on error
 > >  > >   */
 > >  > >  static int get_rx_bufs(struct vhost_virtqueue *vq,
 > >  > > @@ -234,7 +235,8 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 > >  > >  		       int datalen,
 > >  > >  		       unsigned *iovcount,
 > >  > >  		       struct vhost_log *log,
 > >  > > -		       unsigned *log_num)
 > >  > > +		       unsigned *log_num,
 > >  > > +		       unsigned int quota)
 > >  > >  {
 > >  > >  	unsigned int out, in;
 > >  > >  	int seg = 0;
 > >  > > @@ -242,7 +244,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 > >  > >  	unsigned d;
 > >  > >  	int r, nlogs = 0;
 > >  > >  
 > >  > > -	while (datalen > 0) {
 > >  > > +	while (datalen > 0 && headcount < quota) {
 > >  > >  		if (unlikely(seg >= UIO_MAXIOV)) {
 > >  > >  			r = -ENOBUFS;
 > >  > >  			goto err;
 > >  > > @@ -282,116 +284,7 @@ err:
 > >  > >  
 > >  > >  /* Expects to be always run from workqueue - which acts as
 > >  > >   * read-size critical section for our kind of RCU. */
 > >  > > -static void handle_rx_big(struct vhost_net *net)
 > >  > > -{
 > >  > > -	struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
 > >  > > -	unsigned out, in, log, s;
 > >  > > -	int head;
 > >  > > -	struct vhost_log *vq_log;
 > >  > > -	struct msghdr msg = {
 > >  > > -		.msg_name = NULL,
 > >  > > -		.msg_namelen = 0,
 > >  > > -		.msg_control = NULL, /* FIXME: get and handle RX aux data. */
 > >  > > -		.msg_controllen = 0,
 > >  > > -		.msg_iov = vq->iov,
 > >  > > -		.msg_flags = MSG_DONTWAIT,
 > >  > > -	};
 > >  > > -
 > >  > > -	struct virtio_net_hdr hdr = {
 > >  > > -		.flags = 0,
 > >  > > -		.gso_type = VIRTIO_NET_HDR_GSO_NONE
 > >  > > -	};
 > >  > > -
 > >  > > -	size_t len, total_len = 0;
 > >  > > -	int err;
 > >  > > -	size_t hdr_size;
 > >  > > -	struct socket *sock = rcu_dereference(vq->private_data);
 > >  > > -	if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
 > >  > > -		return;
 > >  > > -
 > >  > > -	mutex_lock(&vq->mutex);
 > >  > > -	vhost_disable_notify(vq);
 > >  > > -	hdr_size = vq->vhost_hlen;
 > >  > > -
 > >  > > -	vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
 > >  > > -		vq->log : NULL;
 > >  > > -
 > >  > > -	for (;;) {
 > >  > > -		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
 > >  > > -					 ARRAY_SIZE(vq->iov),
 > >  > > -					 &out, &in,
 > >  > > -					 vq_log, &log);
 > >  > > -		/* On error, stop handling until the next kick. */
 > >  > > -		if (unlikely(head < 0))
 > >  > > -			break;
 > >  > > -		/* OK, now we need to know about added descriptors. */
 > >  > > -		if (head == vq->num) {
 > >  > > -			if (unlikely(vhost_enable_notify(vq))) {
 > >  > > -				/* They have slipped one in as we were
 > >  > > -				 * doing that: check again. */
 > >  > > -				vhost_disable_notify(vq);
 > >  > > -				continue;
 > >  > > -			}
 > >  > > -			/* Nothing new?  Wait for eventfd to tell us
 > >  > > -			 * they refilled. */
 > >  > > -			break;
 > >  > > -		}
 > >  > > -		/* We don't need to be notified again. */
 > >  > > -		if (out) {
 > >  > > -			vq_err(vq, "Unexpected descriptor format for RX: "
 > >  > > -			       "out %d, int %d\n",
 > >  > > -			       out, in);
 > >  > > -			break;
 > >  > > -		}
 > >  > > -		/* Skip header. TODO: support TSO/mergeable rx buffers. */
 > >  > > -		s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, in);
 > >  > > -		msg.msg_iovlen = in;
 > >  > > -		len = iov_length(vq->iov, in);
 > >  > > -		/* Sanity check */
 > >  > > -		if (!len) {
 > >  > > -			vq_err(vq, "Unexpected header len for RX: "
 > >  > > -			       "%zd expected %zd\n",
 > >  > > -			       iov_length(vq->hdr, s), hdr_size);
 > >  > > -			break;
 > >  > > -		}
 > >  > > -		err = sock->ops->recvmsg(NULL, sock, &msg,
 > >  > > -					 len, MSG_DONTWAIT | MSG_TRUNC);
 > >  > > -		/* TODO: Check specific error and bomb out unless EAGAIN? */
 > >  > > -		if (err < 0) {
 > >  > > -			vhost_discard_vq_desc(vq, 1);
 > >  > > -			break;
 > >  > > -		}
 > >  > > -		/* TODO: Should check and handle checksum. */
 > >  > > -		if (err > len) {
 > >  > > -			pr_debug("Discarded truncated rx packet: "
 > >  > > -				 " len %d > %zd\n", err, len);
 > >  > > -			vhost_discard_vq_desc(vq, 1);
 > >  > > -			continue;
 > >  > > -		}
 > >  > > -		len = err;
 > >  > > -		err = memcpy_toiovec(vq->hdr, (unsigned char *)&hdr, hdr_size);
 > >  > > -		if (err) {
 > >  > > -			vq_err(vq, "Unable to write vnet_hdr at addr %p: %d\n",
 > >  > > -			       vq->iov->iov_base, err);
 > >  > > -			break;
 > >  > > -		}
 > >  > > -		len += hdr_size;
 > >  > > -		vhost_add_used_and_signal(&net->dev, vq, head, len);
 > >  > > -		if (unlikely(vq_log))
 > >  > > -			vhost_log_write(vq, vq_log, log, len);
 > >  > > -		total_len += len;
 > >  > > -		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
 > >  > > -			vhost_poll_queue(&vq->poll);
 > >  > > -			break;
 > >  > > -		}
 > >  > > -	}
 > >  > > -
 > >  > > -	mutex_unlock(&vq->mutex);
 > >  > > -}
 > >  > > -
 > >  > > -/* Expects to be always run from workqueue - which acts as
 > >  > > - * read-size critical section for our kind of RCU. */
 > >  > > -static void handle_rx_mergeable(struct vhost_net *net)
 > >  > > +static void handle_rx(struct vhost_net *net)
 > >  > >  {
 > >  > >  	struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
 > >  > >  	unsigned uninitialized_var(in), log;
 > >  > > @@ -431,7 +324,8 @@ static void handle_rx_mergeable(struct vhost_net *net)
 > >  > >  		sock_len += sock_hlen;
 > >  > >  		vhost_len = sock_len + vhost_hlen;
 > >  > >  		headcount = get_rx_bufs(vq, vq->heads, vhost_len,
 > >  > > -					&in, vq_log, &log);
 > >  > > +					&in, vq_log, &log,
 > >  > > +					likely(mergeable) ? UIO_MAXIOV : 1);
 > >  > >  		/* On error, stop handling until the next kick. */
 > >  > >  		if (unlikely(headcount < 0))
 > >  > >  			break;
 > >  > > @@ -497,14 +391,6 @@ static void handle_rx_mergeable(struct vhost_net *net)
 > >  > >  	mutex_unlock(&vq->mutex);
 > >  > >  }
 > >  > >  
 > >  > > -static void handle_rx(struct vhost_net *net)
 > >  > > -{
 > >  > > -	if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF))
 > >  > > -		handle_rx_mergeable(net);
 > >  > > -	else
 > >  > > -		handle_rx_big(net);
 > >  > > -}
 > >  > > -
 > >  > >  static void handle_tx_kick(struct vhost_work *work)
 > >  > >  {
 > >  > >  	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,

^ permalink raw reply

* Re: rps testing questions
From: mi wake @ 2011-01-18  8:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1295257994.3335.6.camel@edumazet-laptop>

2011/1/17 Eric Dumazet <eric.dumazet@gmail.com>:
> Le lundi 17 janvier 2011 à 17:43 +0800, mi wake a écrit :
>> I do a rps(Receive Packet Steering) testing on centos 5.5 with  kernel 2.6.37.
>> cpu: 8 core Intel.
>> ethernet adapter: bnx2x
>>
>> Problem statement:
>> enable rps with:
>> echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus.
>>
>
> bnx2x with one queue only ?
>
>> running 1 instances of netperf TCP_RR: netperf  -t TCP_RR -H 192.168.0.1 -c -C
>> without rps: 9963.48(Trans Rate per sec)
>> with rps:  9387.59(Trans Rate per sec)
>>
>> I do ab and tbench testing also find there is less tps with enable
>> rps.but,there is more cpu using when with enable rps.when with enable
>> rps ,softirqs is blanced  on cpus.
>
> Really ? that seems unlikely with your one flow test, unless you _also_
> have hardware IRQS hitting all your cpus. (That would be very bad)
>
>>
>> is there something wrong with my test?
>> --
>
> If you test with one flow, RPS brings nothing at all. Better handle the
> packet directly from the cpu handling the hardware IRQ (and NAPI)
>
> You better make sure hardware IRQ are on one cpu, instead of many cpus.
>
>
 I have checked that bnx2x with one queue only and hardware IRQ are on one cpu.
 I do testing again with more flows.when I use ip range from 192.x.x.1
to 192.x.x.200 to send syn packets.
Server can deal with :
 without rps + rfs : 18M/s
 with     rps +rfs : 21M/s.
Maybe  in previous tests,there are less flow. I will continue testing.
Thank you!

^ permalink raw reply

* [PATCH v5 1/4] bt hidp: Move hid_add_device() call to after hidp_session() has started.
From: Alan Ott @ 2011-01-18  8:04 UTC (permalink / raw)
  To: Jiri Kosina, Marcel Holtmann, Gustavo F. Padovan, David S. Miller,
	Alan Ott
  Cc: Alan Ott
In-Reply-To: <1295337880-12452-1-git-send-email-alan@signal11.us>

Move the call to hid_add_device() (which calls a device's probe() function)
to after the kernel_thread() call which starts the hidp_session() thread.
This ensures the Bluetooth receive socket is fully running by the time a
device's probe() function is called. This way, a device can communicate
(send and receive) with the Bluetooth device from its probe() function.

Signed-off-by: Alan Ott <alan@signal11.us>
---
 net/bluetooth/hidp/core.c |   28 ++++++++++++++++++++--------
 net/bluetooth/hidp/hidp.h |    3 +++
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 29544c2..67cc4bc 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -563,6 +563,8 @@ static int hidp_session(void *arg)
 	init_waitqueue_entry(&intr_wait, current);
 	add_wait_queue(sk_sleep(ctrl_sk), &ctrl_wait);
 	add_wait_queue(sk_sleep(intr_sk), &intr_wait);
+	session->waiting_for_startup = 0;
+	wake_up_interruptible(&session->startup_queue);
 	while (!atomic_read(&session->terminate)) {
 		set_current_state(TASK_INTERRUPTIBLE);
 
@@ -754,6 +756,8 @@ static struct hid_ll_driver hidp_hid_driver = {
 	.hidinput_input_event = hidp_hidinput_event,
 };
 
+/* This function sets up the hid device. It does not add it
+   to the HID system. That is done in hidp_add_connection(). */
 static int hidp_setup_hid(struct hidp_session *session,
 				struct hidp_connadd_req *req)
 {
@@ -795,16 +799,8 @@ static int hidp_setup_hid(struct hidp_session *session,
 
 	hid->hid_output_raw_report = hidp_output_raw_report;
 
-	err = hid_add_device(hid);
-	if (err < 0)
-		goto failed;
-
 	return 0;
 
-failed:
-	hid_destroy_device(hid);
-	session->hid = NULL;
-
 fault:
 	kfree(session->rd_data);
 	session->rd_data = NULL;
@@ -853,6 +849,8 @@ int hidp_add_connection(struct hidp_connadd_req *req, struct socket *ctrl_sock,
 	skb_queue_head_init(&session->ctrl_transmit);
 	skb_queue_head_init(&session->intr_transmit);
 
+	init_waitqueue_head(&session->startup_queue);
+	session->waiting_for_startup = 1;
 	session->flags   = req->flags & (1 << HIDP_BLUETOOTH_VENDOR_ID);
 	session->idle_to = req->idle_to;
 
@@ -875,6 +873,14 @@ int hidp_add_connection(struct hidp_connadd_req *req, struct socket *ctrl_sock,
 	err = kernel_thread(hidp_session, session, CLONE_KERNEL);
 	if (err < 0)
 		goto unlink;
+	while (session->waiting_for_startup) {
+		wait_event_interruptible(session->startup_queue,
+			!session->waiting_for_startup);
+	}
+
+	err = hid_add_device(session->hid);
+	if (err < 0)
+		goto err_add_device;
 
 	if (session->input) {
 		hidp_send_ctrl_message(session,
@@ -888,6 +894,12 @@ int hidp_add_connection(struct hidp_connadd_req *req, struct socket *ctrl_sock,
 	up_write(&hidp_session_sem);
 	return 0;
 
+err_add_device:
+	hid_destroy_device(session->hid);
+	session->hid = NULL;
+	atomic_inc(&session->terminate);
+	hidp_schedule(session);
+
 unlink:
 	hidp_del_timer(session);
 
diff --git a/net/bluetooth/hidp/hidp.h b/net/bluetooth/hidp/hidp.h
index 8d934a1..2cc35dc 100644
--- a/net/bluetooth/hidp/hidp.h
+++ b/net/bluetooth/hidp/hidp.h
@@ -157,6 +157,9 @@ struct hidp_session {
 	/* Report descriptor */
 	__u8 *rd_data;
 	uint rd_size;
+
+	wait_queue_head_t startup_queue;
+	int waiting_for_startup;
 };
 
 static inline void hidp_schedule(struct hidp_session *session)
-- 
1.7.0.4



^ permalink raw reply related

* [PATCH v5 0/4] Adding HID Feature Report Support to hidraw
From: Alan Ott @ 2011-01-18  8:04 UTC (permalink / raw)
  To: Jiri Kosina, Marcel Holtmann, Gustavo F. Padovan, David S. Miller,
	Alan Ott
  Cc: Alan Ott

This patch adds Feature Report support for USB and Bluetooth HID devices
through hidraw.

The first two patches prepare the bluetooth side for the change.
	a. Make sure the hidp_session() thread is started before
	   device's probe() functions are called.
	b. Wait for ACK/NAK on sent reports, and return proper
	   error codes.
The third patch is the hidraw core and USB changes.
The fourth patch is the Bluetooth changes.

Thanks to Antonio Ospite and Bill Good for providing testing and feedback.


Alan Ott (4):
  bt hidp: Move hid_add_device() call to after hidp_session() has
    started.
  bt hidp: Wait for ACK on Sent Reports
  HID: Add Support for Setting and Getting Feature Reports from hidraw
  Bluetooth hidp: Add support for hidraw HIDIOCGFEATURE and
    HIDIOCSFEATURE

 drivers/hid/hidraw.c          |  106 +++++++++++++++++++-
 drivers/hid/usbhid/hid-core.c |   35 +++++++
 include/linux/hid.h           |    3 +
 include/linux/hidraw.h        |    3 +
 net/bluetooth/hidp/core.c     |  214 ++++++++++++++++++++++++++++++++++++++---
 net/bluetooth/hidp/hidp.h     |   15 +++
 6 files changed, 355 insertions(+), 21 deletions(-)



^ permalink raw reply

* [PATCH v5 4/4] bt hidp: Add support for hidraw HIDIOCGFEATURE and HIDIOCSFEATURE
From: Alan Ott @ 2011-01-18  8:04 UTC (permalink / raw)
  To: Jiri Kosina, Marcel Holtmann, Gustavo F. Padovan, David S. Miller,
	Alan Ott
  Cc: Alan Ott
In-Reply-To: <1295337880-12452-1-git-send-email-alan@signal11.us>

This patch adds support or getting and setting feature reports for bluetooth
HID devices from HIDRAW.

Signed-off-by: Alan Ott <alan@signal11.us>
---
 net/bluetooth/hidp/core.c |  113 +++++++++++++++++++++++++++++++++++++++++++--
 net/bluetooth/hidp/hidp.h |    8 +++
 2 files changed, 117 insertions(+), 4 deletions(-)

diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 5383e6c..6df8ea1 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -36,6 +36,7 @@
 #include <linux/file.h>
 #include <linux/init.h>
 #include <linux/wait.h>
+#include <linux/mutex.h>
 #include <net/sock.h>
 
 #include <linux/input.h>
@@ -313,6 +314,86 @@ static int hidp_send_report(struct hidp_session *session, struct hid_report *rep
 	return hidp_queue_report(session, buf, rsize);
 }
 
+static int hidp_get_raw_report(struct hid_device *hid,
+		unsigned char report_number,
+		unsigned char *data, size_t count,
+		unsigned char report_type)
+{
+	struct hidp_session *session = hid->driver_data;
+	struct sk_buff *skb;
+	size_t len;
+	int numbered_reports = hid->report_enum[report_type].numbered;
+
+	switch (report_type) {
+	case HID_FEATURE_REPORT:
+		report_type = HIDP_TRANS_GET_REPORT | HIDP_DATA_RTYPE_FEATURE;
+		break;
+	case HID_INPUT_REPORT:
+		report_type = HIDP_TRANS_GET_REPORT | HIDP_DATA_RTYPE_INPUT;
+		break;
+	case HID_OUTPUT_REPORT:
+		report_type = HIDP_TRANS_GET_REPORT | HIDP_DATA_RTYPE_OUPUT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (mutex_lock_interruptible(&session->report_mutex))
+		return -ERESTARTSYS;
+
+	/* Set up our wait, and send the report request to the device. */
+	session->waiting_report_type = report_type & HIDP_DATA_RTYPE_MASK;
+	session->waiting_report_number = numbered_reports ? report_number : -1;
+	set_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+	data[0] = report_number;
+	if (hidp_send_ctrl_message(hid->driver_data, report_type, data, 1))
+		goto err_eio;
+
+	/* Wait for the return of the report. The returned report
+	   gets put in session->report_return.  */
+	while (test_bit(HIDP_WAITING_FOR_RETURN, &session->flags)) {
+		int res;
+
+		res = wait_event_interruptible_timeout(session->report_queue,
+			!test_bit(HIDP_WAITING_FOR_RETURN, &session->flags),
+			5*HZ);
+		if (res == 0) {
+			/* timeout */
+			goto err_eio;
+		}
+		if (res < 0) {
+			/* signal */
+			goto err_restartsys;
+		}
+	}
+
+	skb = session->report_return;
+	if (skb) {
+		len = skb->len < count ? skb->len : count;
+		memcpy(data, skb->data, len);
+
+		kfree_skb(skb);
+		session->report_return = NULL;
+	} else {
+		/* Device returned a HANDSHAKE, indicating  protocol error. */
+		len = -EIO;
+	}
+
+	clear_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+	mutex_unlock(&session->report_mutex);
+
+	return len;
+
+err_restartsys:
+	clear_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+	mutex_unlock(&session->report_mutex);
+	return -ERESTARTSYS;
+err_eio:
+	clear_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+	mutex_unlock(&session->report_mutex);
+	return -EIO;
+}
+
 static int hidp_output_raw_report(struct hid_device *hid, unsigned char *data, size_t count,
 		unsigned char report_type)
 {
@@ -409,6 +490,10 @@ static void hidp_process_handshake(struct hidp_session *session,
 	case HIDP_HSHK_ERR_INVALID_REPORT_ID:
 	case HIDP_HSHK_ERR_UNSUPPORTED_REQUEST:
 	case HIDP_HSHK_ERR_INVALID_PARAMETER:
+		if (test_bit(HIDP_WAITING_FOR_RETURN, &session->flags)) {
+			clear_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+			wake_up_interruptible(&session->report_queue);
+		}
 		/* FIXME: Call into SET_ GET_ handlers here */
 		break;
 
@@ -451,9 +536,11 @@ static void hidp_process_hid_control(struct hidp_session *session,
 	}
 }
 
-static void hidp_process_data(struct hidp_session *session, struct sk_buff *skb,
+/* Returns true if the passed-in skb should be freed by the caller. */
+static int hidp_process_data(struct hidp_session *session, struct sk_buff *skb,
 				unsigned char param)
 {
+	int done_with_skb = 1;
 	BT_DBG("session %p skb %p len %d param 0x%02x", session, skb, skb->len, param);
 
 	switch (param) {
@@ -465,7 +552,6 @@ static void hidp_process_data(struct hidp_session *session, struct sk_buff *skb,
 
 		if (session->hid)
 			hid_input_report(session->hid, HID_INPUT_REPORT, skb->data, skb->len, 0);
-
 		break;
 
 	case HIDP_DATA_RTYPE_OTHER:
@@ -477,12 +563,27 @@ static void hidp_process_data(struct hidp_session *session, struct sk_buff *skb,
 		__hidp_send_ctrl_message(session,
 			HIDP_TRANS_HANDSHAKE | HIDP_HSHK_ERR_INVALID_PARAMETER, NULL, 0);
 	}
+
+	if (test_bit(HIDP_WAITING_FOR_RETURN, &session->flags) &&
+				param == session->waiting_report_type) {
+		if (session->waiting_report_number < 0 ||
+		    session->waiting_report_number == skb->data[0]) {
+			/* hidp_get_raw_report() is waiting on this report. */
+			session->report_return = skb;
+			done_with_skb = 0;
+			clear_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+			wake_up_interruptible(&session->report_queue);
+		}
+	}
+
+	return done_with_skb;
 }
 
 static void hidp_recv_ctrl_frame(struct hidp_session *session,
 					struct sk_buff *skb)
 {
 	unsigned char hdr, type, param;
+	int free_skb = 1;
 
 	BT_DBG("session %p skb %p len %d", session, skb, skb->len);
 
@@ -502,7 +603,7 @@ static void hidp_recv_ctrl_frame(struct hidp_session *session,
 		break;
 
 	case HIDP_TRANS_DATA:
-		hidp_process_data(session, skb, param);
+		free_skb = hidp_process_data(session, skb, param);
 		break;
 
 	default:
@@ -511,7 +612,8 @@ static void hidp_recv_ctrl_frame(struct hidp_session *session,
 		break;
 	}
 
-	kfree_skb(skb);
+	if (free_skb)
+		kfree_skb(skb);
 }
 
 static void hidp_recv_intr_frame(struct hidp_session *session,
@@ -845,6 +947,7 @@ static int hidp_setup_hid(struct hidp_session *session,
 	hid->dev.parent = hidp_get_device(session);
 	hid->ll_driver = &hidp_hid_driver;
 
+	hid->hid_get_raw_report = hidp_get_raw_report;
 	hid->hid_output_raw_report = hidp_output_raw_report;
 
 	return 0;
@@ -897,6 +1000,8 @@ int hidp_add_connection(struct hidp_connadd_req *req, struct socket *ctrl_sock,
 	skb_queue_head_init(&session->ctrl_transmit);
 	skb_queue_head_init(&session->intr_transmit);
 
+	mutex_init(&session->report_mutex);
+	init_waitqueue_head(&session->report_queue);
 	init_waitqueue_head(&session->startup_queue);
 	session->waiting_for_startup = 1;
 	session->flags   = req->flags & (1 << HIDP_BLUETOOTH_VENDOR_ID);
diff --git a/net/bluetooth/hidp/hidp.h b/net/bluetooth/hidp/hidp.h
index 92e093e..13de5fa 100644
--- a/net/bluetooth/hidp/hidp.h
+++ b/net/bluetooth/hidp/hidp.h
@@ -80,6 +80,7 @@
 #define HIDP_VIRTUAL_CABLE_UNPLUG	0
 #define HIDP_BOOT_PROTOCOL_MODE		1
 #define HIDP_BLUETOOTH_VENDOR_ID	9
+#define	HIDP_WAITING_FOR_RETURN		10
 #define HIDP_WAITING_FOR_SEND_ACK	11
 
 struct hidp_connadd_req {
@@ -155,6 +156,13 @@ struct hidp_session {
 	struct sk_buff_head ctrl_transmit;
 	struct sk_buff_head intr_transmit;
 
+	/* Used in hidp_get_raw_report() */
+	int waiting_report_type; /* HIDP_DATA_RTYPE_* */
+	int waiting_report_number; /* -1 for not numbered */
+	struct mutex report_mutex;
+	struct sk_buff *report_return;
+	wait_queue_head_t report_queue;
+
 	/* Used in hidp_output_raw_report() */
 	int output_report_success; /* boolean */
 
-- 
1.7.0.4



^ permalink raw reply related

* [PATCH v5 3/4] hid: Add Support for Setting and Getting Feature Reports from hidraw
From: Alan Ott @ 2011-01-18  8:04 UTC (permalink / raw)
  To: Jiri Kosina, Marcel Holtmann, Gustavo F. Padovan, David S. Miller,
	Alan Ott
  Cc: Alan Ott, Antonio Ospite
In-Reply-To: <1295337880-12452-1-git-send-email-alan@signal11.us>

Per the HID Specification, Feature reports must be sent and received on
the Configuration endpoint (EP 0) through the Set_Report/Get_Report
interfaces.  This patch adds two ioctls to hidraw to set and get feature
reports to and from the device.  Modifications were made to hidraw and
usbhid.

New hidraw ioctls:
  HIDIOCSFEATURE - Perform a Set_Report transfer of a Feature report.
  HIDIOCGFEATURE - Perform a Get_Report transfer of a Feature report.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: Antonio Ospite <ospite@studenti.unina.it>
---
 drivers/hid/hidraw.c          |  106 ++++++++++++++++++++++++++++++++++++++--
 drivers/hid/usbhid/hid-core.c |   35 +++++++++++++
 include/linux/hid.h           |    3 +
 include/linux/hidraw.h        |    3 +
 4 files changed, 141 insertions(+), 6 deletions(-)

diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
index 468e87b..8f06044 100644
--- a/drivers/hid/hidraw.c
+++ b/drivers/hid/hidraw.c
@@ -102,15 +102,14 @@ out:
 }
 
 /* the first byte is expected to be a report number */
-static ssize_t hidraw_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos)
+/* This function is to be called with the minors_lock mutex held */
+static ssize_t hidraw_send_report(struct file *file, const char __user *buffer, size_t count, unsigned char report_type)
 {
 	unsigned int minor = iminor(file->f_path.dentry->d_inode);
 	struct hid_device *dev;
 	__u8 *buf;
 	int ret = 0;
 
-	mutex_lock(&minors_lock);
-
 	if (!hidraw_table[minor]) {
 		ret = -ENODEV;
 		goto out;
@@ -148,14 +147,92 @@ static ssize_t hidraw_write(struct file *file, const char __user *buffer, size_t
 		goto out_free;
 	}
 
-	ret = dev->hid_output_raw_report(dev, buf, count, HID_OUTPUT_REPORT);
+	ret = dev->hid_output_raw_report(dev, buf, count, report_type);
 out_free:
 	kfree(buf);
 out:
+	return ret;
+}
+
+/* the first byte is expected to be a report number */
+static ssize_t hidraw_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos)
+{
+	ssize_t ret;
+	mutex_lock(&minors_lock);
+	ret = hidraw_send_report(file, buffer, count, HID_OUTPUT_REPORT);
 	mutex_unlock(&minors_lock);
 	return ret;
 }
 
+
+/* This function performs a Get_Report transfer over the control endpoint
+   per section 7.2.1 of the HID specification, version 1.1.  The first byte
+   of buffer is the report number to request, or 0x0 if the defice does not
+   use numbered reports. The report_type parameter can be HID_FEATURE_REPORT
+   or HID_INPUT_REPORT.  This function is to be called with the minors_lock
+   mutex held.  */
+static ssize_t hidraw_get_report(struct file *file, char __user *buffer, size_t count, unsigned char report_type)
+{
+	unsigned int minor = iminor(file->f_path.dentry->d_inode);
+	struct hid_device *dev;
+	__u8 *buf;
+	int ret = 0, len;
+	unsigned char report_number;
+
+	dev = hidraw_table[minor]->hid;
+
+	if (!dev->hid_get_raw_report) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	if (count > HID_MAX_BUFFER_SIZE) {
+		printk(KERN_WARNING "hidraw: pid %d passed too large report\n",
+				task_pid_nr(current));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (count < 2) {
+		printk(KERN_WARNING "hidraw: pid %d passed too short report\n",
+				task_pid_nr(current));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	buf = kmalloc(count * sizeof(__u8), GFP_KERNEL);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* Read the first byte from the user. This is the report number,
+	   which is passed to dev->hid_get_raw_report(). */
+	if (copy_from_user(&report_number, buffer, 1)) {
+		ret = -EFAULT;
+		goto out_free;
+	}
+
+	ret = dev->hid_get_raw_report(dev, report_number, buf, count, report_type);
+
+	if (ret < 0)
+		goto out_free;
+
+	len = (ret < count) ? ret : count;
+
+	if (copy_to_user(buffer, buf, len)) {
+		ret = -EFAULT;
+		goto out_free;
+	}
+
+	ret = len;
+
+out_free:
+	kfree(buf);
+out:
+	return ret;
+}
+
 static unsigned int hidraw_poll(struct file *file, poll_table *wait)
 {
 	struct hidraw_list *list = file->private_data;
@@ -295,7 +372,24 @@ static long hidraw_ioctl(struct file *file, unsigned int cmd,
 		default:
 			{
 				struct hid_device *hid = dev->hid;
-				if (_IOC_TYPE(cmd) != 'H' || _IOC_DIR(cmd) != _IOC_READ) {
+				if (_IOC_TYPE(cmd) != 'H') {
+					ret = -EINVAL;
+					break;
+				}
+
+				if (_IOC_NR(cmd) == _IOC_NR(HIDIOCSFEATURE(0))) {
+					int len = _IOC_SIZE(cmd);
+					ret = hidraw_send_report(file, user_arg, len, HID_FEATURE_REPORT);
+					break;
+				}
+				if (_IOC_NR(cmd) == _IOC_NR(HIDIOCGFEATURE(0))) {
+					int len = _IOC_SIZE(cmd);
+					ret = hidraw_get_report(file, user_arg, len, HID_FEATURE_REPORT);
+					break;
+				}
+
+				/* Begin Read-only ioctls. */
+				if (_IOC_DIR(cmd) != _IOC_READ) {
 					ret = -EINVAL;
 					break;
 				}
@@ -327,7 +421,7 @@ static long hidraw_ioctl(struct file *file, unsigned int cmd,
 						-EFAULT : len;
 					break;
 				}
-		}
+			}
 
 		ret = -ENOTTY;
 	}
diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
index b336dd8..38c261a 100644
--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -799,6 +799,40 @@ static int hid_alloc_buffers(struct usb_device *dev, struct hid_device *hid)
 	return 0;
 }
 
+static int usbhid_get_raw_report(struct hid_device *hid,
+		unsigned char report_number, __u8 *buf, size_t count,
+		unsigned char report_type)
+{
+	struct usbhid_device *usbhid = hid->driver_data;
+	struct usb_device *dev = hid_to_usb_dev(hid);
+	struct usb_interface *intf = usbhid->intf;
+	struct usb_host_interface *interface = intf->cur_altsetting;
+	int skipped_report_id = 0;
+	int ret;
+
+	/* Byte 0 is the report number. Report data starts at byte 1.*/
+	buf[0] = report_number;
+	if (report_number == 0x0) {
+		/* Offset the return buffer by 1, so that the report ID
+		   will remain in byte 0. */
+		buf++;
+		count--;
+		skipped_report_id = 1;
+	}
+	ret = usb_control_msg(dev, usb_rcvctrlpipe(dev, 0),
+		HID_REQ_GET_REPORT,
+		USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_INTERFACE,
+		((report_type + 1) << 8) | report_number,
+		interface->desc.bInterfaceNumber, buf, count,
+		USB_CTRL_SET_TIMEOUT);
+
+	/* count also the report id */
+	if (ret > 0 && skipped_report_id)
+		ret++;
+
+	return ret;
+}
+
 static int usbhid_output_raw_report(struct hid_device *hid, __u8 *buf, size_t count,
 		unsigned char report_type)
 {
@@ -1139,6 +1173,7 @@ static int usbhid_probe(struct usb_interface *intf, const struct usb_device_id *
 
 	usb_set_intfdata(intf, hid);
 	hid->ll_driver = &usb_hid_driver;
+	hid->hid_get_raw_report = usbhid_get_raw_report;
 	hid->hid_output_raw_report = usbhid_output_raw_report;
 	hid->ff_init = hid_pidff_init;
 #ifdef CONFIG_USB_HIDDEV
diff --git a/include/linux/hid.h b/include/linux/hid.h
index d91c25e..e8ee0a9 100644
--- a/include/linux/hid.h
+++ b/include/linux/hid.h
@@ -504,6 +504,9 @@ struct hid_device {							/* device report descriptor */
 				  struct hid_usage *, __s32);
 	void (*hiddev_report_event) (struct hid_device *, struct hid_report *);
 
+	/* handler for raw input (Get_Report) data, used by hidraw */
+	int (*hid_get_raw_report) (struct hid_device *, unsigned char, __u8 *, size_t, unsigned char);
+
 	/* handler for raw output data, used by hidraw */
 	int (*hid_output_raw_report) (struct hid_device *, __u8 *, size_t, unsigned char);
 
diff --git a/include/linux/hidraw.h b/include/linux/hidraw.h
index dd8d692..4b88e69 100644
--- a/include/linux/hidraw.h
+++ b/include/linux/hidraw.h
@@ -35,6 +35,9 @@ struct hidraw_devinfo {
 #define HIDIOCGRAWINFO		_IOR('H', 0x03, struct hidraw_devinfo)
 #define HIDIOCGRAWNAME(len)     _IOC(_IOC_READ, 'H', 0x04, len)
 #define HIDIOCGRAWPHYS(len)     _IOC(_IOC_READ, 'H', 0x05, len)
+/* The first byte of SFEATURE and GFEATURE is the report number */
+#define HIDIOCSFEATURE(len)    _IOC(_IOC_WRITE|_IOC_READ, 'H', 0x06, len)
+#define HIDIOCGFEATURE(len)    _IOC(_IOC_WRITE|_IOC_READ, 'H', 0x07, len)
 
 #define HIDRAW_FIRST_MINOR 0
 #define HIDRAW_MAX_DEVICES 64
-- 
1.7.0.4



^ permalink raw reply related

* [PATCH v5 2/4] bt hidp: Wait for ACK on Sent Reports
From: Alan Ott @ 2011-01-18  8:04 UTC (permalink / raw)
  To: Jiri Kosina, Marcel Holtmann, Gustavo F. Padovan, David S. Miller,
	Alan Ott
  Cc: Alan Ott
In-Reply-To: <1295337880-12452-1-git-send-email-alan@signal11.us>

Wait for an ACK from the device before returning from
hidp_output_raw_report(). This way, failures can be returned to the user
application. Also, it prevents ACK/NAK packets from an output packet from
being confused with ACK/NAK packets from an input request packet.

Signed-off-by: Alan Ott <alan@signal11.us>
---
 net/bluetooth/hidp/core.c |   54 ++++++++++++++++++++++++++++++++++++++++++--
 net/bluetooth/hidp/hidp.h |    4 +++
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 67cc4bc..5383e6c 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -316,6 +316,9 @@ static int hidp_send_report(struct hidp_session *session, struct hid_report *rep
 static int hidp_output_raw_report(struct hid_device *hid, unsigned char *data, size_t count,
 		unsigned char report_type)
 {
+	struct hidp_session *session = hid->driver_data;
+	int ret;
+
 	switch (report_type) {
 	case HID_FEATURE_REPORT:
 		report_type = HIDP_TRANS_SET_REPORT | HIDP_DATA_RTYPE_FEATURE;
@@ -327,10 +330,47 @@ static int hidp_output_raw_report(struct hid_device *hid, unsigned char *data, s
 		return -EINVAL;
 	}
 
+	if (mutex_lock_interruptible(&session->report_mutex))
+		return -ERESTARTSYS;
+
+	/* Set up our wait, and send the report request to the device. */
+	set_bit(HIDP_WAITING_FOR_SEND_ACK, &session->flags);
 	if (hidp_send_ctrl_message(hid->driver_data, report_type,
-			data, count))
-		return -ENOMEM;
-	return count;
+			data, count)) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	/* Wait for the ACK from the device. */
+	while (test_bit(HIDP_WAITING_FOR_SEND_ACK, &session->flags)) {
+		int res;
+
+		res = wait_event_interruptible_timeout(session->report_queue,
+			!test_bit(HIDP_WAITING_FOR_SEND_ACK, &session->flags),
+			10*HZ);
+		if (res == 0) {
+			/* timeout */
+			ret = -EIO;
+			goto err;
+		}
+		if (res < 0) {
+			/* signal */
+			ret = -ERESTARTSYS;
+			goto err;
+		}
+	}
+
+	if (!session->output_report_success) {
+		ret = -EIO;
+		goto err;
+	}
+
+	ret = count;
+
+err:
+	clear_bit(HIDP_WAITING_FOR_SEND_ACK, &session->flags);
+	mutex_unlock(&session->report_mutex);
+	return ret;
 }
 
 static void hidp_idle_timeout(unsigned long arg)
@@ -357,10 +397,12 @@ static void hidp_process_handshake(struct hidp_session *session,
 					unsigned char param)
 {
 	BT_DBG("session %p param 0x%02x", session, param);
+	session->output_report_success = 0; /* default condition */
 
 	switch (param) {
 	case HIDP_HSHK_SUCCESSFUL:
 		/* FIXME: Call into SET_ GET_ handlers here */
+		session->output_report_success = 1;
 		break;
 
 	case HIDP_HSHK_NOT_READY:
@@ -385,6 +427,12 @@ static void hidp_process_handshake(struct hidp_session *session,
 			HIDP_TRANS_HANDSHAKE | HIDP_HSHK_ERR_INVALID_PARAMETER, NULL, 0);
 		break;
 	}
+
+	/* Wake up the waiting thread. */
+	if (test_bit(HIDP_WAITING_FOR_SEND_ACK, &session->flags)) {
+		clear_bit(HIDP_WAITING_FOR_SEND_ACK, &session->flags);
+		wake_up_interruptible(&session->report_queue);
+	}
 }
 
 static void hidp_process_hid_control(struct hidp_session *session,
diff --git a/net/bluetooth/hidp/hidp.h b/net/bluetooth/hidp/hidp.h
index 2cc35dc..92e093e 100644
--- a/net/bluetooth/hidp/hidp.h
+++ b/net/bluetooth/hidp/hidp.h
@@ -80,6 +80,7 @@
 #define HIDP_VIRTUAL_CABLE_UNPLUG	0
 #define HIDP_BOOT_PROTOCOL_MODE		1
 #define HIDP_BLUETOOTH_VENDOR_ID	9
+#define HIDP_WAITING_FOR_SEND_ACK	11
 
 struct hidp_connadd_req {
 	int   ctrl_sock;	// Connected control socket
@@ -154,6 +155,9 @@ struct hidp_session {
 	struct sk_buff_head ctrl_transmit;
 	struct sk_buff_head intr_transmit;
 
+	/* Used in hidp_output_raw_report() */
+	int output_report_success; /* boolean */
+
 	/* Report descriptor */
 	__u8 *rd_data;
 	uint rd_size;
-- 
1.7.0.4



^ permalink raw reply related

* dev_queue_xmit and sockets
From: Marjan Schiller @ 2011-01-18  9:04 UTC (permalink / raw)
  To: netdev

Hi!

I am currently developing a netfilter module which passes incoming and outgoing ethernet packages from one interface to an another interface ( a tun device ). This works very well, the tcpdump shows correct packages on the destination device and PING and ARP works good.

But i get into trouble if i want to create a socket connection with these packages. It seams to me that the packages are only "useable" within the kernel space and not the user space. Did i miss something there? Do i have to prepare the SKB for user mode or something like that?

I can establish a connection between the client socket and the server socket, but the server socket does not read any data ( But the packet with the data has reached the interface ).

Here is an example how i pass the packaged to the tun device:

static unsigned int send_ethernet_package( struct sk_buff * in_skb, void * data, int len, struct net_device * dev )
{
  struct sk_buff *skb;

  skb = alloc_skb( len , GFP_ATOMIC);
  skb_put( skb, len );
  skb->pkt_type = PACKET_HOST;
  skb->dev = dev;
  memcpy(skb->head, data , len );
  dev_queue_xmit( skb );

  return 0;
}

The data parameter is a well formed ethernet package.

Someone how has a guess whats wrong with this?

Thanks for help
    
    Marjan

-- 
Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail

^ permalink raw reply

* Re: [PATCH 1/3] vhost-net: check the support of mergeable buffer outside the receive loop
From: Jason Wang @ 2011-01-18  9:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, virtualization, netdev, kvm, linux-kernel
In-Reply-To: <20110118043650.GA2233@redhat.com>

Michael S. Tsirkin writes:
 > On Tue, Jan 18, 2011 at 12:26:17PM +0800, Jason Wang wrote:
 > > Michael S. Tsirkin writes:
 > >  > On Mon, Jan 17, 2011 at 04:10:59PM +0800, Jason Wang wrote:
 > >  > > No need to check the support of mergeable buffer inside the recevie
 > >  > > loop as the whole handle_rx()_xx is in the read critical region.  So
 > >  > > this patch move it ahead of the receiving loop.
 > >  > > 
 > >  > > Signed-off-by: Jason Wang <jasowang@redhat.com>
 > >  > 
 > >  > Well feature check is mostly just features & bit
 > >  > so why would it be slower? Because of the rcu
 > >  > adding memory barriers? Do you observe a
 > >  > measureable speedup with this patch?
 > >  > 
 > > 
 > > I do not measure the performance for just this patch, maybe not obvious. And it
 > > can also help the code unification.
 > > 
 > >  > Apropos, I noticed that the check in vhost_has_feature
 > >  > is wrong: it uses the same kind of RCU as the
 > >  > private pointer. So we'll have to fix that properly
 > >  > by adding more lockdep classes, but for now
 > >  > we'll need to make
 > >  > the check 1 || lockdep_is_held(&dev->mutex);
 > >  > and add a TODO.
 > >  > 
 > > 
 > > Not sure, lockdep_is_head(&dev->mutex) maybe not accurate but sufficient, as it
 > > was always held in the read critical region.
 > 
 > Not really, when we call vhost_has_feature from the vq handling thread
 > it's not.
 > 

Seems not, see vhost_net_ioctl().

 > >  > > ---
 > >  > >  drivers/vhost/net.c |    5 +++--
 > >  > >  1 files changed, 3 insertions(+), 2 deletions(-)
 > >  > > 
 > >  > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 > >  > > index 14fc189..95e49de 100644
 > >  > > --- a/drivers/vhost/net.c
 > >  > > +++ b/drivers/vhost/net.c
 > >  > > @@ -411,7 +411,7 @@ static void handle_rx_mergeable(struct vhost_net *net)
 > >  > >  	};
 > >  > >  
 > >  > >  	size_t total_len = 0;
 > >  > > -	int err, headcount;
 > >  > > +	int err, headcount, mergeable;
 > >  > >  	size_t vhost_hlen, sock_hlen;
 > >  > >  	size_t vhost_len, sock_len;
 > >  > >  	struct socket *sock = rcu_dereference(vq->private_data);
 > >  > > @@ -425,6 +425,7 @@ static void handle_rx_mergeable(struct vhost_net *net)
 > >  > >  
 > >  > >  	vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
 > >  > >  		vq->log : NULL;
 > >  > > +	mergeable = vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF);
 > >  > >  
 > >  > >  	while ((sock_len = peek_head_len(sock->sk))) {
 > >  > >  		sock_len += sock_hlen;
 > >  > > @@ -474,7 +475,7 @@ static void handle_rx_mergeable(struct vhost_net *net)
 > >  > >  			break;
 > >  > >  		}
 > >  > >  		/* TODO: Should check and handle checksum. */
 > >  > > -		if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF) &&
 > >  > > +		if (likely(mergeable) &&
 > >  > >  		    memcpy_toiovecend(vq->hdr, (unsigned char *)&headcount,
 > >  > >  				      offsetof(typeof(hdr), num_buffers),
 > >  > >  				      sizeof hdr.num_buffers)) {

^ permalink raw reply

* Re: [PATCH v2 1/3] can: at91_can: clean up usage of AT91_MB_RX_FIRST and AT91_MB_RX_NUM
From: Wolfgang Grandegger @ 2011-01-18  9:26 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4D349D8C.3010403-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

On 01/17/2011 08:50 PM, Marc Kleine-Budde wrote:
> On 01/11/2011 02:21 PM, Marc Kleine-Budde wrote:
>> This patch cleans up the usage of two macros which specify the mailbox
>> usage. AT91_MB_RX_FIRST and AT91_MB_RX_NUM define the first and the
>> number of RX mailboxes. The current driver uses these variables in an
>> unclean way; assuming that AT91_MB_RX_FIRST is 0;
>>
>> This patch cleans up the usage of these macros, no longer assuming
>> AT91_MB_RX_FIRST == 0.
>>
>> Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
> 
> Any comments on this?

Looks fine too me. You can add my:

Acked-by: Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>

Thanks,

Wolfgang.

^ permalink raw reply

* Re: inbound connection problems when "netlink: test for all flags of the NLM_F_DUMP composite" commit applied
From: Jarek Poplawski @ 2011-01-18  9:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Arthur Marsh, Jan Engelhardt, Eric Dumazet, Linux Netdev List,
	Jamal Hadi Salim
In-Reply-To: <4D336050.9030602@netfilter.org>

On 2011-01-16 22:17, Pablo Neira Ayuso wrote:
> On 16/01/11 13:25, Arthur Marsh wrote:
>> Jan Engelhardt wrote, on 16/01/11 21:20:
>>>
>>> Le dimanche 16 janvier 2011 Ă  19:24 +1030, Arthur Marsh a ĂŠcrit :
>>>>
>>>>> With kernels up to and including 2.6.37-git7, inbound telnetd-ssl
>>>>> connections worked fine. With kernel 2.6.37-git9 and later inbound
>>>>> telnetd-ssl connections failed, and on machine shut-down, there
>>>>> were warning messages about daemons not return status.
>>>
>>> Which daemons are these? For reference, what distro do you happen
>>> to use?
>>
>> avahi-daemon (which gave multiple warning messages, hence I thought it
>> may have been multiple packages)
>>
>> I'm running Debian unstable with kernel.org kernels.
>>
>>>
>>>>> commit 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf
>>>>>      netlink: test for all flags of the NLM_F_DUMP composite
>>>
>>> Each of the hunks in this commit is independent of another.
>>> Would you mind bisecting these too?
>>
>> Recompiling with the only the first patch (attached) resulted in a
>> repeat of the problem.
>>
>> I've removed one person from the cc: list as they did not want to
>> receive email about this even though they signed off the commit.
> 
> Please, pass this patch to the avahi-daemon developers. They use an
> invalid netlink flag combination for dump operations.

Nothing in RFC suggests this is an invalid netlink flag combination,
and author's implementation had suported it:
ftp://ftp.rfc-editor.org/in-notes/rfc3549.txt

NLM_F_DUMP is called a convenience macro only, so might be interpreted
as a special kind of dump. BTW, isn't NLM_F_ATOMIC flag with
NLM_F_DUMP treated as invalid now either?

Even if I'm wrong, this change added to stable will break many configs.
My proposal is to revert commit 0ab03c2b147 until proper fix is found.

Cc: Jamal Hadi Salim <hadi@cyberus.ca>

Jarek P.

^ permalink raw reply

* Re: inbound connection problems when "netlink: test for all flags of the NLM_F_DUMP composite" commit applied
From: David Miller @ 2011-01-18 10:07 UTC (permalink / raw)
  To: jarkao2; +Cc: pablo, arthur.marsh, jengelh, eric.dumazet, netdev, hadi
In-Reply-To: <20110118093811.GA7520@ff.dom.local>

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 18 Jan 2011 09:38:11 +0000

> Even if I'm wrong, this change added to stable will break many configs.
> My proposal is to revert commit 0ab03c2b147 until proper fix is found.

The flag combination is, at best ambiguous, it has no proper
definition without the check we added.

^ permalink raw reply

* Re: inbound connection problems when "netlink: test for all flags of the NLM_F_DUMP composite" commit applied
From: Jarek Poplawski @ 2011-01-18 10:24 UTC (permalink / raw)
  To: David Miller; +Cc: pablo, arthur.marsh, jengelh, eric.dumazet, netdev, hadi
In-Reply-To: <20110118.020702.115924992.davem@davemloft.net>

On Tue, Jan 18, 2011 at 02:07:02AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Tue, 18 Jan 2011 09:38:11 +0000
> 
> > Even if I'm wrong, this change added to stable will break many configs.
> > My proposal is to revert commit 0ab03c2b147 until proper fix is found.
> 
> The flag combination is, at best ambiguous, it has no proper
> definition without the check we added.

Do you all expect all users manage to upgrade avahi app before
changing their stable kernel? I mean "own distro" users especially.

Jarek P.

^ permalink raw reply

* bnx2 cards intermittantly going offline
From: Mills, Tony @ 2011-01-18 10:54 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hi, 

I was running Debian lenny 64bit with a 2.6.24 kernel which seemed to have a rather old version of the bnx2 driver, I have been getting reports that there have been issues with connectivity, this seems to happen randomly across many different servers in different data centres. 

Further investigation showed that the interfaces become completely unresponsive for periods of time, whereby machines are unable to ping the host with the problem and the server  with the problem is unable to ping out, our tcp application which is time critical will kick off connections. 

The network cards are Broadcom NetXtreme II BCM5708 Gigabit Ethernet cards on Dell 2950's plugged into  Cisco 3750E's. 

Reading various posts indicated that I might be experiencing a problem that may have already been solved so attempted to build  the drivers from the Broadcom website into the 2.4.24 kernel without success, eventually compiling against a 2.3.32 kernel worked great. 

I have installed this on 4 machines in different data centres and followed some of the other posts I have found, in an attempt to fix the issues, however none of the things i have tried appear to be affective :-

1.	 Was seeing rx_fw_discards so upped rx ring buffer to both 1020 and 4080, this stopped the rx_fw_discards but not the "unresponsiveness". 
2.	Have enabled flow control on one of the machines, this still has the unresponsive behaviour and the port on the switch shows 0 pause frames received. 
3.	Upgraded the kernel and driver to latest available. 

Last night i setup a machine to monitor overnight and at 3:52 this morning it became unresponsive. 

The switch was setup to "flowcontrol desired", and the machine had the following settings:-

ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:                         4080
RX Mini:               0
RX Jumbo:           16320
TX:                          255
Current hardware settings:
RX:                         4080
RX Mini:               0
RX Jumbo:           0
TX:                          255

ethtool -a eth0
Pause parameters for eth0:
Autonegotiate: on
RX:                         on
TX:                          on

The output from ethtool -S eth0

NIC statistics:
     rx_bytes: 65832403312
     rx_error_bytes: 0
     tx_bytes: 141615699363
     tx_error_bytes: 0
     rx_ucast_packets: 565468011
     rx_mcast_packets: 3
     rx_bcast_packets: 193008
     tx_ucast_packets: 768277404
     tx_mcast_packets: 8
     tx_bcast_packets: 657
     tx_mac_errors: 0
     tx_carrier_errors: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     tx_deferred: 0
     tx_excess_collisions: 0
     tx_late_collisions: 0
     tx_total_collisions: 0
     rx_fragments: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_oversize_packets: 0
     rx_64_byte_packets: 398958533
     rx_65_to_127_byte_packets: 125222178
     rx_128_to_255_byte_packets: 16962519
     rx_256_to_511_byte_packets: 6100929
     rx_512_to_1023_byte_packets: 2314593
     rx_1024_to_1522_byte_packets: 16102270
     rx_1523_to_9022_byte_packets: 0
     tx_64_byte_packets: 331974057
     tx_65_to_127_byte_packets: 239480821
     tx_128_to_255_byte_packets: 78102231
     tx_256_to_511_byte_packets: 33163946
     tx_512_to_1023_byte_packets: 57321357
     tx_1024_to_1522_byte_packets: 28235657
     tx_1523_to_9022_byte_packets: 0
     rx_xon_frames: 0
     rx_xoff_frames: 0
     tx_xon_frames: 0
     tx_xoff_frames: 0
     rx_mac_ctrl_frames: 0
     rx_filtered_packets: 43417
     rx_ftq_discards: 0
     rx_discards: 0
     rx_fw_discards: 0

The switch port showed no pause frames

show interfaces gigabitEthernet X/X/X flowcontrol 
Port       Send FlowControl  Receive FlowControl  RxPause TxPause
           admin    oper     admin    oper                       
---------  -------- -------- -------- --------    ------- -------
GiX/X/X   Unsupp.  Unsupp.  desired  on          0       0   

(The switch is unable to send flow control packets but can receive).

Would someone be able to point me at anything else that may help identify/fix the issue. 

Best Regards

Tony Mills
-- 
IMPORTANT NOTICE

The sender does not guarantee that this message, including any attachment, is secure
or virus free. Also, it is confidential and may be privileged or otherwise protected
from disclosure. If you are not the intended recipient, do not disclose or copy it
or its contents. Please telephone or email the sender and delete the message
entirely from your system.
Jagex Limited is a company registered in England & Wales with company number
03982706 and a registered office at St John's Innovation Centre, Cowley Road, 
Cambridge, CB4 0WS, UK.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox