Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 0/7] net: Convert aligned memcpy to ether_addr_copy
From: Joe Perches @ 2014-01-20 17:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20140115.164540.450877805879786067.davem@davemloft.net>

On Wed, 2014-01-15 at 16:45 -0800, David Miller wrote:
From: Joe Perches <joe@perches.com>
> Date: Wed, 15 Jan 2014 16:07:58 -0800
> 
> > If you want the ones for net-next net/ now (but not
> > for batman-adv, that maybe could use a new function like
> > ether_addr_copy_unaligned) here's a changestat.
> > 
> > Otherwise, I'll wait for the next cycle.
> 
> This looks fine, why don't you toss it my way over the weekend as I
> still have some backlog to process at the moment?
 
I didn't get that done this weekend, so next cycle
for most of net/.

I don't want to introduce any breakage this late and
there are possible unaligned memcpy(foo, bar, ETH_ALEN)
where one or both of foo/bar are stack pointers where
the alignment is hard to verify.

There are also statics declared without __aligned(2)
that will need updating.

Maybe ether_addr_copy_unaligned should be added too.

Here are the ones I could easily verify...
No worries if it's this cycle or next.

Joe Perches (7):
  8021q: Use ether_addr_copy
  appletalk: Use ether_addr_copy
  atm: Use ether_addr_copy
  caif_usb: Use ether_addr_copy
  netpoll: Use ether_addr_copy
  pktgen: Use ether_addr_copy
  dsa: Use ether_addr_copy

 net/8021q/vlan.c     |  2 +-
 net/8021q/vlan_dev.c |  6 +++---
 net/appletalk/aarp.c | 12 ++++++------
 net/atm/lec.c        |  9 +++++----
 net/atm/mpc.c        |  2 +-
 net/caif/caif_usb.c  |  4 ++--
 net/core/netpoll.c   |  4 ++--
 net/core/pktgen.c    |  8 ++++----
 net/dsa/slave.c      |  2 +-
 9 files changed, 25 insertions(+), 24 deletions(-)

-- 
1.8.1.2.459.gbcd45b4.dirty

^ permalink raw reply

* Re: [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path
From: Zoltan Kiss @ 2014-01-20 17:47 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies
In-Reply-To: <20140120165356.GF11681@zion.uk.xensource.com>

On 20/01/14 16:53, Wei Liu wrote:
>>>> @@ -559,7 +579,7 @@ void xenvif_free(struct xenvif *vif)
>>>>   		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
>>>>   			unmap_timeout++;
>>>>   			schedule_timeout(msecs_to_jiffies(1000));
>>>> -			if (unmap_timeout > 9 &&
>>>> +			if (unmap_timeout > ((rx_drain_timeout_msecs/1000) * DIV_ROUND_UP(XENVIF_QUEUE_LENGTH, (XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS))) &&
>>>
>>> This line is really too long. And what's the rationale behind this long
>>> expression?
>> It calculates how many times you should ditch the internal queue of
>> an another (maybe stucked) vif before Qdisc empties it's actual
>> content. After that there shouldn't be any mapped handle left, so we
>> should start printing these messages. Actually it should use
>> vif->dev->tx_queue_len, and yes, it is probably better to move it to
>> the beginning of the function into a new variable, and use that
>> here.
>>
>
> Why is relative to tx queue length?
>
> What's the meaning of drain_timeout multipled by the last part
> (DIV_ROUND_UP)?
>
> If you proposed to use vif->dev->tx_queue_len to replace DIV_ROUND_UP
> then ignore the above question. But I still don't understand the
> rationale behind this. Could you elaborate a bit more? Wouldn't
> rx_drain_timeout_msecs/1000 along suffice?

Here we want to avoid timeout messages if an skb can be legitimatly 
stucked somewhere else. As we discussed earlier, realisticly this could 
be an another vif's internal or QDisc queue. That another vif also has 
this rx_drain_timeout_msecs timeout, but now with Paul's recent changes 
the timer only ditches the internal queue. After that, the QDisc queue 
can put in worst case XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS skbs into 
that another vif's internal queue, so we need several rounds of such 
timeouts until we can be sure that no another vif should have skb's from 
us. We are not sending more skb's, so newly stucked packets are not 
interesting for us here.
But actually using the current vif's queue length is not relevant in 
this calculation, as it doesn't mean other vif's has the same. I think 
it is better to stick with XENVIF_QUEUE_LENGTH.
I've added this explanation as a comment and moved the calculation into 
a separate variable, so it doesn't cause such long lines.

Zoli

^ permalink raw reply

* Re: [PATCH v2] socket.7: add description for SO_BUSY_POLL
From: Eliezer Tamir @ 2014-01-20 17:28 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, David Miller,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Andrew Morton, Eliezer Tamir
In-Reply-To: <52DD4EC6.7080208-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On 20/01/2014 18:28, Michael Kerrisk (man-pages) wrote:
> On 07/10/2013 04:18 PM, Eliezer Tamir wrote:
>> Add description for the SO_BUSY_POLL socket option to the socket(7) manpage.
> 
> Long after the fact, I've applied this. Thanks, Eliezer.
> 
> Would you be willing also to write a patch for the POLL_BUSY_LOOP flag of 
> poll()?

Yes, Me or someone from our team will do that.

-Eliezer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v4 6/9] xen-netback: Handle guests with too many frags
From: Zoltan Kiss @ 2014-01-20 17:26 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies
In-Reply-To: <20140116000314.GD5331@zion.uk.xensource.com>

On 16/01/14 00:03, Wei Liu wrote:
> On Tue, Jan 14, 2014 at 08:39:52PM +0000, Zoltan Kiss wrote:
> [...]
>>   	/* Skip first skb fragment if it is on same page as header fragment. */
>> @@ -832,6 +851,29 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>>
>>   	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
>>
>> +	if (frag_overflow) {
>> +		struct sk_buff *nskb = xenvif_alloc_skb(0);
>> +		if (unlikely(nskb == NULL)) {
>> +			netdev_err(vif->dev,
>> +				   "Can't allocate the frag_list skb.\n");
>
> This, and other occurences of netdev_* logs need to be rate limit.
> Otherwise you risk flooding kernel log when system is under memory
> pressure.
Done.

>> @@ -1537,6 +1613,32 @@ static int xenvif_tx_submit(struct xenvif *vif)
>>   				  pending_idx :
>>   				  INVALID_PENDING_IDX);
>>
>> +		if (skb_shinfo(skb)->frag_list) {
>> +			nskb = skb_shinfo(skb)->frag_list;
>> +			xenvif_fill_frags(vif, nskb, INVALID_PENDING_IDX);
>> +			skb->len += nskb->len;
>> +			skb->data_len += nskb->len;
>> +			skb->truesize += nskb->truesize;
>> +			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
>> +			skb_shinfo(nskb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
>> +			vif->tx_zerocopy_sent += 2;
>> +			nskb = skb;
>> +
>> +			skb = skb_copy_expand(skb,
>> +					      0,
>> +					      0,
>> +					      GFP_ATOMIC | __GFP_NOWARN);
>> +			if (!skb) {
>> +				netdev_dbg(vif->dev,
>> +					   "Can't consolidate skb with too many fragments\n");
>
> Rate limit.
>
>> +				if (skb_shinfo(nskb)->destructor_arg)
>> +					skb_shinfo(nskb)->tx_flags |=
>> +						SKBTX_DEV_ZEROCOPY;
>
> Why is this needed? nskb is the saved pointer to original skb, which has
> already had SKBTX_DEV_ZEROCOPY in tx_flags. Did I miss something?
Indeed. This actually belongs to the header grant copy patches I've sent 
in as well. I move it there.

^ permalink raw reply

* Re: [PATCH net-next v4 2/9] xen-netback: Change TX path from grant copy to mapping
From: Zoltan Kiss @ 2014-01-20 17:04 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies
In-Reply-To: <20140116000107.GB5331@zion.uk.xensource.com>

On 16/01/14 00:01, Wei Liu wrote:
> On Tue, Jan 14, 2014 at 08:39:48PM +0000, Zoltan Kiss wrote:
>> v3:
>> - delete a surplus checking from tx_action
>> - remove stray line
>> - squash xenvif_idx_unmap changes into the first patch
>> - init spinlocks
>> - call map hypercall directly instead of gnttab_map_refs()
>> - fix unmapping timeout in xenvif_free()
>>
>> v4:
>> - fix indentations and comments
>> - handle errors of set_phys_to_machine
>
> There's no call to set_phys_to_machine in this patch. Did I miss
> something?
I've made several changes between v3 and v4 about the grant mapping 
stuff, this was an earlier concept, not the one I've finally sent in. It 
should be the same comment as in the first patch: "go back to 
gnttab_map_refs, now we rely on API changes"

>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -123,7 +123,9 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   	BUG_ON(skb->dev != dev);
>>
>>   	/* Drop the packet if vif is not ready */
>> -	if (vif->task == NULL || !xenvif_schedulable(vif))
>> +	if (vif->task == NULL ||
>> +	    vif->dealloc_task == NULL ||
>> +	    !xenvif_schedulable(vif))
>>   		goto drop;
>>
>>   	/* At best we'll need one slot for the header and one for each
>> @@ -345,8 +347,26 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>
> At the beginning of the function there's BUG_ON checks for vif->task. I
> would suggest you do the same for vif->dealloc_task, just to be
> consistent.
I guess you mean in xenvif_connect. Applied.

>> @@ -431,6 +452,16 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>>   		goto err_rx_unbind;
>>   	}
>>
>> +	vif->dealloc_task = kthread_create(xenvif_dealloc_kthread,
>> +					   (void *)vif,
>> +					   "%s-dealloc",
>> +					   vif->dev->name);
>> +	if (IS_ERR(vif->dealloc_task)) {
>> +		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
>> +		err = PTR_ERR(vif->dealloc_task);
>> +		goto err_rx_unbind;
>> +	}
>> +
>>   	vif->task = task;
>
> Please move this line before the above hunk. Don't separate it from
> corresponding kthread_create.
Done, I've also used task for dealloc thread creation, the same way as 
the rx thread does.

> Last but not least, though I've looked at this patch for several rounds
> and and the basic logic looks correct to me, I would like it to go
> through XenRT tests if possible -- eye inspection is error-prone to such
> complicated change. (If I'm not mistaken you once told me you've done
> regression tests already. That would be neat!)
Yes, that's ongoing, I don't expect the patches to be accepted before 
they pass XenRT.

^ permalink raw reply

* Re: [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path
From: Wei Liu @ 2014-01-20 16:53 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, ian.campbell, xen-devel, netdev, linux-kernel,
	jonathan.davies
In-Reply-To: <52D98427.1060103@citrix.com>

On Fri, Jan 17, 2014 at 07:27:35PM +0000, Zoltan Kiss wrote:
> On 16/01/14 00:03, Wei Liu wrote:
> >On Tue, Jan 14, 2014 at 08:39:54PM +0000, Zoltan Kiss wrote:
> >[...]
> >>diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> >>index 109c29f..d1cd8ce 100644
> >>--- a/drivers/net/xen-netback/common.h
> >>+++ b/drivers/net/xen-netback/common.h
> >>@@ -129,6 +129,9 @@ struct xenvif {
> >>  	struct xen_netif_rx_back_ring rx;
> >>  	struct sk_buff_head rx_queue;
> >>  	RING_IDX rx_last_skb_slots;
> >
> >Hmm... You seemed to mix your other patch with this series. :-)
> Yep, this series doesn't work without that patch (actually that is a
> bug in netback even without my series), so at the moment it is based
> on it.
> 
> >
> >>+	bool rx_queue_purge;
> >>+
> >>+	struct timer_list wake_queue;
> >>
> >>  	/* This array is allocated seperately as it is large */
> >>  	struct gnttab_copy *grant_copy_op;
> >>@@ -225,4 +228,7 @@ void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
> >>
> >>  extern bool separate_tx_rx_irq;
> >>
> >[...]
> >>@@ -559,7 +579,7 @@ void xenvif_free(struct xenvif *vif)
> >>  		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> >>  			unmap_timeout++;
> >>  			schedule_timeout(msecs_to_jiffies(1000));
> >>-			if (unmap_timeout > 9 &&
> >>+			if (unmap_timeout > ((rx_drain_timeout_msecs/1000) * DIV_ROUND_UP(XENVIF_QUEUE_LENGTH, (XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS))) &&
> >
> >This line is really too long. And what's the rationale behind this long
> >expression?
> It calculates how many times you should ditch the internal queue of
> an another (maybe stucked) vif before Qdisc empties it's actual
> content. After that there shouldn't be any mapped handle left, so we
> should start printing these messages. Actually it should use
> vif->dev->tx_queue_len, and yes, it is probably better to move it to
> the beginning of the function into a new variable, and use that
> here.
> 

Why is relative to tx queue length?

What's the meaning of drain_timeout multipled by the last part
(DIV_ROUND_UP)?

If you proposed to use vif->dev->tx_queue_len to replace DIV_ROUND_UP
then ignore the above question. But I still don't understand the
rationale behind this. Could you elaborate a bit more? Wouldn't
rx_drain_timeout_msecs/1000 along suffice?

Wei.

> Zoli

^ permalink raw reply

* Re: [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions
From: Zoltan Kiss @ 2014-01-20 16:53 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies
In-Reply-To: <20140116000040.GA5331@zion.uk.xensource.com>

On 16/01/14 00:00, Wei Liu wrote:
> There is a stray blank line change in xenvif_tx_create_gop. (I removed
> that part too early and didn't bother to paste it back...)
Ok, fixed

>> +static inline bool tx_dealloc_work_todo(struct xenvif *vif)
>> +{
>> +	if (vif->dealloc_cons != vif->dealloc_prod)
>> +		return true;
>> +
>> +	return false;
>
> This can be simplified as
>    return vif->dealloc_cons != vif->dealloc_prod;
Indeed, done.

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: TCP_NOTSENT_LOWAT socket option
From: Michael Kerrisk (man-pages) @ 2014-01-20 16:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Neal Cardwell, Yuchung Cheng
In-Reply-To: <1374521803.4990.38.camel@edumazet-glaptop>

Hi Eric,

On Mon, Jul 22, 2013 at 9:36 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Idea of this patch is to add optional limitation of number of
> unsent bytes in TCP sockets, to reduce usage of kernel memory.
>
> TCP receiver might announce a big window, and TCP sender autotuning
> might allow a large amount of bytes in write queue, but this has little
> performance impact if a large part of this buffering is wasted :
>
> Write queue needs to be large only to deal with large BDP, not
> necessarily to cope with scheduling delays (incoming ACKS make room
> for the application to queue more bytes)
>
> For most workloads, using a value of 128 KB or less is OK to give
> applications enough time to react to POLLOUT events in time
> (or being awaken in a blocking sendmsg())
>
> This patch adds two ways to set the limit :
> 1) Per socket option TCP_NOTSENT_LOWAT
>
> 2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
> not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
> Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.
>
>
> This changes poll()/select()/epoll() to report POLLOUT
> only if number of unsent bytes is below tp->nosent_lowat
>
> Note this might increase number of sendmsg() calls when using non
> blocking sockets, and increase number of context switches for
> blocking sockets.

Would you be willing to write a patch to the tcp(7) man page [1] that
describes the user-space API aspects of TCP_NOTSENT_LOWAT /
/proc/sys/net/ipv4/tcp_notsent_lowat and their effect on
poll()/select()? If the *roff markup is too much of a hassle, I'd be
happy enough to get some plain text that I'll then integrate into the
man page.

Cheers,

Michael




[1] https://www.kernel.org/doc/man-pages/download.html

> Tested:
>
> netperf sessions, and watching /proc/net/protocols "memory" column for TCP
>
> Even in the absence of shallow queues, we get a benefit.
>
> With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
> used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)
>
> lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
> lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
> TCPv6     1880      2   45458   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
> TCP       1696    508   45458   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
>
> lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
> lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
> TCPv6     1880      2   20567   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
> TCP       1696    508   20567   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
>
> Using 128KB has no bad effect on the throughput of a single flow, although
> there is an increase of cpu time as sendmsg() calls trigger more
> context switches. A bonus is that we hold socket lock for a shorter amount
> of time and should improve latencies.
>
> lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
> lpq83:~# perf stat -e context-switches ./netperf -H lpq84 -t omni -l 20 -Cc
> OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84 () port 0 AF_INET
> Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
> Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
> Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
> Final       Final                                             %     Method %      Method
> 2097152     6000000     16384  20.00   16509.68   10^6bits/s  3.05  S      4.50   S      0.363   0.536   usec/KB
>
>  Performance counter stats for './netperf -H lpq84 -t omni -l 20 -Cc':
>
>             30,141 context-switches
>
>       20.006308407 seconds time elapsed
>
> lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
> lpq83:~# perf stat -e context-switches ./netperf -H lpq84 -t omni -l 20 -Cc
> OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84 () port 0 AF_INET
> Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
> Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
> Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
> Final       Final                                             %     Method %      Method
> 1911888     6000000     16384  20.00   17412.51   10^6bits/s  3.94  S      4.39   S      0.444   0.496   usec/KB
>
>  Performance counter stats for './netperf -H lpq84 -t omni -l 20 -Cc':
>
>            284,669 context-switches
>
>       20.005294656 seconds time elapsed
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> ---
> v2: title/changelog fix (TCP_NOSENT_LOWAT -> TCP_NOTSENT_LOWAT)
>
>  Documentation/networking/ip-sysctl.txt |   13 +++++++++++++
>  include/linux/tcp.h                    |    1 +
>  include/net/sock.h                     |   15 ++++++++++-----
>  include/net/tcp.h                      |   14 ++++++++++++++
>  include/uapi/linux/tcp.h               |    1 +
>  net/ipv4/sysctl_net_ipv4.c             |    7 +++++++
>  net/ipv4/tcp.c                         |   12 ++++++++++--
>  net/ipv4/tcp_ipv4.c                    |    1 +
>  net/ipv4/tcp_output.c                  |    3 +++
>  net/ipv6/tcp_ipv6.c                    |    1 +
>  10 files changed, 61 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 1074290..53cea9b 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -516,6 +516,19 @@ tcp_wmem - vector of 3 INTEGERs: min, default, max
>         this value is ignored.
>         Default: between 64K and 4MB, depending on RAM size.
>
> +tcp_notsent_lowat - UNSIGNED INTEGER
> +       A TCP socket can control the amount of unsent bytes in its write queue,
> +       thanks to TCP_NOTSENT_LOWAT socket option. poll()/select()/epoll()
> +       reports POLLOUT events if the amount of unsent bytes is below a per
> +       socket value, and if the write queue is not full. sendmsg() will
> +       also not add new buffers if the limit is hit.
> +
> +       This global variable controls the amount of unsent data for
> +       sockets not using TCP_NOTSENT_LOWAT. For these sockets, a change
> +       to the global variable has immediate effect.
> +
> +       Default: UINT_MAX (0xFFFFFFFF)
> +
>  tcp_workaround_signed_windows - BOOLEAN
>         If set, assume no receipt of a window scaling option means the
>         remote TCP is broken and treats the window as a signed quantity.
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index 472120b..9640803 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -238,6 +238,7 @@ struct tcp_sock {
>
>         u32     rcv_wnd;        /* Current receiver window              */
>         u32     write_seq;      /* Tail(+1) of data held in tcp send buffer */
> +       u32     notsent_lowat;  /* TCP_NOTSENT_LOWAT */
>         u32     pushed_seq;     /* Last pushed seq, required to talk to windows */
>         u32     lost_out;       /* Lost packets                 */
>         u32     sacked_out;     /* SACK'd packets                       */
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 95a5a2c..7be0b22 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -746,11 +746,6 @@ static inline int sk_stream_wspace(const struct sock *sk)
>
>  extern void sk_stream_write_space(struct sock *sk);
>
> -static inline bool sk_stream_memory_free(const struct sock *sk)
> -{
> -       return sk->sk_wmem_queued < sk->sk_sndbuf;
> -}
> -
>  /* OOB backlog add */
>  static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
>  {
> @@ -950,6 +945,7 @@ struct proto {
>         unsigned int            inuse_idx;
>  #endif
>
> +       bool                    (*stream_memory_free)(const struct sock *sk);
>         /* Memory pressure */
>         void                    (*enter_memory_pressure)(struct sock *sk);
>         atomic_long_t           *memory_allocated;      /* Current allocated memory. */
> @@ -1089,6 +1085,15 @@ static inline struct cg_proto *parent_cg_proto(struct proto *proto,
>  #endif
>
>
> +static inline bool sk_stream_memory_free(const struct sock *sk)
> +{
> +       if (sk->sk_wmem_queued >= sk->sk_sndbuf)
> +               return false;
> +
> +       return sk->sk_prot->stream_memory_free ?
> +               sk->sk_prot->stream_memory_free(sk) : true;
> +}
> +
>  static inline bool sk_has_memory_pressure(const struct sock *sk)
>  {
>         return sk->sk_prot->memory_pressure != NULL;
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index d198005..ff58714 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -284,6 +284,7 @@ extern int sysctl_tcp_thin_dupack;
>  extern int sysctl_tcp_early_retrans;
>  extern int sysctl_tcp_limit_output_bytes;
>  extern int sysctl_tcp_challenge_ack_limit;
> +extern unsigned int sysctl_tcp_notsent_lowat;
>
>  extern atomic_long_t tcp_memory_allocated;
>  extern struct percpu_counter tcp_sockets_allocated;
> @@ -1549,6 +1550,19 @@ extern int tcp_gro_complete(struct sk_buff *skb);
>  extern void __tcp_v4_send_check(struct sk_buff *skb, __be32 saddr,
>                                 __be32 daddr);
>
> +static inline u32 tcp_notsent_lowat(const struct tcp_sock *tp)
> +{
> +       return tp->notsent_lowat ?: sysctl_tcp_notsent_lowat;
> +}
> +
> +static inline bool tcp_stream_memory_free(const struct sock *sk)
> +{
> +       const struct tcp_sock *tp = tcp_sk(sk);
> +       u32 notsent_bytes = tp->write_seq - tp->snd_nxt;
> +
> +       return notsent_bytes < tcp_notsent_lowat(tp);
> +}
> +
>  #ifdef CONFIG_PROC_FS
>  extern int tcp4_proc_init(void);
>  extern void tcp4_proc_exit(void);
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 8d776eb..377f1e5 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -111,6 +111,7 @@ enum {
>  #define TCP_REPAIR_OPTIONS     22
>  #define TCP_FASTOPEN           23      /* Enable FastOpen on listeners */
>  #define TCP_TIMESTAMP          24
> +#define TCP_NOTSENT_LOWAT      25      /* limit number of unsent bytes in write queue */
>
>  struct tcp_repair_opt {
>         __u32   opt_code;
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index b2c123c..69ed203 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -555,6 +555,13 @@ static struct ctl_table ipv4_table[] = {
>                 .extra1         = &one,
>         },
>         {
> +               .procname       = "tcp_notsent_lowat",
> +               .data           = &sysctl_tcp_notsent_lowat,
> +               .maxlen         = sizeof(sysctl_tcp_notsent_lowat),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec,
> +       },
> +       {
>                 .procname       = "tcp_rmem",
>                 .data           = &sysctl_tcp_rmem,
>                 .maxlen         = sizeof(sysctl_tcp_rmem),
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 5423223..5792302 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -499,7 +499,8 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
>                         mask |= POLLIN | POLLRDNORM;
>
>                 if (!(sk->sk_shutdown & SEND_SHUTDOWN)) {
> -                       if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)) {
> +                       if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) &&
> +                           tcp_stream_memory_free(sk)) {
>                                 mask |= POLLOUT | POLLWRNORM;
>                         } else {  /* send SIGIO later */
>                                 set_bit(SOCK_ASYNC_NOSPACE,
> @@ -510,7 +511,8 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
>                                  * wspace test but before the flags are set,
>                                  * IO signal will be lost.
>                                  */
> -                               if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk))
> +                               if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) &&
> +                                   tcp_stream_memory_free(sk))
>                                         mask |= POLLOUT | POLLWRNORM;
>                         }
>                 } else
> @@ -2631,6 +2633,9 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>                 else
>                         tp->tsoffset = val - tcp_time_stamp;
>                 break;
> +       case TCP_NOTSENT_LOWAT:
> +               tp->notsent_lowat = val;
> +               break;
>         default:
>                 err = -ENOPROTOOPT;
>                 break;
> @@ -2847,6 +2852,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
>         case TCP_TIMESTAMP:
>                 val = tcp_time_stamp + tp->tsoffset;
>                 break;
> +       case TCP_NOTSENT_LOWAT:
> +               val = tp->notsent_lowat;
> +               break;
>         default:
>                 return -ENOPROTOOPT;
>         }
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index b74628e..8390bff 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -2806,6 +2806,7 @@ struct proto tcp_prot = {
>         .unhash                 = inet_unhash,
>         .get_port               = inet_csk_get_port,
>         .enter_memory_pressure  = tcp_enter_memory_pressure,
> +       .stream_memory_free     = tcp_stream_memory_free,
>         .sockets_allocated      = &tcp_sockets_allocated,
>         .orphan_count           = &tcp_orphan_count,
>         .memory_allocated       = &tcp_memory_allocated,
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 92fde8d..884efff 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -65,6 +65,9 @@ int sysctl_tcp_base_mss __read_mostly = TCP_BASE_MSS;
>  /* By default, RFC2861 behavior.  */
>  int sysctl_tcp_slow_start_after_idle __read_mostly = 1;
>
> +unsigned int sysctl_tcp_notsent_lowat __read_mostly = UINT_MAX;
> +EXPORT_SYMBOL(sysctl_tcp_notsent_lowat);
> +
>  static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
>                            int push_one, gfp_t gfp);
>
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index f0d6363..0030cfd 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1927,6 +1927,7 @@ struct proto tcpv6_prot = {
>         .unhash                 = inet_unhash,
>         .get_port               = inet_csk_get_port,
>         .enter_memory_pressure  = tcp_enter_memory_pressure,
> +       .stream_memory_free     = tcp_stream_memory_free,
>         .sockets_allocated      = &tcp_sockets_allocated,
>         .memory_allocated       = &tcp_memory_allocated,
>         .memory_pressure        = &tcp_memory_pressure,
>
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* Re: [PATCH net-next v2] xen-netback: Rework rx_work_todo
From: Wei Liu @ 2014-01-20 16:38 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
	jonathan.davies
In-Reply-To: <1389805867-22409-1-git-send-email-zoltan.kiss@citrix.com>

On Wed, Jan 15, 2014 at 05:11:07PM +0000, Zoltan Kiss wrote:
> The recent patch to fix receive side flow control (11b57f) solved the spinning
> thread problem, however caused an another one. The receive side can stall, if:
> - [THREAD] xenvif_rx_action sets rx_queue_stopped to true
> - [INTERRUPT] interrupt happens, and sets rx_event to true
> - [THREAD] then xenvif_kthread sets rx_event to false
> - [THREAD] rx_work_todo doesn't return true anymore
> 
> Also, if interrupt sent but there is still no room in the ring, it take quite a
> long time until xenvif_rx_action realize it. This patch ditch that two variable,
> and rework rx_work_todo. If the thread finds it can't fit more skb's into the
> ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's
> kept as 0. Then rx_work_todo will check if:
> - there is something to send to the ring (like before)
> - there is space for the topmost packet in the queue
> 
> I think that's more natural and optimal thing to test than two bool which are
> set somewhere else.
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

Sorry for the delay.

Paul, thanks for reviewing.

Acked-by: Wei Liu <wei.liu2@citrix.com>

Wei.

^ permalink raw reply

* Re: [PATCH v2] socket.7: add description for SO_BUSY_POLL
From: Michael Kerrisk (man-pages) @ 2014-01-20 16:28 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-man-u79uwXL29TY76Z2rM5mHXA, David Miller,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Andrew Morton, Eliezer Tamir
In-Reply-To: <20130710141835.15799.61657.stgit-vd+Q285fAQQAP1VksDEriRL4W9x8LtSr@public.gmane.org>

On 07/10/2013 04:18 PM, Eliezer Tamir wrote:
> Add description for the SO_BUSY_POLL socket option to the socket(7) manpage.

Long after the fact, I've applied this. Thanks, Eliezer.

Would you be willing also to write a patch for the POLL_BUSY_LOOP flag of 
poll()?

Cheers,

Michael


> v2
> fixed typos reported by Rasmus Villemoes
> 
> Signed-off-by: Eliezer Tamir <eliezer.tamir-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> ---
> 
>  man7/socket.7 |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/man7/socket.7 b/man7/socket.7
> index f2213eb..5edcb09 100644
> --- a/man7/socket.7
> +++ b/man7/socket.7
> @@ -694,6 +694,31 @@ for details on control messages.
>  Gets the socket type as an integer (e.g.,
>  .BR SOCK_STREAM ).
>  This socket option is read-only.
> +.TP
> +.B SO_BUSY_POLL
> +Sets the approximate time in microseconds to busy poll on a blocking receive
> +when there is no data. Increasing this value requires
> +.BR CAP_NET_ADMIN . 
> +The default for this option is controlled by the
> +.I /proc/sys/net/core/busy_read
> +file. 
> +
> +The value in the  
> +.I /proc/sys/net/core/busy_poll
> +file determines how long 
> +.BR select (2)
> +and 
> +.BR poll (2)
> +will busy poll when they operate on sockets with 
> +.BR SO_BUSY_POLL
> +set and no events to report are found.
> +
> +In both cases busy polling will only be done when the socket last received data
> +from a network device that supports this option.
> +
> +While busy polling may improve latency of some applications, care must be
> +taken when using it since this will increase both CPU utilization and power usage.
> +
>  .SS Signals
>  When writing onto a connection-oriented socket that has been shut down
>  (by the local or the remote end)
> 
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v2] ipv6: enable anycast addresses as source addresses in ICMPv6 error messages
From: Hannes Frederic Sowa @ 2014-01-20 15:54 UTC (permalink / raw)
  To: Francois-Xavier Le Bail
  Cc: netdev, David Stevens, Bill Fink, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki Yoshifuji,
	Patrick McHardy
In-Reply-To: <1390147236-3660-1-git-send-email-fx.lebail@yahoo.com>

On Sun, Jan 19, 2014 at 05:00:36PM +0100, Francois-Xavier Le Bail wrote:
> - Uses ipv6_anycast_destination() in icmp6_send().
> 
> Suggested-by: Bill Fink <billfink@mindspring.com>
> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
> ---
> v2: Consideration of a Hannes's concern : No sysctl is needed for this change.
>     No need for a new check function.

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

^ permalink raw reply

* Re: [PATCH v3 3/7] net: moxa: connect to PHY
From: Rob Herring @ 2014-01-20 14:57 UTC (permalink / raw)
  To: Jonas Jensen
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Florian Fainelli,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Ben Hutchings, David Miller,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1390216399-27028-3-git-send-email-jonas.jensen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Mon, Jan 20, 2014 at 5:13 AM, Jonas Jensen <jonas.jensen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> The kernel now has a MDIO bus driver and a phy_driver (RTL8201CP),
> connect to this PHY using OF.
>
> Signed-off-by: Jonas Jensen <jonas.jensen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>
> Notes:
>     Applies to next-20140120
>
>  .../devicetree/bindings/net/moxa,moxart-mac.txt    | 47 ++++++++++-
>  drivers/net/ethernet/moxa/moxart_ether.c           | 92 +++++++++++++++++++++-
>  drivers/net/ethernet/moxa/moxart_ether.h           |  2 +
>  3 files changed, 138 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt b/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt
> index 583418b..94c1f3b 100644
> --- a/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt
> +++ b/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt
> @@ -1,21 +1,64 @@
>  MOXA ART Ethernet Controller
>
> +Integrated MDIO bus node:
> +
> +- compatible: "moxa,moxart-mdio"
> +- Inherits from MDIO bus node binding[1]
> +
> +[1] Documentation/devicetree/bindings/net/phy.txt
> +
> +
> +Ethernet node:
> +
>  Required properties:
>
>  - compatible : Must be "moxa,moxart-mac"
>  - reg : Should contain register location and length
>  - interrupts : Should contain the mac interrupt number
>
> +Optional Properties:
> +
> +- phy-handle : the phandle to a PHY node
> +
> +
>  Example:
>
> +       mdio0: mdio@90900090 {
> +               compatible = "moxa,moxart-mdio";
> +               reg = <0x90900090 0x8>;
> +               #address-cells = <1>;
> +               #size-cells = <0>;
> +
> +               ethphy0: ethernet-phy@1 {
> +                       device_type = "ethernet-phy";

Drop this. device_type is only for real OpenFirmware.

> +                       compatible = "moxa,moxart-rtl8201cp", "ethernet-phy-ieee802.3-c22";
> +                       reg = <1>;
> +               };
> +       };
> +
> +       mdio1: mdio@92000090 {
> +               compatible = "moxa,moxart-mdio";
> +               reg = <0x92000090 0x8>;
> +               #address-cells = <1>;
> +               #size-cells = <0>;
> +
> +               ethphy1: ethernet-phy@1 {
> +                       device_type = "ethernet-phy";
> +                       compatible = "moxa,moxart-rtl8201cp", "ethernet-phy-ieee802.3-c22";
> +                       reg = <1>;
> +               };
> +       };
> +
>         mac0: mac@90900000 {

Not part of this patch, but this should really be ethernet@...

>                 compatible = "moxa,moxart-mac";
> -               reg =   <0x90900000 0x100>;
> +               reg = <0x90900000 0x90>;
>                 interrupts = <25 0>;
> +               phy-handle = <&ethphy0>;
>         };
>
>         mac1: mac@92000000 {
>                 compatible = "moxa,moxart-mac";
> -               reg =   <0x92000000 0x100>;
> +               reg = <0x92000000 0x90>;
>                 interrupts = <27 0>;
> +               phy-handle = <&ethphy1>;
>         };
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Fwd: [RFC PATCH net-next 0/3] virtio_net: add aRFS support
From: Ben Hutchings @ 2014-01-20 14:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Tom Herbert, Zhi Yong Wu, Linux Netdev List, Eric Dumazet,
	David S. Miller, Zhi Yong Wu, Michael S. Tsirkin, Rusty Russell,
	Jason Wang
In-Reply-To: <20140117052229.GE16061@stefanha-thinkpad.redhat.com>

On Fri, 2014-01-17 at 13:22 +0800, Stefan Hajnoczi wrote:
> On Thu, Jan 16, 2014 at 09:12:29AM -0800, Tom Herbert wrote:
> > On Thu, Jan 16, 2014 at 12:52 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
[...]
> > > If it's not possible or too hard to implement aRFS down the entire
> > > stack, we won't be able to process the packet on the right CPU.
> > > Then we might as well not bother with aRFS and just distribute uniformly
> > > across the rx virtqueues.
> > >
> > > Please post an outline of how rx packets will be steered up the stack so
> > > we can discuss whether aRFS can bring any benefit.
> > >
> > 1. The aRFS interface for the guest to specify which virtual queue to
> > receive a packet on is fairly straight forward.
> > 2. To hook into RFS, we need to match the virtual queue to the real
> > CPU it will processed on, and then program the RFS table for that flow
> > and CPU.
> > 3. NIC aRFS keys off the RFS tables so it can program the HW with the
> > correct queue for the CPU.
> 
> There are a lot of details that are not yet worked out:
> 
> If you want to implement aRFS down the vhost_net + macvtap path
> (probably easiest?) how will Step 2 work?  Do the necessary kernel
> interfaces exist to take the flow information in vhost_net, give them to
> macvtap, and finally push them down to the physical NIC?
>
> Not sure if aRFS will work down the full stack with vhost_net + tap +
> bridge.  Any ideas?
[...]

Currently ARFS identifies the flow to be steered by passing an skb from
that flow to the driver, not just a hash.  This is important for the sfc
driver because we're using perfect filters for ARFS and we need to pick
out the relevant header fields instead of the RSS hash.

If you try to set an ARFS filter synchronously from the guest, this
would require a different driver operation, and if the guest only
provides a hash then sfc probably would not be able to support it.

An alternative would be that the hash is used to update the
rps_flow_table for the physical RX queue and then ARFS on the host can
insert a filter after the *next* packet.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 2/2] net/cxgb4: Don't retrieve stats during recovery
From: Sergei Shtylyov @ 2014-01-20 14:35 UTC (permalink / raw)
  To: Gavin Shan, netdev; +Cc: dm
In-Reply-To: <1390187144-15495-2-git-send-email-shangw@linux.vnet.ibm.com>

Hello.

On 20-01-2014 7:05, Gavin Shan wrote:

> We possiblly retrieve the adapter's statistics during EEH recovery

    Only "possibly".

> and that should be disallowed. Otherwise, it would possibly incur
> replicate EEH error and EEH recovery is going to fail eventually.
> The patch checks if the PCI device is off-line before statistic
> retrieval.

> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>   drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   11 +++++++++++
>   1 file changed, 11 insertions(+)
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index c8eafbf..b0e72fb 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -4288,6 +4288,17 @@ static struct rtnl_link_stats64 *cxgb_get_stats(struct net_device *dev,
>   	struct port_info *p = netdev_priv(dev);
>   	struct adapter *adapter = p->adapter;
>
> +	/*
> +	 * We possibly retrieve the statistics while the PCI
> +	 * device is off-line. That would cause the recovery
> +	 * on off-lined PCI device going to fail. So it's
> +	 * reasonable to block it during the recovery period.
> +	 */

    The multi-line comment style in the networking code is somewhat special:

/* bla
  * bla
  */

WBR, Sergei

^ permalink raw reply

* Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff
From: Sergei Shtylyov @ 2014-01-20 14:27 UTC (permalink / raw)
  To: shaobingqing, trond.myklebust, bfields, davem
  Cc: linux-nfs, netdev, linux-kernel
In-Reply-To: <1390201154-20815-1-git-send-email-shaobingqing@bwstor.com.cn>

Hello.

On 20-01-2014 10:59, shaobingqing wrote:

> In current code, there only one struct rpc_rqst is prealloced. If one
> callback request is received from two sk_buff, the xprt_alloc_bc_request
> would be execute two times with the same transport->xid. The first time
> xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA
> bit of transport->tcp_flags will not be cleared. The second time
> xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
> pointer will be returned, then xprt_force_disconnect occur. I think one
> callback request can be allowed to be received from two sk_buff.

> Signed-off-by: shaobingqing <shaobingqing@bwstor.com.cn>
> ---
>   net/sunrpc/xprtsock.c |   11 +++++++++--
>   1 files changed, 9 insertions(+), 2 deletions(-)

> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index ee03d35..606950d 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
[...]
> @@ -1297,7 +1303,8 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
>   		list_add(&req->rq_bc_list, &bc_serv->sv_cb_list);
>   		spin_unlock(&bc_serv->sv_cb_lock);
>   		wake_up(&bc_serv->sv_cb_waitq);
> -	}
> +	} else
> +		req_partial = req;

    {} is needed in the *else* branch since it's already used in another 
branch of *if* -- see Documentation/CodingStyle.

WBR, Sergei

^ permalink raw reply

* Re: [PATCH] DT: net: document Ethernet bindings in one place
From: Rob Herring @ 2014-01-20 14:06 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: netdev, Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell,
	Kumar Gala, devicetree@vger.kernel.org, Rob Landley,
	linux-doc@vger.kernel.org
In-Reply-To: <201401180405.07859.sergei.shtylyov@cogentembedded.com>

On Fri, Jan 17, 2014 at 7:05 PM, Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> This patch is an attempt to gather the Ethernet related bindings in one file,
> like it's done in the MMC and some other subsystems. It should save the trouble
> of documenting several properties over and over in each binding document.
>
> I have used the Embedded Power Architecture(TM) Platform Requirements (ePAPR)
> standard as a base for the properties description, also documenting some ad-hoc
> properties that have been introduced over time despite having direct analogs in
> ePAPR.
>
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>
> ---
> The patch is against DaveM's 'net-next.git' repo and the DaVinci EMAC bindings
> fix I've posted yesterday:
>
> http://patchwork.ozlabs.org/patch/311854/
>
>  Documentation/devicetree/bindings/net/allwinner,sun4i-emac.txt    |    5 --
>  Documentation/devicetree/bindings/net/arc_emac.txt                |   12 ------
>  Documentation/devicetree/bindings/net/cavium-mix.txt              |    7 ---
>  Documentation/devicetree/bindings/net/cavium-pip.txt              |    7 ---
>  Documentation/devicetree/bindings/net/cdns-emac.txt               |    5 --
>  Documentation/devicetree/bindings/net/cpsw.txt                    |    3 -
>  Documentation/devicetree/bindings/net/davicom-dm9000.txt          |    2 -
>  Documentation/devicetree/bindings/net/davinci_emac.txt            |    4 --
>  Documentation/devicetree/bindings/net/ethernet.txt                |   20 ++++++++++
>  Documentation/devicetree/bindings/net/fsl-fec.txt                 |    4 --
>  Documentation/devicetree/bindings/net/fsl-tsec-phy.txt            |   11 +----
>  Documentation/devicetree/bindings/net/lpc-eth.txt                 |    4 --
>  Documentation/devicetree/bindings/net/macb.txt                    |    5 --
>  Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt |    4 --
>  Documentation/devicetree/bindings/net/marvell-orion-net.txt       |    3 -
>  Documentation/devicetree/bindings/net/micrel-ks8851.txt           |    3 -
>  Documentation/devicetree/bindings/net/smsc-lan91c111.txt          |    1
>  Documentation/devicetree/bindings/net/smsc911x.txt                |    4 --
>  Documentation/devicetree/bindings/net/stmmac.txt                  |    5 --
>  19 files changed, 25 insertions(+), 84 deletions(-)
>
> Index: net-next/Documentation/devicetree/bindings/net/allwinner,sun4i-emac.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/allwinner,sun4i-emac.txt
> +++ net-next/Documentation/devicetree/bindings/net/allwinner,sun4i-emac.txt
> @@ -4,13 +4,8 @@ Required properties:
>  - compatible: should be "allwinner,sun4i-emac".
>  - reg: address and length of the register set for the device.
>  - interrupts: interrupt for the device
> -- phy: A phandle to a phy node defining the PHY address (as the reg
> -  property, a single integer).
>  - clocks: A phandle to the reference clock for this device
>
> -Optional properties:
> -- (local-)mac-address: mac address to be used by this driver
> -

You should reference that this binding uses the common ethernet binding doc.

>  Example:
>
>  emac: ethernet@01c0b000 {
> Index: net-next/Documentation/devicetree/bindings/net/arc_emac.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/arc_emac.txt
> +++ net-next/Documentation/devicetree/bindings/net/arc_emac.txt
> @@ -6,18 +6,6 @@ Required properties:
>  - interrupts: Should contain the EMAC interrupts
>  - clock-frequency: CPU frequency. It is needed to calculate and set polling
>  period of EMAC.
> -- max-speed: Maximum supported data-rate in Mbit/s. In some HW configurations
> -bandwidth of external memory controller might be a limiting factor. That's why
> -it's required to specify which data-rate is supported on current SoC or FPGA.
> -For example if only 10 Mbit/s is supported (10BASE-T) set "10". If 100 Mbit/s is
> -supported (100BASE-TX) set "100".
> -- phy: PHY device attached to the EMAC via MDIO bus
> -
> -Child nodes of the driver are the individual PHY devices connected to the
> -MDIO bus. They must have a "reg" property given the PHY address on the MDIO bus.
> -
> -Optional properties:
> -- mac-address: 6 bytes, mac address
>
>  Examples:
>
> Index: net-next/Documentation/devicetree/bindings/net/cavium-mix.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/cavium-mix.txt
> +++ net-next/Documentation/devicetree/bindings/net/cavium-mix.txt
> @@ -18,13 +18,6 @@ Properties:
>  - interrupts: Two interrupt specifiers.  The first is the MIX
>    interrupt routing and the second the routing for the AGL interrupts.
>
> -- mac-address: Optional, the MAC address to assign to the device.
> -
> -- local-mac-address: Optional, the MAC address to assign to the device
> -  if mac-address is not specified.
> -
> -- phy-handle: Optional, a phandle for the PHY device connected to this device.
> -
>  Example:
>         ethernet@1070000100800 {
>                 compatible = "cavium,octeon-5750-mix";
> Index: net-next/Documentation/devicetree/bindings/net/cavium-pip.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/cavium-pip.txt
> +++ net-next/Documentation/devicetree/bindings/net/cavium-pip.txt
> @@ -35,13 +35,6 @@ Properties for PIP port which is a child
>
>  - reg: The port number within the interface group.
>
> -- mac-address: Optional, the MAC address to assign to the device.
> -
> -- local-mac-address: Optional, the MAC address to assign to the device
> -  if mac-address is not specified.
> -
> -- phy-handle: Optional, a phandle for the PHY device connected to this device.
> -
>  Example:
>
>         pip@11800a0000000 {
> Index: net-next/Documentation/devicetree/bindings/net/cdns-emac.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/cdns-emac.txt
> +++ net-next/Documentation/devicetree/bindings/net/cdns-emac.txt
> @@ -6,11 +6,6 @@ Required properties:
>    or the generic form: "cdns,emac".
>  - reg: Address and length of the register set for the device
>  - interrupts: Should contain macb interrupt
> -- phy-mode: String, operation mode of the PHY interface.
> -  Supported values are: "mii", "rmii".
> -
> -Optional properties:
> -- local-mac-address: 6 bytes, mac address
>
>  Examples:
>
> Index: net-next/Documentation/devicetree/bindings/net/cpsw.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/cpsw.txt
> +++ net-next/Documentation/devicetree/bindings/net/cpsw.txt
> @@ -28,9 +28,6 @@ Optional properties:
>  Slave Properties:
>  Required properties:
>  - phy_id               : Specifies slave phy id
> -- phy-mode             : The interface between the SoC and the PHY (a string
> -                         that of_get_phy_mode() can understand)
> -- mac-address          : Specifies slave MAC address
>
>  Optional properties:
>  - dual_emac_res_vlan   : Specifies VID to be used to segregate the ports
> Index: net-next/Documentation/devicetree/bindings/net/davicom-dm9000.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/davicom-dm9000.txt
> +++ net-next/Documentation/devicetree/bindings/net/davicom-dm9000.txt
> @@ -9,8 +9,6 @@ Required properties:
>  - interrupts : interrupt specifier specific to interrupt controller
>
>  Optional properties:
> -- local-mac-address : A bytestring of 6 bytes specifying Ethernet MAC address
> -    to use (from firmware or bootloader)
>  - davicom,no-eeprom : Configuration EEPROM is not available
>  - davicom,ext-phy : Use external PHY
>
> Index: net-next/Documentation/devicetree/bindings/net/davinci_emac.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/davinci_emac.txt
> +++ net-next/Documentation/devicetree/bindings/net/davinci_emac.txt
> @@ -19,9 +19,7 @@ Required properties:
>                           Miscellaneous Interrupt>
>
>  Optional properties:
> -- phy-handle: Contains a phandle to an Ethernet PHY.
> -              If absent, davinci_emac driver defaults to 100/FULL.
> -- local-mac-address : 6 bytes, mac address
> +- phy-handle: If absent, davinci_emac driver defaults to 100/FULL.
>
>  Example (enbw_cmc board):
>         eth0: emac@1e20000 {
> Index: net-next/Documentation/devicetree/bindings/net/ethernet.txt
> ===================================================================
> --- /dev/null
> +++ net-next/Documentation/devicetree/bindings/net/ethernet.txt
> @@ -0,0 +1,20 @@
> +The following properties are common to the Ethernet controllers:
> +
> +- local-mac-address: array of 6 bytes, specifies the MAC address that was
> +  assigned to the network device;
> +- mac-address: array of 6 bytes, specifies the MAC address that was last used by
> +  the boot program; should be used in cases where the MAC address assigned to
> +  the device by the boot program is different from the "local-mac-address"
> +  property;
> +- max-speed: number, specifies maximum speed in Mbit/s supported by the device;
> +- phy-mode: string, operation mode of the PHY interface; supported values are
> +  "mii", "gmii", "sgmii", "tbi", "rev-mii", "rmii", "rgmii", "rgmii-id",
> +  "rgmii-rxid", "rgmii-txid", "rtbi", "smii", "xgmii";

Mark this as deprecated in favor of phy-connection-type so it's use
does not spread.

> +- phy-connection-type: the same as "phy-mode" property (but described in ePAPR);
> +- phy-handle: phandle, specifies a reference to a node representing a PHY
> +  device (this property is described in ePAPR);
> +- phy: the same as "phy-handle" property (but actually ad-hoc one).

Mark this as deprecated in favor of phy-handle.

> +
> +Child nodes of the Ethernet controller are typically the individual PHY devices
> +connected via the MDIO bus (sometimes the MDIO bus controller is separate).
> +They are described in the phy.txt file in this same directory.
> Index: net-next/Documentation/devicetree/bindings/net/fsl-fec.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/fsl-fec.txt
> +++ net-next/Documentation/devicetree/bindings/net/fsl-fec.txt
> @@ -4,12 +4,8 @@ Required properties:
>  - compatible : Should be "fsl,<soc>-fec"
>  - reg : Address and length of the register set for the device
>  - interrupts : Should contain fec interrupt
> -- phy-mode : String, operation mode of the PHY interface.
> -  Supported values are: "mii", "gmii", "sgmii", "tbi", "rmii",
> -  "rgmii", "rgmii-id", "rgmii-rxid", "rgmii-txid", "rtbi", "smii".
>
>  Optional properties:
> -- local-mac-address : 6 bytes, mac address
>  - phy-reset-gpios : Should specify the gpio for phy reset
>  - phy-reset-duration : Reset duration in milliseconds.  Should present
>    only if property "phy-reset-gpios" is available.  Missing the property
> Index: net-next/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> +++ net-next/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> @@ -38,22 +38,15 @@ Properties:
>    - model : Model of the device.  Can be "TSEC", "eTSEC", or "FEC"
>    - compatible : Should be "gianfar"
>    - reg : Offset and length of the register set for the device
> -  - local-mac-address : List of bytes representing the ethernet address of
> -    this controller
>    - interrupts : For FEC devices, the first interrupt is the device's
>      interrupt.  For TSEC and eTSEC devices, the first interrupt is
>      transmit, the second is receive, and the third is error.
> -  - phy-handle : The phandle for the PHY connected to this ethernet
> -    controller.
>    - fixed-link : <a b c d e> where a is emulated phy id - choose any,
>      but unique to the all specified fixed-links, b is duplex - 0 half,
>      1 full, c is link speed - d#10/d#100/d#1000, d is pause - 0 no
>      pause, 1 pause, e is asym_pause - 0 no asym_pause, 1 asym_pause.
> -  - phy-connection-type : a string naming the controller/PHY interface type,
> -    i.e., "mii" (default), "rmii", "gmii", "rgmii", "rgmii-id", "sgmii",
> -    "tbi", or "rtbi".  This property is only really needed if the connection
> -    is of type "rgmii-id", as all other connection types are detected by
> -    hardware.
> +  - phy-connection-type : only really needed if the connection is of type
> +    "rgmii-id", as all other connection types are detected by hardware.
>    - fsl,magic-packet : If present, indicates that the hardware supports
>      waking up via magic packet.
>    - bd-stash : If present, indicates that the hardware supports stashing
> Index: net-next/Documentation/devicetree/bindings/net/lpc-eth.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/lpc-eth.txt
> +++ net-next/Documentation/devicetree/bindings/net/lpc-eth.txt
> @@ -6,10 +6,8 @@ Required properties:
>  - interrupts: Should contain ethernet controller interrupt
>
>  Optional properties:
> -- phy-mode: String, operation mode of the PHY interface.
> -  Supported values are: "mii", "rmii" (default)
> +- phy-mode: if absent, "rmii" is assumed.
>  - use-iram: Use LPC32xx internal SRAM (IRAM) for DMA buffering
> -- local-mac-address : 6 bytes, mac address
>
>  Example:
>
> Index: net-next/Documentation/devicetree/bindings/net/macb.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/macb.txt
> +++ net-next/Documentation/devicetree/bindings/net/macb.txt
> @@ -8,11 +8,6 @@ Required properties:
>    the Cadence GEM, or the generic form: "cdns,gem".
>  - reg: Address and length of the register set for the device
>  - interrupts: Should contain macb interrupt
> -- phy-mode: String, operation mode of the PHY interface.
> -  Supported values are: "mii", "rmii", "gmii", "rgmii".
> -
> -Optional properties:
> -- local-mac-address: 6 bytes, mac address
>
>  Examples:
>
> Index: net-next/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
> +++ net-next/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
> @@ -4,10 +4,6 @@ Required properties:
>  - compatible: should be "marvell,armada-370-neta".
>  - reg: address and length of the register set for the device.
>  - interrupts: interrupt for the device
> -- phy: A phandle to a phy node defining the PHY address (as the reg
> -  property, a single integer).
> -- phy-mode: The interface between the SoC and the PHY (a string that
> -  of_get_phy_mode() can understand)
>  - clocks: a pointer to the reference clock for this device.
>
>  Example:
> Index: net-next/Documentation/devicetree/bindings/net/marvell-orion-net.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/marvell-orion-net.txt
> +++ net-next/Documentation/devicetree/bindings/net/marvell-orion-net.txt
> @@ -37,7 +37,6 @@ Required port properties:
>        "marvell,kirkwood-eth-port".
>   - reg: port number relative to ethernet controller, shall be 0, 1, or 2.
>   - interrupts: port interrupt.
> - - local-mac-address: 6 bytes MAC address.
>
>  Optional port properties:
>   - marvell,tx-queue-size: size of the transmit ring buffer.
> @@ -49,7 +48,7 @@ Optional port properties:
>
>  and
>
> - - phy-handle: phandle reference to ethernet PHY.
> + - phy-handle: if a PHY is connected.
>
>  or
>
> Index: net-next/Documentation/devicetree/bindings/net/micrel-ks8851.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/micrel-ks8851.txt
> +++ net-next/Documentation/devicetree/bindings/net/micrel-ks8851.txt
> @@ -4,6 +4,3 @@ Required properties:
>  - compatible = "micrel,ks8851-ml" of parallel interface
>  - reg : 2 physical address and size of registers for data and command
>  - interrupts : interrupt connection
> -
> -Optional properties:
> -- local-mac-address : Ethernet mac address to use
> Index: net-next/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
> +++ net-next/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
> @@ -7,7 +7,6 @@ Required properties:
>
>  Optional properties:
>  - phy-device : phandle to Ethernet phy
> -- local-mac-address : Ethernet mac address to use
>  - reg-io-width : Mask of sizes (in bytes) of the IO accesses that
>    are supported on the device.  Valid value for SMSC LAN91c111 are
>    1, 2 or 4.  If it's omitted or invalid, the size would be 2 meaning
> Index: net-next/Documentation/devicetree/bindings/net/smsc911x.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/smsc911x.txt
> +++ net-next/Documentation/devicetree/bindings/net/smsc911x.txt
> @@ -6,9 +6,6 @@ Required properties:
>  - interrupts : Should contain SMSC LAN interrupt line
>  - interrupt-parent : Should be the phandle for the interrupt controller
>    that services interrupts for this device
> -- phy-mode : String, operation mode of the PHY interface.
> -  Supported values are: "mii", "gmii", "sgmii", "tbi", "rmii",
> -  "rgmii", "rgmii-id", "rgmii-rxid", "rgmii-txid", "rtbi", "smii".
>
>  Optional properties:
>  - reg-shift : Specify the quantity to shift the register offsets by
> @@ -23,7 +20,6 @@ Optional properties:
>    external PHY
>  - smsc,save-mac-address : Indicates that mac address needs to be saved
>    before resetting the controller
> -- local-mac-address : 6 bytes, mac address
>
>  Examples:
>
> Index: net-next/Documentation/devicetree/bindings/net/stmmac.txt
> ===================================================================
> --- net-next.orig/Documentation/devicetree/bindings/net/stmmac.txt
> +++ net-next/Documentation/devicetree/bindings/net/stmmac.txt
> @@ -10,8 +10,6 @@ Required properties:
>  - interrupt-names: Should contain the interrupt names "macirq"
>    "eth_wake_irq" if this interrupt is supported in the "interrupts"
>    property
> -- phy-mode: String, operation mode of the PHY interface.
> -  Supported values are: "mii", "rmii", "gmii", "rgmii".
>  - snps,phy-addr                phy address to connect to.
>  - snps,reset-gpio      gpio number for phy reset.
>  - snps,reset-active-low boolean flag to indicate if phy reset is active low.
> @@ -28,9 +26,6 @@ Required properties:
>                                 mode for both tx and rx. This flag is
>                                 ignored if force_thresh_dma_mode is set.
>
> -Optional properties:
> -- mac-address: 6 bytes, mac address
> -
>  Examples:
>
>         gmac0: ethernet@e0800000 {

^ permalink raw reply

* Re: [PATCH 2/4 ethtool] ethtool: Support for configurable RSS hash key.
From: Ben Hutchings @ 2014-01-20 13:43 UTC (permalink / raw)
  To: Venkata Duvvuru; +Cc: netdev@vger.kernel.org
In-Reply-To: <BF3270C86E8B1349A26C34E4EC1C44CB2C84C38B@CMEXMB1.ad.emulex.com>

[-- Attachment #1: Type: text/plain, Size: 1436 bytes --]

On Mon, 2014-01-20 at 13:28 +0000, Venkata Duvvuru wrote:
> Ben, Please ignore my previous reply. My reply options were screwed up in that.
> 
> > -----Original Message-----
> > From: Ben Hutchings [mailto:ben@decadent.org.uk]
> > Sent: Monday, January 20, 2014 12:06 AM
> > To: Venkata Duvvuru
> > Cc: netdev@vger.kernel.org
> > Subject: Re: [PATCH 2/4 ethtool] ethtool: Support for configurable RSS hash
> > key.
> > 
> > On Fri, 2014-01-17 at 13:02 +0000, Venkata Duvvuru wrote:
> > > This ethtool patch will primarily implement the parser for the options
> > provided by the user for set and get hashkey before invoking the ioctl.
> > > This patch also has Ethtool man page changes which describes the Usage of
> > set and get hashkey options.
> > 
> > I'd prefer to have this combined with the -x/-X options (and add new long
> > options to reflect that they cover the key as well).
> 
> if we add hashkey options to the existing -x/-X (--show-rxfh-indir/ --set-rxfh-indir), I think it won't be appropriate going by the command name.
> We could change the command name to something like --show-rssconfig /--rss-config but I'm afraid would that be backward compatible?
[...]

That's why I said 'add new long options'.  The ethtool argument parser
allows arbitrarily many aliases for each sub-command.

Ben.

-- 
Ben Hutchings
One of the nice things about standards is that there are so many of them.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* RE: [PATCH 2/4 ethtool] ethtool: Support for configurable RSS hash key.
From: Venkata Duvvuru @ 2014-01-20 13:28 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev@vger.kernel.org
In-Reply-To: <1390156549.16433.119.camel@deadeye.wl.decadent.org.uk>

Ben, Please ignore my previous reply. My reply options were screwed up in that.

> -----Original Message-----
> From: Ben Hutchings [mailto:ben@decadent.org.uk]
> Sent: Monday, January 20, 2014 12:06 AM
> To: Venkata Duvvuru
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH 2/4 ethtool] ethtool: Support for configurable RSS hash
> key.
> 
> On Fri, 2014-01-17 at 13:02 +0000, Venkata Duvvuru wrote:
> > This ethtool patch will primarily implement the parser for the options
> provided by the user for set and get hashkey before invoking the ioctl.
> > This patch also has Ethtool man page changes which describes the Usage of
> set and get hashkey options.
> 
> I'd prefer to have this combined with the -x/-X options (and add new long
> options to reflect that they cover the key as well).

if we add hashkey options to the existing -x/-X (--show-rxfh-indir/ --set-rxfh-indir), I think it won't be appropriate going by the command name.
We could change the command name to something like --show-rssconfig /--rss-config but I'm afraid would that be backward compatible?

> 
> [...]
> > diff --git a/ethtool.c b/ethtool.c
> > index b06dfa3..4b05b0c 100644
> > --- a/ethtool.c
> > +++ b/ethtool.c
> > @@ -471,6 +471,59 @@ static int rxflow_str_to_type(const char *str)
> >  	return flow_type;
> >  }
> >
> > +static inline int is_hkey_char_valid(const char rss_hkey_string) {
> 
> A char is not a string.
> 
> > +	/* Are there any invalid characters in the string */
> > +	return ((rss_hkey_string >= '0' && rss_hkey_string <= '9') ||
> > +	       (rss_hkey_string >= 'a' && rss_hkey_string <= 'f') ||
> > +	       (rss_hkey_string >= 'A' && rss_hkey_string <= 'F')); }
> 
> Braces are in the wrong places.  And the whole function is redundant with
> isxdigit() anyway.
> 
> > +static int convert_string_to_hashkey(struct ethtool_rss_hkey *rss_hkey,
> > +				      const char *rss_hkey_string) {
> > +	int i = 0;
> > +	int hex_byte;
> > +
> > +	do {
> > +		if (i > (RSS_HASH_KEY_LEN - 1)) {
> 
> Comparing with the wrong limit.
> 
> [...]
> > +static int get_hashkey(struct cmd_context *ctx) {
> 
> Brace in the wrong place.
> 
> [...]
> > +	for (i = 0; i < RSS_HASH_KEY_LEN; i++) {
> > +		if (i == (RSS_HASH_KEY_LEN - 1))
> 
> Wrong length.
> 
> > +			printf("%02x\n", rss_hkey->data[i]);
> > +		else
> > +			printf("%02x:", rss_hkey->data[i]);
> > +	}
> > +
> > +done:
> > +	free(rss_hkey);
> > +	return rc;
> > +}
> [...]
> 
> --
> Ben Hutchings
> friends: People who know you well, but like you anyway.

^ permalink raw reply

* Re: [PATCH net-next 3/4] ethtool: Support for configurable RSS hash key.
From: Ben Hutchings @ 2014-01-20 13:20 UTC (permalink / raw)
  To: Venkata Duvvuru; +Cc: netdev@vger.kernel.org
In-Reply-To: <BF3270C86E8B1349A26C34E4EC1C44CB2C84C327@CMEXMB1.ad.emulex.com>

[-- Attachment #1: Type: text/plain, Size: 1644 bytes --]

On Mon, 2014-01-20 at 12:23 +0000, Venkata Duvvuru wrote:
[...]
> > > +/* RSS Hash key */
> > > +struct ethtool_rss_hkey {
> > > +	__u32   cmd;            /* ETHTOOL_SET/GET_RSS_HKEY */
> > > +	__u8    data[RSS_HASH_KEY_LEN];
> > > +	__u32	data_len;
> > > +};
> > [...]
> > 
> > How about putting data after the data_len and giving it a length of 0, so this is
> > extensible to an arbitrary length key?
> > 
> > If we're extending the RSS configuration interface, there are a few other
> > things that might be worth doing at the same time:
> > 
> > - Single commands to get/set both the key and the indirection table at the
> > same time
> > - Add a field to distinguish multiple RSS contexts (some hardware can use RSS
> > contexts together with filters, though RX NFC does not support that yet)
> Are you referring to the filter-id that is created at the time of config-nfc? Pls clarify.

No, what I mean is:

1. An RX flow steering filter can specify use of RSS, in which case the
value looked up in the indirection is added to the queue number
specified in the filter.  This is not yet controllable through RX NFC
though there is room for extension there.

2. Multi-function controllers need multiple RSS contexts (key +
indirection table) to support independent use of RSS on each function.
But it may also be possible to allocate multiple contexts to a single
function.  This could be useful in conjunction with 1.  But there would
need to be a way to allocate and configure extra contexts first.

Ben.

-- 
Ben Hutchings
One of the nice things about standards is that there are so many of them.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* [ANNOUNCE]: Release of nftables 0.099
From: Patrick McHardy @ 2014-01-20 13:11 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netfilter, announce, netdev, coreteam

[-- Attachment #1: Type: text/plain, Size: 6730 bytes --]

The Netfilter project presents:

        nftables 0.099

With the release of Linux 3.13 and almost 5 years after the last nftables
release, the time has come to finally get this code out to our users.

Since this is the first regular release intended for users, I'm including
a bit of extra information.

Overview
========

nf_tables is the new firewalling infrastructure in the Linux kernel,
intended to replace ip_tables, ip6_tables, arp_tables and ebtables
in the long term. nftables is the corresponsing userspace frontend,
replacing their respective userspace utilities.

nftables features native support for sets and dictionaries of arbitrary
types, support for many different protocols, meta data types, connection
tracking, NAT, logging, atomic incremental and full ruleset updates,
a netlink API with notification support, a format grammar, a compatiblity
layer for iptables/ip6tables and more.

While the internal architecture is fundamentally different from
ip_tables etc, many of the well proven concepts like tables and chains
have been retained. The syntax differs significantly from iptables
and friends, most notable, the options style parsing has been replaced
by a formal grammar and a set of keywords. For anyone familiar with
BPF the syntax should be quite easy to learn.

Architecture
============

As mentioned previously, the architecture differs significantly from
the existing packet filtering mechanisms. While ip_tables etc. include
special modules for each and every protocol they support, for each meta
data type etc and each each of these modules implement a set of usually
similar operations on this data, nftables contains a small evaluation
engine (sometimes called a virtual machine) with extensions to support
getting packet payload data, meta data, ... and performing operations
with this data, altering flow control and so on.

The userspace frontend performs parsing of the ruleset and compiles it
into instructions for the virtual machine. F.i. while an iptables tcp
dport match would instruct the xt_tcpudp module to compare the TCP port
number, nftables userspace emits instructions to load 2 bytes at the
position network header + 2 into a so called register and a second
instruction to compare that register to a given value. IOW, the kernel
doesn't require knowledge of particular protocols, support for them
can in most cases be added completely in the nftables frontend.

Data gathered from the packet (or elsewhere) can not only be used for
matches (called relational expressions in nftables), but for dynamically
parameterizing other extensions. F.i. the following expression would
select the DNAT destination address based on the source address of the
packet:

... dnat ip saddr map {
	192.168.0.0/24 : 10.0.0.1,
	192.168.1.0/24 : 10.0.0.2,
	* : 10.0.0.3 
    }

while the following expression would store the input interface index
in the upper 8 bits of the packet mark to be used in the POSTROUTING 
hook where it is not available anymore:

... mark set iif

Similar to ip_tables, rules are organized in address family specific
tables and chains. The kernel doesn't include any pre-defined tables
anymore, they can be created at will from userspace. Special features
of tables like the NAT table and mangle table are available as so
called "chain types", which instruct nftables to perform operations
like setting up NAT mappings or rerouting packets after remarking.
A set of predefined tables corresponding to the tables existing in
ip_tables etc is contained in nftables.

Dictionaries, as shown in the previous dnat example, can not only
be used for parameterizing different extensions, but also to alter
control flow, allowing to build match trees with efficient branching:

... iif vmap {
	eth0 : jump from_lan,
	eth1 : jump from_dmz,
	eth2 : jump from_wan,
	*    : drop,
    }

Status
======

There are still a few rough edges, but we believe the code is ready
to be used for testing and personal usage. It is not ready for
production use, but we should be getting there quickly. Userspace
may occasionally produce an unexpected error for uncommon cases,
the kernel side is expected to be pretty much solid. Any bugs
reported will be fixed quickly.

While trying to avoid it when possible, until the 0.1 release we may
still change the grammar or other things in incompatible ways. This
should result in only small impact though, most of the grammar is
expected to stay as it is.

Naming
======

nftables releases have names. The last release v0.01-alpha1 was named
schäublefilter, honoring the minister of the interieur of Germany,
Wolfgang Schäuble, and his attempts to introduce legislation to allow
the state to crack computers.

Owing to the fact that his term is over since over four years and that
in retrospective his attempts really seem only alpha, the new release
is named keith-alexander-filter, in celebration of not being backdoored
by the NSA so far.

Resources
=========

The nftables code can be obtained from:

* http://netfilter.org/projects/nftables/downloads.html
* ftp://ftp.netfilter.org/pub/nftables
* git://git.netfilter.org/nftables

To build the code, you libnftnl and libmnl are required:

* http://netfilter.org/projects/libnftnl/index.html
* http://netfilter.org/projects/libmnl/index.html

The iptables compatibility layer is available at:

* git://git.netfilter.org/iptables-nftables

The code should appear on the website and FTP shortly.

Further reading
===============

While documentation is still scarce at the moment, the next release
will include a full command reference and further documentation.

The project page on netfilter.org contains some further pointers:

  http://netfilter.org/projects/nftables/index.html

Eric Leblond has written a short howto:

  https://home.regit.org/netfilter-en/nftables-quick-howto/

and has given a presentation on nftables:

  https://home.regit.org/wp-content/uploads/2013/09/2013_kernel_recipes_nftables.pdf

My first presentation on nftables during NFWS 2008 in Paris:

  http://people.netfilter.org/kaber/nfws2008/nftables.odp

And there's a Wiki-page with some further information on the basic
building blocks, the syntax ...:

  http://people.netfilter.org/wiki-nftables/index.php/Main_Page

Thanks
======

A lot of people have started contributing to nftables during the past
1.5 years and helped to get both the kernel and userspace components in
shape for merging and release. Pablo revived the project after I stopped
working on it for quite a while, Eric Leblond, Tomasz Burstyka, Arturo 
Borrero, Alvaro Neira and Giuseppe Longo all made important contributions
to nftables and the surrounding infrastructure.


On behalf of the Netfilter Core Team,
Happy bytecode execution :)



[-- Attachment #2: changes.txt --]
[-- Type: text/plain, Size: 11351 bytes --]

Ana Rey (1):
      nft: scanner: fixed problem with ipv6 address

Arturo Borrero (2):
      nftables: delete debian/ directory
      mnl: fix inconsistent name usage in nft_*_nlmsg_build_hdr calls

Arturo Borrero Gonzalez (2):
      src: fix return code
      files: replace interpreter during installation

Eric Leblond (23):
      rule: add flag to display rule handle as comment
      doc: fix inversion of operator and object.
      rule: list elements in set in any case
      cli: add quit command
      cli: reset terminal when CTRL+d is pressed
      rule: display hook info
      src: fix counter restoration
      src: Add support for insertion inside rule list
      src: Add icmpv6 support
      nat: add mandatory family attribute
      Suppress non working examples.
      Update chain creation format.
      display family in table listing
      netlink: fix IPv6 prefix computation
      src: Add support for IPv6 NAT
      mnl: fix typo in comment
      netlink: suppress useless variable
      netlink: only flush asked table/chain
      netlink: fix nft flush operation
      expression: fix indent
      jump: fix logic in netlink linearize
      verdict: fix delinearize in case of jump
      netlink: only display wanted chain in listing

Florian Westphal (3):
      log: s/threshold/queue-threshold/
      meta: iif/oifname should be host byte order
      statement: avoid huge rodata array

Kevin Fenzi (1):
      nftables: drop hard coded install using root user owner and group

Pablo Neira (1):
      expression: fix output of verdict maps

Pablo Neira Ayuso (63):
      tests: fix test, commands now comes before the family and table name
      rule: allow to list of existing tables
      rule: fix nft list chain
      netlink: return error if chain not found
      main: fix error checking in nft_parse
      tests: family-ipv4: update test to use current syntax
      tests: expr-ct: update examples to use the current syntax
      src: fix crash if nft -f wrong_file is passed
      tests: family-ipv6: update to use the current syntax
      payload: accept ethertype in hexadecimal
      tests: family-bridge: update to use the current syntax
      tests: feat-adjancent-load-merging: remove ip protocol from rule
      meta: accept uid/gid in numerical
      tests: expr-meta: update examples to use the current syntax
      tests: obj-chain: update examples to use the current syntax
      tests: dictionary: update examples to use the current syntax
      tests: set: update examples to use the current syntax
      tests: obj-table: update examples to use the current syntax
      cli: complete basic functionality of the interactive mode
      datatype: concat expression only releases dynamically allocated datatype
      evaluate: fix range and comparison evaluation
      src: get it sync with current include/linux/netfilter/nf_tables.h
      rule: family field in struct handle is unsigned
      meta: use if_nametoindex and if_indextoname
      meta: replace rtnl_tc_handle2str and rtnl_tc_str2handle
      src: use libnftables
      netlink: fix network address prefix
      datatype: fix table listing if name resolution is not available
      mnl: use nft_*_list_add_tail
      datatype: fix crash if wrong integer type is passed
      log: convert group and qthreshold to use u16
      datatype: fix wrong endianess in numeric ports
      src: allow to specify the base chain type
      meta: fix output display of meta length
      datatype: fix mark parsing if string is used
      payload: fix endianess of ARP operation code
      netlink: use uint32_t instead of size_t for attribute length
      src: add rule batching support
      netlink_linearize: finish reject support
      payload: fix ethernet type protocol matching
      parser: fix warning on deprecated directive in bison
      build: relax compilation not to break on warning
      datatype: fix missing nul-terminated string in string_type_print
      netlink: improve rule deletion per chain
      meta: fix endianness in UID/GID
      meta: relax restriction on UID/GID parsing
      src: fix rule flushing atomically
      mnl: don't set NLM_F_ACK flag in mnl_nft_rule_batch_[add|del]
      mnl: print netlink message if if --debug=netlink in mnl_talk()
      netlink: fix dictionary feature with data mappings
      netlink: fix wrong type in attributes
      scanner: rename address selector from 'eth' to 'ether'
      scanner: add aliases to symbols for easier interaction with most shells
      segtree: add new segtree debugging option
      netlink: use stdout for debugging
      parser: fix parsing of ethernet protocol types
      payload: fix crash when wrong ethernet protocol type is used
      payload: fix inconsistency in ethertype output
      src: add new --debug=mnl option to enable libmnl debugging
      src: use ':' instead of '=>' in dictionaries
      datatype: add time type parser and adapt output
      mnl: fix chain type autoloading
      use new libnftnl library name

Patrick McHardy (96):
      build: work around docbook2x-man inability to specify output file
      templates: add IPv6 raw table template
      netlink: wrap libnl object dumping in #ifdef DEBUG
      lexer: fix some whitespace errors
      Fix use of reserved names in header sandwich
      kill obsolete TODO item
      Allow newlines in sets and maps
      Allow newlines in regular maps
      build: remove double subdir in build output
      build: fix installation when docs are not built
      Add installation instructions
      parser: fix common_block usage in chain and table blocks
      parser: consistently use $@ for location of entire grouping
      Add support for scoping and symbol binding
      Add support for user-defined symbolic constants
      Add more notes to INSTALL
      expr: add support for cloning expressions
      Fix multiple references to the same user defined symbolic expression
      Release scopes during cleanup
      Fix some memory leaks
      netlink_linearize: remove two debugging printfs
      ct: resync netlink header and properly add ct l3protocol support
      netlink: add helper function for socket callback modification
      netlink: consistent naming fixes
      netlink: use libnl OBJ_CAST macro
      netlink: move data related functions to netlink.c
      datatype: maintain table of all datatypes and add registration/lookup function
      datatype: add/move size and byte order information into data types
      expressions: kill seperate sym_type datatype for symbols
      add support for new set API and standalone sets
      debug: allow runtime control of debugging output
      netlink: fix bitmask element reconstruction
      netlink: dump all chains when listing rules
      netlink: fix binop RHS byteorder
      payload: add DCCP packet type definitions
      payload: fix two datatypes
      parser: support bison >= 2.4
      build: add 'archive' target
      build: fix endless recursion with SUBDIRS=...
      debug: properly parse debug levels
      netlink: fix byteorder of RHS of relational meta expression
      utils: fix invalid assertion in xrealloc()
      netlink: fix creation of base chains with hooknum and priority 0
      payload: fix crash with uncombinable protocols
      netlink: fix nat stmt linearization/parsing
      nat: validate protocol context when performing transport protocol mappings
      netlink: add debugging for missing objects
      don't use internal_location for files specified on command line
      datatype: reject incompletely parsed integers in integer_type_parse()
      add bridge filter table definitions
      parser: fix parsing protocol names for protocols which are also keywords
      evaluate: reintroduce type chekcs for relational expressions
      segtree: fix segtree to properly support mappings
      tests: add verdict map test
      seqtree: update mapping data when keeping the base
      payload: kill redundant payload protocol expressions during netlink postprocessing
      expression: fix constant expression splicing
      rules: change rule handle to 64 bit
      netlink: fix endless loop on 64 bit when parsing binops
      sets: fix sets using intervals
      rule: reenable adjacent payload merging
      cmd: fix handle use after free for implicit set declarations
      tests: add loop detection tests
      netlink: fix query requests
      chains: add chain rename support
      rule: add rule insertion (prepend) support
      chains: add rename testcases
      netlink_delinearize: don't reset source register after read
      expr: kill EXPR_F_PRIMARY
      datatype: parse/print in all basetypes subsequently
      types: add ethernet address type
      expr: fix concat expression type propagation
      cmd/netlink: make sure we always have a location in netlink operations
      mark: fix numeric mark value parsing
      expr: catch missing and excess elements in concatenations
      parser: include leading '.' in concat subexpression location
      parser: fix size of internet protocol expressions matching keywords
      nftables: fix supression of "permission denied" errors
      nftables: shorten "could not process rule in batch" message
      erec: fix error markup for errors starting at column 0
      datatype: revert "fix crash if wrong integer type is passed"
      meta: fix crash when parsing unresolvable mark values
      parser: replace "vmap" keyword by "map"
      Revert "parser: replace "vmap" keyword by "map""
      expr: remove secmark from ct and meta expression
      meta: don't require "meta" keyword for a subset of meta expressions
      meta: fix mismerge
      payload: fix name of eth_proto
      expr: relational: don't surpress '==' for LHS binops in output
      parser: fix compilation breakage
      segtree: only use prefix expressions for ranges for selected datatypes
      segtree: fix decomposition of unclosed intervals
      build: fix recursive parser.h inclusion
      set: make set flags output parsable
      set: make set initializer parsable
      nftables: version 0.099

Phil Oester (8):
      datatype: validate port number in inet_service_type_parse
      datatype: allow protocols by number in inet_protocol_type_parse
      nftables: add additional --numeric level
      src: operational limit match
      parser: segfault in top scope define
      examples: adjust new chain type syntax in sets_and_maps file
      rule: missing set cleanup in do_command_list
      parser: add 'delete map' syntax

Romain Bignon (1):
      help: fix of the -I option in help display

Tomasz Bursztyka (11):
      netlink: Use the right datatype for verdict
      evaluate: Remove useless variable in expr_evaluate_bitwise()
      erec: Handle returned value properly in erec_print
      expression: Differentiate expr among anonymous structures in struct expr
      src: Fix base chain printing
      INSTALL: Update dependency list and repository URLs
      src: Wrap netfilter hooks around human readable strings
      src: Add priority keyword on base chain description
      tests: Update bate chain creation according to latest syntax changes
      src: Better error reporting if chain type is invalid
      include: cache a copy of nfnetlink.h

root (1):
      debug: include verbose message in all BUG statements


^ permalink raw reply

* RE: [PATCH net-next v2] xen-netback: Rework rx_work_todo
From: Paul Durrant @ 2014-01-20 13:02 UTC (permalink / raw)
  To: Zoltan Kiss, Ian Campbell, Wei Liu,
	xen-devel@lists.xenproject.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jonathan Davies
In-Reply-To: <52DD1540.7000503@citrix.com>

> -----Original Message-----
> From: Zoltan Kiss
> Sent: 20 January 2014 12:23
> To: Ian Campbell; Wei Liu; xen-devel@lists.xenproject.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jonathan Davies
> Cc: Paul Durrant
> Subject: Re: [PATCH net-next v2] xen-netback: Rework rx_work_todo
> 
> Any reviews on this one? It fixes an important lockup situation, so
> either this or some other fix should go in soon.
> 
> On 15/01/14 17:11, Zoltan Kiss wrote:
> > The recent patch to fix receive side flow control (11b57f) solved the
> spinning
> > thread problem, however caused an another one. The receive side can
> stall, if:
> > - [THREAD] xenvif_rx_action sets rx_queue_stopped to true
> > - [INTERRUPT] interrupt happens, and sets rx_event to true
> > - [THREAD] then xenvif_kthread sets rx_event to false
> > - [THREAD] rx_work_todo doesn't return true anymore
> >
> > Also, if interrupt sent but there is still no room in the ring, it take quite a
> > long time until xenvif_rx_action realize it. This patch ditch that two variable,
> > and rework rx_work_todo. If the thread finds it can't fit more skb's into the
> > ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's
> > kept as 0. Then rx_work_todo will check if:
> > - there is something to send to the ring (like before)
> > - there is space for the topmost packet in the queue
> >
> > I think that's more natural and optimal thing to test than two bool which are
> > set somewhere else.
> >
> > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> > ---
> >   drivers/net/xen-netback/common.h    |    6 +-----
> >   drivers/net/xen-netback/interface.c |    1 -
> >   drivers/net/xen-netback/netback.c   |   16 ++++++----------
> >   3 files changed, 7 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-
> netback/common.h
> > index 4c76bcb..ae413a2 100644
> > --- a/drivers/net/xen-netback/common.h
> > +++ b/drivers/net/xen-netback/common.h
> > @@ -143,11 +143,7 @@ struct xenvif {
> >   	char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
> >   	struct xen_netif_rx_back_ring rx;
> >   	struct sk_buff_head rx_queue;
> > -	bool rx_queue_stopped;
> > -	/* Set when the RX interrupt is triggered by the frontend.
> > -	 * The worker thread may need to wake the queue.
> > -	 */
> > -	bool rx_event;
> > +	RING_IDX rx_last_skb_slots;
> >
> >   	/* This array is allocated seperately as it is large */
> >   	struct gnttab_copy *grant_copy_op;
> > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
> netback/interface.c
> > index b9de31e..7669d49 100644
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -100,7 +100,6 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void
> *dev_id)
> >   {
> >   	struct xenvif *vif = dev_id;
> >
> > -	vif->rx_event = true;
> >   	xenvif_kick_thread(vif);
> >
> >   	return IRQ_HANDLED;
> > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
> netback/netback.c
> > index 2738563..bb241d0 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -477,7 +477,6 @@ static void xenvif_rx_action(struct xenvif *vif)
> >   	unsigned long offset;
> >   	struct skb_cb_overlay *sco;
> >   	bool need_to_notify = false;
> > -	bool ring_full = false;
> >
> >   	struct netrx_pending_operations npo = {
> >   		.copy  = vif->grant_copy_op,
> > @@ -487,7 +486,7 @@ static void xenvif_rx_action(struct xenvif *vif)
> >   	skb_queue_head_init(&rxq);
> >
> >   	while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
> > -		int max_slots_needed;
> > +		RING_IDX max_slots_needed;
> >   		int i;
> >
> >   		/* We need a cheap worse case estimate for the number of
> > @@ -510,9 +509,10 @@ static void xenvif_rx_action(struct xenvif *vif)
> >   		if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) {
> >   			skb_queue_head(&vif->rx_queue, skb);
> >   			need_to_notify = true;
> > -			ring_full = true;
> > +			vif->rx_last_skb_slots = max_slots_needed;
> >   			break;
> > -		}
> > +		} else
> > +			vif->rx_last_skb_slots = 0;
> >
> >   		sco = (struct skb_cb_overlay *)skb->cb;
> >   		sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> > @@ -523,8 +523,6 @@ static void xenvif_rx_action(struct xenvif *vif)
> >
> >   	BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta));
> >
> > -	vif->rx_queue_stopped = !npo.copy_prod && ring_full;
> > -
> >   	if (!npo.copy_prod)
> >   		goto done;
> >
> > @@ -1727,8 +1725,8 @@ static struct xen_netif_rx_response
> *make_rx_response(struct xenvif *vif,
> >
> >   static inline int rx_work_todo(struct xenvif *vif)
> >   {
> > -	return (!skb_queue_empty(&vif->rx_queue) && !vif-
> >rx_queue_stopped) ||
> > -		vif->rx_event;
> > +	return !skb_queue_empty(&vif->rx_queue) &&
> > +	       xenvif_rx_ring_slots_available(vif, vif->rx_last_skb_slots);
> >   }
> >
> >   static inline int tx_work_todo(struct xenvif *vif)
> > @@ -1814,8 +1812,6 @@ int xenvif_kthread(void *data)
> >   		if (!skb_queue_empty(&vif->rx_queue))
> >   			xenvif_rx_action(vif);
> >
> > -		vif->rx_event = false;
> > -

The minimal patch is to simply move this line up above the previous if clause, but I'm happy with your patch as it stands so

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> >   		if (skb_queue_empty(&vif->rx_queue) &&
> >   		    netif_queue_stopped(vif->dev))
> >   			xenvif_start_queue(vif);
> >

^ permalink raw reply

* Re: [Patch net-next 3/6] net_sched: act: hide struct tcf_common from API
From: Jamal Hadi Salim @ 2014-01-20 13:01 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: David S. Miller
In-Reply-To: <1389987427-14085-4-git-send-email-xiyou.wangcong@gmail.com>

On 01/17/14 14:37, Cong Wang wrote:
> Now we can totally hide it from modules. tcf_hash_*() API's
> will operate on struct tc_action, modules don't need to care about
> the details.
>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Had to stare at this a bit longer - I am afraid
this and rest look a little suspect.
Can you run some tests for me after your patches?
I could do it in about 1 day if you dont have time.

---
#add a couple of tests
sudo tc actions add action drop index 10
sudo tc actions add action drop index 12
# now list them - should see both.
sudo tc actions ls action gact
#now flush them
sudo tc actions flush action gact
# now list them - should see them gone
sudo tc actions ls action gact
---

And for the last patch, in particular
---
#add two actions by index
sudo tc actions add action drop index 10
sudo tc actions add action ok index 12
# we need an ingress qdisc to attach filter to
sudo tc qdisc del dev lo parent ffff:
sudo tc qdisc add dev lo ingress
#use existing action index 10
sudo tc filter add dev lo parent ffff: protocol ip prio 8 \
u32 match ip dst 127.0.0.8/32 flowid 1:10 action gact index 10
#double bind
sudo tc filter add dev lo parent ffff: protocol ip prio 7 \
u32 match ip src 127.0.0.10/32 flowid 1:11 action gact index 10
# now lets see the filters..
sudo tc filter ls dev lo parent ffff: protocol ip
#display the actions and pay attention to the bind count
sudo tc actions ls action gact
#try to readd an existing action
sudo tc actions add action ok index 12
#it should be rejected - now list it and make sure refcnt doesnt go up
sudo tc actions ls action gact
#delete action index 12 (which is not bound)
sudo tc actions del action gact index 12
#list and make sure index 12 is gone
sudo tc actions ls action gact
#delete action index 10 (which is bound)
sudo tc actions del action gact index 10
#display to see it is still there ..
sudo tc actions ls action gact
#Repeat above two steps several times and make sure action 10 stays
# action should not be deleted...
#
# delete qdisc - which should delete all filters but not
# action that were not created by filters
sudo tc qdisc del dev lo parent ffff:
#ok now that filter is gone, lets see the actions ..
#pay attention to binds and references
sudo tc actions ls action gact
#
#delete action index 10 (which is no longer bound)
sudo tc actions del action gact index 10
#display to see it is gone
sudo tc actions ls action gact


cheers,
jamal

^ permalink raw reply

* Re: [Patch net-next 3/6] net_sched: act: hide struct tcf_common from API
From: Jamal Hadi Salim @ 2014-01-20 12:44 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: David S. Miller
In-Reply-To: <1389987427-14085-4-git-send-email-xiyou.wangcong@gmail.com>

On 01/17/14 14:37, Cong Wang wrote:
> Now we can totally hide it from modules. tcf_hash_*() API's
> will operate on struct tc_action, modules don't need to care about
> the details.
>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Looks good.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

^ permalink raw reply

* Re: [PATCH v5] can: add Renesas R-Car CAN driver
From: Marc Kleine-Budde @ 2014-01-20 12:35 UTC (permalink / raw)
  To: David Laight, 'Ben Dooks'
  Cc: Geert Uytterhoeven, Sergei Shtylyov, netdev@vger.kernel.org,
	wg@grandegger.com, linux-can@vger.kernel.org, Linux-sh list,
	vksavl@gmail.com
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D460082@AcuExch.aculab.com>

[-- Attachment #1: Type: text/plain, Size: 2265 bytes --]

On 01/20/2014 01:13 PM, David Laight wrote:
> From: 
>> On 20/01/14 11:58, Marc Kleine-Budde wrote:
>>> On 01/20/2014 12:52 PM, Geert Uytterhoeven wrote:
>>>> On Mon, Jan 20, 2014 at 12:47 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>>>>> On 01/20/2014 12:43 PM, Geert Uytterhoeven wrote:
>>>>>> On Thu, Dec 26, 2013 at 10:37 PM, Sergei Shtylyov
>>>>>> <sergei.shtylyov@cogentembedded.com> wrote:
>>>>>>> Changes in version 3:
>>>>>>
>>>>>>> - added '__packed' to 'struct rcar_can_mbox_regs' and 'struct rcar_can_regs';
>>>>>>> - removed unneeded type cast in the probe() method.
>>>>>>
>>>>>>> +/* Mailbox registers structure */
>>>>>>> +struct rcar_can_mbox_regs {
>>>>>>> +       u32 id;         /* IDE and RTR bits, SID and EID */
>>>>>>> +       u8 stub;        /* Not used */
>>>>>>> +       u8 dlc;         /* Data Length Code - bits [0..3] */
>>>>>>> +       u8 data[8];     /* Data Bytes */
>>>>>>> +       u8 tsh;         /* Time Stamp Higher Byte */
>>>>>>> +       u8 tsl;         /* Time Stamp Lower Byte */
>>>>>>> +} __packed;
>>>>>>
>>>>>> Sorry, I missed the earlier discussion, but why the _packed?
>>>>>> One u32 and 12 bytes makes a nice multiple of 4.
>>>>>
>>>>> Better safe than sorry, it's the layout of the registers in hardware,
>>>>> don't let the compiler optimize here.
> 
> Why not just add a compile time check against the size of the
> structure - that will ensure that no padding is accidentally added.

And what do when the check fails? Add an the __packed (again)? :D

>>>> Actually __packed makes it less safe, as the compiler now assumes
>>>> the u32 id member is unaligned, and thus may turn 32-bit accesses into 4
>>>> byte accesses.
>>>>
>>>> Fortunately it won't happen in this case as the code uses writel/readl to
>>>> acces the id member.
> 
> Which means that it will be aligned (and must be aligned).
> So the packed is completely useless and pointless.

Then we'll remove the packed.

Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 0/2] sctp: some small clean ups
From: Wang Weidong @ 2014-01-20 12:32 UTC (permalink / raw)
  To: Neil Horman, Daniel Borkmann; +Cc: davem, vyasevich, netdev, linux-sctp
In-Reply-To: <20140120122055.GA22690@hmsreliant.think-freely.org>

On 2014/1/20 20:20, Neil Horman wrote:
> On Mon, Jan 20, 2014 at 12:37:06PM +0100, Daniel Borkmann wrote:
>> On 01/20/2014 12:27 PM, Wang Weidong wrote:
>>> We have the macros in sctp.h, so use them for coding accordance
>>> in sctp.
>>
>> Thanks for doing this Wang.
>>
>> I am actually wondering why we have these macro locking wrappers
>> and not use these functions directly? Hm, any reasons? Maybe we
>> should rather go in the other direction with this?
>>
> Its because in the origional implementation of the sctp protocol, there was a
> user space test harness which built the kernel module for userspace execution to
> cary our some unit testing on the code.  It did so by redefining some of those
> locking macros to user space friendly code.  IIRC we haven't use those unit
> tests in years, and so should be removing them, not adding them to other
> locations.
> 

Thanks for your answers.

I will send the patches with removing these macros soon.

Regards,
Wang

> Neil
> 
>>> Wang Weidong (2):
>>>   sctp: use sctp_local_bh_{disable|enable} instead
>>>     local_bh_{disable|enable}
>>>   sctp: use sctp_read_[un]lock instead of read_[un]lock
>>>
>>>  net/sctp/endpointola.c   |  4 ++--
>>>  net/sctp/input.c         | 10 +++++-----
>>>  net/sctp/proc.c          | 12 ++++++------
>>>  net/sctp/sm_make_chunk.c |  8 ++++----
>>>  net/sctp/socket.c        |  8 ++++----
>>>  5 files changed, 21 insertions(+), 21 deletions(-)
>>>
>>
> 
> .
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox