Netdev List
 help / color / mirror / Atom feed
* Re: network namespace and DNS lookups
From: Ravi Aysola @ 2012-12-18 15:49 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CAFaHj6GVgDf2QXUWyaKYnvOqLJRjG+kA32eEk5psPkmc-RPY1Q@mail.gmail.com>

I think I sent my earlier email a bit prematurely.  I do have
/etc/netns/<namespace-name>/resolv.conf
files under each of my namespaces. Now the question is, how does a
user space process
(say bind)  look at a namespace specific resolv.conf instead of
default one?  Have any of
these standard applications been modified to work with namespace
specific config files?

thanks again
ravi/

On Tue, Dec 18, 2012 at 10:09 AM, Ravi Aysola <ravi.mlists@gmail.com> wrote:
> Has there been any work in any of the recent kernels to limit the DNS lookup
> to a particular network namespace?  Do we have any facility to specify the
> DNS resolvers on network namespace basis (such as /etc/ns/resolv.conf)?
>
> thank you
> ravi/

^ permalink raw reply

* Re: [PATCH V4 0/5] Add LE hash collision bug fix for active and passive offloaded connections
From: Vipul Pandya @ 2012-12-18 16:07 UTC (permalink / raw)
  To: roland@purestorage.com, davem@davemloft.net
  Cc: Vipul Pandya, linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
	Divy Le Ray, Dimitrios Michailidis, Kumar A S, Steve Wise,
	Abhishek Agrawal
In-Reply-To: <1355131856-29142-1-git-send-email-vipul@chelsio.com>

Hi Roland and David,

Any feedback on this?

Thanks,
Vipul

On 10-12-2012 15:00, Vipul Pandya wrote:
> This patch series fixes the LE hash collision issue in cxgb4 and RDMA/cxgb4
> drivers in kernel.org.
> 
> If the hash functionality is enabled in T4 then tuple information of active and
> passive offloaded connections are stored in DDR3 memory. LE (Lookup Engine)
> implements the interface to search this tuple entries using hash algorithm. To
> avoid LE hash collision, firmware provides new mechanisms to setup active and
> passive open connections.
> 
> In case of active open connection, firmware detects LE hash collision situation
> and notifies driver. Driver uses fw_ofld_connection work request to offload
> that connection. Its tuple information will be stored in TCAM memory array. 
> 
> In case of passive open connection, server filter region is created in TCAM.
> This region stores the filter which will redirect the incoming SYN packet to
> offload queues. After this driver tries to establish the connection using
> firmware work request.
> 
> This patch series also adds framework for managing filters and to use T4's
> filter capabilities.
> 
> Following testing has been carried out successfully on this patch series.
> 1. 1000 active open connections created and saw that tcam_full counter is
> getting incremented through debugfs file.
> 2. Multiple passive open connections were created and ran the same test
> repetatively to verify server filter is getting created and deleted properly.
> 
> The patch-series is based on Roland's infiniband tree (for-next branch),
> and involves changes on cxgb4 and RDMA/cxgb4 drivers. Both linux-rdma and netdev
> are included in this post for review.
> 
> We have some more bug fixes in RDMA/cxgb4 after this patch series. So, we would
> like to request this series to get mereged through Roland's infiniband for-next
> tree.
> 
> V2: Changed commenting style from kernel-doc format('/**') to normal
> V3: Resending the whole patch series again with the missing patches in V2
> V4: Changed comments for networking from '/*' to '/* Comment' format.
> 
> Thanks,
> Vipul Pandya
> 
> Vipul Pandya (5):
>   cxgb4: Add T4 filter support
>   cxgb4: Add LE hash collision bug fix path in LLD driver
>   RDMA/cxgb4: Fix LE hash collision bug for active open connection
>   RDMA/cxgb4: Fix LE hash collision bug for passive open connection
>   RDMA/cxgb4: Fix bug for active and passive LE hash collision path
> 
>  drivers/infiniband/hw/cxgb4/cm.c                |  789 +++++++++++++++++++---
>  drivers/infiniband/hw/cxgb4/device.c            |  210 ++++++-
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h          |   33 +
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |  146 +++++
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  473 +++++++++++++-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h  |   23 +-
>  drivers/net/ethernet/chelsio/cxgb4/l2t.c        |   34 +
>  drivers/net/ethernet/chelsio/cxgb4/l2t.h        |    3 +
>  drivers/net/ethernet/chelsio/cxgb4/t4_hw.c      |   23 +-
>  drivers/net/ethernet/chelsio/cxgb4/t4_msg.h     |   66 ++
>  drivers/net/ethernet/chelsio/cxgb4/t4_regs.h    |   37 ++
>  drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h   |  388 +++++++++++
>  12 files changed, 2100 insertions(+), 125 deletions(-)
> 

^ permalink raw reply

* Re: [PATCH V4 0/5] Add LE hash collision bug fix for active and passive offloaded connections
From: Steve Wise @ 2012-12-18 16:11 UTC (permalink / raw)
  To: Vipul Pandya
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	roland-BHEL68pLQRGGvPXPguhicg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	divy-ut6Up61K2wZBDgjK7y7TUQ, dm-ut6Up61K2wZBDgjK7y7TUQ,
	kumaras-ut6Up61K2wZBDgjK7y7TUQ, abhishek-ut6Up61K2wZBDgjK7y7TUQ
In-Reply-To: <1355131856-29142-1-git-send-email-vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>

On 12/10/2012 3:30 AM, Vipul Pandya wrote:
> This patch series fixes the LE hash collision issue in cxgb4 and RDMA/cxgb4
> drivers in kernel.org.
>
> If the hash functionality is enabled in T4 then tuple information of active and
> passive offloaded connections are stored in DDR3 memory. LE (Lookup Engine)
> implements the interface to search this tuple entries using hash algorithm. To
> avoid LE hash collision, firmware provides new mechanisms to setup active and
> passive open connections.
>
> In case of active open connection, firmware detects LE hash collision situation
> and notifies driver. Driver uses fw_ofld_connection work request to offload
> that connection. Its tuple information will be stored in TCAM memory array.
>
> In case of passive open connection, server filter region is created in TCAM.
> This region stores the filter which will redirect the incoming SYN packet to
> offload queues. After this driver tries to establish the connection using
> firmware work request.
>
> This patch series also adds framework for managing filters and to use T4's
> filter capabilities.
>
> Following testing has been carried out successfully on this patch series.
> 1. 1000 active open connections created and saw that tcam_full counter is
> getting incremented through debugfs file.
> 2. Multiple passive open connections were created and ran the same test
> repetatively to verify server filter is getting created and deleted properly.
>
> The patch-series is based on Roland's infiniband tree (for-next branch),
> and involves changes on cxgb4 and RDMA/cxgb4 drivers. Both linux-rdma and netdev
> are included in this post for review.
>
> We have some more bug fixes in RDMA/cxgb4 after this patch series. So, we would
> like to request this series to get mereged through Roland's infiniband for-next
> tree.
>
> V2: Changed commenting style from kernel-doc format('/**') to normal
> V3: Resending the whole patch series again with the missing patches in V2
> V4: Changed comments for networking from '/*' to '/* Comment' format.
>
> Thanks,
> Vipul Pandya
>
> Vipul Pandya (5):
>    cxgb4: Add T4 filter support
>    cxgb4: Add LE hash collision bug fix path in LLD driver
>    RDMA/cxgb4: Fix LE hash collision bug for active open connection
>    RDMA/cxgb4: Fix LE hash collision bug for passive open connection
>    RDMA/cxgb4: Fix bug for active and passive LE hash collision path
>
>   drivers/infiniband/hw/cxgb4/cm.c                |  789 +++++++++++++++++++---
>   drivers/infiniband/hw/cxgb4/device.c            |  210 ++++++-
>   drivers/infiniband/hw/cxgb4/iw_cxgb4.h          |   33 +
>   drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |  146 +++++
>   drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  473 +++++++++++++-
>   drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h  |   23 +-
>   drivers/net/ethernet/chelsio/cxgb4/l2t.c        |   34 +
>   drivers/net/ethernet/chelsio/cxgb4/l2t.h        |    3 +
>   drivers/net/ethernet/chelsio/cxgb4/t4_hw.c      |   23 +-
>   drivers/net/ethernet/chelsio/cxgb4/t4_msg.h     |   66 ++
>   drivers/net/ethernet/chelsio/cxgb4/t4_regs.h    |   37 ++
>   drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h   |  388 +++++++++++
>   12 files changed, 2100 insertions(+), 125 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] xen/netfront: improve truesize tracking
From: Eric Dumazet @ 2012-12-18 16:13 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev@vger.kernel.org, Sander Eikelenboom, Konrad Rzeszutek Wilk,
	annie li, xen-devel@lists.xensource.com
In-Reply-To: <1355844398.14620.254.camel@zakaz.uk.xensource.com>

On Tue, 2012-12-18 at 15:26 +0000, Ian Campbell wrote:

> So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
> 

I dont know what are the real frag sizes in your case.

Some drivers allocate a full page for an ethernet frame, others use half
of a page, it really depends.

As the frag ABI doesnt contain real size, its ok in this case to account
the actual frag size.

(skb->data_len in your driver)

^ permalink raw reply

* Re: [PATCH] xen/netfront: improve truesize tracking
From: Ian Campbell @ 2012-12-18 16:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, Sander Eikelenboom, Konrad Rzeszutek Wilk,
	annie li, xen-devel@lists.xensource.com
In-Reply-To: <1355847180.9380.21.camel@edumazet-glaptop>

On Tue, 2012-12-18 at 16:13 +0000, Eric Dumazet wrote:
> On Tue, 2012-12-18 at 15:26 +0000, Ian Campbell wrote:
> 
> > So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
> > 
> 
> I dont know what are the real frag sizes in your case.

I think it's a page, see xennet_alloc_rx_buffers and the alloc_page
therein.

> Some drivers allocate a full page for an ethernet frame, others use half
> of a page, it really depends.
> 
> As the frag ABI doesnt contain real size, its ok in this case to account
> the actual frag size.
> 
> (skb->data_len in your driver)

I guess I'm a bit confused by what truesize means again then ;-),
because in that case the original patch is correct although it would
have been less confusing to do:
	skb->truesize += skb->data_len; 
in xennet_poll() and then do the subtraction of
NETFRONT_SKB_CB(skb)->pull_to in handle_incoming_queue() where we
actually do the pull up.

Unless __pskb_pull_tail does that adjustment for us, but if it does I
can't see where.

Ian.

^ permalink raw reply

* Re: inaccurate packet scheduling
From: Stephen Hemminger @ 2012-12-18 16:26 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: jhs, davem, edumazet, tgraf, netdev
In-Reply-To: <20121218150407.GB1533@minipsycho.orion>



----- Original Message -----
> Hi all.
> 
> Run one of the following 2 scripts on machine A:
> 
> #!/bin/bash
> tc qdisc del dev eth0 root
> sleep 1
> tc -batch << EOF
> qdisc add dev eth0 root handle 1: prio bands 2 priomap 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0
> qdisc add dev eth0 parent 1:1 handle 10: pfifo limit 50
> qdisc add dev eth0 parent 1:2 handle 20 tbf latency 100ms rate 4mbit
> burst 2m
> filter add dev eth0 parent 1: protocol ip u32 match ip dst
> $machineB_ip flowid 1:2
> EOF
> 
> #!/bin/bash
> tc qdisc del dev eth0 root
> sleep 1
> tc -batch << EOF
> qdisc add dev eth0 root handle 1: prio bands 2 priomap 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0
> qdisc add dev eth0 parent 1:1 handle 10: pfifo limit 20
> qdisc add dev eth0 parent 1:2 handle 20: pfifo limit 20
> filter add dev eth0 parent 1: protocol ip pref 10 \
> u32 match ip dst $machineB_ip \
> flowid 1:2 \
> police rate 4Mbit burst 2m conform-exceed drop
> EOF
> 
> And run:
> [machineB ~]# iperf -s
> [machineA ~]# iperf -c machineB_ip -t 60
> 
> Expected results are: ~3.8-4.2 Mbits/s
> But actual results are: ~130-170 Kbits/s with tbf, ~70-300 Kbits/s
> with policy rate
> 
> [machineA ~]# tc -s qdisc list dev eth0
> qdisc prio 1: root refcnt 9 bands 2 priomap  0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0
>  Sent 1512384 bytes 1032 pkt (dropped 729, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> qdisc pfifo 10: parent 1:1 limit 50p
>  Sent 4560 bytes 32 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> qdisc tbf 20: parent 1:2 rate 4000Kbit burst 2Mb lat 100.0ms
>  Sent 1507824 bytes 1000 pkt (dropped 729, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> 
> 
> Tested with kernel pulled from linus's git today. This happens with
> older
> kernels as well (I tried 2.6.32-based rhel6 kernels).
> 
> This happens to me on following machines:
> HP DL360G8 (x86_64) http://people.redhat.com/jpirko/aThoo2Ei/dl380g8/
> HP DL360G3 (i686)
> IBM JS22 (ppc64) http://people.redhat.com/jpirko/aThoo2Ei/ibmjs22/
> 
> On following machines, I do not observe this issue:
> qemu kvm (x86_64)
> IBM Zseries (s390x) http://people.redhat.com/jpirko/aThoo2Ei/ibmz/
> 
> Please ask in case you need me to provide any other details.
> 
> Thanks.

Check kernel log for messages about clock. It could be that on the
machines with issues TSC is not usable for kernel clock.
Also turn off TSO since it screws up any form of rate control.

^ permalink raw reply

* Re: [PATCH v2] netlink: align attributes on 64-bits
From: Nicolas Dichtel @ 2012-12-18 16:23 UTC (permalink / raw)
  To: Thomas Graf; +Cc: bhutchings, netdev, davem, David.Laight
In-Reply-To: <20121218125735.GG27746@casper.infradead.org>

Le 18/12/2012 13:57, Thomas Graf a écrit :
> On 12/17/12 at 05:49pm, Nicolas Dichtel wrote:
>> @@ -492,6 +492,15 @@ static inline struct nlmsghdr *nlmsg_put_answer(struct sk_buff *skb,
>>    */
>>   static inline struct sk_buff *nlmsg_new(size_t payload, gfp_t flags)
>>   {
>> +	/* Because attributes may be aligned on 64-bits boundary with fake
>> +	 * attribute (type 0, size 4 (attributes are 32-bits align by default)),
>> +	 * an exact payload size cannot be calculated. Hence, we need to reserve
>> +	 * more space for these attributes.
>> +	 * 128 is arbitrary: it allows to align up to 32 attributes.
>> +	 */
>> +	if (payload < NLMSG_DEFAULT_SIZE)
>> +		payload = min(payload + 128, (size_t)NLMSG_DEFAULT_SIZE);
>
> This is doomed to fail eventually. A netlink message may carry
> hundreds of attributes eventually. See my suggestion below.
>
>> diff --git a/lib/nlattr.c b/lib/nlattr.c
>> index 18eca78..7440a80 100644
>> --- a/lib/nlattr.c
>> +++ b/lib/nlattr.c
>> @@ -450,9 +450,18 @@ EXPORT_SYMBOL(__nla_put_nohdr);
>>    */
>>   int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
>>   {
>> -	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen)))
>> +	int align = IS_ALIGNED((unsigned long)skb_tail_pointer(skb), 8) ? 0 : 4;
>
> This does not look right. In order for the attribute data to be
> aligned properly you would need to check skb_tail_pointer(skb) +
> NLA_HDRLEN for proper alignment or you end up aligning the
> attribute header.
Good point.

>
> How about we change nla_total_size() to return the size with
> needed padding taken into account. That should fix the message
> size caluclation problem and we only need to reserve room for
> the initial padding to align the very first attribute.
>
> Below is an untested patch that does this. What do you think?
I still have some doubts about the size calculation (see bellow).
For the rest of the patch, it seems ok (except some minor point). I will test it.

>
> diff --git a/include/net/netlink.h b/include/net/netlink.h
> index 9690b0f..7ce8e76 100644
> --- a/include/net/netlink.h
> +++ b/include/net/netlink.h
> @@ -114,7 +114,6 @@
>    * Attribute Length Calculations:
>    *   nla_attr_size(payload)		length of attribute w/o padding
>    *   nla_total_size(payload)		length of attribute w/ padding
> - *   nla_padlen(payload)		length of padding
>    *
>    * Attribute Payload Access:
>    *   nla_data(nla)			head of attribute payload
> @@ -492,6 +491,14 @@ static inline struct nlmsghdr *nlmsg_put_answer(struct sk_buff *skb,
>    */
>   static inline struct sk_buff *nlmsg_new(size_t payload, gfp_t flags)
>   {
> +	/* If an exact size if specified, reserve some additional space to
> +	 * align the first attribute, all subsequent attributes should have
> +	 * padding accounted for.
> +	 */
> +	if (payload != NLMSG_DEFAULT_SIZE)
> +		payload = min_t(size_t, payload + NLA_ATTR_ALIGN,
> +				NLMSG_DEFAULT_SIZE);
> +
>   	return alloc_skb(nlmsg_total_size(payload), flags);
>   }
>
> @@ -653,16 +660,12 @@ static inline int nla_attr_size(int payload)
>    */
>   static inline int nla_total_size(int payload)
>   {
> -	return NLA_ALIGN(nla_attr_size(payload));
> -}
> +	size_t len = NLA_ALIGN(nla_attr_size(payload));
>
> -/**
> - * nla_padlen - length of padding at the tail of attribute
> - * @payload: length of payload
> - */
> -static inline int nla_padlen(int payload)
> -{
> -	return nla_total_size(payload) - nla_attr_size(payload);
> +	if (!IS_ALIGNED(len, NLA_ATTR_ALIGN))
> +		len = ALIGN(len + NLA_HDRLEN, NLA_ATTR_ALIGN);
Two comments:
1/ should it be ALIGN(len, NLA_ATTR_ALIGN)? If we want to add a __u64:
    => nla_attr_size(sizeof(__u64)) = 12
    => NLA_ALIGN(nla_attr_size(sizeof(__u64))) => 12 (= len)
    => ALIGN(len + NLA_HDRLEN, NLA_ATTR_ALIGN) = 0 but it should be 4

2/ Suppose that the attribute is:

   struct foo {
   	__u64 bar1;
   	__u32 bar2;
   }
   => sizeof(struct foo) = 12 (= payload)
   => nla_attr_size(payload) = 16
   => NLA_ALIGN(nla_attr_size(payload)) = 16 (= len)
   => IS_ALIGNED(len, NLA_ATTR_ALIGN) = true
   => extra room is not reserved
   But it's not guaranteed that bar1 is aligned on 8 bytes, only on 4 bytes.

> +
> +	return len;
>   }
>
>   /**
> diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
> index 78d5b8a..1856729 100644
> --- a/include/uapi/linux/netlink.h
> +++ b/include/uapi/linux/netlink.h
> @@ -149,5 +149,10 @@ struct nlattr {
>   #define NLA_ALIGN(len)		(((len) + NLA_ALIGNTO - 1) & ~(NLA_ALIGNTO - 1))
>   #define NLA_HDRLEN		((int) NLA_ALIGN(sizeof(struct nlattr)))
>
> +/* Padding attribute type, added automatically to align attributes,
> + * must be ignored by readers. */
> +#define NLA_PADDING		0
> +#define NLA_ATTR_ALIGN		8
> +
>
>   #endif /* _UAPI__LINUX_NETLINK_H */
> diff --git a/lib/nlattr.c b/lib/nlattr.c
> index 18eca78..b09473c 100644
> --- a/lib/nlattr.c
> +++ b/lib/nlattr.c
> @@ -322,18 +322,36 @@ int nla_strcmp(const struct nlattr *nla, const char *str)
>    * Adds a netlink attribute header to a socket buffer and reserves
>    * room for the payload but does not copy it.
>    *
> + * May add a padding attribute of type NLA_PADDING before the
> + * real attribute to ensure proper alignment.
> + *
>    * The caller is responsible to ensure that the skb provides enough
>    * tailroom for the attribute header and payload.
>    */
>   struct nlattr *__nla_reserve(struct sk_buff *skb, int attrtype, int attrlen)
>   {
>   	struct nlattr *nla;
> +	size_t offset;
> +
> +	offset = (size_t) skb_tail_pointer(skb);
> +	if (!IS_ALIGNED(offset + NLA_HDRLEN, NLA_ATTR_ALIGN)) {
With the previous struct foo, this test may be true even if we don't have 
reserved extra room. This test depends on previous attribute.
I think the exact size of the netlink message depends on the order of 
attributes, not only on the attribute itself.
What about taking the assumption that the start will never be aligned and always 
allocating extra room: ALIGN(NLA_ALIGNTO, NLA_ATTR_ALIGN) (= 4)?

> +		struct nlattr *pad;
> +		size_t padlen;
> +
> +		padlen = nla_total_size(offset) - offset - NLA_HDRLEN;
> +		pad = (struct nlattr *) skb_put(skb, nla_attr_size(padlen));
> +		pad->nla_type = 0;
> +		pad->nla_len = nla_attr_size(padlen);
> +
> +		memset((unsigned char *) pad + NLA_HDRLEN, 0, padlen);
> +	}
>
> -	nla = (struct nlattr *) skb_put(skb, nla_total_size(attrlen));
> +	nla = (struct nlattr *) skb_put(skb, NLA_ALIGN(nla_attr_size(attrlen)));
>   	nla->nla_type = attrtype;
>   	nla->nla_len = nla_attr_size(attrlen);
>
> -	memset((unsigned char *) nla + nla->nla_len, 0, nla_padlen(attrlen));
> +	memset((unsigned char *) nla + nla->nla_len, 0,
> +	       NLA_ALIGN(nla->nla_len) - nla->nla_len);
>
>   	return nla;
>   }
> @@ -360,6 +378,23 @@ void *__nla_reserve_nohdr(struct sk_buff *skb, int attrlen)
>   }
>   EXPORT_SYMBOL(__nla_reserve_nohdr);
>
> +static size_t nla_pre_padlen(struct sk_buff *skb)
> +{
> +	size_t offset = (size_t) skb_tail_pointer(skb);
> +
> +	if (!IS_ALIGNED(offset + NLA_HDRLEN, NLA_ATTR_ALIGN))
> +		return nla_total_size(offset) - offset - NLA_HDRLEN;
> +
> +	return 0;
> +}
> +
> +static bool nla_insufficient_space(struct sk_buff *skb, int attrlen)
> +{
> +	size_t needed = nla_pre_padlen(skb) + nla_total_size(attrlen);
If nla_total_size() was right, nla_pre_padlen(skb) should already be included. 
Am I wrong?

> +
> +	return (skb_tailroom(skb) < needed);
> +}
> +
>   /**
>    * nla_reserve - reserve room for attribute on the skb
>    * @skb: socket buffer to reserve room on
> @@ -374,7 +409,7 @@ EXPORT_SYMBOL(__nla_reserve_nohdr);
>    */
>   struct nlattr *nla_reserve(struct sk_buff *skb, int attrtype, int attrlen)
>   {
> -	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen)))
> +	if (unlikely(nla_insufficient_space(skb, attrlen)))
>   		return NULL;
>
>   	return __nla_reserve(skb, attrtype, attrlen);
> @@ -450,7 +485,7 @@ EXPORT_SYMBOL(__nla_put_nohdr);
>    */
>   int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
>   {
> -	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen)))
> +	if (unlikely(nla_insufficient_space(skb, attrlen)))
>   		return -EMSGSIZE;
>
>   	__nla_put(skb, attrtype, attrlen, data);
>

^ permalink raw reply

* Re: [PATCH] xen/netfront: improve truesize tracking
From: Eric Dumazet @ 2012-12-18 16:35 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev@vger.kernel.org, Sander Eikelenboom, Konrad Rzeszutek Wilk,
	annie li, xen-devel@lists.xensource.com
In-Reply-To: <1355847758.14620.265.camel@zakaz.uk.xensource.com>

On Tue, 2012-12-18 at 16:22 +0000, Ian Campbell wrote:
> On Tue, 2012-12-18 at 16:13 +0000, Eric Dumazet wrote:
> > On Tue, 2012-12-18 at 15:26 +0000, Ian Campbell wrote:
> > 
> > > So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
> > > 
> > 
> > I dont know what are the real frag sizes in your case.
> 
> I think it's a page, see xennet_alloc_rx_buffers and the alloc_page
> therein.
> 

If they are order-0 pages, then PAGE_SIZE * nr_frags  is OK.


> > Some drivers allocate a full page for an ethernet frame, others use half
> > of a page, it really depends.
> > 

> > As the frag ABI doesnt contain real size, its ok in this case to account
> > the actual frag size.
> > 
> > (skb->data_len in your driver)
> 
> I guess I'm a bit confused by what truesize means again then ;-),
> because in that case the original patch is correct although it would
> have been less confusing to do:
> 	skb->truesize += skb->data_len; 
> in xennet_poll() and then do the subtraction of
> NETFRONT_SKB_CB(skb)->pull_to in handle_incoming_queue() where we
> actually do the pull up.
> 
> Unless __pskb_pull_tail does that adjustment for us, but if it does I
> can't see where.

Thats because skb frags only contain :

a page pointer.
An offset
A size. (Exact number or used bytes in this frag)

And not the 'originally allocated size. It could be 256, 768, 2048,
4096, 65536 bytes, nobody but the driver really knows.

So when we pull X bytes from a fragment to skb->head, there is no way to
remember what was the original size of the fragment.

Only the driver allocating the frag knows its truesize.

Once skb is given to the stack, we lose this information, and rely on
skb->truesize being an accurate estimation.

^ permalink raw reply

* Re: TCP delayed ACK heuristic
From: Eric Dumazet @ 2012-12-18 16:30 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, Ben Greear, David Miller, Stephen Hemminger, Rick Jones,
	Thomas Graf
In-Reply-To: <2088500005.27728019.1355843484620.JavaMail.root@redhat.com>

On Tue, 2012-12-18 at 10:11 -0500, Cong Wang wrote:
> Hello, TCP experts!
> 
> Some time ago, Ben sent a patch [1] to add some knobs for
> tuning TCP delayed ACK, but rejected by David.
> 
> David's point is that we can do some heuristics for TCP
> delayed ACK, so the question is that what kind of heuristics
> can we use?
> 
> RFC1122 explicitly mentions:
> 
>             A TCP SHOULD implement a delayed ACK, but an ACK should not
>             be excessively delayed; in particular, the delay MUST be
>             less than 0.5 seconds, and in a stream of full-sized
>             segments there SHOULD be an ACK for at least every second
>             segment.
> 
> so this prevents us from using any heuristic for the number
> of coalesced delayed ACK.
> 
> For the timeout of a delayed ACK, my idea is guessing how many
> packet we suppose to receive is the TCP stream is fully utilized,
> something like below:
> 
> +static inline u32 tcp_expect_packets(struct sock *sk)
> +{
> +       struct tcp_sock *tp = tcp_sk(sk);
> +       int rtt = tp->srtt >> 3;
> +       u32 idle = tcp_time_stamp - inet_csk(sk)->icsk_ack.lrcvtime;
> +
> +       return idle * 2 / rtt;
> +}
> ...
> +       ato -= tcp_expect_packets(sk) * delta;
> 
> 
> The more we expect, the less we should delay. However this is
> not accurate due to congestion control.
> 
> Meanwhile, we can also check how many packets are pending in TCP
> sending queue, the more we pend, the more we can piggyback with
> a single ACK, but not beyond how much we are able to send at
> that time.
> 
> Comments? Ideas?
> 

ACKS might also be delayed because of bidirectional traffic, and is more
controlled by the application response time. TCP stack can not easily
estimate it.

If you focus on bulk receive, LRO/GRO should already lower number of
ACKS to an acceptable level and without major disruption.

Stretch acks are not only the receiver concern, there are issues for the
sender that you cannot always control/change.

I recommend reading RFC2525 2.13

^ permalink raw reply

* RE: TCP delayed ACK heuristic
From: David Laight @ 2012-12-18 16:39 UTC (permalink / raw)
  To: Cong Wang, netdev
  Cc: Ben Greear, David Miller, Eric Dumazet, Stephen Hemminger,
	Rick Jones, Thomas Graf
In-Reply-To: <2088500005.27728019.1355843484620.JavaMail.root@redhat.com>

> David's point is that we can do some heuristics for TCP
> delayed ACK, so the question is that what kind of heuristics
> can we use?
> 
> RFC1122 explicitly mentions:
> 
>             A TCP SHOULD implement a delayed ACK, but an ACK should not
>             be excessively delayed; in particular, the delay MUST be
>             less than 0.5 seconds, and in a stream of full-sized
>             segments there SHOULD be an ACK for at least every second
>             segment.
> 
> so this prevents us from using any heuristic for the number
> of coalesced delayed ACK.

There are problems with only implementing the acks
specified by RFC1122.

I've seen problems when the sending side is doing (I think)
'slow start' with Nagle disabled.
The sender would only send 4 segments before waiting for an
ACK - even when it had more than a full sized segment waiting.
Sender was Linux 2.6.something (probably low 20s).
I changed the application flow to send data in the reverse
direction to avoid the problem.
That was on a ~0 delay local connection - which means that
there is almost never outstanding data, and the 'slow start'
happened almost all the time.
Nagle is completely the wrong algorithm for the data flow.

	David

^ permalink raw reply

* Re: inaccurate packet scheduling
From: Eric Dumazet @ 2012-12-18 16:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jiri Pirko, jhs, davem, edumazet, tgraf, netdev
In-Reply-To: <e2f71dc7-bee5-41ca-bd64-f4569fa953da@tahiti.vyatta.com>

On Tue, 2012-12-18 at 08:26 -0800, Stephen Hemminger wrote:

> Check kernel log for messages about clock. It could be that on the
> machines with issues TSC is not usable for kernel clock.
> Also turn off TSO since it screws up any form of rate control.

Yes, but we should fix it eventually. I'll take a look.

^ permalink raw reply

* RE: [PATCH v2] netlink: align attributes on 64-bits
From: David Laight @ 2012-12-18 16:50 UTC (permalink / raw)
  To: nicolas.dichtel, Thomas Graf; +Cc: bhutchings, netdev, davem
In-Reply-To: <50D09897.8030508@6wind.com>

> 2/ Suppose that the attribute is:
> 
>    struct foo {
>    	__u64 bar1;
>    	__u32 bar2;
>    }
>    => sizeof(struct foo) = 12 (= payload)

That is only true if the host architecture aligns 64bit items
on 32 it boundaries (as i386 does).
Otherwise there are 4 bytes of padding at the end and the
size is 16.

Actually it is worse than that.
Consider the structure:
	struct bar {
		__u32 foo1;
		__u64 foo2;
	}
On i386 it will have size 12 and foo2 will be at offset 4.
On sparc32 (and most 64bit) it will have size 16 with foo2
at offset 8 (and 4 bytes of pad after foo1).

Do these messages move between systems?
If they do then any 64bit items need an explicit alignment
eg tag with __attribute__((aligned(8))) (or aligned(4)).

	David

^ permalink raw reply

* Re: [PATCH 11/11] bridge: Dump vlan information from a bridge port
From: Stephen Hemminger @ 2012-12-18 17:03 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, davem, mst, john.r.fastabend
In-Reply-To: <1355342477-4971-12-git-send-email-vyasevic@redhat.com>

On Wed, 12 Dec 2012 15:01:17 -0500
Vlad Yasevich <vyasevic@redhat.com> wrote:

> Using the RTM_GETLINK dump the vlan filter list of a given
> bridge port.  The information depends on setting the filter
> flag similar to how nic VF info is dumped.
> 
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

I will put these in the bridge tree with other patches that
are being staged for net-next.

^ permalink raw reply

* Re: [PATCH 04/11] bridge: Cache vlan in the cb for faster egress lookup.
From: Stephen Hemminger @ 2012-12-18 17:04 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, davem, mst, john.r.fastabend
In-Reply-To: <1355342477-4971-5-git-send-email-vyasevic@redhat.com>

On Wed, 12 Dec 2012 15:01:10 -0500
Vlad Yasevich <vyasevic@redhat.com> wrote:

> On input, cache the pointer to the bridge vlan info, so that
> on egress, we have can simply look at the port bitmap instead
> of traversing a vlan list.
> 
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

This isn't going to be safe. Once packet is passed up, the cb[]
can get overwritten by other things.

^ permalink raw reply

* Re: [PATCH v2] netlink: align attributes on 64-bits
From: Thomas Graf @ 2012-12-18 17:08 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: bhutchings, netdev, davem, David.Laight
In-Reply-To: <50D09897.8030508@6wind.com>

On 12/18/12 at 05:23pm, Nicolas Dichtel wrote:
> Le 18/12/2012 13:57, Thomas Graf a écrit :
> >-static inline int nla_padlen(int payload)
> >-{
> >-	return nla_total_size(payload) - nla_attr_size(payload);
> >+	if (!IS_ALIGNED(len, NLA_ATTR_ALIGN))
> >+		len = ALIGN(len + NLA_HDRLEN, NLA_ATTR_ALIGN);
> Two comments:
> 1/ should it be ALIGN(len, NLA_ATTR_ALIGN)? If we want to add a __u64:
>    => nla_attr_size(sizeof(__u64)) = 12
>    => NLA_ALIGN(nla_attr_size(sizeof(__u64))) => 12 (= len)
>    => ALIGN(len + NLA_HDRLEN, NLA_ATTR_ALIGN) = 0 but it should be 4

We can't add 1-3 bytes of padding, therefore we need to add
NLA_HDRLEN to len before aligning it to enforce a minimal
padding. We can't hit it right now because 4 byte alignment
of the previous attribute is a given but if we ever change
the alignment it could become an issue and the above should
be bullet proof.

Your example would come out like this:
  nla_attr_size(8) = 12
  ALIGN(12 + 4, 8) = 16

> 2/ Suppose that the attribute is:
> 
>   struct foo {
>   	__u64 bar1;
>   	__u32 bar2;
>   }
>   => sizeof(struct foo) = 12 (= payload)
>   => nla_attr_size(payload) = 16
>   => NLA_ALIGN(nla_attr_size(payload)) = 16 (= len)
>   => IS_ALIGNED(len, NLA_ATTR_ALIGN) = true
>   => extra room is not reserved
>   But it's not guaranteed that bar1 is aligned on 8 bytes, only on 4 bytes.

That's correct, that's why I have added the additional
NLA_ATTR_ALIGN of room in nlmsg_new(). It will account
for the one time padding that is needed before we add
the very first attribute.

If all attributes after that have a size aligned to 8
bytes no padding is needed. Padding will only be needed
again if a struct is missized in which case we reserve
room with the above. Correct?

> >+	offset = (size_t) skb_tail_pointer(skb);
> >+	if (!IS_ALIGNED(offset + NLA_HDRLEN, NLA_ATTR_ALIGN)) {
> With the previous struct foo, this test may be true even if we don't
> have reserved extra room. This test depends on previous attribute.
> I think the exact size of the netlink message depends on the order
> of attributes, not only on the attribute itself.
> What about taking the assumption that the start will never be
> aligned and always allocating extra room: ALIGN(NLA_ALIGNTO,
> NLA_ATTR_ALIGN) (= 4)?

See my explanation above. I think this works. The order does not
matter, the sum of all padding required will always be the same.

> >+static bool nla_insufficient_space(struct sk_buff *skb, int attrlen)
> >+{
> >+	size_t needed = nla_pre_padlen(skb) + nla_total_size(attrlen);
> If nla_total_size() was right, nla_pre_padlen(skb) should already be
> included. Am I wrong?

No, nla_pre_padlen() contains the number of bytes needed to align
skb_tail_pointer() to an alignment of 8. If that is > 0 but the
attribute to follow is already aligned.

The tricky part here is that accounting for padding in
nla_total_size() only works for the sum of all attributes.
It does not account for the specific padding required for the
previous attribute.

Therefore the above check. The above could be changed to
nla_attr_size() theoretically as we don't need space for the
final padding eventually but we checked for space before so I
kept it that way.

I realize it's slightly confusign and needs better documentation
and please double check my thinking :-)

^ permalink raw reply

* Re: [PATCH v2] netlink: align attributes on 64-bits
From: Thomas Graf @ 2012-12-18 17:11 UTC (permalink / raw)
  To: David Laight; +Cc: nicolas.dichtel, bhutchings, netdev, davem
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70F5@saturn3.aculab.com>

On 12/18/12 at 04:50pm, David Laight wrote:
> > 2/ Suppose that the attribute is:
> > 
> >    struct foo {
> >    	__u64 bar1;
> >    	__u32 bar2;
> >    }
> >    => sizeof(struct foo) = 12 (= payload)
> 
> That is only true if the host architecture aligns 64bit items
> on 32 it boundaries (as i386 does).
> Otherwise there are 4 bytes of padding at the end and the
> size is 16.
> 
> Actually it is worse than that.
> Consider the structure:
> 	struct bar {
> 		__u32 foo1;
> 		__u64 foo2;
> 	}
> On i386 it will have size 12 and foo2 will be at offset 4.
> On sparc32 (and most 64bit) it will have size 16 with foo2
> at offset 8 (and 4 bytes of pad after foo1).

This is a known problem and I can't think of anything
that can be done about it except for memcpy()ing the
data before accessing it.

If you have ideas, I'm more that willing to listen :)

> Do these messages move between systems?
> If they do then any 64bit items need an explicit alignment
> eg tag with __attribute__((aligned(8))) (or aligned(4)).

They don't. Netlink has and will be host bound. It also
uses host byte order for that reason.

^ permalink raw reply

* Re: network namespace and DNS lookups
From: Dan Williams @ 2012-12-18 17:22 UTC (permalink / raw)
  To: Ravi Aysola; +Cc: netdev
In-Reply-To: <CAFaHj6HT0c59us_7F9Uh7BYh_YXGRt30=yLOrV=hcrG29YO-qA@mail.gmail.com>

On Tue, 2012-12-18 at 10:49 -0500, Ravi Aysola wrote:
> I think I sent my earlier email a bit prematurely.  I do have
> /etc/netns/<namespace-name>/resolv.conf
> files under each of my namespaces. Now the question is, how does a
> user space process
> (say bind)  look at a namespace specific resolv.conf instead of
> default one?  Have any of
> these standard applications been modified to work with namespace
> specific config files?

Wouldn't that be the glibc resolver's domain?  DNS lookups aren't done
by the kernel, but by glibc in userspace.  And glibc is also what reads
resolv.conf, so most things DNS-namespace related would need to happen
there.

Dan

> thanks again
> ravi/
> 
> On Tue, Dec 18, 2012 at 10:09 AM, Ravi Aysola <ravi.mlists@gmail.com> wrote:
> > Has there been any work in any of the recent kernels to limit the DNS lookup
> > to a particular network namespace?  Do we have any facility to specify the
> > DNS resolvers on network namespace basis (such as /etc/ns/resolv.conf)?
> >
> > thank you
> > ravi/
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] build: unbreak linkage of m_xt.so
From: Stephen Hemminger @ 2012-12-18 17:21 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: stephen.hemminger, vapier, netdev, jhs, urykhy, shemonc, pablo,
	netfilter-devel
In-Reply-To: <1355617968-26138-1-git-send-email-jengelh@inai.de>

On Sun, 16 Dec 2012 01:32:48 +0100
Jan Engelhardt <jengelh@inai.de> wrote:

> Commit v3.7.0~10 caused the variable new PKG_CONFIG variable never
> to be present at the time of calling make, leading to tc/m_xt.so
> not linked with -lxtables (result from pkg-config xtables --libs),
> that in turn leading to
> 
> tc: symbol lookup error: /usr/lib64/tc//m_xt.so: undefined symbol:
> xtables_init_all
> 
> Fixing that.
> 
> Signed-off-by: Jan Engelhardt <jengelh@inai.de>
> ---
>  configure |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index 9912114..573ee55 100755
> --- a/configure
> +++ b/configure
> @@ -4,7 +4,6 @@
>  INCLUDE=${1:-"$PWD/include"}
>  : ${PKG_CONFIG:=pkg-config}
>  : ${CC=gcc}
> -echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
>  
>  # Make a temp directory in build tree.
>  TMPDIR=$(mktemp -d config.XXXXXX)
> @@ -224,6 +223,7 @@ rm -f $TMPDIR/ipsettest.c $TMPDIR/ipsettest
>  }
>  
>  echo "# Generated config based on" $INCLUDE >Config
> +echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
>  
>  echo "TC schedulers"
>  

Ok, manually did the diff (conflicted with other previous changes).

^ permalink raw reply

* Re: [PATCH 04/11] bridge: Cache vlan in the cb for faster egress lookup.
From: Vlad Yasevich @ 2012-12-18 17:50 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, davem, mst, john.r.fastabend
In-Reply-To: <20121218090448.69c21daa@nehalam.linuxnetplumber.net>

On 12/18/2012 12:04 PM, Stephen Hemminger wrote:
> On Wed, 12 Dec 2012 15:01:10 -0500
> Vlad Yasevich <vyasevic@redhat.com> wrote:
>
>> On input, cache the pointer to the bridge vlan info, so that
>> on egress, we have can simply look at the port bitmap instead
>> of traversing a vlan list.
>>
>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>
> This isn't going to be safe. Once packet is passed up, the cb[]
> can get overwritten by other things.
>

Right, but only care about it while in bridging code.  We don't look
at it anywhere else...

Or are you saying that cb is not guaranteed to be preserved between 
br_handle_frame_finish and br_forward?

-vlad

^ permalink raw reply

* Re: [PATCH 11/11] bridge: Dump vlan information from a bridge port
From: Vlad Yasevich @ 2012-12-18 17:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, davem, mst, john.r.fastabend
In-Reply-To: <20121218090332.2f18f393@nehalam.linuxnetplumber.net>

On 12/18/2012 12:03 PM, Stephen Hemminger wrote:
> On Wed, 12 Dec 2012 15:01:17 -0500
> Vlad Yasevich <vyasevic@redhat.com> wrote:
>
>> Using the RTM_GETLINK dump the vlan filter list of a given
>> bridge port.  The information depends on setting the filter
>> flag similar to how nic VF info is dumped.
>>
>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>
> I will put these in the bridge tree with other patches that
> are being staged for net-next.
>

Hold on a bit.  I was running some more tests and found some
scenarios that didn't work right.

I am preparing v2 of the patches.

Thanks
-vlad

^ permalink raw reply

* Re: TCP delayed ACK heuristic
From: Rick Jones @ 2012-12-18 17:54 UTC (permalink / raw)
  To: David Laight
  Cc: Cong Wang, netdev, Ben Greear, David Miller, Eric Dumazet,
	Stephen Hemminger, Thomas Graf
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70F4@saturn3.aculab.com>

On 12/18/2012 08:39 AM, David Laight wrote:
> There are problems with only implementing the acks
> specified by RFC1122.
>
> I've seen problems when the sending side is doing (I think)
> 'slow start' with Nagle disabled.
> The sender would only send 4 segments before waiting for an
> ACK - even when it had more than a full sized segment waiting.
> Sender was Linux 2.6.something (probably low 20s).
> I changed the application flow to send data in the reverse
> direction to avoid the problem.
> That was on a ~0 delay local connection - which means that
> there is almost never outstanding data, and the 'slow start'
> happened almost all the time.
> Nagle is completely the wrong algorithm for the data flow.

If Nagle was already disabled, why the last sentence?  And from your 
description, even if Nagle were enabled, I would think that it was 
remote ACK+cwnd behaviour getting in your way, not Nagle, given that 
Nagle is to be decided on a user-send by user-send basis and release 
queued data (to the mercies of other heuristics) when it gets to be an 
MSS-worth.

The joys of intertwined heuristics I suppose.

Personally, I would love for there to be a way to have a cwnd's 
byte-limit's-worth of small segments outstanding at one time - it would 
make my netperf-life much easier as I could get rid of the netperf-level 
congestion window intended to keep successive requests (with Nagle 
already disabled) from getting coalesced by cwnd in a "burst-mode" test. 
* And perhaps make things nicer for the test when there is the 
occasional retransmission.  I used to think that netperf was just 
"unique" in that regard, but it sounds like you have an actual 
application looking to do that??

rick jones

* because I am trying to (ab)use the burst mode TCP_RR test for a 
maximum packets per second through the stack+NIC measurement that isn't 
also a context  switching benchmark. But I cannot really come-up with a 
real-world rationale to support further cwnd behaviour changes. 
Allowing a byte-limit-cwnd's worth of single-byte-payload TCP segments 
could easily be seen as being rather anti-social :)  And 
forcing/maintaining the original segment boundaries in retransmissions 
for small packets isn't such a hot idea either.

^ permalink raw reply

* Re: inaccurate packet scheduling
From: Eric Dumazet @ 2012-12-18 18:06 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Stephen Hemminger, jhs, davem, tgraf, netdev
In-Reply-To: <20121218175409.GA1690@minipsycho.orion>

> No such messages in log
>
>>Also turn off TSO since it screws up any form of rate control.
>
> Turning off TSO did not help. However, turning off scatter-gather
> helped, on both HP DL360G8 (igb) and IBM JS22 (ehea).
>

Think to also disable GSO

Alternatively, you can use "mtu 65536" to allow TBF to send packets
bigger than 2048 bytes

^ permalink raw reply

* [PATCH 0/2] qlcnic: fix warnings
From: Sony Chacko @ 2012-12-18 17:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Dept_NX_Linux_NIC_Driver, Sony Chacko

From: Sony Chacko <sony.chacko@qlogic.com>

Please apply to net-next.

Thanks,
Sony

^ permalink raw reply

* [PATCH 1/2] qlcnic: fix unused variable warnings
From: Sony Chacko @ 2012-12-18 17:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Dept_NX_Linux_NIC_Driver, Shahed Shaikh
In-Reply-To: <1355853591-29917-1-git-send-email-sony.chacko@qlogic.com>

From: Shahed Shaikh <shahed.shaikh@qlogic.com>

qlcnic_hw.c:370: warning: variable cmd_desc set but not used
qlcnic_hw.c:368: warning: variable consumer set but not used
qlcnic_main.c:448: warning: variable ref_count set but not used
qlcnic_main.c:534: warning: variable mem_base set but not used
qlcnic_ctx.c:137: warning: variable tmp_tmpl set but not used
qlcnic_ctx.c:133: warning: variable version set but not used
qlcnic_minidump.c:200: warning: variable opcode set but not used

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c    |    5 +++--
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c     |    5 ++---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c   |    5 -----
 .../net/ethernet/qlogic/qlcnic/qlcnic_minidump.c   |    3 +--
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c
index 58f094c..b14b8f0 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c
@@ -134,7 +134,7 @@ int qlcnic_fw_cmd_get_minidump_temp(struct qlcnic_adapter *adapter)
 	__le32 *tmp_buf;
 	struct qlcnic_cmd_args cmd;
 	struct qlcnic_hardware_context *ahw;
-	struct qlcnic_dump_template_hdr *tmpl_hdr, *tmp_tmpl;
+	struct qlcnic_dump_template_hdr *tmpl_hdr;
 	dma_addr_t tmp_addr_t = 0;
 
 	ahw = adapter->ahw;
@@ -150,6 +150,8 @@ int qlcnic_fw_cmd_get_minidump_temp(struct qlcnic_adapter *adapter)
 	}
 	temp_size = cmd.rsp.arg2;
 	version = cmd.rsp.arg3;
+	dev_info(&adapter->pdev->dev,
+		 "minidump template version = 0x%x", version);
 	if (!temp_size)
 		return -EIO;
 
@@ -174,7 +176,6 @@ int qlcnic_fw_cmd_get_minidump_temp(struct qlcnic_adapter *adapter)
 		err = -EIO;
 		goto error;
 	}
-	tmp_tmpl = tmp_addr;
 	ahw->fw_dump.tmpl_hdr = vzalloc(temp_size);
 	if (!ahw->fw_dump.tmpl_hdr) {
 		err = -EIO;
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c
index fc48e00..7a6d5eb 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c
@@ -365,7 +365,7 @@ static int
 qlcnic_send_cmd_descs(struct qlcnic_adapter *adapter,
 		struct cmd_desc_type0 *cmd_desc_arr, int nr_desc)
 {
-	u32 i, producer, consumer;
+	u32 i, producer;
 	struct qlcnic_cmd_buffer *pbuf;
 	struct cmd_desc_type0 *cmd_desc;
 	struct qlcnic_host_tx_ring *tx_ring;
@@ -379,7 +379,6 @@ qlcnic_send_cmd_descs(struct qlcnic_adapter *adapter,
 	__netif_tx_lock_bh(tx_ring->txq);
 
 	producer = tx_ring->producer;
-	consumer = tx_ring->sw_consumer;
 
 	if (nr_desc >= qlcnic_tx_avail(tx_ring)) {
 		netif_tx_stop_queue(tx_ring->txq);
@@ -402,7 +401,7 @@ qlcnic_send_cmd_descs(struct qlcnic_adapter *adapter,
 		pbuf->frag_count = 0;
 
 		memcpy(&tx_ring->desc_head[producer],
-			&cmd_desc_arr[i], sizeof(struct cmd_desc_type0));
+		       cmd_desc, sizeof(struct cmd_desc_type0));
 
 		producer = get_next_index(producer, tx_ring->num_desc);
 		i++;
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index a7554d9..d833f59 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -445,13 +445,10 @@ static int
 qlcnic_set_function_modes(struct qlcnic_adapter *adapter)
 {
 	u8 id;
-	u32 ref_count;
 	int i, ret = 1;
 	u32 data = QLCNIC_MGMT_FUNC;
 	struct qlcnic_hardware_context *ahw = adapter->ahw;
 
-	/* If other drivers are not in use set their privilege level */
-	ref_count = QLCRD32(adapter, QLCNIC_CRB_DRV_ACTIVE);
 	ret = qlcnic_api_lock(adapter);
 	if (ret)
 		goto err_lock;
@@ -531,11 +528,9 @@ static int qlcnic_setup_pci_map(struct pci_dev *pdev,
 {
 	u32 offset;
 	void __iomem *mem_ptr0 = NULL;
-	resource_size_t mem_base;
 	unsigned long mem_len, pci_len0 = 0, bar0_len;
 
 	/* remap phys address */
-	mem_base = pci_resource_start(pdev, 0);	/* 0 is for BAR 0 */
 	mem_len = pci_resource_len(pdev, 0);
 
 	qlcnic_get_bar_length(pdev->device, &bar0_len);
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c
index 12ff292..0b8d862 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c
@@ -197,7 +197,7 @@ static u32 qlcnic_dump_ctrl(struct qlcnic_adapter *adapter,
 	int i, k, timeout = 0;
 	void __iomem *base = adapter->ahw->pci_base0;
 	u32 addr, data;
-	u8 opcode, no_ops;
+	u8 no_ops;
 	struct __ctrl *ctr = &entry->region.ctrl;
 	struct qlcnic_dump_template_hdr *t_hdr = adapter->ahw->fw_dump.tmpl_hdr;
 
@@ -206,7 +206,6 @@ static u32 qlcnic_dump_ctrl(struct qlcnic_adapter *adapter,
 
 	for (i = 0; i < no_ops; i++) {
 		k = 0;
-		opcode = 0;
 		for (k = 0; k < 8; k++) {
 			if (!(ctr->opcode & (1 << k)))
 				continue;
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/2] qlcnic: update driver version
From: Sony Chacko @ 2012-12-18 17:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Dept_NX_Linux_NIC_Driver, Signed-off-by: Sony Chacko
In-Reply-To: <1355853591-29917-1-git-send-email-sony.chacko@qlogic.com>

From: Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
index 5379024..bc7ec64 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
@@ -36,8 +36,8 @@
 
 #define _QLCNIC_LINUX_MAJOR 5
 #define _QLCNIC_LINUX_MINOR 0
-#define _QLCNIC_LINUX_SUBVERSION 29
-#define QLCNIC_LINUX_VERSIONID  "5.0.29"
+#define _QLCNIC_LINUX_SUBVERSION 30
+#define QLCNIC_LINUX_VERSIONID  "5.0.30"
 #define QLCNIC_DRV_IDC_VER  0x01
 #define QLCNIC_DRIVER_VERSION  ((_QLCNIC_LINUX_MAJOR << 16) |\
 		 (_QLCNIC_LINUX_MINOR << 8) | (_QLCNIC_LINUX_SUBVERSION))
-- 
1.7.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox