Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net] ipx: restore token ring define to include/linux/ipx.h
From: Paul Gortmaker @ 2012-05-23 14:43 UTC (permalink / raw)
  To: davem; +Cc: netdev, Paul Gortmaker, Stephen Hemminger

Commit 211ed865108e24697b44bee5daac502ee6bdd4a4

    "net: delete all instances of special processing for token ring"

removed the define for IPX_FRAME_TR_8022.

While it is unlikely, we can't be 100% sure that there aren't
random userspace consumers of this value, so restore it.

The only instance I could find was in ncpfs-2.2.6, and it was
safe as-is, since it used #ifdef IPX_FRAME_TR_8022 around the
two use cases it had, but there may be other userspace packages
without similar ifdefs.

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

diff --git a/include/linux/ipx.h b/include/linux/ipx.h
index 8f02439..3d48014 100644
--- a/include/linux/ipx.h
+++ b/include/linux/ipx.h
@@ -38,7 +38,7 @@ struct ipx_interface_definition {
 #define IPX_FRAME_8022		2
 #define IPX_FRAME_ETHERII	3
 #define IPX_FRAME_8023		4
-/* obsolete token ring was	5 */
+#define IPX_FRAME_TR_8022       5 /* obsolete */
 	unsigned char ipx_special;
 #define IPX_SPECIAL_NONE	0
 #define IPX_PRIMARY		1
-- 
1.7.9.1

^ permalink raw reply related

* Re: Kernel doesn't propagate DNSSL to userspace (e.g. NetworkManager)
From: Dan Williams @ 2012-05-23 14:46 UTC (permalink / raw)
  To: Pavel Simerda; +Cc: netdev, Dan Williams, danw
In-Reply-To: <1337783065.25721.1.camel@dcbw.foobar.com>

On Wed, 2012-05-23 at 09:24 -0500, Dan Williams wrote:
> On Wed, 2012-05-23 at 04:16 -0400, Pavel Simerda wrote:
> > Hi,
> > 
> > I filed a bugreport to about lack of DNSSL support in kernel:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=824121
> > 
> > While NetworkManager recieves RDNSS neighbor discovery user option from kernel, it doesn't recieve DNSSL at all. This can be debugged with NetworkManager
> > (or hopefully some better testing tool) and radvdump (to check if DNSSL is present).
> > 
> > radvdump reports DNSSL is there, NetworkManager gets no netlink message from kernel.
> > 
> > kernel-3.3.4-4.fc17.x86_64
> > NetworkManager-0.9.4.0-7.git20120403.fc17.x86_64
> > 
> > Dave Jones asked me to contact this ML directly. I'm offlist.
> 
> So Pierre Ossman sent some RFC patches for DNSSL back in December 2010,
> but Dave Miller wanted actual formal submissions which Pierre never got
> around to doing.  I'll resubmit Pierre's second patch, which exports all
> options the kernel doesn't care about to userspace.
> 
> Sun, 12 Dec 2010:
> "[RFC][PATCH] Export all RA options that we don't handle to userspace"

Well, it appears that e35f30c1 from 2012-04-06 (should be in the 3.4
kernel?) actually adds support for passing DNSSL to userspace, basically
what Pierre's first patch from 2010 did.  So thanks Alexey :)  I think
we should be good now for DNSSL?

Dan

^ permalink raw reply

* Re: [RFC:kvm] export host NUMA info to guest & make emulated device NUMA attr
From: Andrew Theurer @ 2012-05-23 14:52 UTC (permalink / raw)
  To: Liu ping fan
  Cc: Shirley Ma, kvm, netdev, linux-kernel, qemu-devel, Avi Kivity,
	Michael S. Tsirkin, Srivatsa Vaddagiri, Rusty Russell,
	Anthony Liguori, Ryan Harper, Shirley Ma, Krishna Kumar,
	Tom Lendacky
In-Reply-To: <CAFgQCTsdXqitVGKm5jQjtC--yvZy6x04zRduJ+_tXqUnkncdog@mail.gmail.com>

On 05/22/2012 04:28 AM, Liu ping fan wrote:
> On Sat, May 19, 2012 at 12:14 AM, Shirley Ma<mashirle@us.ibm.com>  wrote:
>> On Thu, 2012-05-17 at 17:20 +0800, Liu Ping Fan wrote:
>>> Currently, the guest can not know the NUMA info of the vcpu, which
>>> will
>>> result in performance drawback.
>>>
>>> This is the discovered and experiment by
>>>          Shirley Ma<xma@us.ibm.com>
>>>          Krishna Kumar<krkumar2@in.ibm.com>
>>>          Tom Lendacky<toml@us.ibm.com>
>>> Refer to -
>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html
>>> we can see the big perfermance gap between NUMA aware and unaware.
>>>
>>> Enlightened by their discovery, I think, we can do more work -- that
>>> is to
>>> export NUMA info of host to guest.
>>
>> There three problems we've found:
>>
>> 1. KVM doesn't support NUMA load balancer. Even there are no other
>> workloads in the system, and the number of vcpus on the guest is smaller
>> than the number of cpus per node, the vcpus could be scheduled on
>> different nodes.
>>
>> Someone is working on in-kernel solution. Andrew Theurer has a working
>> user-space NUMA aware VM balancer, it requires libvirt and cgroups
>> (which is default for RHEL6 systems).
>>
> Interesting, and I found that "sched/numa: Introduce
> sys_numa_{t,m}bind()" committed by Peter and Ingo may help.
> But I think from the guest view, it can not tell whether the two vcpus
> are on the same host node. For example,
> vcpu-a in node-A is not vcpu-b in node-B, the guest lb will be more
> expensive if it pull_task from vcpu-a and
> choose vcpu-b to push.  And my idea is to export such info to guest,
> still working on it.

The long term solution is to two-fold:
1) Guests that are quite large (in that they cannot fit in a host NUMA 
node) must have static mulit-node NUMA topology implemented by Qemu. 
That is here today, but we do not do it automatically, which is probably 
going to be a VM management responsibility.
2) Host scheduler and NUMA code must be enhanced to get better placement 
of Qemu memory and threads.  For single-node vNUMA guests, this is easy, 
put it all in one node.  For mulit-node vNUMA guests, the host must 
understand that some Qemu memory belongs with certain vCPU threads 
(which make up one of the guests vNUMA nodes), and then place that 
memory/threads in a specific host node (and continue for other 
memory/threads for each Qemu vNUMA node).

Note that even if a guest's memory/threads for a vNUMA node are 
relocated to another host node (which will be necessary) the NUMA 
characteristics of guest are still maintained (as all those vCPUs and 
memory are still "close" to each other).

The problem with exposing the host's NUMA info directly to the guest is 
that (1) vCPUs will get relocated, so their topology info in the guest 
will have to change over time. IMO that is a bad idea.  We have a hard 
enough time getting applications to work with a static NUMA info.  To 
get applications to react to changing NUMA topology is not going to turn 
out well. (2) Every single guest would have to have the same number of 
NUMA nodes defined as the host.  That is overkill, especially for small 
guests.
>
>
>> 2. The host scheduler is not aware the relationship between guest vCPUs
>> and vhost. So it's possible for host scheduler to schedule per-device
>> vhost thread on the same cpu on which the vCPU kick a TX packet, or
>> schecule vhost thread on different node than the vCPU for; For RX packet
>> it's possible for vhost delivers RX packet on the vCPU running on
>> different node too.
>>
> Yes. I notice this point in your original patch.
>
>> 3. per-device vhost thread is not scaled.
>>
> What about the scale-ability of per-vm * host_NUMA_NODE? When we make
> advantage of multi-core,  we produce mulit vcpu threads for one VM.
> So what about the emulated device? Is it acceptable to scale to take
> advantage of host NUMA attr.  After all, how many nodes on which the
> VM
> can be run on are the user's control.  It is a balance of
> scale-ability and performance.
>
>> So the problems are in host scheduling and vhost thread scalability. I
>> am not sure how much help from exposing NUMA info from host to guest.
>>
>> Have you tested these patched? How much performance gain here?
>>
> Sorry, not yet.  As you have mentioned, the vhost thread scalability
> is a big problem. So I want to see others' opinion before going on.
>
> Thanks and regards,
> pingfan
>
>
>> Thanks
>> Shirley
>>
>>> So here comes the idea:
>>> 1. export host numa info through guest's sched domain to its scheduler
>>>    Export vcpu's NUMA info to guest scheduler(I think mem NUMA problem
>>>    has been handled by host).  So the guest's lb will consider the
>>> cost.
>>>    I am still working on this, and my original idea is to export these
>>> info
>>>    through "static struct sched_domain_topology_level
>>> *sched_domain_topology"
>>>    to guest.
>>>
>>> 2. Do a better emulation of virt mach exported to guest.
>>>    In real world, the devices are limited by kinds of reasons to own
>>> the NUMA
>>>    property. But as to Qemu, the device is emulated by thread, which
>>> inherit
>>>    the NUMA attr in nature.  We can implement the device as components
>>> of many
>>>    logic units, each of the unit is backed by a thread in different
>>> host node.
>>>    Currently, I want to start the work on vhost. But I think, maybe in
>>>    future, the iothread in Qemu can also has such attr.
>>>
>>>
>>> Forgive me, for the limited time, I can not have more better
>>> understand of
>>> vhost/virtio_net drivers. These patches are just draft, _FAR_, _FAR_
>>> from work.
>>> I will do more detail work for them in future.
>>>
>>> To easy the review, the following is the sum up of the 2nd point of
>>> the idea.
>>> As for the 1st point of the idea, it is not reflected in the patches.
>>>
>>> --spread/shrink the vhost_workers over the host nodes as demanded from
>>> Qemu.
>>>    And we can consider each vhost_worker as an independent net logic
>>> device
>>>    embeded in physical device "vhost_net".  At the meanwhile, we spread
>>> vcpu
>>>    threads over the host node.
>>>    The vrings on guest are allocated PAGE_SIZE align separately, so
>>> they can
>>>    will only be mapped into different host node, so vhost_worker in the
>>> same
>>>    node can access it with the least cost. So does the vq on guest.
>>>
>>> --virtio_net driver will changes and talk with the logic device. And
>>> which
>>>    logic device it will talk to is determined by on which vcpu it is
>>> scheduled.
>>>
>>> --the binding of vcpus and vhost_worker is implemented by:
>>>    for call direction, vq-a in the node-A will have a dedicated irq-a.
>>> And
>>>    we set the irq-a's affinity to vcpus in node-A.
>>>    for kick direction, kick register-b trigger different eventfd-b
>>> which wake up
>>>    vhost_worker-b.
>>>
-Andrew Theurer

^ permalink raw reply

* Re: WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
From: Eric Dumazet @ 2012-05-23 15:08 UTC (permalink / raw)
  To: Sergio Correia; +Cc: netdev
In-Reply-To: <CAJyhjX2f_XFgVZpS1hLy5CWKY7eYA5MzQSHsfRgc40-2Rxztkw@mail.gmail.com>

On Tue, 2012-05-22 at 11:47 -0300, Sergio Correia wrote:
> Hi Eric,
...
> Yes, it's an Atheros AR9285 adapter.
> This morning I did a make mrproper before rebuilding the kernel
> (should I always do that?), but the warning has just appeared again.

OK, I am taking a look at this problem, thanks.

^ permalink raw reply

* Re: [RFC:kvm] export host NUMA info to guest & make emulated device NUMA attr
From: Michael S. Tsirkin @ 2012-05-23 15:16 UTC (permalink / raw)
  To: Andrew Theurer
  Cc: Krishna Kumar, Rusty Russell, Shirley Ma, kvm, netdev, Shirley Ma,
	qemu-devel, Liu ping fan, linux-kernel, Tom Lendacky, Ryan Harper,
	Avi Kivity, Anthony Liguori, Srivatsa Vaddagiri
In-Reply-To: <4FBCF99F.4070409@linux.vnet.ibm.com>

On Wed, May 23, 2012 at 09:52:15AM -0500, Andrew Theurer wrote:
> On 05/22/2012 04:28 AM, Liu ping fan wrote:
> >On Sat, May 19, 2012 at 12:14 AM, Shirley Ma<mashirle@us.ibm.com>  wrote:
> >>On Thu, 2012-05-17 at 17:20 +0800, Liu Ping Fan wrote:
> >>>Currently, the guest can not know the NUMA info of the vcpu, which
> >>>will
> >>>result in performance drawback.
> >>>
> >>>This is the discovered and experiment by
> >>>         Shirley Ma<xma@us.ibm.com>
> >>>         Krishna Kumar<krkumar2@in.ibm.com>
> >>>         Tom Lendacky<toml@us.ibm.com>
> >>>Refer to -
> >>>http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html
> >>>we can see the big perfermance gap between NUMA aware and unaware.
> >>>
> >>>Enlightened by their discovery, I think, we can do more work -- that
> >>>is to
> >>>export NUMA info of host to guest.
> >>
> >>There three problems we've found:
> >>
> >>1. KVM doesn't support NUMA load balancer. Even there are no other
> >>workloads in the system, and the number of vcpus on the guest is smaller
> >>than the number of cpus per node, the vcpus could be scheduled on
> >>different nodes.
> >>
> >>Someone is working on in-kernel solution. Andrew Theurer has a working
> >>user-space NUMA aware VM balancer, it requires libvirt and cgroups
> >>(which is default for RHEL6 systems).
> >>
> >Interesting, and I found that "sched/numa: Introduce
> >sys_numa_{t,m}bind()" committed by Peter and Ingo may help.
> >But I think from the guest view, it can not tell whether the two vcpus
> >are on the same host node. For example,
> >vcpu-a in node-A is not vcpu-b in node-B, the guest lb will be more
> >expensive if it pull_task from vcpu-a and
> >choose vcpu-b to push.  And my idea is to export such info to guest,
> >still working on it.
> 
> The long term solution is to two-fold:
> 1) Guests that are quite large (in that they cannot fit in a host
> NUMA node) must have static mulit-node NUMA topology implemented by
> Qemu. That is here today, but we do not do it automatically, which
> is probably going to be a VM management responsibility.
> 2) Host scheduler and NUMA code must be enhanced to get better
> placement of Qemu memory and threads.  For single-node vNUMA guests,
> this is easy, put it all in one node.  For mulit-node vNUMA guests,
> the host must understand that some Qemu memory belongs with certain
> vCPU threads (which make up one of the guests vNUMA nodes), and then
> place that memory/threads in a specific host node (and continue for
> other memory/threads for each Qemu vNUMA node).

And for IO, we need multiqueue devices such that each
node can have its own queue in its local memory.

-- 
MST

^ permalink raw reply

* [PATCH] dca: check against empty dca_domains list before unregister provider
From: Maciej Sosnowski @ 2012-05-23 15:27 UTC (permalink / raw)
  To: dan.j.williams; +Cc: jiang.liu, chenkeping, linux-kernel, netdev, linux-pci

When providers get blocked unregister_dca_providers() is called ending up
with dca_providers and dca_domain lists emptied. Dca should be prevented from
trying to unregister any provider if dca_domain list is found empty.

Reported-by: Jiang Liu <jiang.liu@huawei.com>
Tested-by: Gaohuai Han <hangaohuai@huawei.com>
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
---

 drivers/dca/dca-core.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/dca/dca-core.c b/drivers/dca/dca-core.c
index bc6f5fa..819dfda 100644
--- a/drivers/dca/dca-core.c
+++ b/drivers/dca/dca-core.c
@@ -420,6 +420,11 @@ void unregister_dca_provider(struct dca_
 
 	raw_spin_lock_irqsave(&dca_lock, flags);
 
+	if (list_empty(&dca_domains)) {
+		raw_spin_unlock_irqrestore(&dca_lock, flags);
+		return;
+	}
+
 	list_del(&dca->node);
 
 	pci_rc = dca_pci_rc_from_dev(dev);

^ permalink raw reply related

* Re: WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
From: Sergio Correia @ 2012-05-23 15:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1337785681.3361.2923.camel@edumazet-glaptop>

On Wed, May 23, 2012 at 12:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2012-05-22 at 11:47 -0300, Sergio Correia wrote:
>> Hi Eric,
> ...
>> Yes, it's an Atheros AR9285 adapter.
>> This morning I did a make mrproper before rebuilding the kernel
>> (should I always do that?), but the warning has just appeared again.
>
> OK, I am taking a look at this problem, thanks.
>

Thanks. Let me know if you need additional info. As of now, my dmesg
basically shows only those warnings.

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Alexander Duyck @ 2012-05-23 16:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Kieran Mansley, Jeff Kirsher, Ben Hutchings, netdev
In-Reply-To: <1337774978.3361.2744.camel@edumazet-glaptop>

On 05/23/2012 05:09 AM, Eric Dumazet wrote:
> On Wed, 2012-05-23 at 11:44 +0200, Eric Dumazet wrote:
>
>> I believe that as soon as ixgbe can use build_skb() and avoid the 1024
>> bytes overhead per skb, it should go away.
>
> Here is the patch for ixgbe, for reference.
I'm confused as to what this is trying to accomplish.

Currently the way the ixgbe driver works is that we allocate the
skb->head using netdev_alloc_skb, which after your recent changes should
be using a head frag.  If the buffer is less than 256 bytes we have
pushed the entire buffer into the head frag, and if it is more we only
pull everything up to the end of the TCP header.  In either case if we
are merging TCP flows we should be able to drop one page or the other
along with the sk_buff giving us a total truesize addition after merge
of ~1K for less than 256 bytes or 2K for a full sized frame.

I'll try to take a look at this today as it is in our interest to have
TCP performing as well as possible on ixgbe.

Thanks,

Alex


 
>
> My machine is now able to receive a netperf TCP_STREAM full speed
> (10Gb), even with GRO/LRO off. (TCPRcvCoalesce counter increasing very
> fast too)
>
> Its not an official patch yet, because :
>
> 1) I need to properly align DMA buffers to reserve NET_SKB_PAD bytes
> (not all workloads are like TCP, and some headroom is needed for
> tunnels)
>
> 2) Must cope with MTU > 1500 cases
>
> 3) Should not be done if NET_IP_ALIGN is not null (I dont know if ixgbe
> hardware can do the DMA to non aligned area on receive)
>
> This patch saves 1024 bytes per incoming skb. (skb->head directly mapped
> to the frag containing the frame, instead of a separate memory area)
>
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   82 ++++++++--------
>  1 file changed, 46 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index bf20457..d05693a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -1511,39 +1511,41 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
>  		return true;
>  	}
>  
> -	/*
> -	 * it is valid to use page_address instead of kmap since we are
> -	 * working with pages allocated out of the lomem pool per
> -	 * alloc_page(GFP_ATOMIC)
> -	 */
> -	va = skb_frag_address(frag);
> +	if (!skb_headlen(skb)) {
> +		/*
> +		 * it is valid to use page_address instead of kmap since we are
> +		 * working with pages allocated out of the lowmem pool per
> +		 * alloc_page(GFP_ATOMIC)
> +		 */
> +		va = skb_frag_address(frag);
>  
> -	/*
> -	 * we need the header to contain the greater of either ETH_HLEN or
> -	 * 60 bytes if the skb->len is less than 60 for skb_pad.
> -	 */
> -	pull_len = skb_frag_size(frag);
> -	if (pull_len > 256)
> -		pull_len = ixgbe_get_headlen(va, pull_len);
> +		/*
> +		 * we need the header to contain the greater of either ETH_HLEN or
> +		 * 60 bytes if the skb->len is less than 60 for skb_pad.
> +		 */
> +		pull_len = skb_frag_size(frag);
> +		if (pull_len > 256)
> +			pull_len = ixgbe_get_headlen(va, pull_len);
>  
> -	/* align pull length to size of long to optimize memcpy performance */
> -	skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
> +		/* align pull length to size of long to optimize memcpy performance */
> +		skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
>  
> -	/* update all of the pointers */
> -	skb_frag_size_sub(frag, pull_len);
> -	frag->page_offset += pull_len;
> -	skb->data_len -= pull_len;
> -	skb->tail += pull_len;
> +		/* update all of the pointers */
> +		skb_frag_size_sub(frag, pull_len);
> +		frag->page_offset += pull_len;
> +		skb->data_len -= pull_len;
> +		skb->tail += pull_len;
>  
> -	/*
> -	 * if we sucked the frag empty then we should free it,
> -	 * if there are other frags here something is screwed up in hardware
> -	 */
> -	if (skb_frag_size(frag) == 0) {
> -		BUG_ON(skb_shinfo(skb)->nr_frags != 1);
> -		skb_shinfo(skb)->nr_frags = 0;
> -		__skb_frag_unref(frag);
> -		skb->truesize -= ixgbe_rx_bufsz(rx_ring);
> +		/*
> +		 * if we sucked the frag empty then we should free it,
> +		 * if there are other frags here something is screwed up in hardware
> +		 */
> +		if (skb_frag_size(frag) == 0) {
> +			BUG_ON(skb_shinfo(skb)->nr_frags != 1);
> +			skb_shinfo(skb)->nr_frags = 0;
> +			__skb_frag_unref(frag);
> +			skb->truesize -= ixgbe_rx_bufsz(rx_ring);
> +		}
>  	}
>  
>  	/* if skb_pad returns an error the skb was freed */
> @@ -1662,6 +1664,8 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>  		struct sk_buff *skb;
>  		struct page *page;
>  		u16 ntc;
> +		unsigned int len;
> +		bool addfrag = true;
>  
>  		/* return some buffers to hardware, one at a time is too slow */
>  		if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) {
> @@ -1687,7 +1691,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>  		prefetchw(page);
>  
>  		skb = rx_buffer->skb;
> -
> +		len = le16_to_cpu(rx_desc->wb.upper.length);
>  		if (likely(!skb)) {
>  			void *page_addr = page_address(page) +
>  					  rx_buffer->page_offset;
> @@ -1698,9 +1702,14 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>  			prefetch(page_addr + L1_CACHE_BYTES);
>  #endif
>  
> -			/* allocate a skb to store the frags */
> -			skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
> -							IXGBE_RX_HDR_SIZE);
> +			/* allocate a skb to store the frag */
> +			if (len <= 256) {
> +				skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
> +								256);
> +			} else {
> +				skb = build_skb(page_addr, ixgbe_rx_bufsz(rx_ring));
> +				addfrag = false;
> +			}
>  			if (unlikely(!skb)) {
>  				rx_ring->rx_stats.alloc_rx_buff_failed++;
>  				break;
> @@ -1729,9 +1738,10 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>  						      DMA_FROM_DEVICE);
>  		}
>  
> -		/* pull page into skb */
> -		ixgbe_add_rx_frag(rx_ring, rx_buffer, skb,
> -				  le16_to_cpu(rx_desc->wb.upper.length));
> +		if (addfrag)
> +			ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, len);
> +		else
> +			__skb_put(skb, len);
>  
>  		if (ixgbe_can_reuse_page(rx_buffer)) {
>  			/* hand second half of page back to the ring */
>
>

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Eric Dumazet @ 2012-05-23 16:12 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Kieran Mansley, Jeff Kirsher, Ben Hutchings, netdev
In-Reply-To: <4FBD0A85.4040407@intel.com>

On Wed, 2012-05-23 at 09:04 -0700, Alexander Duyck wrote:
> On 05/23/2012 05:09 AM, Eric Dumazet wrote:
> > On Wed, 2012-05-23 at 11:44 +0200, Eric Dumazet wrote:
> >
> >> I believe that as soon as ixgbe can use build_skb() and avoid the 1024
> >> bytes overhead per skb, it should go away.
> >
> > Here is the patch for ixgbe, for reference.
> I'm confused as to what this is trying to accomplish.
> 
> Currently the way the ixgbe driver works is that we allocate the
> skb->head using netdev_alloc_skb, which after your recent changes should
> be using a head frag.  If the buffer is less than 256 bytes we have
> pushed the entire buffer into the head frag, and if it is more we only
> pull everything up to the end of the TCP header.  In either case if we
> are merging TCP flows we should be able to drop one page or the other
> along with the sk_buff giving us a total truesize addition after merge
> of ~1K for less than 256 bytes or 2K for a full sized frame.
> 
> I'll try to take a look at this today as it is in our interest to have
> TCP performing as well as possible on ixgbe.

With current driver, a MTU=1500 frame uses :

sk_buff (256 bytes)
skb->head : 1024 bytes  (or more exaclty now : 512 + 384)
one fragment of 2048 bytes

At skb free time,  one kfree(sk_buff) and two put_page().

After this patch :

sk_buff (256 bytes)
skb->head : 2048 bytes 

At skb free time, one kfree(sk_buff) and only one put_page().

Note that my patch doesnt change the 256 bytes threshold: Small frames
wont have one fragment and their use is :

sk_buff (256 bytes)
skb->head : 512 + 384 bytes 

^ permalink raw reply

* Re: WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
From: Eric Dumazet @ 2012-05-23 16:36 UTC (permalink / raw)
  To: Sergio Correia; +Cc: netdev
In-Reply-To: <CAJyhjX02wjsVE2pM26a+7xTrAJwvLZo0UL-W6h8zMKEiC5qTug@mail.gmail.com>

On Wed, 2012-05-23 at 12:56 -0300, Sergio Correia wrote:
> On Wed, May 23, 2012 at 12:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2012-05-22 at 11:47 -0300, Sergio Correia wrote:
> >> Hi Eric,
> > ...
> >> Yes, it's an Atheros AR9285 adapter.
> >> This morning I did a make mrproper before rebuilding the kernel
> >> (should I always do that?), but the warning has just appeared again.
> >
> > OK, I am taking a look at this problem, thanks.
> >
> 
> Thanks. Let me know if you need additional info. As of now, my dmesg
> basically shows only those warnings.

I believe I found the bug and am testing a fix right now.

By the way, we might have the same problem in tcp collapses.

TCP coalescing (introduced in linux-3.5) triggers the problem faster.

Please test following patch :

 net/ipv4/tcp_input.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index cfa2aa1..b224eb8 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4555,6 +4555,11 @@ static bool tcp_try_coalesce(struct sock *sk,
 
 	if (tcp_hdr(from)->fin)
 		return false;
+
+	/* Its possible this segment overlaps with prior segment in queue */
+	if (TCP_SKB_CB(from)->seq != TCP_SKB_CB(to)->end_seq)
+		return false;
+
 	if (!skb_try_coalesce(to, from, fragstolen, &delta))
 		return false;
 

^ permalink raw reply related

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Eric Dumazet @ 2012-05-23 16:39 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Kieran Mansley, Jeff Kirsher, Ben Hutchings, netdev
In-Reply-To: <1337789530.3361.2992.camel@edumazet-glaptop>

On Wed, 2012-05-23 at 18:12 +0200, Eric Dumazet wrote:

> With current driver, a MTU=1500 frame uses :
> 
> sk_buff (256 bytes)
> skb->head : 1024 bytes  (or more exaclty now : 512 + 384)

By the way, NET_SKB_PAD adds 64 bytes so its 64 + 512 + 384 = 960

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Alexander Duyck @ 2012-05-23 16:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Kieran Mansley, Jeff Kirsher, Ben Hutchings, netdev
In-Reply-To: <1337789530.3361.2992.camel@edumazet-glaptop>

On 05/23/2012 09:12 AM, Eric Dumazet wrote:
> On Wed, 2012-05-23 at 09:04 -0700, Alexander Duyck wrote:
>> On 05/23/2012 05:09 AM, Eric Dumazet wrote:
>>> On Wed, 2012-05-23 at 11:44 +0200, Eric Dumazet wrote:
>>>
>>>> I believe that as soon as ixgbe can use build_skb() and avoid the 1024
>>>> bytes overhead per skb, it should go away.
>>> Here is the patch for ixgbe, for reference.
>> I'm confused as to what this is trying to accomplish.
>>
>> Currently the way the ixgbe driver works is that we allocate the
>> skb->head using netdev_alloc_skb, which after your recent changes should
>> be using a head frag.  If the buffer is less than 256 bytes we have
>> pushed the entire buffer into the head frag, and if it is more we only
>> pull everything up to the end of the TCP header.  In either case if we
>> are merging TCP flows we should be able to drop one page or the other
>> along with the sk_buff giving us a total truesize addition after merge
>> of ~1K for less than 256 bytes or 2K for a full sized frame.
>>
>> I'll try to take a look at this today as it is in our interest to have
>> TCP performing as well as possible on ixgbe.
> With current driver, a MTU=1500 frame uses :
>
> sk_buff (256 bytes)
> skb->head : 1024 bytes  (or more exaclty now : 512 + 384)
> one fragment of 2048 bytes
>
> At skb free time,  one kfree(sk_buff) and two put_page().
>
> After this patch :
>
> sk_buff (256 bytes)
> skb->head : 2048 bytes 
>
> At skb free time, one kfree(sk_buff) and only one put_page().
>
> Note that my patch doesnt change the 256 bytes threshold: Small frames
> wont have one fragment and their use is :
>
> sk_buff (256 bytes)
> skb->head : 512 + 384 bytes 
>
>
Right, but the problem is that in order to make this work the we are
dropping the padding for head and hoping to have room for shared info. 
This is going to kill performance for things like routing workloads
since the entire head is going to have to be copied over to make space
for NET_SKB_PAD.  Also this assumes no RSC being enabled.  RSC is
normally enabled by default.  If it is turned on we are going to start
receiving full 2K buffers which will cause even more issues since there
wouldn't be any room for shared info in the 2K frame.

The way the driver is currently written probably provides the optimal
setup for truesize given the circumstances.  In order to support
receiving at least 1 full 1500 byte frame per fragment, and supporting
RSC I have to support receiving up to 2K of data.  If we try to make it
all part of one paged receive we would then have to either reduce the
receive buffer size to 1K in hardware and span multiple fragments for a
1.5K frame or allocate a 3K buffer so we would have room to add
NET_SKB_PAD and the shared info on the end.  At which point we are back
to the extra 1K again, only in that case we cannot trim it off later via
skb_try_coalesce.  In the 3K buffer case we would be over a 1/2 page
which means we can only get one buffer per page instead of 2 in which
case we might as well just round it up to 4K and be honest.

The reason I am confused is that I thought the skb_try_coalesce function
was supposed to be what addressed these types of issues.  If these
packets go through that function they should be stripping the sk_buff
and possibly even the skb->head if we used the fragment since the only
thing that is going to end up in the head would be the TCP header which
should have been pulled prior to trying to coalesce.

I will need to investigate this further to understand what is going on. 
I realize that dealing with 3K of memory for buffer storage is not
ideal, but all of the alternatives lean more toward 4K when fully
implemented.  I'll try and see what alternative solutions we might have
available.

Thanks,

Alex

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Alexander Duyck @ 2012-05-23 17:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Kieran Mansley, Jeff Kirsher, Ben Hutchings, netdev
In-Reply-To: <1337791189.3361.3029.camel@edumazet-glaptop>

On 05/23/2012 09:39 AM, Eric Dumazet wrote:
> On Wed, 2012-05-23 at 18:12 +0200, Eric Dumazet wrote:
>
>> With current driver, a MTU=1500 frame uses :
>>
>> sk_buff (256 bytes)
>> skb->head : 1024 bytes  (or more exaclty now : 512 + 384)
> By the way, NET_SKB_PAD adds 64 bytes so its 64 + 512 + 384 = 960
Actually pahole seems to be indicating to me the size of skb_shared_info
is 320, unless something has changed in the last few days.

When I get a chance I will try to remember to reduce the ixgbe header
size to 256 which should also help.  The only reason it is set to 512
was to deal with the fact that the old alloc_skb code wasn't aligning
the shared info with the end of whatever size was allocated and so the
512 was an approximation to make better use of the 1K slab allocation
back when we still were using hardware packet split.  That should help
to improve the page utilization for the headers since that would
increase the uses of a page from 4 to 6 for the skb head frag, and it
would drop truesize by another 256 bytes.

Thanks,

Alex

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Eric Dumazet @ 2012-05-23 17:24 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Kieran Mansley, Jeff Kirsher, Ben Hutchings, netdev
In-Reply-To: <4FBD1740.1020304@intel.com>

On Wed, 2012-05-23 at 09:58 -0700, Alexander Duyck wrote:

> Right, but the problem is that in order to make this work the we are
> dropping the padding for head and hoping to have room for shared info. 
> This is going to kill performance for things like routing workloads
> since the entire head is going to have to be copied over to make space
> for NET_SKB_PAD. 

Hey I said that one of the point I have to add to my patch. Please read
it again ;)

By the way, we can also add code doing the ksb->head upgrade to fragment
again in case we need to add a tunnel header, instead of full copy.

So maybe the NET_SKB_PAD is not really needed anymore.

Anyway, a router host could use a different allocation strategy (going
back to current one)

>  Also this assumes no RSC being enabled.  RSC is
> normally enabled by default.  If it is turned on we are going to start
> receiving full 2K buffers which will cause even more issues since there
> wouldn't be any room for shared info in the 2K frame.
> 

Hey his is one of the point I have to address, also mentioned.

Its almost trivial to check len (if we have room for shared info, take
it, if not allocate the head as before)


> The way the driver is currently written probably provides the optimal
> setup for truesize given the circumstances.

It unfortunate the hardware has 1KB granularity.


>   In order to support
> receiving at least 1 full 1500 byte frame per fragment, and supporting
> RSC I have to support receiving up to 2K of data.  If we try to make it
> all part of one paged receive we would then have to either reduce the
> receive buffer size to 1K in hardware and span multiple fragments for a
> 1.5K frame or allocate a 3K buffer so we would have room to add
> NET_SKB_PAD and the shared info on the end.  At which point we are back
> to the extra 1K again, only in that case we cannot trim it off later via
> skb_try_coalesce.  In the 3K buffer case we would be over a 1/2 page
> which means we can only get one buffer per page instead of 2 in which
> case we might as well just round it up to 4K and be honest.
> 
> The reason I am confused is that I thought the skb_try_coalesce function
> was supposed to be what addressed these types of issues.  If these
> packets go through that function they should be stripping the sk_buff
> and possibly even the skb->head if we used the fragment since the only
> thing that is going to end up in the head would be the TCP header which
> should have been pulled prior to trying to coalesce.
> 
> I will need to investigate this further to understand what is going on. 
> I realize that dealing with 3K of memory for buffer storage is not
> ideal, but all of the alternatives lean more toward 4K when fully
> implemented.  I'll try and see what alternative solutions we might have
> available.

Problem is skb_try_coalesce() is not used when we store packets in
socket backlog, and only used for TCP at this moment.

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: David Miller @ 2012-05-23 17:34 UTC (permalink / raw)
  To: eric.dumazet; +Cc: kmansley, bhutchings, netdev
In-Reply-To: <1337766246.3361.2447.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 23 May 2012 11:44:06 +0200

> Locking the socket for the whole operation (including copyout to user)
> is not very good. It was good enough years ago with small receive
> window.
> 
> With a potentially huge backlog, it means user process has to process
> it, regardless of its latency constraints. CPU caches are also
> completely destroyed because of huge amount of data included in thousand
> of skbs.

But it is the only way we can have TCP processing scheduled and
accounted to user processes.  That does have value when you have lots
of flows active.

The scheduler's ability to give the process cpu time influences
TCP's behavier, and under load if the process can't get enough
cpu time then TCP will back off.  We want that.

^ permalink raw reply

* Re: [PATCH 1/1] net: add dev_loopback_xmit() to avoid duplicate code
From: David Miller @ 2012-05-23 17:41 UTC (permalink / raw)
  To: michel
  Cc: netdev, linux-kernel, kuznet, jmorris, yoshfuji, kaber, edumazet,
	jpirko, mirq-linux, bhutchings
In-Reply-To: <1337782819.2779.20.camel@Thor>

I'm getting really tired of saying this.

As I announced several days ago, it is absolutely not appropriate
to submit patches other than bug fixes at this time because we are
in the merge window and the net-next tree is frozen.

And even once I do announce here that the net-next tree is open
once more, and patches like this one are appropriate, you must
indicate in the subject line which of the 'net' or 'net-next'
tree you are targetting your patch at.

Please pay attention to what's going on, and what state the networking
development trees are in, before submitting changes.

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Eric Dumazet @ 2012-05-23 17:46 UTC (permalink / raw)
  To: David Miller; +Cc: kmansley, bhutchings, netdev
In-Reply-To: <20120523.133401.915684077769386834.davem@davemloft.net>

On Wed, 2012-05-23 at 13:34 -0400, David Miller wrote:

> But it is the only way we can have TCP processing scheduled and
> accounted to user processes.  That does have value when you have lots
> of flows active.
> 
> The scheduler's ability to give the process cpu time influences
> TCP's behavier, and under load if the process can't get enough
> cpu time then TCP will back off.  We want that.

But TCP already backs off if user process is not blocked on socket
input.

Modern applications uses select()/poll()/epoll() on many sockets in //.

Only old ones stil block on recv().

^ permalink raw reply

* Re: [PATCH net] ipx: restore token ring define to include/linux/ipx.h
From: David Miller @ 2012-05-23 17:49 UTC (permalink / raw)
  To: paul.gortmaker; +Cc: netdev, shemminger
In-Reply-To: <1337784225-30641-1-git-send-email-paul.gortmaker@windriver.com>

From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Wed, 23 May 2012 10:43:45 -0400

> Commit 211ed865108e24697b44bee5daac502ee6bdd4a4
> 
>     "net: delete all instances of special processing for token ring"
> 
> removed the define for IPX_FRAME_TR_8022.
> 
> While it is unlikely, we can't be 100% sure that there aren't
> random userspace consumers of this value, so restore it.
> 
> The only instance I could find was in ncpfs-2.2.6, and it was
> safe as-is, since it used #ifdef IPX_FRAME_TR_8022 around the
> two use cases it had, but there may be other userspace packages
> without similar ifdefs.
> 
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

Applied, thanks Paul.

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: David Miller @ 2012-05-23 17:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: kmansley, bhutchings, netdev
In-Reply-To: <1337795210.3361.3118.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 23 May 2012 19:46:50 +0200

> But TCP already backs off if user process is not blocked on socket
> input.
> 
> Modern applications uses select()/poll()/epoll() on many sockets in //.
> 
> Only old ones stil block on recv().

These arguments seem circular.

Those modern applications still trigger enough TCP work during their
recv() calls that it's significant enough for scheduling purposes, and
to me being able to account that TCP work as process time is still
extremely beneficial.

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Alexander Duyck @ 2012-05-23 17:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Kieran Mansley, Jeff Kirsher, Ben Hutchings, netdev
In-Reply-To: <1337793866.3361.3090.camel@edumazet-glaptop>

On 05/23/2012 10:24 AM, Eric Dumazet wrote:
> On Wed, 2012-05-23 at 09:58 -0700, Alexander Duyck wrote:
>
>> Right, but the problem is that in order to make this work the we are
>> dropping the padding for head and hoping to have room for shared info. 
>> This is going to kill performance for things like routing workloads
>> since the entire head is going to have to be copied over to make space
>> for NET_SKB_PAD. 
> Hey I said that one of the point I have to add to my patch. Please read
> it again ;)
I'm aware of that, but still it seems like we are getting ahead of
ourselves.  This fix is so specific to just the socket backlog case that
I think we are missing the fact that it is going to have huge
performance repercussions elsewhere.

> By the way, we can also add code doing the ksb->head upgrade to fragment
> again in case we need to add a tunnel header, instead of full copy.
>
> So maybe the NET_SKB_PAD is not really needed anymore.
>
> Anyway, a router host could use a different allocation strategy (going
> back to current one)
The thing I don't like is that we are adding extra memcpy calls to all
of these paths.  We cannot change the head without having to copy the
shared info and there is going to be a cost for that.  I would prefer to
only generate the shared info once and not have to relocate it every
time I want to run a router or tunnel.

>>  Also this assumes no RSC being enabled.  RSC is
>> normally enabled by default.  If it is turned on we are going to start
>> receiving full 2K buffers which will cause even more issues since there
>> wouldn't be any room for shared info in the 2K frame.
>>
> Hey his is one of the point I have to address, also mentioned.
>
> Its almost trivial to check len (if we have room for shared info, take
> it, if not allocate the head as before)
I know you mentioned this as well.  The thing is I would prefer not to
add code where we are branching in so many different directions on what
the header actually looks like.  It tends to open a lot of opportunities
for bugs when someone makes a change and doesn't take one of the
possible head and fragment combinations into account.

>> The way the driver is currently written probably provides the optimal
>> setup for truesize given the circumstances.
> It unfortunate the hardware has 1KB granularity.
Agreed, I would have preferred 512B granularity, but we are locked in at
1K for now..  :-/

> Problem is skb_try_coalesce() is not used when we store packets in
> socket backlog, and only used for TCP at this moment.
One piece of low hanging fruit that is available to help with some of
this is to drop the Rx header size for ixgbe to 256.  That should at
least cut the total truesize for the buffer and sk_buff to 960 or so
which is at least a step in the right direction.

Thanks,

Alex

^ permalink raw reply

* Re: [net] stmmac: fix driver Kconfig when built as module
From: David Miller @ 2012-05-23 18:01 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, bhutchings, lliubbo, rayagond
In-Reply-To: <1337753142-13762-1-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Wed, 23 May 2012 08:05:42 +0200

> This patches fixes the driver when built as dyn module.
> In fact the platform part cannot be built and the probe fails
> (thanks to Bob Liu that reported this bug).
> The patch also makes the selection of Platform and PCI parts
> mutually exclusive.
> 
> Reported-by: Bob Liu <lliubbo@gmail.com>
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>

We have drivers which support both OF (which is implemented as
platform bus) and PCI at the same time.  For example,
drivers/net/ethernet/sun/niu.c

I do not see why stmmac cannot support both at the same time as well.

I absolutely do not want such segregation unless it is absolutely
necessary.  Because it means that no matter what is choosen, a piece
of code is disabled and therefore not getting build and/or runtime
validation.

^ permalink raw reply

* Re: WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
From: Sergio Correia @ 2012-05-23 18:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1337791018.3361.3024.camel@edumazet-glaptop>

On Wed, May 23, 2012 at 1:36 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2012-05-23 at 12:56 -0300, Sergio Correia wrote:
>> On Wed, May 23, 2012 at 12:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Tue, 2012-05-22 at 11:47 -0300, Sergio Correia wrote:
>> >> Hi Eric,
>> > ...
>> >> Yes, it's an Atheros AR9285 adapter.
>> >> This morning I did a make mrproper before rebuilding the kernel
>> >> (should I always do that?), but the warning has just appeared again.
>> >
>> > OK, I am taking a look at this problem, thanks.
>> >
>>
>> Thanks. Let me know if you need additional info. As of now, my dmesg
>> basically shows only those warnings.
>
> I believe I found the bug and am testing a fix right now.
>
> By the way, we might have the same problem in tcp collapses.
>
> TCP coalescing (introduced in linux-3.5) triggers the problem faster.
>
> Please test following patch :
>

I reverted back to 471368557a734c6c486ee757952c902b36e7fd01 and it
took almost one hour to trigger the warning. Now I have applied your
patch and will report back how it went after a few hours of testing.

^ permalink raw reply

* Re: WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
From: Eric Dumazet @ 2012-05-23 18:37 UTC (permalink / raw)
  To: Sergio Correia; +Cc: netdev
In-Reply-To: <CAJyhjX1E_yzT1Vte+3iqc-PnfLiEEdFEyM9GcOCs-6TrUm86XQ@mail.gmail.com>

On Wed, 2012-05-23 at 15:30 -0300, Sergio Correia wrote:
> On Wed, May 23, 2012 at 1:36 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Wed, 2012-05-23 at 12:56 -0300, Sergio Correia wrote:
> >> On Wed, May 23, 2012 at 12:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> > On Tue, 2012-05-22 at 11:47 -0300, Sergio Correia wrote:
> >> >> Hi Eric,
> >> > ...
> >> >> Yes, it's an Atheros AR9285 adapter.
> >> >> This morning I did a make mrproper before rebuilding the kernel
> >> >> (should I always do that?), but the warning has just appeared again.
> >> >
> >> > OK, I am taking a look at this problem, thanks.
> >> >
> >>
> >> Thanks. Let me know if you need additional info. As of now, my dmesg
> >> basically shows only those warnings.
> >
> > I believe I found the bug and am testing a fix right now.
> >
> > By the way, we might have the same problem in tcp collapses.
> >
> > TCP coalescing (introduced in linux-3.5) triggers the problem faster.
> >
> > Please test following patch :
> >
> 
> I reverted back to 471368557a734c6c486ee757952c902b36e7fd01 and it
> took almost one hour to trigger the warning. Now I have applied your
> patch and will report back how it went after a few hours of testing.

Thanks

I triggered it very fast in my lab using following setup

Sender machine :

# tc qdisc add dev eth0 root netem delay 1ms 3ms 20 reorder 10 20
for i in `seq 1 8`
do
  netperf -t OMNI  -C -c -H 172.30.42.8 -l 60 &
done
wait
# tc -s -d qd
qdisc netem 8002: dev eth0 root refcnt 2 limit 1000 delay 1.0ms  3.0ms
20% reorder 10% 20% gap 1
 Sent 66030032010 bytes 43992779 pkt (dropped 13846, overlimits 0
requeues 2712184) 
 backlog 0b 0p requeues 2712184 

receiver machine runs a netserver and triggers the bug in few seconds.

(receiver being a slow machine, with r8169 NIC)

^ permalink raw reply

* GET BACK TO ME ASAP.
From: Mrs Anna Kennedy @ 2012-05-23 18:59 UTC (permalink / raw)


Good day my beloved friend,

How are you and your lovely family doing today,i hope all is well?if so glory 
be to God,i have an urgent purposal for you, if interested kindly contact me on 
this e-mail (anna_kennedy_hood@hotmail.co.uk)

^ permalink raw reply

* Re: [PATCH v6 2/2] decrement static keys on real destroy time
From: Andrew Morton @ 2012-05-23 20:33 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, devel, kamezawa.hiroyu, netdev, Tejun Heo,
	Li Zefan, Johannes Weiner, Michal Hocko, David Miller
In-Reply-To: <4FBCAAF4.4030803@parallels.com>

On Wed, 23 May 2012 13:16:36 +0400
Glauber Costa <glommer@parallels.com> wrote:

> On 05/23/2012 02:46 AM, Andrew Morton wrote:
> > Here, we're open-coding kinda-test_bit().  Why do that?  These flags are
> > modified with set_bit() and friends, so we should read them with the
> > matching test_bit()?
> 
> My reasoning was to be as cheap as possible, as you noted yourself two
> paragraphs below.

These aren't on any fast path, are they?

Plus: you failed in that objective!  The C compiler's internal
scalar->bool conversion makes these functions no more efficient than
test_bit().

> > So here are suggested changes from*some*  of the above discussion.
> > Please consider, incorporate, retest and send us a v7?
> 
> How do you want me to do it? Should I add your patch ontop of mine,
> and then another one that tweaks whatever else is left, or should I just
> merge those changes into the patches I have?

A brand new patch, I guess.  I can sort out the what-did-he-change view
at this end.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox