From mboxrd@z Thu Jan  1 00:00:00 1970
From: Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: splice() performance for TCP socket forwarding
Date: Thu, 13 Dec 2018 05:37:11 -0800
Message-ID: <de250149-ac07-6ac3-e770-eda64dd0a84d@gmail.com>
References: <CAJPywTLpLjFXNBJnNB2puaDKe0Ku_afSE-sRZiQ+ZGdmFQhaDA@mail.gmail.com>
 <20181213125553.GA16149@1wt.eu>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Cc: netdev@vger.kernel.org
To: Willy Tarreau <w@1wt.eu>, Marek Majkowski <marek@cloudflare.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-wr1-f47.google.com ([209.85.221.47]:43078 "EHLO
        mail-wr1-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1729281AbeLMNhP (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 13 Dec 2018 08:37:15 -0500
Received: by mail-wr1-f47.google.com with SMTP id r10so2028468wrs.10
        for <netdev@vger.kernel.org>; Thu, 13 Dec 2018 05:37:14 -0800 (PST)
In-Reply-To: <20181213125553.GA16149@1wt.eu>
Content-Language: en-US
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>


On 12/13/2018 04:55 AM, Willy Tarreau wrote:

> 
> It's quite strange, it doesn't match at all what I'm used to. In haproxy
> we're using splicing as well between sockets, and for medium to large
> objects we always get much better performance with splicing than without.
> 3 years ago during a test, we reached 60 Gbps on a 4-core machine using
> 2 40G NICs, which is not an exceptional sizing. And between processes on
> the loopback, numbers around 100G are totally possible. By the way this
> is one test you should start with, to verify if the issue is more on the
> splice side or on the NIC's side. It might be that your network driver is
> totally inefficient when used with GRO/GSO. In my case, multi-10G using
> ixgbe and 40G using mlx5 have always shown excellent results.

Maybe mlx5 driver is in LRO mode, packing TCP payload in 4K pages ?

bnx2x GRO/LRO has this mode, meaning that around 8 pages are used for a GRO packets of ~32 KB,
while mlx4 for instance would use one page frag for every ~1428 bytes of payload.