Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: phy: fixed_phy: print gpio error only if gpio node is present
From: David Miller @ 2019-07-30 16:55 UTC (permalink / raw)
  To: h.feurstein; +Cc: netdev, linux-kernel, andrew, f.fainelli, hkallweit1
In-Reply-To: <20190730094623.31640-1-h.feurstein@gmail.com>

From: Hubert Feurstein <h.feurstein@gmail.com>
Date: Tue, 30 Jul 2019 11:46:23 +0200

> It is perfectly ok to not have an gpio attached to the fixed-link node. So
> the driver should not throw an error message when the gpio is missing.
> 
> Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next v4 0/4] enetc: Add mdio bus driver for the PCIe MDIO endpoint
From: David Miller @ 2019-07-30 16:53 UTC (permalink / raw)
  To: claudiu.manoil
  Cc: andrew, robh+dt, leoyang.li, alexandru.marginean, netdev,
	devicetree, linux-arm-kernel, linux-kernel
In-Reply-To: <20190730.094436.855806617449032791.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 30 Jul 2019 09:44:36 -0700 (PDT)

> From: Claudiu Manoil <claudiu.manoil@nxp.com>
> Date: Tue, 30 Jul 2019 12:45:15 +0300
> 
>> First patch fixes a sparse issue and cleans up accessors to avoid
>> casting to __iomem.
>> Second patch just registers the PCIe endpoint device containing
>> the MDIO registers as a standalone MDIO bus driver, to allow
>> an alternative way to control the MDIO bus.  The same code used
>> by the ENETC ports (eth controllers) to manage MDIO via local
>> registers applies and is reused.
>> 
>> Bindings are provided for the new MDIO node, similarly to ENETC
>> port nodes bindings.
>> 
>> Last patch enables the ENETC port 1 and its RGMII PHY on the
>> LS1028A QDS board, where the MDIO muxing configuration relies
>> on the MDIO support provided in the first patch.
>  ...
> 
> Series applied, thank you.

Actually this doesn't compile, I had to revert:

In file included from ./include/linux/phy.h:20,
                 from ./include/linux/of_mdio.h:11,
                 from drivers/net/ethernet/freescale/enetc/enetc_mdio.c:5:
drivers/net/ethernet/freescale/enetc/enetc_mdio.c:284:26: error: ‘enetc_mdio_id_table’ undeclared here (not in a function); did you mean ‘enetc_pci_mdio_id_table’?
 MODULE_DEVICE_TABLE(pci, enetc_mdio_id_table);
                          ^~~~~~~~~~~~~~~~~~~
./include/linux/module.h:230:15: note: in definition of macro ‘MODULE_DEVICE_TABLE’
 extern typeof(name) __mod_##type##__##name##_device_table  \
               ^~~~
./include/linux/module.h:230:21: error: ‘__mod_pci__enetc_mdio_id_table_device_table’ aliased to undefined symbol ‘enetc_mdio_id_table’
 extern typeof(name) __mod_##type##__##name##_device_table  \
                     ^~~~~~
drivers/net/ethernet/freescale/enetc/enetc_mdio.c:284:1: note: in expansion of macro ‘MODULE_DEVICE_TABLE’
 MODULE_DEVICE_TABLE(pci, enetc_mdio_id_table);
 ^~~~~~~~~~~~~~~~~~~

^ permalink raw reply

* Re: [PATCH net-next v4 0/4] enetc: Add mdio bus driver for the PCIe MDIO endpoint
From: David Miller @ 2019-07-30 16:44 UTC (permalink / raw)
  To: claudiu.manoil
  Cc: andrew, robh+dt, leoyang.li, alexandru.marginean, netdev,
	devicetree, linux-arm-kernel, linux-kernel
In-Reply-To: <1564479919-18835-1-git-send-email-claudiu.manoil@nxp.com>

From: Claudiu Manoil <claudiu.manoil@nxp.com>
Date: Tue, 30 Jul 2019 12:45:15 +0300

> First patch fixes a sparse issue and cleans up accessors to avoid
> casting to __iomem.
> Second patch just registers the PCIe endpoint device containing
> the MDIO registers as a standalone MDIO bus driver, to allow
> an alternative way to control the MDIO bus.  The same code used
> by the ENETC ports (eth controllers) to manage MDIO via local
> registers applies and is reused.
> 
> Bindings are provided for the new MDIO node, similarly to ENETC
> port nodes bindings.
> 
> Last patch enables the ENETC port 1 and its RGMII PHY on the
> LS1028A QDS board, where the MDIO muxing configuration relies
> on the MDIO support provided in the first patch.
 ...

Series applied, thank you.

^ permalink raw reply

* Re: [net-next 08/13] net/mlx5e: Protect tc flows hashtable with rcu
From: Willem de Bruijn @ 2019-07-30 16:37 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev@vger.kernel.org, Vlad Buslov, Jianbo Liu,
	Roi Dayan
In-Reply-To: <CA+FuTSfnikCV_J2cUEeafCaui8KxrK4njRR9rqgpo+5JhBxR9g@mail.gmail.com>

On Tue, Jul 30, 2019 at 12:16 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Mon, Jul 29, 2019 at 7:50 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
> >
> > From: Vlad Buslov <vladbu@mellanox.com>
> >
> > In order to remove dependency on rtnl lock, access to tc flows hashtable
> > must be explicitly protected from concurrent flows removal.
> >
> > Extend tc flow structure with rcu to allow concurrent parallel access. Use
> > rcu read lock to safely lookup flow in tc flows hash table, and take
> > reference to it. Use rcu free for flow deletion to accommodate concurrent
> > stats requests.
> >
> > Add new DELETED flow flag. Imlement new flow_flag_test_and_set() helper
> > that is used to set a flag and return its previous value. Use it to
> > atomically set the flag in mlx5e_delete_flower() to guarantee that flow can
> > only be deleted once, even when same flow is deleted concurrently by
> > multiple tasks.
> >
> > Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> > Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
> > Reviewed-by: Roi Dayan <roid@mellanox.com>
> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > ---
>
> > @@ -3492,16 +3507,32 @@ int mlx5e_delete_flower(struct net_device *dev, struct mlx5e_priv *priv,
> >  {
> >         struct rhashtable *tc_ht = get_tc_ht(priv, flags);
> >         struct mlx5e_tc_flow *flow;
> > +       int err;
> >
> > +       rcu_read_lock();
> >         flow = rhashtable_lookup_fast(tc_ht, &f->cookie, tc_ht_params);
> > -       if (!flow || !same_flow_direction(flow, flags))
> > -               return -EINVAL;
> > +       if (!flow || !same_flow_direction(flow, flags)) {
> > +               err = -EINVAL;
> > +               goto errout;
> > +       }
> >
> > +       /* Only delete the flow if it doesn't have MLX5E_TC_FLOW_DELETED flag
> > +        * set.
> > +        */
> > +       if (flow_flag_test_and_set(flow, DELETED)) {
> > +               err = -EINVAL;
> > +               goto errout;
> > +       }
> >         rhashtable_remove_fast(tc_ht, &flow->node, tc_ht_params);
> > +       rcu_read_unlock();
> >
> >         mlx5e_flow_put(priv, flow);
>
> Dereferencing flow outside rcu readside critical section? Does a build
> with lockdep not complain?

Eh no, it won't. The surprising part to me was to use a readside
critical section when performing a write action on an RCU ptr. The
DELETED flag ensures that multiple writers will not compete to call
rhashtable_remove_fast. rcu_read_lock is a common pattern to do
rhashtable lookup + delete.

>
> >
> >         return 0;
> > +
> > +errout:
> > +       rcu_read_unlock();
> > +       return err;
> >  }
> >
> >  int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
> > @@ -3517,8 +3548,10 @@ int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
> >         u64 bytes = 0;
> >         int err = 0;
> >
> > -       flow = mlx5e_flow_get(rhashtable_lookup_fast(tc_ht, &f->cookie,
> > -                                                    tc_ht_params));
> > +       rcu_read_lock();
> > +       flow = mlx5e_flow_get(rhashtable_lookup(tc_ht, &f->cookie,
> > +                                               tc_ht_params));
> > +       rcu_read_unlock();
> >         if (IS_ERR(flow))
> >                 return PTR_ERR(flow);
>
> Same, in code below this check?

Never mind, sorry. I missed that this took a reference on the ptr
returned from rhashtable_lookup.

^ permalink raw reply

* Re: [PATCH 4/4] net: dsa: mv88e6xxx: add PTP support for MV88E6250 family
From: Hubert Feurstein @ 2019-07-30 16:20 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, linux-kernel, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, David S. Miller, Rasmus Villemoes
In-Reply-To: <20190730160032.GA1251@localhost>

Hi Richard,

thank you for your comments.

Am Di., 30. Juli 2019 um 18:00 Uhr schrieb Richard Cochran
<richardcochran@gmail.com>:
[...]
> > -/* Raw timestamps are in units of 8-ns clock periods. */
> > -#define CC_SHIFT     28
> > -#define CC_MULT              (8 << CC_SHIFT)
> > -#define CC_MULT_NUM  (1 << 9)
> > -#define CC_MULT_DEM  15625ULL
> > +/* The adjfine API clamps ppb between [-32,768,000, 32,768,000], and
>
> That is not true.
>
> > + * therefore scaled_ppm between [-2,147,483,648, 2,147,483,647].
> > + * Set the maximum supported ppb to a round value smaller than the maximum.
> > + *
> > + * Percentually speaking, this is a +/- 0.032x adjustment of the
> > + * free-running counter (0.968x to 1.032x).
> > + */
> > +#define MV88E6XXX_MAX_ADJ_PPB        32000000
>
> I had set an arbitrary limit of 1000 ppm.  I can't really see any
> point in raising the limit.
>
> > +/* Family MV88E6250:
> > + * Raw timestamps are in units of 10-ns clock periods.
> > + *
> > + * clkadj = scaled_ppm * 10*2^28 / (10^6 * 2^16)
> > + * simplifies to
> > + * clkadj = scaled_ppm * 2^7 / 5^5
> > + */
> > +#define MV88E6250_CC_SHIFT   28
> > +#define MV88E6250_CC_MULT    (10 << MV88E6250_CC_SHIFT)
> > +#define MV88E6250_CC_MULT_NUM        (1 << 7)
> > +#define MV88E6250_CC_MULT_DEM        3125ULL
> > +
> > +/* Other families:
> > + * Raw timestamps are in units of 8-ns clock periods.
> > + *
> > + * clkadj = scaled_ppm * 8*2^28 / (10^6 * 2^16)
> > + * simplifies to
> > + * clkadj = scaled_ppm * 2^9 / 5^6
> > + */
> > +#define MV88E6XXX_CC_SHIFT   28
> > +#define MV88E6XXX_CC_MULT    (8 << MV88E6XXX_CC_SHIFT)
> > +#define MV88E6XXX_CC_MULT_NUM        (1 << 9)
> > +#define MV88E6XXX_CC_MULT_DEM        15625ULL
> >
> >  #define TAI_EVENT_WORK_INTERVAL msecs_to_jiffies(100)
> >
> > @@ -179,24 +206,14 @@ static void mv88e6352_tai_event_work(struct work_struct *ugly)
> >  static int mv88e6xxx_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
> >  {
> >       struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
> > -     int neg_adj = 0;
> > -     u32 diff, mult;
> > -     u64 adj;
> > +     s64 adj;
> >
> > -     if (scaled_ppm < 0) {
> > -             neg_adj = 1;
> > -             scaled_ppm = -scaled_ppm;
> > -     }
>
> Please don't re-write this logic.  It is written like that for a reason.
I used the sja1105_ptp.c as a reference. So it is also wrong there.

Hubert

^ permalink raw reply

* Re: [net-next 08/13] net/mlx5e: Protect tc flows hashtable with rcu
From: Willem de Bruijn @ 2019-07-30 16:15 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev@vger.kernel.org, Vlad Buslov, Jianbo Liu,
	Roi Dayan
In-Reply-To: <20190729234934.23595-9-saeedm@mellanox.com>

On Mon, Jul 29, 2019 at 7:50 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> From: Vlad Buslov <vladbu@mellanox.com>
>
> In order to remove dependency on rtnl lock, access to tc flows hashtable
> must be explicitly protected from concurrent flows removal.
>
> Extend tc flow structure with rcu to allow concurrent parallel access. Use
> rcu read lock to safely lookup flow in tc flows hash table, and take
> reference to it. Use rcu free for flow deletion to accommodate concurrent
> stats requests.
>
> Add new DELETED flow flag. Imlement new flow_flag_test_and_set() helper
> that is used to set a flag and return its previous value. Use it to
> atomically set the flag in mlx5e_delete_flower() to guarantee that flow can
> only be deleted once, even when same flow is deleted concurrently by
> multiple tasks.
>
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
> Reviewed-by: Roi Dayan <roid@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---

> @@ -3492,16 +3507,32 @@ int mlx5e_delete_flower(struct net_device *dev, struct mlx5e_priv *priv,
>  {
>         struct rhashtable *tc_ht = get_tc_ht(priv, flags);
>         struct mlx5e_tc_flow *flow;
> +       int err;
>
> +       rcu_read_lock();
>         flow = rhashtable_lookup_fast(tc_ht, &f->cookie, tc_ht_params);
> -       if (!flow || !same_flow_direction(flow, flags))
> -               return -EINVAL;
> +       if (!flow || !same_flow_direction(flow, flags)) {
> +               err = -EINVAL;
> +               goto errout;
> +       }
>
> +       /* Only delete the flow if it doesn't have MLX5E_TC_FLOW_DELETED flag
> +        * set.
> +        */
> +       if (flow_flag_test_and_set(flow, DELETED)) {
> +               err = -EINVAL;
> +               goto errout;
> +       }
>         rhashtable_remove_fast(tc_ht, &flow->node, tc_ht_params);
> +       rcu_read_unlock();
>
>         mlx5e_flow_put(priv, flow);

Dereferencing flow outside rcu readside critical section? Does a build
with lockdep not complain?

>
>         return 0;
> +
> +errout:
> +       rcu_read_unlock();
> +       return err;
>  }
>
>  int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
> @@ -3517,8 +3548,10 @@ int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
>         u64 bytes = 0;
>         int err = 0;
>
> -       flow = mlx5e_flow_get(rhashtable_lookup_fast(tc_ht, &f->cookie,
> -                                                    tc_ht_params));
> +       rcu_read_lock();
> +       flow = mlx5e_flow_get(rhashtable_lookup(tc_ht, &f->cookie,
> +                                               tc_ht_params));
> +       rcu_read_unlock();
>         if (IS_ERR(flow))
>                 return PTR_ERR(flow);

Same, in code below this check?

^ permalink raw reply

* Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
From: Stefano Garzarella @ 2019-07-30 16:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, virtualization, linux-kernel, Jason Wang, kvm,
	David S. Miller, Stefan Hajnoczi
In-Reply-To: <20190730115503-mutt-send-email-mst@kernel.org>

On Tue, Jul 30, 2019 at 11:55:09AM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 11:54:53AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> > > This series tries to increase the throughput of virtio-vsock with slight
> > > changes.
> > > While I was testing the v2 of this series I discovered an huge use of memory,
> > > so I added patch 1 to mitigate this issue. I put it in this series in order
> > > to better track the performance trends.
> > > 
> > > v5:
> > > - rebased all patches on net-next
> > > - added Stefan's R-b and Michael's A-b
> > 
> > This doesn't solve all issues around allocation - as I mentioned I think
> > we will need to improve accounting for that,
> > and maybe add pre-allocation.

Yes, I'll work on it following your suggestions.

> > But it's a great series of steps in the right direction!
> > 

Thank you very much :)
Stefano

> 
> 
> So
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> > 
> > > v4: https://patchwork.kernel.org/cover/11047717
> > > v3: https://patchwork.kernel.org/cover/10970145
> > > v2: https://patchwork.kernel.org/cover/10938743
> > > v1: https://patchwork.kernel.org/cover/10885431
> > > 
> > > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > > support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.
> > > 
> > > A brief description of patches:
> > > - Patches 1:   limit the memory usage with an extra copy for small packets
> > > - Patches 2+3: reduce the number of credit update messages sent to the
> > >                transmitter
> > > - Patches 4+5: allow the host to split packets on multiple buffers and use
> > >                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > > 
> > >                     host -> guest [Gbps]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.032     0.030    0.048    0.051
> > > 64         0.061     0.059    0.108    0.117
> > > 128        0.122     0.112    0.227    0.234
> > > 256        0.244     0.241    0.418    0.415
> > > 512        0.459     0.466    0.847    0.865
> > > 1K         0.927     0.919    1.657    1.641
> > > 2K         1.884     1.813    3.262    3.269
> > > 4K         3.378     3.326    6.044    6.195
> > > 8K         5.637     5.676   10.141   11.287
> > > 16K        8.250     8.402   15.976   16.736
> > > 32K       13.327    13.204   19.013   20.515
> > > 64K       21.241    21.341   20.973   21.879
> > > 128K      21.851    22.354   21.816   23.203
> > > 256K      21.408    21.693   21.846   24.088
> > > 512K      21.600    21.899   21.921   24.106
> > > 
> > >                     guest -> host [Gbps]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.045     0.046    0.057    0.057
> > > 64         0.089     0.091    0.103    0.104
> > > 128        0.170     0.179    0.192    0.200
> > > 256        0.364     0.351    0.361    0.379
> > > 512        0.709     0.699    0.731    0.790
> > > 1K         1.399     1.407    1.395    1.427
> > > 2K         2.670     2.684    2.745    2.835
> > > 4K         5.171     5.199    5.305    5.451
> > > 8K         8.442     8.500   10.083    9.941
> > > 16K       12.305    12.259   13.519   15.385
> > > 32K       11.418    11.150   11.988   24.680
> > > 64K       10.778    10.659   11.589   35.273
> > > 128K      10.421    10.339   10.939   40.338
> > > 256K      10.300     9.719   10.508   36.562
> > > 512K       9.833     9.808   10.612   35.979
> > > 
> > > As Stefan suggested in the v1, I measured also the efficiency in this way:
> > >     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > > 
> > > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > > but it's provided for free from iperf3 and could be an indication.
> > > 
> > >         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.35      0.45     0.79     1.02
> > > 64         0.56      0.80     1.41     1.54
> > > 128        1.11      1.52     3.03     3.12
> > > 256        2.20      2.16     5.44     5.58
> > > 512        4.17      4.18    10.96    11.46
> > > 1K         8.30      8.26    20.99    20.89
> > > 2K        16.82     16.31    39.76    39.73
> > > 4K        30.89     30.79    74.07    75.73
> > > 8K        53.74     54.49   124.24   148.91
> > > 16K       80.68     83.63   200.21   232.79
> > > 32K      132.27    132.52   260.81   357.07
> > > 64K      229.82    230.40   300.19   444.18
> > > 128K     332.60    329.78   331.51   492.28
> > > 256K     331.06    337.22   339.59   511.59
> > > 512K     335.58    328.50   331.56   504.56
> > > 
> > >         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.43      0.43     0.53     0.56
> > > 64         0.85      0.86     1.04     1.10
> > > 128        1.63      1.71     2.07     2.13
> > > 256        3.48      3.35     4.02     4.22
> > > 512        6.80      6.67     7.97     8.63
> > > 1K        13.32     13.31    15.72    15.94
> > > 2K        25.79     25.92    30.84    30.98
> > > 4K        50.37     50.48    58.79    59.69
> > > 8K        95.90     96.15   107.04   110.33
> > > 16K      145.80    145.43   143.97   174.70
> > > 32K      147.06    144.74   146.02   282.48
> > > 64K      145.25    143.99   141.62   406.40
> > > 128K     149.34    146.96   147.49   489.34
> > > 256K     156.35    149.81   152.21   536.37
> > > 512K     151.65    150.74   151.52   519.93
> > > 
> > > [1] https://github.com/stefano-garzarella/iperf/
> > > 
> > > Stefano Garzarella (5):
> > >   vsock/virtio: limit the memory used per-socket
> > >   vsock/virtio: reduce credit update messages
> > >   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
> > >   vhost/vsock: split packets to send using multiple buffers
> > >   vsock/virtio: change the maximum packet size allowed
> > > 
> > >  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
> > >  include/linux/virtio_vsock.h            |  4 +-
> > >  net/vmw_vsock/virtio_transport.c        |  1 +
> > >  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
> > >  4 files changed, 134 insertions(+), 38 deletions(-)
> > > 
> > > -- 
> > > 2.20.1

-- 

^ permalink raw reply

* Re: [PATCH 4/4] net: dsa: mv88e6xxx: add PTP support for MV88E6250 family
From: Richard Cochran @ 2019-07-30 16:00 UTC (permalink / raw)
  To: Hubert Feurstein
  Cc: netdev, linux-kernel, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, David S. Miller, Rasmus Villemoes
In-Reply-To: <20190730100429.32479-5-h.feurstein@gmail.com>

On Tue, Jul 30, 2019 at 12:04:29PM +0200, Hubert Feurstein wrote:
> diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
> index 768d256f7c9f..51cdf4712517 100644
> --- a/drivers/net/dsa/mv88e6xxx/ptp.c
> +++ b/drivers/net/dsa/mv88e6xxx/ptp.c
> @@ -15,11 +15,38 @@
>  #include "hwtstamp.h"
>  #include "ptp.h"
>  
> -/* Raw timestamps are in units of 8-ns clock periods. */
> -#define CC_SHIFT	28
> -#define CC_MULT		(8 << CC_SHIFT)
> -#define CC_MULT_NUM	(1 << 9)
> -#define CC_MULT_DEM	15625ULL
> +/* The adjfine API clamps ppb between [-32,768,000, 32,768,000], and

That is not true.

> + * therefore scaled_ppm between [-2,147,483,648, 2,147,483,647].
> + * Set the maximum supported ppb to a round value smaller than the maximum.
> + *
> + * Percentually speaking, this is a +/- 0.032x adjustment of the
> + * free-running counter (0.968x to 1.032x).
> + */
> +#define MV88E6XXX_MAX_ADJ_PPB	32000000

I had set an arbitrary limit of 1000 ppm.  I can't really see any
point in raising the limit.

> +/* Family MV88E6250:
> + * Raw timestamps are in units of 10-ns clock periods.
> + *
> + * clkadj = scaled_ppm * 10*2^28 / (10^6 * 2^16)
> + * simplifies to
> + * clkadj = scaled_ppm * 2^7 / 5^5
> + */
> +#define MV88E6250_CC_SHIFT	28
> +#define MV88E6250_CC_MULT	(10 << MV88E6250_CC_SHIFT)
> +#define MV88E6250_CC_MULT_NUM	(1 << 7)
> +#define MV88E6250_CC_MULT_DEM	3125ULL
> +
> +/* Other families:
> + * Raw timestamps are in units of 8-ns clock periods.
> + *
> + * clkadj = scaled_ppm * 8*2^28 / (10^6 * 2^16)
> + * simplifies to
> + * clkadj = scaled_ppm * 2^9 / 5^6
> + */
> +#define MV88E6XXX_CC_SHIFT	28
> +#define MV88E6XXX_CC_MULT	(8 << MV88E6XXX_CC_SHIFT)
> +#define MV88E6XXX_CC_MULT_NUM	(1 << 9)
> +#define MV88E6XXX_CC_MULT_DEM	15625ULL
>  
>  #define TAI_EVENT_WORK_INTERVAL msecs_to_jiffies(100)
>  
> @@ -179,24 +206,14 @@ static void mv88e6352_tai_event_work(struct work_struct *ugly)
>  static int mv88e6xxx_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
>  {
>  	struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
> -	int neg_adj = 0;
> -	u32 diff, mult;
> -	u64 adj;
> +	s64 adj;
>  
> -	if (scaled_ppm < 0) {
> -		neg_adj = 1;
> -		scaled_ppm = -scaled_ppm;
> -	}

Please don't re-write this logic.  It is written like that for a reason.

> -	mult = CC_MULT;
> -	adj = CC_MULT_NUM;
> -	adj *= scaled_ppm;
> -	diff = div_u64(adj, CC_MULT_DEM);

Just substitute CC_MULT* with your new chip->ptp_cc_mult*
and leave the rest alone.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH 03/12] block: bio_release_pages: use flags arg instead of bool
From: Jerome Glisse @ 2019-07-30 15:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, john.hubbard, Andrew Morton, Alexander Viro,
	Anna Schumaker, David S . Miller, Dominique Martinet,
	Eric Van Hensbergen, Jason Gunthorpe, Jason Wang, Jens Axboe,
	Latchesar Ionkov, Michael S . Tsirkin, Miklos Szeredi,
	Trond Myklebust, Matthew Wilcox, linux-mm, LKML, ceph-devel, kvm,
	linux-block, linux-cifs, linux-fsdevel, linux-nfs, linux-rdma,
	netdev, samba-technical, v9fs-developer, virtualization,
	John Hubbard, Minwoo Im
In-Reply-To: <20190730102557.GA1700@lst.de>

On Tue, Jul 30, 2019 at 12:25:57PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 29, 2019 at 04:57:21PM -0400, Jerome Glisse wrote:
> > > All pages releases by bio_release_pages should come from
> > > get_get_user_pages, so I don't really see the point here.
> > 
> > No they do not all comes from GUP for see various callers
> > of bio_check_pages_dirty() for instance iomap_dio_zero()
> > 
> > I have carefully tracked down all this and i did not do
> > anyconvertion just for the fun of it :)
> 
> Well, the point is _should_ not necessarily do.  iomap_dio_zero adds the
> ZERO_PAGE, which we by definition don't need to refcount.  So we can
> mark this bio BIO_NO_PAGE_REF safely after removing the get_page there.
> 
> Note that the equivalent in the old direct I/O code, dio_refill_pages,
> will be a little more complicated as it can match user pages and the
> ZERO_PAGE in a single bio, so a per-bio flag won't handle it easily.
> Maybe we just need to use a separate bio there as well.
> 
> In general with series like this we should not encode the status quo an
> pile new hacks upon the old one, but thing where we should be and fix
> up the old warts while having to wade through all that code.

Other user can also add page that are not coming from GUP but need to
have a reference see __blkdev_direct_IO() saddly bio get fill from many
different places and not always with GUP. So we can not say that all
pages here are coming from bio. I had a different version of the patchset
i think that was adding a new release dirty function for GUP versus non
GUP bio. I posted it a while ago, i will try to dig it up once i am
back.

Cheers,
Jérôme

^ permalink raw reply

* Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
From: Michael S. Tsirkin @ 2019-07-30 15:55 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, virtualization, linux-kernel, Jason Wang, kvm,
	David S. Miller, Stefan Hajnoczi
In-Reply-To: <20190730115339-mutt-send-email-mst@kernel.org>

On Tue, Jul 30, 2019 at 11:54:53AM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes.
> > While I was testing the v2 of this series I discovered an huge use of memory,
> > so I added patch 1 to mitigate this issue. I put it in this series in order
> > to better track the performance trends.
> > 
> > v5:
> > - rebased all patches on net-next
> > - added Stefan's R-b and Michael's A-b
> 
> This doesn't solve all issues around allocation - as I mentioned I think
> we will need to improve accounting for that,
> and maybe add pre-allocation.
> But it's a great series of steps in the right direction!
> 


So

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> 
> > v4: https://patchwork.kernel.org/cover/11047717
> > v3: https://patchwork.kernel.org/cover/10970145
> > v2: https://patchwork.kernel.org/cover/10938743
> > v1: https://patchwork.kernel.org/cover/10885431
> > 
> > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.
> > 
> > A brief description of patches:
> > - Patches 1:   limit the memory usage with an extra copy for small packets
> > - Patches 2+3: reduce the number of credit update messages sent to the
> >                transmitter
> > - Patches 4+5: allow the host to split packets on multiple buffers and use
> >                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > 
> >                     host -> guest [Gbps]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.032     0.030    0.048    0.051
> > 64         0.061     0.059    0.108    0.117
> > 128        0.122     0.112    0.227    0.234
> > 256        0.244     0.241    0.418    0.415
> > 512        0.459     0.466    0.847    0.865
> > 1K         0.927     0.919    1.657    1.641
> > 2K         1.884     1.813    3.262    3.269
> > 4K         3.378     3.326    6.044    6.195
> > 8K         5.637     5.676   10.141   11.287
> > 16K        8.250     8.402   15.976   16.736
> > 32K       13.327    13.204   19.013   20.515
> > 64K       21.241    21.341   20.973   21.879
> > 128K      21.851    22.354   21.816   23.203
> > 256K      21.408    21.693   21.846   24.088
> > 512K      21.600    21.899   21.921   24.106
> > 
> >                     guest -> host [Gbps]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.045     0.046    0.057    0.057
> > 64         0.089     0.091    0.103    0.104
> > 128        0.170     0.179    0.192    0.200
> > 256        0.364     0.351    0.361    0.379
> > 512        0.709     0.699    0.731    0.790
> > 1K         1.399     1.407    1.395    1.427
> > 2K         2.670     2.684    2.745    2.835
> > 4K         5.171     5.199    5.305    5.451
> > 8K         8.442     8.500   10.083    9.941
> > 16K       12.305    12.259   13.519   15.385
> > 32K       11.418    11.150   11.988   24.680
> > 64K       10.778    10.659   11.589   35.273
> > 128K      10.421    10.339   10.939   40.338
> > 256K      10.300     9.719   10.508   36.562
> > 512K       9.833     9.808   10.612   35.979
> > 
> > As Stefan suggested in the v1, I measured also the efficiency in this way:
> >     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > 
> > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > but it's provided for free from iperf3 and could be an indication.
> > 
> >         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.35      0.45     0.79     1.02
> > 64         0.56      0.80     1.41     1.54
> > 128        1.11      1.52     3.03     3.12
> > 256        2.20      2.16     5.44     5.58
> > 512        4.17      4.18    10.96    11.46
> > 1K         8.30      8.26    20.99    20.89
> > 2K        16.82     16.31    39.76    39.73
> > 4K        30.89     30.79    74.07    75.73
> > 8K        53.74     54.49   124.24   148.91
> > 16K       80.68     83.63   200.21   232.79
> > 32K      132.27    132.52   260.81   357.07
> > 64K      229.82    230.40   300.19   444.18
> > 128K     332.60    329.78   331.51   492.28
> > 256K     331.06    337.22   339.59   511.59
> > 512K     335.58    328.50   331.56   504.56
> > 
> >         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.43      0.43     0.53     0.56
> > 64         0.85      0.86     1.04     1.10
> > 128        1.63      1.71     2.07     2.13
> > 256        3.48      3.35     4.02     4.22
> > 512        6.80      6.67     7.97     8.63
> > 1K        13.32     13.31    15.72    15.94
> > 2K        25.79     25.92    30.84    30.98
> > 4K        50.37     50.48    58.79    59.69
> > 8K        95.90     96.15   107.04   110.33
> > 16K      145.80    145.43   143.97   174.70
> > 32K      147.06    144.74   146.02   282.48
> > 64K      145.25    143.99   141.62   406.40
> > 128K     149.34    146.96   147.49   489.34
> > 256K     156.35    149.81   152.21   536.37
> > 512K     151.65    150.74   151.52   519.93
> > 
> > [1] https://github.com/stefano-garzarella/iperf/
> > 
> > Stefano Garzarella (5):
> >   vsock/virtio: limit the memory used per-socket
> >   vsock/virtio: reduce credit update messages
> >   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
> >   vhost/vsock: split packets to send using multiple buffers
> >   vsock/virtio: change the maximum packet size allowed
> > 
> >  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
> >  include/linux/virtio_vsock.h            |  4 +-
> >  net/vmw_vsock/virtio_transport.c        |  1 +
> >  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
> >  4 files changed, 134 insertions(+), 38 deletions(-)
> > 
> > -- 
> > 2.20.1

^ permalink raw reply

* Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
From: Michael S. Tsirkin @ 2019-07-30 15:54 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, virtualization, linux-kernel, Jason Wang, kvm,
	David S. Miller, Stefan Hajnoczi
In-Reply-To: <20190730154334.237789-1-sgarzare@redhat.com>

On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
> 
> v5:
> - rebased all patches on net-next
> - added Stefan's R-b and Michael's A-b

This doesn't solve all issues around allocation - as I mentioned I think
we will need to improve accounting for that,
and maybe add pre-allocation.
But it's a great series of steps in the right direction!



> v4: https://patchwork.kernel.org/cover/11047717
> v3: https://patchwork.kernel.org/cover/10970145
> v2: https://patchwork.kernel.org/cover/10938743
> v1: https://patchwork.kernel.org/cover/10885431
> 
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.
> 
> A brief description of patches:
> - Patches 1:   limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
>                transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
>                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> 
>                     host -> guest [Gbps]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.032     0.030    0.048    0.051
> 64         0.061     0.059    0.108    0.117
> 128        0.122     0.112    0.227    0.234
> 256        0.244     0.241    0.418    0.415
> 512        0.459     0.466    0.847    0.865
> 1K         0.927     0.919    1.657    1.641
> 2K         1.884     1.813    3.262    3.269
> 4K         3.378     3.326    6.044    6.195
> 8K         5.637     5.676   10.141   11.287
> 16K        8.250     8.402   15.976   16.736
> 32K       13.327    13.204   19.013   20.515
> 64K       21.241    21.341   20.973   21.879
> 128K      21.851    22.354   21.816   23.203
> 256K      21.408    21.693   21.846   24.088
> 512K      21.600    21.899   21.921   24.106
> 
>                     guest -> host [Gbps]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.045     0.046    0.057    0.057
> 64         0.089     0.091    0.103    0.104
> 128        0.170     0.179    0.192    0.200
> 256        0.364     0.351    0.361    0.379
> 512        0.709     0.699    0.731    0.790
> 1K         1.399     1.407    1.395    1.427
> 2K         2.670     2.684    2.745    2.835
> 4K         5.171     5.199    5.305    5.451
> 8K         8.442     8.500   10.083    9.941
> 16K       12.305    12.259   13.519   15.385
> 32K       11.418    11.150   11.988   24.680
> 64K       10.778    10.659   11.589   35.273
> 128K      10.421    10.339   10.939   40.338
> 256K      10.300     9.719   10.508   36.562
> 512K       9.833     9.808   10.612   35.979
> 
> As Stefan suggested in the v1, I measured also the efficiency in this way:
>     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> 
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
> 
>         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.35      0.45     0.79     1.02
> 64         0.56      0.80     1.41     1.54
> 128        1.11      1.52     3.03     3.12
> 256        2.20      2.16     5.44     5.58
> 512        4.17      4.18    10.96    11.46
> 1K         8.30      8.26    20.99    20.89
> 2K        16.82     16.31    39.76    39.73
> 4K        30.89     30.79    74.07    75.73
> 8K        53.74     54.49   124.24   148.91
> 16K       80.68     83.63   200.21   232.79
> 32K      132.27    132.52   260.81   357.07
> 64K      229.82    230.40   300.19   444.18
> 128K     332.60    329.78   331.51   492.28
> 256K     331.06    337.22   339.59   511.59
> 512K     335.58    328.50   331.56   504.56
> 
>         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.43      0.43     0.53     0.56
> 64         0.85      0.86     1.04     1.10
> 128        1.63      1.71     2.07     2.13
> 256        3.48      3.35     4.02     4.22
> 512        6.80      6.67     7.97     8.63
> 1K        13.32     13.31    15.72    15.94
> 2K        25.79     25.92    30.84    30.98
> 4K        50.37     50.48    58.79    59.69
> 8K        95.90     96.15   107.04   110.33
> 16K      145.80    145.43   143.97   174.70
> 32K      147.06    144.74   146.02   282.48
> 64K      145.25    143.99   141.62   406.40
> 128K     149.34    146.96   147.49   489.34
> 256K     156.35    149.81   152.21   536.37
> 512K     151.65    150.74   151.52   519.93
> 
> [1] https://github.com/stefano-garzarella/iperf/
> 
> Stefano Garzarella (5):
>   vsock/virtio: limit the memory used per-socket
>   vsock/virtio: reduce credit update messages
>   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
>   vhost/vsock: split packets to send using multiple buffers
>   vsock/virtio: change the maximum packet size allowed
> 
>  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
>  include/linux/virtio_vsock.h            |  4 +-
>  net/vmw_vsock/virtio_transport.c        |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
>  4 files changed, 134 insertions(+), 38 deletions(-)
> 
> -- 
> 2.20.1

^ permalink raw reply

* Re: [net-next 01/13] net/mlx5e: Print a warning when LRO feature is dropped or not allowed
From: Willem de Bruijn @ 2019-07-30 15:52 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: David S. Miller, netdev@vger.kernel.org, Huy Nguyen
In-Reply-To: <20190729234934.23595-2-saeedm@mellanox.com>

On Mon, Jul 29, 2019 at 7:50 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> From: Huy Nguyen <huyn@mellanox.com>
>
> When user enables LRO via ethtool and if the RQ mode is legacy,
> mlx5e_fix_features drops the request without any explanation.
> Add netdev_warn to cover this case.
>
> Fixes: 6c3a823e1e9c ("net/mlx5e: RX, Remove HW LRO support in legacy RQ")
> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 47eea6b3a1c3..776eb46d263d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3788,9 +3788,10 @@ static netdev_features_t mlx5e_fix_features(struct net_device *netdev,
>                         netdev_warn(netdev, "Dropping C-tag vlan stripping offload due to S-tag vlan\n");
>         }
>         if (!MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_STRIDING_RQ)) {
> -               features &= ~NETIF_F_LRO;
> -               if (params->lro_en)
> +               if (features & NETIF_F_LRO) {
>                         netdev_warn(netdev, "Disabling LRO, not supported in legacy RQ\n");

This warns about "Disabling LRO" on an enable request?

More fundamentally, it appears that the device does not advertise
the feature as configurable in netdev_hw_features as of commit
6c3a823e1e9c ("net/mlx5e: RX, Remove HW LRO support in
legacy RQ"), so shouldn't this be caught by the device driver
independent ethtool code?

^ permalink raw reply

* Re: [PATCH 1/2] net: dsa: mv88e6xxx: add support to setup led-control register through device-tree
From: Hubert Feurstein @ 2019-07-30 15:50 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, linux-kernel, Vivien Didelot, Florian Fainelli,
	David S. Miller
In-Reply-To: <20190730140930.GM28552@lunn.ch>

Am Di., 30. Juli 2019 um 16:09 Uhr schrieb Andrew Lunn <andrew@lunn.ch>:
[...]
> Sorry, but this is not going to be accepted. There is an ongoing
> discussion about PHY LEDs and how they should be configured. Switch
> LEDs are no different from PHY LEDs. So they should use the same basic
> concept.
>
> Please take a look at the discussion around:
>
> [RFC] dt-bindings: net: phy: Add subnode for LED configuration
>
> Marvell designers have made this more difficult than it should be by
> moving the registers out of the PHY address space and into the switch
> address space. So we are going to have to implement this code twice
> :-(
Ok, good to know. I'll wait for the first implementation and take it
as a reference.

Hubert

^ permalink raw reply

* [PATCH net-next v5 5/5] vsock/virtio: change the maximum packet size allowed
From: Stefano Garzarella @ 2019-07-30 15:43 UTC (permalink / raw)
  To: netdev
  Cc: virtualization, linux-kernel, Jason Wang, kvm, Michael S. Tsirkin,
	David S. Miller, Stefan Hajnoczi
In-Reply-To: <20190730154334.237789-1-sgarzare@redhat.com>

Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 56fab3f03d0e..94cc0fa3e848 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -181,8 +181,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	vvs = vsk->trans;
 
 	/* we can send less than pkt_len bytes */
-	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
-		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
 
 	/* virtio_transport_get_credit might return less than pkt_len credit */
 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next v5 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-30 15:43 UTC (permalink / raw)
  To: netdev
  Cc: virtualization, linux-kernel, Jason Wang, kvm, Michael S. Tsirkin,
	David S. Miller, Stefan Hajnoczi
In-Reply-To: <20190730154334.237789-1-sgarzare@redhat.com>

If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/vhost/vsock.c                   | 66 ++++++++++++++++++-------
 net/vmw_vsock/virtio_transport_common.c | 15 ++++--
 2 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6c8390a2af52..9f57736fe15e 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -102,7 +102,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		struct iov_iter iov_iter;
 		unsigned out, in;
 		size_t nbytes;
-		size_t len;
+		size_t iov_len, payload_len;
 		int head;
 
 		spin_lock_bh(&vsock->send_pkt_list_lock);
@@ -147,8 +147,24 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		len = iov_length(&vq->iov[out], in);
-		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
+		iov_len = iov_length(&vq->iov[out], in);
+		if (iov_len < sizeof(pkt->hdr)) {
+			virtio_transport_free_pkt(pkt);
+			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
+			break;
+		}
+
+		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
+		payload_len = pkt->len - pkt->off;
+
+		/* If the packet is greater than the space available in the
+		 * buffer, we split it using multiple buffers.
+		 */
+		if (payload_len > iov_len - sizeof(pkt->hdr))
+			payload_len = iov_len - sizeof(pkt->hdr);
+
+		/* Set the correct length in the header */
+		pkt->hdr.len = cpu_to_le32(payload_len);
 
 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
 		if (nbytes != sizeof(pkt->hdr)) {
@@ -157,33 +173,47 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
-		if (nbytes != pkt->len) {
+		nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
+				      &iov_iter);
+		if (nbytes != payload_len) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Faulted on copying pkt buf\n");
 			break;
 		}
 
-		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+		vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
 		added = true;
 
-		if (pkt->reply) {
-			int val;
-
-			val = atomic_dec_return(&vsock->queued_replies);
-
-			/* Do we have resources to resume tx processing? */
-			if (val + 1 == tx_vq->num)
-				restart_tx = true;
-		}
-
 		/* Deliver to monitoring devices all correctly transmitted
 		 * packets.
 		 */
 		virtio_transport_deliver_tap_pkt(pkt);
 
-		total_len += pkt->len;
-		virtio_transport_free_pkt(pkt);
+		pkt->off += payload_len;
+		total_len += payload_len;
+
+		/* If we didn't send all the payload we can requeue the packet
+		 * to send it with the next available buffer.
+		 */
+		if (pkt->off < pkt->len) {
+			spin_lock_bh(&vsock->send_pkt_list_lock);
+			list_add(&pkt->list, &vsock->send_pkt_list);
+			spin_unlock_bh(&vsock->send_pkt_list_lock);
+		} else {
+			if (pkt->reply) {
+				int val;
+
+				val = atomic_dec_return(&vsock->queued_replies);
+
+				/* Do we have resources to resume tx
+				 * processing?
+				 */
+				if (val + 1 == tx_vq->num)
+					restart_tx = true;
+			}
+
+			virtio_transport_free_pkt(pkt);
+		}
 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
 	if (added)
 		vhost_signal(&vsock->dev, vq);
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 34a2b42313b7..56fab3f03d0e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -97,8 +97,17 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 	struct virtio_vsock_pkt *pkt = opaque;
 	struct af_vsockmon_hdr *hdr;
 	struct sk_buff *skb;
+	size_t payload_len;
+	void *payload_buf;
 
-	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
+	/* A packet could be split to fit the RX buffer, so we can retrieve
+	 * the payload length from the header and the buffer pointer taking
+	 * care of the offset in the original packet.
+	 */
+	payload_len = le32_to_cpu(pkt->hdr.len);
+	payload_buf = pkt->buf + pkt->off;
+
+	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len,
 			GFP_ATOMIC);
 	if (!skb)
 		return NULL;
@@ -138,8 +147,8 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 
 	skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr));
 
-	if (pkt->len) {
-		skb_put_data(skb, pkt->buf, pkt->len);
+	if (payload_len) {
+		skb_put_data(skb, payload_buf, payload_len);
 	}
 
 	return skb;
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next v5 2/5] vsock/virtio: reduce credit update messages
From: Stefano Garzarella @ 2019-07-30 15:43 UTC (permalink / raw)
  To: netdev
  Cc: virtualization, linux-kernel, Jason Wang, kvm, Michael S. Tsirkin,
	David S. Miller, Stefan Hajnoczi
In-Reply-To: <20190730154334.237789-1-sgarzare@redhat.com>

In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/virtio_vsock.h            |  1 +
 net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 7d973903f52e..49fc9d20bc43 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -41,6 +41,7 @@ struct virtio_vsock_sock {
 
 	/* Protected by rx_lock */
 	u32 fwd_cnt;
+	u32 last_fwd_cnt;
 	u32 rx_bytes;
 	struct list_head rx_queue;
 };
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 095221f94786..a85559d4d974 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
 {
 	spin_lock_bh(&vvs->tx_lock);
+	vvs->last_fwd_cnt = vvs->fwd_cnt;
 	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
 	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
 	spin_unlock_bh(&vvs->tx_lock);
@@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	struct virtio_vsock_sock *vvs = vsk->trans;
 	struct virtio_vsock_pkt *pkt;
 	size_t bytes, total = 0;
+	u32 free_space;
 	int err = -EFAULT;
 
 	spin_lock_bh(&vvs->rx_lock);
@@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 			virtio_transport_free_pkt(pkt);
 		}
 	}
+
+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+
 	spin_unlock_bh(&vvs->rx_lock);
 
-	/* Send a credit pkt to peer */
-	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
-					    NULL);
+	/* We send a credit update only when the space available seen
+	 * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
+	 */
+	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
+		virtio_transport_send_credit_update(vsk,
+						    VIRTIO_VSOCK_TYPE_STREAM,
+						    NULL);
+	}
 
 	return total;
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next v5 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
From: Stefano Garzarella @ 2019-07-30 15:43 UTC (permalink / raw)
  To: netdev
  Cc: virtualization, linux-kernel, Jason Wang, kvm, Michael S. Tsirkin,
	David S. Miller, Stefan Hajnoczi
In-Reply-To: <20190730154334.237789-1-sgarzare@redhat.com>

fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
the same spinlock also if we are in the TX path.

Move also buf_alloc under the same lock.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/virtio_vsock.h            | 2 +-
 net/vmw_vsock/virtio_transport_common.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 49fc9d20bc43..4c7781f4b29b 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -35,7 +35,6 @@ struct virtio_vsock_sock {
 
 	/* Protected by tx_lock */
 	u32 tx_cnt;
-	u32 buf_alloc;
 	u32 peer_fwd_cnt;
 	u32 peer_buf_alloc;
 
@@ -43,6 +42,7 @@ struct virtio_vsock_sock {
 	u32 fwd_cnt;
 	u32 last_fwd_cnt;
 	u32 rx_bytes;
+	u32 buf_alloc;
 	struct list_head rx_queue;
 };
 
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a85559d4d974..34a2b42313b7 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
 
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
 {
-	spin_lock_bh(&vvs->tx_lock);
+	spin_lock_bh(&vvs->rx_lock);
 	vvs->last_fwd_cnt = vvs->fwd_cnt;
 	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
 	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
-	spin_unlock_bh(&vvs->tx_lock);
+	spin_unlock_bh(&vvs->rx_lock);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next v5 1/5] vsock/virtio: limit the memory used per-socket
From: Stefano Garzarella @ 2019-07-30 15:43 UTC (permalink / raw)
  To: netdev
  Cc: virtualization, linux-kernel, Jason Wang, kvm, Michael S. Tsirkin,
	David S. Miller, Stefan Hajnoczi
In-Reply-To: <20190730154334.237789-1-sgarzare@redhat.com>

Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.

This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/vhost/vsock.c                   |  2 +
 include/linux/virtio_vsock.h            |  1 +
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 60 +++++++++++++++++++++----
 4 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6a50e1d0529c..6c8390a2af52 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -329,6 +329,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
 		return NULL;
 	}
 
+	pkt->buf_len = pkt->len;
+
 	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
 	if (nbytes != pkt->len) {
 		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index e223e2632edd..7d973903f52e 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -52,6 +52,7 @@ struct virtio_vsock_pkt {
 	/* socket refcnt not held, only use for cancellation */
 	struct vsock_sock *vsk;
 	void *buf;
+	u32 buf_len;
 	u32 len;
 	u32 off;
 	bool reply;
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 0815d1357861..082a30936690 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -307,6 +307,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 			break;
 		}
 
+		pkt->buf_len = buf_len;
 		pkt->len = buf_len;
 
 		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 6f1a8aff65c5..095221f94786 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -26,6 +26,9 @@
 /* How long to wait for graceful shutdown of a connection */
 #define VSOCK_CLOSE_TIMEOUT (8 * HZ)
 
+/* Threshold for detecting small packets to copy */
+#define GOOD_COPY_LEN  128
+
 static const struct virtio_transport *virtio_transport_get_ops(void)
 {
 	const struct vsock_transport *t = vsock_core_get_transport();
@@ -64,6 +67,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
 		pkt->buf = kmalloc(len, GFP_KERNEL);
 		if (!pkt->buf)
 			goto out_pkt;
+
+		pkt->buf_len = len;
+
 		err = memcpy_from_msg(pkt->buf, info->msg, len);
 		if (err)
 			goto out;
@@ -841,24 +847,60 @@ virtio_transport_recv_connecting(struct sock *sk,
 	return err;
 }
 
+static void
+virtio_transport_recv_enqueue(struct vsock_sock *vsk,
+			      struct virtio_vsock_pkt *pkt)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	bool free_pkt = false;
+
+	pkt->len = le32_to_cpu(pkt->hdr.len);
+	pkt->off = 0;
+
+	spin_lock_bh(&vvs->rx_lock);
+
+	virtio_transport_inc_rx_pkt(vvs, pkt);
+
+	/* Try to copy small packets into the buffer of last packet queued,
+	 * to avoid wasting memory queueing the entire buffer with a small
+	 * payload.
+	 */
+	if (pkt->len <= GOOD_COPY_LEN && !list_empty(&vvs->rx_queue)) {
+		struct virtio_vsock_pkt *last_pkt;
+
+		last_pkt = list_last_entry(&vvs->rx_queue,
+					   struct virtio_vsock_pkt, list);
+
+		/* If there is space in the last packet queued, we copy the
+		 * new packet in its buffer.
+		 */
+		if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
+			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
+			       pkt->len);
+			last_pkt->len += pkt->len;
+			free_pkt = true;
+			goto out;
+		}
+	}
+
+	list_add_tail(&pkt->list, &vvs->rx_queue);
+
+out:
+	spin_unlock_bh(&vvs->rx_lock);
+	if (free_pkt)
+		virtio_transport_free_pkt(pkt);
+}
+
 static int
 virtio_transport_recv_connected(struct sock *sk,
 				struct virtio_vsock_pkt *pkt)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
-	struct virtio_vsock_sock *vvs = vsk->trans;
 	int err = 0;
 
 	switch (le16_to_cpu(pkt->hdr.op)) {
 	case VIRTIO_VSOCK_OP_RW:
-		pkt->len = le32_to_cpu(pkt->hdr.len);
-		pkt->off = 0;
-
-		spin_lock_bh(&vvs->rx_lock);
-		virtio_transport_inc_rx_pkt(vvs, pkt);
-		list_add_tail(&pkt->list, &vvs->rx_queue);
-		spin_unlock_bh(&vvs->rx_lock);
-
+		virtio_transport_recv_enqueue(vsk, pkt);
 		sk->sk_data_ready(sk);
 		return err;
 	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
From: Stefano Garzarella @ 2019-07-30 15:43 UTC (permalink / raw)
  To: netdev
  Cc: virtualization, linux-kernel, Jason Wang, kvm, Michael S. Tsirkin,
	David S. Miller, Stefan Hajnoczi

This series tries to increase the throughput of virtio-vsock with slight
changes.
While I was testing the v2 of this series I discovered an huge use of memory,
so I added patch 1 to mitigate this issue. I put it in this series in order
to better track the performance trends.

v5:
- rebased all patches on net-next
- added Stefan's R-b and Michael's A-b

v4: https://patchwork.kernel.org/cover/11047717
v3: https://patchwork.kernel.org/cover/10970145
v2: https://patchwork.kernel.org/cover/10938743
v1: https://patchwork.kernel.org/cover/10885431

Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.

A brief description of patches:
- Patches 1:   limit the memory usage with an extra copy for small packets
- Patches 2+3: reduce the number of credit update messages sent to the
               transmitter
- Patches 4+5: allow the host to split packets on multiple buffers and use
               VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed

                    host -> guest [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.032     0.030    0.048    0.051
64         0.061     0.059    0.108    0.117
128        0.122     0.112    0.227    0.234
256        0.244     0.241    0.418    0.415
512        0.459     0.466    0.847    0.865
1K         0.927     0.919    1.657    1.641
2K         1.884     1.813    3.262    3.269
4K         3.378     3.326    6.044    6.195
8K         5.637     5.676   10.141   11.287
16K        8.250     8.402   15.976   16.736
32K       13.327    13.204   19.013   20.515
64K       21.241    21.341   20.973   21.879
128K      21.851    22.354   21.816   23.203
256K      21.408    21.693   21.846   24.088
512K      21.600    21.899   21.921   24.106

                    guest -> host [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.045     0.046    0.057    0.057
64         0.089     0.091    0.103    0.104
128        0.170     0.179    0.192    0.200
256        0.364     0.351    0.361    0.379
512        0.709     0.699    0.731    0.790
1K         1.399     1.407    1.395    1.427
2K         2.670     2.684    2.745    2.835
4K         5.171     5.199    5.305    5.451
8K         8.442     8.500   10.083    9.941
16K       12.305    12.259   13.519   15.385
32K       11.418    11.150   11.988   24.680
64K       10.778    10.659   11.589   35.273
128K      10.421    10.339   10.939   40.338
256K      10.300     9.719   10.508   36.562
512K       9.833     9.808   10.612   35.979

As Stefan suggested in the v1, I measured also the efficiency in this way:
    efficiency = Mbps / (%CPU_Host + %CPU_Guest)

The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.

        host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.35      0.45     0.79     1.02
64         0.56      0.80     1.41     1.54
128        1.11      1.52     3.03     3.12
256        2.20      2.16     5.44     5.58
512        4.17      4.18    10.96    11.46
1K         8.30      8.26    20.99    20.89
2K        16.82     16.31    39.76    39.73
4K        30.89     30.79    74.07    75.73
8K        53.74     54.49   124.24   148.91
16K       80.68     83.63   200.21   232.79
32K      132.27    132.52   260.81   357.07
64K      229.82    230.40   300.19   444.18
128K     332.60    329.78   331.51   492.28
256K     331.06    337.22   339.59   511.59
512K     335.58    328.50   331.56   504.56

        guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.43      0.43     0.53     0.56
64         0.85      0.86     1.04     1.10
128        1.63      1.71     2.07     2.13
256        3.48      3.35     4.02     4.22
512        6.80      6.67     7.97     8.63
1K        13.32     13.31    15.72    15.94
2K        25.79     25.92    30.84    30.98
4K        50.37     50.48    58.79    59.69
8K        95.90     96.15   107.04   110.33
16K      145.80    145.43   143.97   174.70
32K      147.06    144.74   146.02   282.48
64K      145.25    143.99   141.62   406.40
128K     149.34    146.96   147.49   489.34
256K     156.35    149.81   152.21   536.37
512K     151.65    150.74   151.52   519.93

[1] https://github.com/stefano-garzarella/iperf/

Stefano Garzarella (5):
  vsock/virtio: limit the memory used per-socket
  vsock/virtio: reduce credit update messages
  vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
  vhost/vsock: split packets to send using multiple buffers
  vsock/virtio: change the maximum packet size allowed

 drivers/vhost/vsock.c                   | 68 ++++++++++++-----
 include/linux/virtio_vsock.h            |  4 +-
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
 4 files changed, 134 insertions(+), 38 deletions(-)

-- 
2.20.1


^ permalink raw reply

* Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
From: Stefano Garzarella @ 2019-07-30 15:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Stefan Hajnoczi, David S. Miller, netdev,
	linux-kernel, virtualization, kvm
In-Reply-To: <fc568e3d-7af5-5895-89e8-32e35b0f9af4@redhat.com>

On Tue, Jul 30, 2019 at 06:03:24PM +0800, Jason Wang wrote:
> 
> On 2019/7/30 下午5:40, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > > > This series tries to increase the throughput of virtio-vsock with slight
> > > > changes.
> > > > While I was testing the v2 of this series I discovered an huge use of memory,
> > > > so I added patch 1 to mitigate this issue. I put it in this series in order
> > > > to better track the performance trends.
> > > Series:
> > > 
> > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > 
> > > Can this go into net-next?
> > > 
> > I think so.
> > Michael, Stefan thanks to ack the series!
> > 
> > Should I resend it with net-next tag?
> > 
> > Thanks,
> > Stefano
> 
> 
> I think so.

Okay, I'll resend it!

Thanks,
Stefano

^ permalink raw reply

* [PATCH v3] net/socket: fix GCC8+ Wpacked-not-aligned warnings
From: Qian Cai @ 2019-07-30 15:30 UTC (permalink / raw)
  To: davem
  Cc: vyasevich, nhorman, marcelo.leitner, David.Laight, linux-sctp,
	netdev, linux-kernel, Qian Cai

There are a lot of those warnings with GCC8+ 64-bit,

In file included from ./include/linux/sctp.h:42,
                 from net/core/skbuff.c:47:
./include/uapi/linux/sctp.h:395:1: warning: alignment 4 of 'struct
sctp_paddr_change' is less than 8 [-Wpacked-not-aligned]
 } __attribute__((packed, aligned(4)));
 ^
./include/uapi/linux/sctp.h:728:1: warning: alignment 4 of 'struct
sctp_setpeerprim' is less than 8 [-Wpacked-not-aligned]
 } __attribute__((packed, aligned(4)));
 ^
./include/uapi/linux/sctp.h:727:26: warning: 'sspp_addr' offset 4 in
'struct sctp_setpeerprim' isn't aligned to 8 [-Wpacked-not-aligned]
  struct sockaddr_storage sspp_addr;
                          ^~~~~~~~~
./include/uapi/linux/sctp.h:741:1: warning: alignment 4 of 'struct
sctp_prim' is less than 8 [-Wpacked-not-aligned]
 } __attribute__((packed, aligned(4)));
 ^
./include/uapi/linux/sctp.h:740:26: warning: 'ssp_addr' offset 4 in
'struct sctp_prim' isn't aligned to 8 [-Wpacked-not-aligned]
  struct sockaddr_storage ssp_addr;
                          ^~~~~~~~
./include/uapi/linux/sctp.h:792:1: warning: alignment 4 of 'struct
sctp_paddrparams' is less than 8 [-Wpacked-not-aligned]
 } __attribute__((packed, aligned(4)));
 ^
./include/uapi/linux/sctp.h:784:26: warning: 'spp_address' offset 4 in
'struct sctp_paddrparams' isn't aligned to 8 [-Wpacked-not-aligned]
  struct sockaddr_storage spp_address;
                          ^~~~~~~~~~~
./include/uapi/linux/sctp.h:905:1: warning: alignment 4 of 'struct
sctp_paddrinfo' is less than 8 [-Wpacked-not-aligned]
 } __attribute__((packed, aligned(4)));
 ^
./include/uapi/linux/sctp.h:899:26: warning: 'spinfo_address' offset 4
in 'struct sctp_paddrinfo' isn't aligned to 8 [-Wpacked-not-aligned]
  struct sockaddr_storage spinfo_address;
                          ^~~~~~~~~~~~~~

This is because the commit 20c9c825b12f ("[SCTP] Fix SCTP socket options
to work with 32-bit apps on 64-bit kernels.") added "packed, aligned(4)"
GCC attributes to some structures but one of the members, i.e, "struct
sockaddr_storage" in those structures has the attribute,
"aligned(__alignof__ (struct sockaddr *)" which is 8-byte on 64-bit
systems, so the commit overwrites the designed alignments for
"sockaddr_storage".

To fix this, "struct sockaddr_storage" needs to be aligned to 4-byte as
it is only used in those packed sctp structure which is part of UAPI,
and "struct __kernel_sockaddr_storage" is used in some other
places of UAPI that need not to change alignments in order to not
breaking userspace.

Use an implicit alignment for "struct __kernel_sockaddr_storage" so it
can keep the same alignments as a member in both packed and un-packed
structures without breaking UAPI.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Qian Cai <cai@lca.pw>
---

v3: Add some comments and rearrange the public member first per David.
v2: Use an implicit alignment for "struct __kernel_sockaddr_storage".

 include/uapi/linux/socket.h | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/socket.h b/include/uapi/linux/socket.h
index 8eb96021709c..c3409c8ec0dd 100644
--- a/include/uapi/linux/socket.h
+++ b/include/uapi/linux/socket.h
@@ -6,17 +6,24 @@
  * Desired design of maximum size and alignment (see RFC2553)
  */
 #define _K_SS_MAXSIZE	128	/* Implementation specific max size */
-#define _K_SS_ALIGNSIZE	(__alignof__ (struct sockaddr *))
-				/* Implementation specific desired alignment */
 
 typedef unsigned short __kernel_sa_family_t;
 
+/*
+ * The definition uses anonymous union and struct in order to control the
+ * default alignment.
+ */
 struct __kernel_sockaddr_storage {
-	__kernel_sa_family_t	ss_family;		/* address family */
-	/* Following field(s) are implementation specific */
-	char		__data[_K_SS_MAXSIZE - sizeof(unsigned short)];
+	union {
+		struct {
+			__kernel_sa_family_t	ss_family; /* address family */
+			/* Following field(s) are implementation specific */
+			char __data[_K_SS_MAXSIZE - sizeof(unsigned short)];
 				/* space to achieve desired size, */
 				/* _SS_MAXSIZE value minus size of ss_family */
-} __attribute__ ((aligned(_K_SS_ALIGNSIZE)));	/* force desired alignment */
+		};
+		void *__align; /* implementation specific desired alignment */
+	};
+};
 
 #endif /* _UAPI_LINUX_SOCKET_H */
-- 
1.8.3.1


^ permalink raw reply related

* RE: [PATCH] enetc: Fix build error without PHYLIB
From: Claudiu Manoil @ 2019-07-30 15:28 UTC (permalink / raw)
  To: YueHaibing, davem@davemloft.net
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20190730142959.50892-1-yuehaibing@huawei.com>

>-----Original Message-----
>From: YueHaibing <yuehaibing@huawei.com>
>Sent: Tuesday, July 30, 2019 5:30 PM
>To: Claudiu Manoil <claudiu.manoil@nxp.com>; davem@davemloft.net
>Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org; YueHaibing
><yuehaibing@huawei.com>
>Subject: [PATCH] enetc: Fix build error without PHYLIB
>
>If PHYLIB is not set, build enetc will fails:
>
>drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_open':
>enetc.c: undefined reference to `phy_disconnect'
>enetc.c: undefined reference to `phy_start'
>drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_close':
>enetc.c: undefined reference to `phy_stop'
>enetc.c: undefined reference to `phy_disconnect'
>drivers/net/ethernet/freescale/enetc/enetc_ethtool.o: undefined reference to
>`phy_ethtool_get_link_ksettings'
>drivers/net/ethernet/freescale/enetc/enetc_ethtool.o: undefined reference to
>`phy_ethtool_set_link_ksettings'
>drivers/net/ethernet/freescale/enetc/enetc_mdio.o: In function
>`enetc_mdio_probe':
>enetc_mdio.c: undefined reference to `mdiobus_alloc_size'
>enetc_mdio.c: undefined reference to `mdiobus_free'
>
>Reported-by: Hulk Robot <hulkci@huawei.com>
>Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
>Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com>

^ permalink raw reply

* Re: [PATCH net v3] net: neigh: fix multiple neigh timer scheduling
From: Lorenzo Bianconi @ 2019-07-30 15:19 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: davem, netdev, dsahern, marek
In-Reply-To: <alpine.LFD.2.21.1907211606200.3535@ja.home.ssi.bg>

[-- Attachment #1: Type: text/plain, Size: 5867 bytes --]

> 
> 	Hello,

Hi Julian,

sorry for the late reply, I was AFK

> 
> On Sun, 14 Jul 2019, Lorenzo Bianconi wrote:
> 
> > Neigh timer can be scheduled multiple times from userspace adding
> 
> 	If the garbage comes from ndm_state, why we should create
> a patch that just covers the problem?:
> 
> State: INCOMPLETE, STALE, FAILED, 0x8400 (0x8425)
> 
> 	User space is trying to create entry that is both
> STALE (no timer) and INCOMPLETE (with timer). So, in the
> 2nd NL message __neigh_event_send() detects timer with NUD_STALE
> bit. What if this 2nd message never comes? Such inconsistence
> between nud_state and the timer can trigger other bugs in
> other functions.

I guess this patch is not harmful since it does nothing if we are not in
IN_TIMER and it fixes an issue if for some reason we hit this code and we
have already scheduled the neigh timer (other parts of the state machine
do the same). Anyway I agree we could add some sanity checks to inputs from
userspace.

Regards,
Lorenzo

> 
> 	May be we just need to restrict ndm_state and to drop
> this patch, for example, by adding checks in __neigh_update():
> 
>         if (!(flags & NEIGH_UPDATE_F_ADMIN) &&
>             (old & (NUD_NOARP | NUD_PERMANENT)))
>                 goto out;
> +	/* State must be single bit or 0 */
> +	if (new & (new - 1))
> +		goto out;
>         if (neigh->dead) {
> 
> 	If needed, this check can be moved only for ndm_state
> in neigh_add().
> 
> > multiple neigh entries and forcing the neigh timer scheduling passing
> > NTF_USE in the netlink requests.
> > This will result in a refcount leak and in the following dump stack:
> 
> 	It is a single create with multiple bits in state with following
> __neigh_event_send(). And who knows, this bug may exist even in Linux 2.2 
> and below...
> 
> > [   32.465295] NEIGH: BUG, double timer add, state is 8
> > [   32.465308] CPU: 0 PID: 416 Comm: double_timer_ad Not tainted 5.2.0+ #65
> > [   32.465311] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
> > [   32.465313] Call Trace:
> > [   32.465318]  dump_stack+0x7c/0xc0
> > [   32.465323]  __neigh_event_send+0x20c/0x880
> > [   32.465326]  ? ___neigh_create+0x846/0xfb0
> > [   32.465329]  ? neigh_lookup+0x2a9/0x410
> > [   32.465332]  ? neightbl_fill_info.constprop.0+0x800/0x800
> > [   32.465334]  neigh_add+0x4f8/0x5e0
> > [   32.465337]  ? neigh_xmit+0x620/0x620
> > [   32.465341]  ? find_held_lock+0x85/0xa0
> > [   32.465345]  rtnetlink_rcv_msg+0x204/0x570
> > [   32.465348]  ? rtnl_dellink+0x450/0x450
> > [   32.465351]  ? mark_held_locks+0x90/0x90
> > [   32.465354]  ? match_held_lock+0x1b/0x230
> > [   32.465357]  netlink_rcv_skb+0xc4/0x1d0
> > [   32.465360]  ? rtnl_dellink+0x450/0x450
> > [   32.465363]  ? netlink_ack+0x420/0x420
> > [   32.465366]  ? netlink_deliver_tap+0x115/0x560
> > [   32.465369]  ? __alloc_skb+0xc9/0x2f0
> > [   32.465372]  netlink_unicast+0x270/0x330
> > [   32.465375]  ? netlink_attachskb+0x2f0/0x2f0
> > [   32.465378]  netlink_sendmsg+0x34f/0x5a0
> > [   32.465381]  ? netlink_unicast+0x330/0x330
> > [   32.465385]  ? move_addr_to_kernel.part.0+0x20/0x20
> > [   32.465388]  ? netlink_unicast+0x330/0x330
> > [   32.465391]  sock_sendmsg+0x91/0xa0
> > [   32.465394]  ___sys_sendmsg+0x407/0x480
> > [   32.465397]  ? copy_msghdr_from_user+0x200/0x200
> > [   32.465401]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> > [   32.465404]  ? lockdep_hardirqs_on+0x17d/0x250
> > [   32.465407]  ? __wake_up_common_lock+0xcb/0x110
> > [   32.465410]  ? __wake_up_common+0x230/0x230
> > [   32.465413]  ? netlink_bind+0x3e1/0x490
> > [   32.465416]  ? netlink_setsockopt+0x540/0x540
> > [   32.465420]  ? __fget_light+0x9c/0xf0
> > [   32.465423]  ? sockfd_lookup_light+0x8c/0xb0
> > [   32.465426]  __sys_sendmsg+0xa5/0x110
> > [   32.465429]  ? __ia32_sys_shutdown+0x30/0x30
> > [   32.465432]  ? __fd_install+0xe1/0x2c0
> > [   32.465435]  ? lockdep_hardirqs_off+0xb5/0x100
> > [   32.465438]  ? mark_held_locks+0x24/0x90
> > [   32.465441]  ? do_syscall_64+0xf/0x270
> > [   32.465444]  do_syscall_64+0x63/0x270
> > [   32.465448]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
> > receiving a netlink request with NTF_USE flag set
> > 
> > Reported-by: Marek Majkowski <marek@cloudflare.com>
> > Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
> > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> > ---
> > Changes since v2:
> > - remove check_timer flag and run neigh_del_timer directly
> > Changes since v1:
> > - fix compilation errors defining neigh_event_send_check_timer routine
> > ---
> >  net/core/neighbour.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > index 742cea4ce72e..0dfc97bc8760 100644
> > --- a/net/core/neighbour.c
> > +++ b/net/core/neighbour.c
> > @@ -1124,6 +1124,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
> >  
> >  			atomic_set(&neigh->probes,
> >  				   NEIGH_VAR(neigh->parms, UCAST_PROBES));
> > +			neigh_del_timer(neigh);
> >  			neigh->nud_state     = NUD_INCOMPLETE;
> >  			neigh->updated = now;
> >  			next = now + max(NEIGH_VAR(neigh->parms, RETRANS_TIME),
> > @@ -1140,6 +1141,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
> >  		}
> >  	} else if (neigh->nud_state & NUD_STALE) {
> >  		neigh_dbg(2, "neigh %p is delayed\n", neigh);
> > +		neigh_del_timer(neigh);
> >  		neigh->nud_state = NUD_DELAY;
> >  		neigh->updated = jiffies;
> >  		neigh_add_timer(neigh, jiffies +
> > -- 
> > 2.21.0
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: linux-next: Tree for Jul 30 (drivers/net/phy/mdio-octeon.c: i386)
From: Randy Dunlap @ 2019-07-30 15:00 UTC (permalink / raw)
  To: Stephen Rothwell, Linux Next Mailing List
  Cc: Linux Kernel Mailing List, netdev@vger.kernel.org, David Daney
In-Reply-To: <20190730151527.08f611b8@canb.auug.org.au>

On 7/29/19 10:15 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20190729:
> 

on i386:

../drivers/net/phy/mdio-octeon.c: In function ‘octeon_mdiobus_probe’:
../drivers/net/phy/mdio-octeon.c:48:3: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   (u64)devm_ioremap(&pdev->dev, mdio_phys, regsize);
   ^
In file included from ../drivers/net/phy/mdio-octeon.c:14:0:
../drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of function ‘writeq’; did you mean ‘writel’? [-Werror=implicit-function-declaration]
 #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                    ^
../drivers/net/phy/mdio-octeon.c:56:2: note: in expansion of macro ‘oct_mdio_writeq’
  oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
  ^~~~~~~~~~~~~~~
../drivers/net/phy/mdio-cavium.h:111:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
 #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                                ^
../drivers/net/phy/mdio-octeon.c:56:2: note: in expansion of macro ‘oct_mdio_writeq’
  oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
  ^~~~~~~~~~~~~~~
../drivers/net/phy/mdio-cavium.h:111:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
 #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                                ^
../drivers/net/phy/mdio-octeon.c:77:2: note: in expansion of macro ‘oct_mdio_writeq’
  oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
  ^~~~~~~~~~~~~~~
../drivers/net/phy/mdio-octeon.c: In function ‘octeon_mdiobus_remove’:
../drivers/net/phy/mdio-cavium.h:111:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
 #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                                ^
../drivers/net/phy/mdio-octeon.c:91:2: note: in expansion of macro ‘oct_mdio_writeq’
  oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
  ^~~~~~~~~~~~~~~



-- 
~Randy

^ permalink raw reply

* RE: [PATCH v2] net/socket: fix GCC8+ Wpacked-not-aligned warnings
From: David Laight @ 2019-07-30 14:59 UTC (permalink / raw)
  To: 'Qian Cai', davem@davemloft.net
  Cc: vyasevich@gmail.com, nhorman@tuxdriver.com,
	marcelo.leitner@gmail.com, linux-sctp@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1564497488-3030-1-git-send-email-cai@lca.pw>

From: Qian Cai
> Sent: 30 July 2019 15:38
...
> Use an implicit alignment for "struct __kernel_sockaddr_storage" so it
> can keep the same alignments as a member in both packed and un-packed
> structures without breaking UAPI.
> 
> Suggested-by: David Laight <David.Laight@ACULAB.COM>
...

Although I suggested it needs a bit of tidy up.

Add a comment maybe:

/* The definition uses anonymous union and struct in order to
 * control the default alignment. */

>  struct __kernel_sockaddr_storage {
> -	__kernel_sa_family_t	ss_family;		/* address family */
> -	/* Following field(s) are implementation specific */
> -	char		__data[_K_SS_MAXSIZE - sizeof(unsigned short)];
> +	union {
> +		void *__align; /* implementation specific desired alignment */

Move the 'void *' below the struct so the first member is the 'public one.

> +		struct {
> +			__kernel_sa_family_t	ss_family; /* address family */
> +			/* Following field(s) are implementation specific */
> +			char __data[_K_SS_MAXSIZE - sizeof(unsigned short)];
>  				/* space to achieve desired size, */
>  				/* _SS_MAXSIZE value minus size of ss_family */
> -} __attribute__ ((aligned(_K_SS_ALIGNSIZE)));	/* force desired alignment */
> +			};
> +		};
> +};
> 
>  #endif /* _UAPI_LINUX_SOCKET_H */
> --
> 1.8.3.1

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox