Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net v2 2/2] net: phy: Check harder for errors in get_phy_id()
From: Florian Fainelli @ 2020-06-19 18:47 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Jakub Kicinski, Rob Herring, Frank Rowand,
	Dajun Jin, Alexandre Belloni, open list,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE
In-Reply-To: <20200619184747.16606-1-f.fainelli@gmail.com>

Commit 02a6efcab675 ("net: phy: allow scanning busses with missing
phys") added a special condition to return -ENODEV in case -ENODEV or
-EIO was returned from the first read of the MII_PHYSID1 register.

In case the MDIO bus data line pull-up is not strong enough, the MDIO
bus controller will not flag this as a read error. This can happen when
a pluggable daughter card is not connected and weak internal pull-ups
are used (since that is the only option, otherwise the pins are
floating).

The second read of MII_PHYSID2 will be correctly flagged an error
though, but now we will return -EIO which will be treated as a hard
error, thus preventing MDIO bus scanning loops to continue succesfully.

Apply the same logic to both register reads, thus allowing the scanning
logic to proceed.

Fixes: 02a6efcab675 ("net: phy: allow scanning busses with missing phys")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/phy_device.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 04946de74fa0..85ba95b598b5 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -794,8 +794,10 @@ static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id,
 
 	/* Grab the bits from PHYIR2, and put them in the lower half */
 	phy_reg = mdiobus_read(bus, addr, MII_PHYSID2);
-	if (phy_reg < 0)
-		return -EIO;
+	if (phy_reg < 0) {
+		/* returning -ENODEV doesn't stop bus scanning */
+		return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
+	}
 
 	*phy_id |= phy_reg;
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH net v2 0/2] net: phy: MDIO bus scanning fixes
From: Florian Fainelli @ 2020-06-19 18:47 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Jakub Kicinski, Rob Herring, Frank Rowand,
	Dajun Jin, Alexandre Belloni, open list,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE

Hi all,

This patch series fixes two problems with the current MDIO bus scanning
logic which was identified while moving from 4.9 to 5.4 on devices that
do rely on scanning the MDIO bus at runtime because they use pluggable
cards.

Changes in v2:

- added comment explaining the special value of -ENODEV
- added Andrew's Reviewed-by tag

Florian Fainelli (2):
  of: of_mdio: Correct loop scanning logic
  net: phy: Check harder for errors in get_phy_id()

 drivers/net/phy/phy_device.c | 6 ++++--
 drivers/of/of_mdio.c         | 9 +++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH net 3/3] net: ethtool: sync netdev_features_strings order with enum netdev_features
From: Alexander Lobakin @ 2020-06-19 18:39 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Michal Kubecek, Florian Fainelli, Andrew Lunn, Jiri Pirko,
	Antoine Tenart, Steffen Klassert, Aya Levin, Tom Herbert,
	Alexander Lobakin, netdev, linux-kernel
In-Reply-To: <x6AQUs_HEHFh9N-5HYIEIDvv9krP6Fg6OgEuqUBC6jHmWwaeXSkyLVi05uelpCPAZXlXKlJqbJk8ox3xkIs33KVna41w5es0wJlc-cQhb8g=@pm.me>

The ordering of netdev_features_strings[] makes no sense when it comes
to user interaction, as list of features in `ethtool -k` input is sorted
according to the corresponding bit's position.
Instead, it *does* make sense when it comes to adding new netdev_features
or modifying existing ones. We have at least 2 occasions of forgetting to
add the strings for newly introduced features, and one of them existed
since 3.1x times till now.

Let's keep this stringtable sorted according to bit's position in enum
netdev_features, as this simplifies both reading and modification of the
source code and can help not to miss or forget anything.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/ethtool/common.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index c8e3fce6e48d..24f35d47832d 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -8,25 +8,25 @@
 const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
 	[NETIF_F_SG_BIT]			= "tx-scatter-gather",
 	[NETIF_F_IP_CSUM_BIT]			= "tx-checksum-ipv4",
+
+	/* __UNUSED_NETIF_F_1 - deprecated */
+
 	[NETIF_F_HW_CSUM_BIT]			= "tx-checksum-ip-generic",
 	[NETIF_F_IPV6_CSUM_BIT]			= "tx-checksum-ipv6",
 	[NETIF_F_HIGHDMA_BIT]			= "highdma",
 	[NETIF_F_FRAGLIST_BIT]			= "tx-scatter-gather-fraglist",
 	[NETIF_F_HW_VLAN_CTAG_TX_BIT]		= "tx-vlan-hw-insert",
-
 	[NETIF_F_HW_VLAN_CTAG_RX_BIT]		= "rx-vlan-hw-parse",
 	[NETIF_F_HW_VLAN_CTAG_FILTER_BIT]	= "rx-vlan-filter",
-	[NETIF_F_HW_VLAN_STAG_TX_BIT]		= "tx-vlan-stag-hw-insert",
-	[NETIF_F_HW_VLAN_STAG_RX_BIT]		= "rx-vlan-stag-hw-parse",
-	[NETIF_F_HW_VLAN_STAG_FILTER_BIT]	= "rx-vlan-stag-filter",
 	[NETIF_F_VLAN_CHALLENGED_BIT]		= "vlan-challenged",
 	[NETIF_F_GSO_BIT]			= "tx-generic-segmentation",
 	[NETIF_F_LLTX_BIT]			= "tx-lockless",
 	[NETIF_F_NETNS_LOCAL_BIT]		= "netns-local",
 	[NETIF_F_GRO_BIT]			= "rx-gro",
-	[NETIF_F_GRO_HW_BIT]			= "rx-gro-hw",
 	[NETIF_F_LRO_BIT]			= "rx-lro",
 
+	/* NETIF_F_GSO_SHIFT = NETIF_F_TSO_BIT */
+
 	[NETIF_F_TSO_BIT]			= "tx-tcp-segmentation",
 	[NETIF_F_GSO_ROBUST_BIT]		= "tx-gso-robust",
 	[NETIF_F_TSO_ECN_BIT]			= "tx-tcp-ecn-segmentation",
@@ -43,9 +43,14 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
 	[NETIF_F_GSO_TUNNEL_REMCSUM_BIT]	= "tx-tunnel-remcsum-segmentation",
 	[NETIF_F_GSO_SCTP_BIT]			= "tx-sctp-segmentation",
 	[NETIF_F_GSO_ESP_BIT]			= "tx-esp-segmentation",
+
+	/* NETIF_F_GSO_UDP_BIT - deprecated */
+
 	[NETIF_F_GSO_UDP_L4_BIT]		= "tx-udp-segmentation",
 	[NETIF_F_GSO_FRAGLIST_BIT]		= "tx-gso-list",
 
+	/* NETIF_F_GSO_LAST = NETIF_F_GSO_FRAGLIST_BIT */
+
 	[NETIF_F_FCOE_CRC_BIT]			= "tx-checksum-fcoe-crc",
 	[NETIF_F_SCTP_CRC_BIT]			= "tx-checksum-sctp",
 	[NETIF_F_FCOE_MTU_BIT]			= "fcoe-mtu",
@@ -56,16 +61,25 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
 	[NETIF_F_LOOPBACK_BIT]			= "loopback",
 	[NETIF_F_RXFCS_BIT]			= "rx-fcs",
 	[NETIF_F_RXALL_BIT]			= "rx-all",
+	[NETIF_F_HW_VLAN_STAG_TX_BIT]		= "tx-vlan-stag-hw-insert",
+	[NETIF_F_HW_VLAN_STAG_RX_BIT]		= "rx-vlan-stag-hw-parse",
+	[NETIF_F_HW_VLAN_STAG_FILTER_BIT]	= "rx-vlan-stag-filter",
 	[NETIF_F_HW_L2FW_DOFFLOAD_BIT]		= "l2-fwd-offload",
+
 	[NETIF_F_HW_TC_BIT]			= "hw-tc-offload",
 	[NETIF_F_HW_ESP_BIT]			= "esp-hw-offload",
 	[NETIF_F_HW_ESP_TX_CSUM_BIT]		= "esp-tx-csum-hw-offload",
 	[NETIF_F_RX_UDP_TUNNEL_PORT_BIT]	= "rx-udp_tunnel-port-offload",
-	[NETIF_F_HW_TLS_RECORD_BIT]		= "tls-hw-record",
 	[NETIF_F_HW_TLS_TX_BIT]			= "tls-hw-tx-offload",
 	[NETIF_F_HW_TLS_RX_BIT]			= "tls-hw-rx-offload",
+
+	[NETIF_F_GRO_HW_BIT]			= "rx-gro-hw",
+	[NETIF_F_HW_TLS_RECORD_BIT]		= "tls-hw-record",
 	[NETIF_F_GRO_FRAGLIST_BIT]		= "rx-gro-list",
+
 	[NETIF_F_HW_MACSEC_BIT]			= "macsec-hw-offload",
+
+	/* NETDEV_FEATURE_COUNT */
 };
 
 const char
-- 
2.27.0



^ permalink raw reply related

* Re: [PATCH net 2/2] net: phy: Check harder for errors in get_phy_id()
From: Florian Fainelli @ 2020-06-19 18:42 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Andrew Lunn
  Cc: netdev, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Rob Herring, Frank Rowand, Dajun Jin, Alexandre Belloni,
	open list, open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE
In-Reply-To: <20200619133040.GO1551@shell.armlinux.org.uk>



On 6/19/2020 6:30 AM, Russell King - ARM Linux admin wrote:
> On Fri, Jun 19, 2020 at 03:26:59PM +0200, Andrew Lunn wrote:
>> On Thu, Jun 18, 2020 at 09:47:59PM -0700, Florian Fainelli wrote:
>>> Commit 02a6efcab675 ("net: phy: allow scanning busses with missing
>>> phys") added a special condition to return -ENODEV in case -ENODEV or
>>> -EIO was returned from the first read of the MII_PHYSID1 register.
>>>
>>> In case the MDIO bus data line pull-up is not strong enough, the MDIO
>>> bus controller will not flag this as a read error. This can happen when
>>> a pluggable daughter card is not connected and weak internal pull-ups
>>> are used (since that is the only option, otherwise the pins are
>>> floating).
>>>
>>> The second read of MII_PHYSID2 will be correctly flagged an error
>>> though, but now we will return -EIO which will be treated as a hard
>>> error, thus preventing MDIO bus scanning loops to continue succesfully.
>>>
>>> Apply the same logic to both register reads, thus allowing the scanning
>>> logic to proceed.
>>
>> Hi Florian
>>
>> Maybe extend the kerneldoc for this function to document the return
>> values and there special meanings?
> 
> You mean like the patch I sent yesterday?
> 
>> BTW: Did you look at get_phy_c45_ids()? Is it using the correct return
>> value? Given the current work being done to extend scanning to C45,
>> maybe it needs reviewing for issues like this.
> 
> And the updates I sent for this yesterday? ;)

When Russell's patches land, they will address this correctly and
because I did not want to introduce any conflicts, this is not addressed
by this two patch series.
-- 
Florian

^ permalink raw reply

* Re: [PATCH bpf-next 1/2] bpf: switch most helper return values from 32-bit int to 64-bit long
From: Andrii Nakryiko @ 2020-06-19 18:41 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: John Fastabend, Andrii Nakryiko, bpf, Networking,
	Alexei Starovoitov, Kernel Team
In-Reply-To: <45321002-2676-0f5b-c729-5526e503ebd2@iogearbox.net>

On Fri, Jun 19, 2020 at 6:08 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 6/19/20 2:39 AM, John Fastabend wrote:
> > John Fastabend wrote:
> >> Andrii Nakryiko wrote:
> >>> On Thu, Jun 18, 2020 at 11:58 AM John Fastabend
> >>> <john.fastabend@gmail.com> wrote:
> >
> > [...]
> >
> >>> That would be great. Self-tests do work, but having more testing with
> >>> real-world application would certainly help as well.
> >>
> >> Thanks for all the follow up.
> >>
> >> I ran the change through some CI on my side and it passed so I can
> >> complain about a few shifts here and there or just update my code or
> >> just not change the return types on my side but I'm convinced its OK
> >> in most cases and helps in some so...
> >>
> >> Acked-by: John Fastabend <john.fastabend@gmail.com>
> >
> > I'll follow this up with a few more selftests to capture a couple of our
> > patterns. These changes are subtle and I worry a bit that additional
> > <<,s>> pattern could have the potential to break something.
> >
> > Another one we didn't discuss that I found in our code base is feeding
> > the output of a probe_* helper back into the size field (after some
> > alu ops) of subsequent probe_* call. Unfortunately, the tests I ran
> > today didn't cover that case.
> >
> > I'll put it on the list tomorrow and encode these in selftests. I'll
> > let the mainainers decide if they want to wait for those or not.
>
> Given potential fragility on verifier side, my preference would be that we
> have the known variations all covered in selftests before moving forward in
> order to make sure they don't break in any way. Back in [0] I've seen mostly
> similar cases in the way John mentioned in other projects, iirc, sysdig was
> another one. If both of you could hack up the remaining cases we need to
> cover and then submit a combined series, that would be great. I don't think
> we need to rush this optimization w/o necessary selftests.

There is no rush, but there is also no reason to delay it. I'd rather
land it early in the libbpf release cycle and let people try it in
their prod environments, for those concerned about such code patterns.

I don't have a list of all the patterns that we might need to test.
Going through all open-source BPF source code to identify possible
patterns and then coding them up in minimal selftests is a bit too
much for me, honestly. Additionally, some of those patterns will most
probably be broken in no-ALU32 and making them work with assembly and
other clever tricks is actually where the majority of time usually
goes. Also, simple selftests might not actually trigger pathological
codegen cases (because in a lot of cases register spill/pressure
triggers different codegen patterns). So I just don't believe we can
have a full piece of mind, regardless of how many selftests we add.
This test_varlen selftest is a simplification of a production code
we've had for a long while. We never bothered to contribute it as a
selftest before, which I'd say is our fault as users of BPF. Anyone
interested in ensuring regressions get detected for the way they write
BPF code, should distill them into selftests and contribute to our
test suite (like we did with PyPerf, Strobemeta, and how Jakub
Sitnicki did recently with his program).

So sure, maintainers might decide to not land this because of
potential regressions, but I tried to do my best to explain why there
shouldn't be really regressions (after all, int -> long reflects
*reality*, where everything is u64/s64 on return from BPF helper),
apart from actually testing for two patterns I knew about.

After all, even in case of regression, doing `int bla =
(int)bpf_helper_whatever(...);` is in theory equivalent to what we had
before, so it's an easy fix. Reality might require an extra compiler
barrier after that to force Clang to emit casting instructions sooner,
but that's another story.

>
> Thanks everyone,
> Daniel
>
>    [0] https://lore.kernel.org/bpf/20200421125822.14073-1-daniel@iogearbox.net/

^ permalink raw reply

* Re: [PATCH] [net/sched] Fix null pointer deref skb in tc_ctl_action
From: kernel test robot @ 2020-06-19 18:39 UTC (permalink / raw)
  To: Gaurav Singh, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jakub Kicinski, open list:TC subsystem,
	open list
  Cc: kbuild-all, netdev
In-Reply-To: <20200618014328.28668-1-gaurav1086@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5674 bytes --]

Hi Gaurav,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.8-rc1 next-20200618]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Gaurav-Singh/Fix-null-pointer-deref-skb-in-tc_ctl_action/20200618-094734
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 1b5044021070efa3259f3e9548dc35d1eb6aa844
config: mips-randconfig-r004-20200619 (attached as .config)
compiler: mipsel-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

net/sched/act_api.c: In function 'tc_ctl_action':
>> net/sched/act_api.c:1479:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
1479 |  struct net *net = sock_net(skb->sk);
|  ^~~~~~

vim +1479 net/sched/act_api.c

90825b23a887f0 Jamal Hadi Salim  2017-07-30  1472  
c21ef3e343ae91 David Ahern       2017-04-16  1473  static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n,
c21ef3e343ae91 David Ahern       2017-04-16  1474  			 struct netlink_ext_ack *extack)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1475  {
63de5fbd31b331 Gaurav Singh      2020-06-17  1476  	if (!skb)
63de5fbd31b331 Gaurav Singh      2020-06-17  1477  		return 0;
63de5fbd31b331 Gaurav Singh      2020-06-17  1478  
3b1e0a655f8eba YOSHIFUJI Hideaki 2008-03-26 @1479  	struct net *net = sock_net(skb->sk);
90825b23a887f0 Jamal Hadi Salim  2017-07-30  1480  	struct nlattr *tca[TCA_ROOT_MAX + 1];
63de5fbd31b331 Gaurav Singh      2020-06-17  1481  	u32 portid = NETLINK_CB(skb).portid;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1482  	int ret = 0, ovr = 0;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1483  
0b0f43fe2e7291 Jamal Hadi Salim  2016-06-05  1484  	if ((n->nlmsg_type != RTM_GETACTION) &&
0b0f43fe2e7291 Jamal Hadi Salim  2016-06-05  1485  	    !netlink_capable(skb, CAP_NET_ADMIN))
dfc47ef8639fac Eric W. Biederman 2012-11-16  1486  		return -EPERM;
dfc47ef8639fac Eric W. Biederman 2012-11-16  1487  
8cb081746c031f Johannes Berg     2019-04-26  1488  	ret = nlmsg_parse_deprecated(n, sizeof(struct tcamsg), tca,
8cb081746c031f Johannes Berg     2019-04-26  1489  				     TCA_ROOT_MAX, NULL, extack);
7ba699c604ab81 Patrick McHardy   2008-01-22  1490  	if (ret < 0)
7ba699c604ab81 Patrick McHardy   2008-01-22  1491  		return ret;
7ba699c604ab81 Patrick McHardy   2008-01-22  1492  
7ba699c604ab81 Patrick McHardy   2008-01-22  1493  	if (tca[TCA_ACT_TAB] == NULL) {
84ae017a007764 Alexander Aring   2018-02-15  1494  		NL_SET_ERR_MSG(extack, "Netlink action attributes missing");
^1da177e4c3f41 Linus Torvalds    2005-04-16  1495  		return -EINVAL;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1496  	}
^1da177e4c3f41 Linus Torvalds    2005-04-16  1497  
cc7ec456f82da7 Eric Dumazet      2011-01-19  1498  	/* n->nlmsg_flags & NLM_F_CREATE */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1499  	switch (n->nlmsg_type) {
^1da177e4c3f41 Linus Torvalds    2005-04-16  1500  	case RTM_NEWACTION:
^1da177e4c3f41 Linus Torvalds    2005-04-16  1501  		/* we are going to assume all other flags
25985edcedea63 Lucas De Marchi   2011-03-30  1502  		 * imply create only if it doesn't exist
^1da177e4c3f41 Linus Torvalds    2005-04-16  1503  		 * Note that CREATE | EXCL implies that
^1da177e4c3f41 Linus Torvalds    2005-04-16  1504  		 * but since we want avoid ambiguity (eg when flags
^1da177e4c3f41 Linus Torvalds    2005-04-16  1505  		 * is zero) then just set this
^1da177e4c3f41 Linus Torvalds    2005-04-16  1506  		 */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1507  		if (n->nlmsg_flags & NLM_F_REPLACE)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1508  			ovr = 1;
aea0d727899140 Alexander Aring   2018-02-15  1509  		ret = tcf_action_add(net, tca[TCA_ACT_TAB], n, portid, ovr,
aea0d727899140 Alexander Aring   2018-02-15  1510  				     extack);
^1da177e4c3f41 Linus Torvalds    2005-04-16  1511  		break;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1512  	case RTM_DELACTION:
7316ae88c43d47 Tom Goff          2010-03-19  1513  		ret = tca_action_gd(net, tca[TCA_ACT_TAB], n,
84ae017a007764 Alexander Aring   2018-02-15  1514  				    portid, RTM_DELACTION, extack);
^1da177e4c3f41 Linus Torvalds    2005-04-16  1515  		break;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1516  	case RTM_GETACTION:
7316ae88c43d47 Tom Goff          2010-03-19  1517  		ret = tca_action_gd(net, tca[TCA_ACT_TAB], n,
84ae017a007764 Alexander Aring   2018-02-15  1518  				    portid, RTM_GETACTION, extack);
^1da177e4c3f41 Linus Torvalds    2005-04-16  1519  		break;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1520  	default:
^1da177e4c3f41 Linus Torvalds    2005-04-16  1521  		BUG();
^1da177e4c3f41 Linus Torvalds    2005-04-16  1522  	}
^1da177e4c3f41 Linus Torvalds    2005-04-16  1523  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1524  	return ret;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1525  }
^1da177e4c3f41 Linus Torvalds    2005-04-16  1526  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25077 bytes --]

^ permalink raw reply

* Re: [PATCH] linux++, this: rename "struct notifier_block *this"
From: Linus Torvalds @ 2020-06-19 18:37 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Linux Kernel Mailing List, Netdev, linux-arch, NetFilter
In-Reply-To: <20200618210645.GB2212102@localhost.localdomain>

On Thu, Jun 18, 2020 at 2:06 PM Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
> Rename
>         struct notifier_block *this
> to
>         struct notifier_block *nb
>
> "nb" is arguably a better name for notifier block.

Maybe it's a better name. But it doesn't seem worth it.

Because C++ reserved words are entirely irrelevant.

We did this same dance almost three decades ago, and the fact is, C++
has other reserved words that make it all pointless.

There is no way I will accept the renaming of various "new" variables.
We did it, it was bad, we undid it, and we now have a _lot_ more uses
of 'new' and 'old', and no, we're not changing it for a braindead
language that isn't relevant to the kernel.

The fact is, C++ chose bad identifiers to make reserved words.

If you want to build the kernel with C++, you'd be a lot better off just doing

   /* C++ braindamage */
   #define this __this
   #define new __new

and deal with that instead.

Because no, the 'new' renaming will never happen, and while 'this'
isn't nearly as common or relevant a name, once you have the same
issue with 'new', what's the point of trying to deal with 'this'?

             Linus

^ permalink raw reply

* Re: [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version
From: Eugenio Perez Martin @ 2020-06-19 18:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Michael S. Tsirkin, linux-kernel, kvm list, virtualization,
	netdev, Jason Wang
In-Reply-To: <CAJaqyWdwgy0fmReOgLfL4dAv-E+5k_7z3d9M+vHqt0aO2SmOFg@mail.gmail.com>

On Fri, Jun 19, 2020 at 8:07 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Mon, Jun 15, 2020 at 2:28 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Thu, Jun 11, 2020 at 5:22 PM Konrad Rzeszutek Wilk
> > <konrad.wilk@oracle.com> wrote:
> > >
> > > On Thu, Jun 11, 2020 at 07:34:19AM -0400, Michael S. Tsirkin wrote:
> > > > As testing shows no performance change, switch to that now.
> > >
> > > What kind of testing? 100GiB? Low latency?
> > >
> >
> > Hi Konrad.
> >
> > I tested this version of the patch:
> > https://lkml.org/lkml/2019/10/13/42
> >
> > It was tested for throughput with DPDK's testpmd (as described in
> > http://doc.dpdk.org/guides/howto/virtio_user_as_exceptional_path.html)
> > and kernel pktgen. No latency tests were performed by me. Maybe it is
> > interesting to perform a latency test or just a different set of tests
> > over a recent version.
> >
> > Thanks!
>
> I have repeated the tests with v9, and results are a little bit different:
> * If I test opening it with testpmd, I see no change between versions
> * If I forward packets between two vhost-net interfaces in the guest
> using a linux bridge in the host:
>   - netperf UDP_STREAM shows a performance increase of 1.8, almost
> doubling performance. This gets lower as frame size increase.
>   - rests of the test goes noticeably worse: UDP_RR goes from ~6347
> transactions/sec to 5830
>   - TCP_STREAM goes from ~10.7 gbps to ~7Gbps
>   - TCP_RR from 6223.64 transactions/sec to 5739.44

And I forgot to add: It seems that avoiding IOV length math helps,
since performance increases in all tests from patch 02/11 ("vhost: use
batched get_vq_desc version") to 11/11 ("vhost: drop head based
APIs").


^ permalink raw reply

* Re: [PATCH 09/11] bpf: Add d_path helper
From: Andrii Nakryiko @ 2020-06-19 18:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Networking, bpf,
	Song Liu, Yonghong Song, Martin KaFai Lau, David Miller,
	John Fastabend, Wenbo Zhang, KP Singh, Andrii Nakryiko,
	Brendan Gregg, Florent Revest, Al Viro
In-Reply-To: <20200619133124.GJ2465907@krava>

On Fri, Jun 19, 2020 at 6:31 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Thu, Jun 18, 2020 at 09:35:10PM -0700, Andrii Nakryiko wrote:
> > On Tue, Jun 16, 2020 at 3:07 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > Adding d_path helper function that returns full path
> > > for give 'struct path' object, which needs to be the
> > > kernel BTF 'path' object.
> > >
> > > The helper calls directly d_path function.
> > >
> > > Updating also bpf.h tools uapi header and adding
> > > 'path' to bpf_helpers_doc.py script.
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > >  include/linux/bpf.h            |  4 ++++
> > >  include/uapi/linux/bpf.h       | 14 ++++++++++++-
> > >  kernel/bpf/btf_ids.c           | 11 ++++++++++
> > >  kernel/trace/bpf_trace.c       | 38 ++++++++++++++++++++++++++++++++++
> > >  scripts/bpf_helpers_doc.py     |  2 ++
> > >  tools/include/uapi/linux/bpf.h | 14 ++++++++++++-
> > >  6 files changed, 81 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index a94e85c2ec50..d35265b6c574 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -1752,5 +1752,9 @@ extern int bpf_skb_output_btf_ids[];
> > >  extern int bpf_seq_printf_btf_ids[];
> > >  extern int bpf_seq_write_btf_ids[];
> > >  extern int bpf_xdp_output_btf_ids[];
> > > +extern int bpf_d_path_btf_ids[];
> > > +
> > > +extern int btf_whitelist_d_path[];
> > > +extern int btf_whitelist_d_path_cnt;
> >
> > So with suggestion from previous patch, this would be declared as:
> >
> > extern const struct btf_id_set btf_whitelist_d_path;
>
> yes
>
> SNIP
>
> > >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > >   * function eBPF program intends to call
> > > diff --git a/kernel/bpf/btf_ids.c b/kernel/bpf/btf_ids.c
> > > index d8d0df162f04..853c8fd59b06 100644
> > > --- a/kernel/bpf/btf_ids.c
> > > +++ b/kernel/bpf/btf_ids.c
> > > @@ -13,3 +13,14 @@ BTF_ID(struct, seq_file)
> > >
> > >  BTF_ID_LIST(bpf_xdp_output_btf_ids)
> > >  BTF_ID(struct, xdp_buff)
> > > +
> > > +BTF_ID_LIST(bpf_d_path_btf_ids)
> > > +BTF_ID(struct, path)
> > > +
> > > +BTF_WHITELIST_ENTRY(btf_whitelist_d_path)
> > > +BTF_ID(func, vfs_truncate)
> > > +BTF_ID(func, vfs_fallocate)
> > > +BTF_ID(func, dentry_open)
> > > +BTF_ID(func, vfs_getattr)
> > > +BTF_ID(func, filp_close)
> > > +BTF_WHITELIST_END(btf_whitelist_d_path)
> >
> > Oh, so that's why you added btf_ids.c. Do you think centralizing all
> > those BTF ID lists in one file is going to be more convenient? I lean
> > towards keeping them closer to where they are used, as it was with all
> > those helper BTF IDS. But I wonder what others think...
>
> either way works for me, but then BTF_ID_* macros needs to go
> to include/linux/btf_ids.h header right?
>

given it's internal API, I'd probably just put it in
include/linux/btf.h or include/linux/bpf.h, don't think we need extra
header just for these


> jirka
>
> >
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index c1866d76041f..0ff5d8434d40 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -1016,6 +1016,42 @@ static const struct bpf_func_proto bpf_send_signal_thread_proto = {
> > >         .arg1_type      = ARG_ANYTHING,
> > >  };
> > >
> >
> > [...]
> >
>

^ permalink raw reply

* [RFC v9 03/11] vhost/net: pass net specific struct pointer
From: Eugenio Pérez @ 2020-06-19 18:22 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

In preparation for further cleanup, pass net specific pointer
to ubuf callbacks so we can move net specific fields
out to net structures.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/net.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e992decfec53..81900ce673b4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref {
 	 */
 	atomic_t refcount;
 	wait_queue_head_t wait;
-	struct vhost_virtqueue *vq;
+	struct vhost_net_virtqueue *nvq;
 };
 
 #define VHOST_NET_BATCH 64
@@ -231,7 +231,7 @@ static void vhost_net_enable_zcopy(int vq)
 }
 
 static struct vhost_net_ubuf_ref *
-vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
+vhost_net_ubuf_alloc(struct vhost_net_virtqueue *nvq, bool zcopy)
 {
 	struct vhost_net_ubuf_ref *ubufs;
 	/* No zero copy backend? Nothing to count. */
@@ -242,7 +242,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
 		return ERR_PTR(-ENOMEM);
 	atomic_set(&ubufs->refcount, 1);
 	init_waitqueue_head(&ubufs->wait);
-	ubufs->vq = vq;
+	ubufs->nvq = nvq;
 	return ubufs;
 }
 
@@ -384,13 +384,13 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 {
 	struct vhost_net_ubuf_ref *ubufs = ubuf->ctx;
-	struct vhost_virtqueue *vq = ubufs->vq;
+	struct vhost_net_virtqueue *nvq = ubufs->nvq;
 	int cnt;
 
 	rcu_read_lock_bh();
 
 	/* set len to mark this desc buffers done DMA */
-	vq->heads[ubuf->desc].len = success ?
+	nvq->vq.heads[ubuf->desc].len = success ?
 		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
 	cnt = vhost_net_ubuf_put(ubufs);
 
@@ -402,7 +402,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	 * less than 10% of times).
 	 */
 	if (cnt <= 1 || !(cnt % 16))
-		vhost_poll_queue(&vq->poll);
+		vhost_poll_queue(&nvq->vq.poll);
 
 	rcu_read_unlock_bh();
 }
@@ -1526,7 +1526,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 	/* start polling new socket */
 	oldsock = vhost_vq_get_backend(vq);
 	if (sock != oldsock) {
-		ubufs = vhost_net_ubuf_alloc(vq,
+		ubufs = vhost_net_ubuf_alloc(nvq,
 					     sock && vhost_sock_zcopy(sock));
 		if (IS_ERR(ubufs)) {
 			r = PTR_ERR(ubufs);
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 05/11] vhost: format-independent API for used buffers
From: Eugenio Pérez @ 2020-06-19 18:22 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

Add a new API that doesn't assume used ring, heads, etc.
For now, we keep the old APIs around to make it easier
to convert drivers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/vhost.c | 73 +++++++++++++++++++++++++++++++++++++------
 drivers/vhost/vhost.h | 17 +++++++++-
 2 files changed, 79 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index ba8da8785ede..e227e0667790 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2334,13 +2334,12 @@ static void unfetch_descs(struct vhost_virtqueue *vq)
  * number of output then some number of input descriptors, it's actually two
  * iovecs, but we pack them into one and note how many of each there were.
  *
- * This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
-		      struct iovec iov[], unsigned int iov_size,
-		      unsigned int *out_num, unsigned int *in_num,
-		      struct vhost_log *log, unsigned int *log_num)
+ * This function returns a value > 0 if a descriptor was found, or 0 if none were found.
+ * A negative code is returned on error. */
+int vhost_get_avail_buf(struct vhost_virtqueue *vq, struct vhost_buf *buf,
+			struct iovec iov[], unsigned int iov_size,
+			unsigned int *out_num, unsigned int *in_num,
+			struct vhost_log *log, unsigned int *log_num)
 {
 	int ret = fetch_descs(vq);
 	int i;
@@ -2353,6 +2352,8 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 	*out_num = *in_num = 0;
 	if (unlikely(log))
 		*log_num = 0;
+	buf->in_len = buf->out_len = 0;
+	buf->descs = 0;
 
 	for (i = vq->first_desc; i < vq->ndescs; ++i) {
 		unsigned iov_count = *in_num + *out_num;
@@ -2382,6 +2383,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 			/* If this is an input descriptor,
 			 * increment that count. */
 			*in_num += ret;
+			buf->in_len += desc->len;
 			if (unlikely(log && ret)) {
 				log[*log_num].addr = desc->addr;
 				log[*log_num].len = desc->len;
@@ -2397,9 +2399,11 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 				goto err;
 			}
 			*out_num += ret;
+			buf->out_len += desc->len;
 		}
 
-		ret = desc->id;
+		buf->id = desc->id;
+		++buf->descs;
 
 		if (!(desc->flags & VRING_DESC_F_NEXT))
 			break;
@@ -2407,12 +2411,41 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
 	vq->first_desc = i + 1;
 
-	return ret;
+	return 1;
 
 err:
 	unfetch_descs(vq);
 
-	return ret ? ret : vq->num;
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vhost_get_avail_buf);
+
+/* Reverse the effect of vhost_get_avail_buf. Useful for error handling. */
+void vhost_discard_avail_bufs(struct vhost_virtqueue *vq,
+			      struct vhost_buf *buf, unsigned count)
+{
+	vhost_discard_vq_desc(vq, count);
+}
+EXPORT_SYMBOL_GPL(vhost_discard_avail_bufs);
+
+/* This function returns the descriptor number found, or vq->num (which is
+ * never a valid descriptor number) if none was found.  A negative code is
+ * returned on error. */
+int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+		      struct iovec iov[], unsigned int iov_size,
+		      unsigned int *out_num, unsigned int *in_num,
+		      struct vhost_log *log, unsigned int *log_num)
+{
+	struct vhost_buf buf;
+	int ret = vhost_get_avail_buf(vq, &buf,
+				      iov, iov_size, out_num, in_num,
+				      log, log_num);
+
+	if (likely(ret > 0))
+		return buf.id;
+	if (likely(!ret))
+		return vq->num;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
@@ -2506,6 +2539,26 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
 }
 EXPORT_SYMBOL_GPL(vhost_add_used);
 
+int vhost_put_used_buf(struct vhost_virtqueue *vq, struct vhost_buf *buf)
+{
+	return vhost_add_used(vq, buf->id, buf->in_len);
+}
+EXPORT_SYMBOL_GPL(vhost_put_used_buf);
+
+int vhost_put_used_n_bufs(struct vhost_virtqueue *vq,
+			  struct vhost_buf *bufs, unsigned count)
+{
+	unsigned i;
+
+	for (i = 0; i < count; ++i) {
+		vq->heads[i].id = cpu_to_vhost32(vq, bufs[i].id);
+		vq->heads[i].len = cpu_to_vhost32(vq, bufs[i].in_len);
+	}
+
+	return vhost_add_used_n(vq, vq->heads, count);
+}
+EXPORT_SYMBOL_GPL(vhost_put_used_n_bufs);
+
 static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
 	__u16 old, new;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index fed36af5c444..28eea0155efb 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -67,6 +67,13 @@ struct vhost_desc {
 	u16 id;
 };
 
+struct vhost_buf {
+	u32 out_len;
+	u32 in_len;
+	u16 descs;
+	u16 id;
+};
+
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
 	struct vhost_dev *dev;
@@ -195,7 +202,12 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
-
+int vhost_get_avail_buf(struct vhost_virtqueue *, struct vhost_buf *buf,
+			struct iovec iov[], unsigned int iov_count,
+			unsigned int *out_num, unsigned int *in_num,
+			struct vhost_log *log, unsigned int *log_num);
+void vhost_discard_avail_bufs(struct vhost_virtqueue *,
+			      struct vhost_buf *, unsigned count);
 int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
 int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
@@ -204,6 +216,9 @@ void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
 			       unsigned int id, int len);
 void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
+int vhost_put_used_buf(struct vhost_virtqueue *, struct vhost_buf *buf);
+int vhost_put_used_n_bufs(struct vhost_virtqueue *,
+			  struct vhost_buf *bufs, unsigned count);
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 07/11] vhost/net: avoid iov length math
From: Eugenio Pérez @ 2020-06-19 18:22 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

Now that API exposes buffer length, we no longer need to
scan IOVs to figure it out.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/net.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 871214d8c64d..3356b249bf64 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -607,11 +607,9 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net)
 }
 
 static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter,
-			    size_t hdr_size, int out)
+			    size_t len, size_t hdr_size, int out)
 {
 	/* Skip header. TODO: support TSO. */
-	size_t len = iov_length(vq->iov, out);
-
 	iov_iter_init(iter, WRITE, vq->iov, out, len);
 	iov_iter_advance(iter, hdr_size);
 
@@ -640,7 +638,7 @@ static int get_tx_bufs(struct vhost_net *net,
 	}
 
 	/* Sanity check */
-	*len = init_iov_iter(vq, &msg->msg_iter, nvq->vhost_hlen, *out);
+	*len = init_iov_iter(vq, &msg->msg_iter, buf->out_len, nvq->vhost_hlen, *out);
 	if (*len == 0) {
 		vq_err(vq, "Unexpected header len for TX: %zd expected %zd\n",
 			*len, nvq->vhost_hlen);
@@ -1081,7 +1079,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 			nlogs += *log_num;
 			log += *log_num;
 		}
-		len = iov_length(vq->iov + seg, in);
+		len = bufs[bufcount].in_len;
 		datalen -= len;
 		++bufcount;
 		seg += in;
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 10/11] vhost/vsock: switch to the buf API
From: Eugenio Pérez @ 2020-06-19 18:23 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

A straight-forward conversion.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/vsock.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index a483cec31d5c..61c6d3dd2ae3 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -103,7 +103,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		unsigned out, in;
 		size_t nbytes;
 		size_t iov_len, payload_len;
-		int head;
+		struct vhost_buf buf;
+		int ret;
 
 		spin_lock_bh(&vsock->send_pkt_list_lock);
 		if (list_empty(&vsock->send_pkt_list)) {
@@ -117,16 +118,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		list_del_init(&pkt->list);
 		spin_unlock_bh(&vsock->send_pkt_list_lock);
 
-		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-					 &out, &in, NULL, NULL);
-		if (head < 0) {
+		ret = vhost_get_avail_buf(vq, &buf,
+					  vq->iov, ARRAY_SIZE(vq->iov),
+					  &out, &in, NULL, NULL);
+		if (ret < 0) {
 			spin_lock_bh(&vsock->send_pkt_list_lock);
 			list_add(&pkt->list, &vsock->send_pkt_list);
 			spin_unlock_bh(&vsock->send_pkt_list_lock);
 			break;
 		}
 
-		if (head == vq->num) {
+		if (!ret) {
 			spin_lock_bh(&vsock->send_pkt_list_lock);
 			list_add(&pkt->list, &vsock->send_pkt_list);
 			spin_unlock_bh(&vsock->send_pkt_list_lock);
@@ -186,7 +188,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		 */
 		virtio_transport_deliver_tap_pkt(pkt);
 
-		vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
+		buf.in_len = sizeof(pkt->hdr) + payload_len;
+		vhost_put_used_buf(vq, &buf);
 		added = true;
 
 		pkt->off += payload_len;
@@ -440,7 +443,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 	struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
 						 dev);
 	struct virtio_vsock_pkt *pkt;
-	int head, pkts = 0, total_len = 0;
+	int ret, pkts = 0, total_len = 0;
+	struct vhost_buf buf;
 	unsigned int out, in;
 	bool added = false;
 
@@ -461,12 +465,13 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			goto no_more_replies;
 		}
 
-		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-					 &out, &in, NULL, NULL);
-		if (head < 0)
+		ret = vhost_get_avail_buf(vq, &buf,
+					  vq->iov, ARRAY_SIZE(vq->iov),
+					  &out, &in, NULL, NULL);
+		if (ret < 0)
 			break;
 
-		if (head == vq->num) {
+		if (!ret) {
 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
 				vhost_disable_notify(&vsock->dev, vq);
 				continue;
@@ -494,7 +499,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			virtio_transport_free_pkt(pkt);
 
 		len += sizeof(pkt->hdr);
-		vhost_add_used(vq, head, len);
+		buf.in_len = len;
+		vhost_put_used_buf(vq, &buf);
 		total_len += len;
 		added = true;
 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 09/11] vhost/scsi: switch to buf APIs
From: Eugenio Pérez @ 2020-06-19 18:23 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

Switch to buf APIs. Doing this exposes a spec violation in vhost scsi:
all used bufs are marked with length 0.
Fix that is left for another day.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/scsi.c | 73 ++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 29 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 6fb4d7ecfa19..f09ca5244ab9 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -71,8 +71,8 @@ struct vhost_scsi_inflight {
 };
 
 struct vhost_scsi_cmd {
-	/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
-	int tvc_vq_desc;
+	/* Descriptor from vhost_get_avail_buf() for virt_queue segment */
+	struct vhost_buf tvc_vq_desc;
 	/* virtio-scsi initiator task attribute */
 	int tvc_task_attr;
 	/* virtio-scsi response incoming iovecs */
@@ -213,7 +213,7 @@ struct vhost_scsi {
  * Context for processing request and control queue operations.
  */
 struct vhost_scsi_ctx {
-	int head;
+	struct vhost_buf buf;
 	unsigned int out, in;
 	size_t req_size, rsp_size;
 	size_t out_size, in_size;
@@ -443,6 +443,20 @@ static int vhost_scsi_check_stop_free(struct se_cmd *se_cmd)
 	return target_put_sess_cmd(se_cmd);
 }
 
+/* Signal to guest that request finished with no input buffer. */
+/* TODO calling this when writing into buffer and most likely a bug */
+static void vhost_scsi_signal_noinput(struct vhost_dev *vdev,
+				      struct vhost_virtqueue *vq,
+				      struct vhost_buf *bufp)
+{
+	struct vhost_buf buf = *bufp;
+
+	buf.in_len = 0;
+	vhost_put_used_buf(vq, &buf);
+	vhost_signal(vdev, vq);
+}
+
+
 static void
 vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 {
@@ -450,7 +464,8 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 	struct virtio_scsi_event *event = &evt->event;
 	struct virtio_scsi_event __user *eventp;
 	unsigned out, in;
-	int head, ret;
+	struct vhost_buf buf;
+	int ret;
 
 	if (!vhost_vq_get_backend(vq)) {
 		vs->vs_events_missed = true;
@@ -459,14 +474,14 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 
 again:
 	vhost_disable_notify(&vs->dev, vq);
-	head = vhost_get_vq_desc(vq, vq->iov,
-			ARRAY_SIZE(vq->iov), &out, &in,
-			NULL, NULL);
-	if (head < 0) {
+	ret = vhost_get_avail_buf(vq, &buf,
+				  vq->iov, ARRAY_SIZE(vq->iov), &out, &in,
+				  NULL, NULL);
+	if (ret < 0) {
 		vs->vs_events_missed = true;
 		return;
 	}
-	if (head == vq->num) {
+	if (!ret) {
 		if (vhost_enable_notify(&vs->dev, vq))
 			goto again;
 		vs->vs_events_missed = true;
@@ -488,7 +503,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 	eventp = vq->iov[out].iov_base;
 	ret = __copy_to_user(eventp, event, sizeof(*event));
 	if (!ret)
-		vhost_add_used_and_signal(&vs->dev, vq, head, 0);
+		vhost_scsi_signal_noinput(&vs->dev, vq, &buf);
 	else
 		vq_err(vq, "Faulted on vhost_scsi_send_event\n");
 }
@@ -549,7 +564,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 		ret = copy_to_iter(&v_rsp, sizeof(v_rsp), &iov_iter);
 		if (likely(ret == sizeof(v_rsp))) {
 			struct vhost_scsi_virtqueue *q;
-			vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0);
+			vhost_put_used_buf(cmd->tvc_vq, &cmd->tvc_vq_desc);
 			q = container_of(cmd->tvc_vq, struct vhost_scsi_virtqueue, vq);
 			vq = q - vs->vqs;
 			__set_bit(vq, signal);
@@ -793,7 +808,7 @@ static void vhost_scsi_submission_work(struct work_struct *work)
 static void
 vhost_scsi_send_bad_target(struct vhost_scsi *vs,
 			   struct vhost_virtqueue *vq,
-			   int head, unsigned out)
+			   struct vhost_buf *buf, unsigned out)
 {
 	struct virtio_scsi_cmd_resp __user *resp;
 	struct virtio_scsi_cmd_resp rsp;
@@ -804,7 +819,7 @@ vhost_scsi_send_bad_target(struct vhost_scsi *vs,
 	resp = vq->iov[out].iov_base;
 	ret = __copy_to_user(resp, &rsp, sizeof(rsp));
 	if (!ret)
-		vhost_add_used_and_signal(&vs->dev, vq, head, 0);
+		vhost_scsi_signal_noinput(&vs->dev, vq, buf);
 	else
 		pr_err("Faulted on virtio_scsi_cmd_resp\n");
 }
@@ -813,21 +828,21 @@ static int
 vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
 		    struct vhost_scsi_ctx *vc)
 {
-	int ret = -ENXIO;
+	int r, ret = -ENXIO;
 
-	vc->head = vhost_get_vq_desc(vq, vq->iov,
-				     ARRAY_SIZE(vq->iov), &vc->out, &vc->in,
-				     NULL, NULL);
+	r = vhost_get_avail_buf(vq, &vc->buf,
+				vq->iov, ARRAY_SIZE(vq->iov), &vc->out, &vc->in,
+				NULL, NULL);
 
-	pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n",
-		 vc->head, vc->out, vc->in);
+	pr_debug("vhost_get_avail_buf: buf: %d, out: %u in: %u\n",
+		 vc->buf.id, vc->out, vc->in);
 
 	/* On error, stop handling until the next kick. */
-	if (unlikely(vc->head < 0))
+	if (unlikely(r < 0))
 		goto done;
 
 	/* Nothing new?  Wait for eventfd to tell us they refilled. */
-	if (vc->head == vq->num) {
+	if (!r) {
 		if (unlikely(vhost_enable_notify(&vs->dev, vq))) {
 			vhost_disable_notify(&vs->dev, vq);
 			ret = -EAGAIN;
@@ -1093,11 +1108,11 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 			}
 		}
 		/*
-		 * Save the descriptor from vhost_get_vq_desc() to be used to
+		 * Save the descriptor from vhost_get_avail_buf() to be used to
 		 * complete the virtio-scsi request in TCM callback context via
 		 * vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
 		 */
-		cmd->tvc_vq_desc = vc.head;
+		cmd->tvc_vq_desc = vc.buf;
 		/*
 		 * Dispatch cmd descriptor for cmwq execution in process
 		 * context provided by vhost_scsi_workqueue.  This also ensures
@@ -1117,7 +1132,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 		if (ret == -ENXIO)
 			break;
 		else if (ret == -EIO)
-			vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out);
+			vhost_scsi_send_bad_target(vs, vq, &vc.buf, vc.out);
 	} while (likely(!vhost_exceeds_weight(vq, ++c, 0)));
 out:
 	mutex_unlock(&vq->mutex);
@@ -1139,9 +1154,9 @@ vhost_scsi_send_tmf_reject(struct vhost_scsi *vs,
 	iov_iter_init(&iov_iter, READ, &vq->iov[vc->out], vc->in, sizeof(rsp));
 
 	ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter);
-	if (likely(ret == sizeof(rsp)))
-		vhost_add_used_and_signal(&vs->dev, vq, vc->head, 0);
-	else
+	if (likely(ret == sizeof(rsp))) {
+		vhost_scsi_signal_noinput(&vs->dev, vq, &vc->buf);
+	} else
 		pr_err("Faulted on virtio_scsi_ctrl_tmf_resp\n");
 }
 
@@ -1162,7 +1177,7 @@ vhost_scsi_send_an_resp(struct vhost_scsi *vs,
 
 	ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter);
 	if (likely(ret == sizeof(rsp)))
-		vhost_add_used_and_signal(&vs->dev, vq, vc->head, 0);
+		vhost_scsi_signal_noinput(&vs->dev, vq, &vc->buf);
 	else
 		pr_err("Faulted on virtio_scsi_ctrl_an_resp\n");
 }
@@ -1269,7 +1284,7 @@ vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 		if (ret == -ENXIO)
 			break;
 		else if (ret == -EIO)
-			vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out);
+			vhost_scsi_send_bad_target(vs, vq, &vc.buf, vc.out);
 	} while (likely(!vhost_exceeds_weight(vq, ++c, 0)));
 out:
 	mutex_unlock(&vq->mutex);
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 11/11] vhost: drop head based APIs
From: Eugenio Pérez @ 2020-06-19 18:23 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

Everyone's using buf APIs, no need for head based ones anymore.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/vhost.c | 58 +++----------------------------------------
 drivers/vhost/vhost.h | 12 ---------
 2 files changed, 4 insertions(+), 66 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e227e0667790..736fa1a3cee5 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2423,39 +2423,11 @@ EXPORT_SYMBOL_GPL(vhost_get_avail_buf);
 /* Reverse the effect of vhost_get_avail_buf. Useful for error handling. */
 void vhost_discard_avail_bufs(struct vhost_virtqueue *vq,
 			      struct vhost_buf *buf, unsigned count)
-{
-	vhost_discard_vq_desc(vq, count);
-}
-EXPORT_SYMBOL_GPL(vhost_discard_avail_bufs);
-
-/* This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
-		      struct iovec iov[], unsigned int iov_size,
-		      unsigned int *out_num, unsigned int *in_num,
-		      struct vhost_log *log, unsigned int *log_num)
-{
-	struct vhost_buf buf;
-	int ret = vhost_get_avail_buf(vq, &buf,
-				      iov, iov_size, out_num, in_num,
-				      log, log_num);
-
-	if (likely(ret > 0))
-		return buf.id;
-	if (likely(!ret))
-		return vq->num;
-	return ret;
-}
-EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
-
-/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
-void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
 {
 	unfetch_descs(vq);
-	vq->last_avail_idx -= n;
+	vq->last_avail_idx -= count;
 }
-EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
+EXPORT_SYMBOL_GPL(vhost_discard_avail_bufs);
 
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 			    struct vring_used_elem *heads,
@@ -2491,7 +2463,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
-int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
+static int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 		     unsigned count)
 {
 	int start, n, r;
@@ -2524,11 +2496,10 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 	}
 	return r;
 }
-EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+static int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
 {
 	struct vring_used_elem heads = {
 		cpu_to_vhost32(vq, head),
@@ -2537,7 +2508,6 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
 
 	return vhost_add_used_n(vq, &heads, 1);
 }
-EXPORT_SYMBOL_GPL(vhost_add_used);
 
 int vhost_put_used_buf(struct vhost_virtqueue *vq, struct vhost_buf *buf)
 {
@@ -2605,26 +2575,6 @@ void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(vhost_signal);
 
-/* And here's the combo meal deal.  Supersize me! */
-void vhost_add_used_and_signal(struct vhost_dev *dev,
-			       struct vhost_virtqueue *vq,
-			       unsigned int head, int len)
-{
-	vhost_add_used(vq, head, len);
-	vhost_signal(dev, vq);
-}
-EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
-
-/* multi-buffer version of vhost_add_used_and_signal */
-void vhost_add_used_and_signal_n(struct vhost_dev *dev,
-				 struct vhost_virtqueue *vq,
-				 struct vring_used_elem *heads, unsigned count)
-{
-	vhost_add_used_n(vq, heads, count);
-	vhost_signal(dev, vq);
-}
-EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
-
 /* return true if we're sure that avaiable ring is empty */
 bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 28eea0155efb..264a2a2fae97 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -197,11 +197,6 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 bool vhost_vq_access_ok(struct vhost_virtqueue *vq);
 bool vhost_log_access_ok(struct vhost_dev *);
 
-int vhost_get_vq_desc(struct vhost_virtqueue *,
-		      struct iovec iov[], unsigned int iov_count,
-		      unsigned int *out_num, unsigned int *in_num,
-		      struct vhost_log *log, unsigned int *log_num);
-void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 int vhost_get_avail_buf(struct vhost_virtqueue *, struct vhost_buf *buf,
 			struct iovec iov[], unsigned int iov_count,
 			unsigned int *out_num, unsigned int *in_num,
@@ -209,13 +204,6 @@ int vhost_get_avail_buf(struct vhost_virtqueue *, struct vhost_buf *buf,
 void vhost_discard_avail_bufs(struct vhost_virtqueue *,
 			      struct vhost_buf *, unsigned count);
 int vhost_vq_init_access(struct vhost_virtqueue *);
-int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
-int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
-		     unsigned count);
-void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
-			       unsigned int id, int len);
-void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
-			       struct vring_used_elem *heads, unsigned count);
 int vhost_put_used_buf(struct vhost_virtqueue *, struct vhost_buf *buf);
 int vhost_put_used_n_bufs(struct vhost_virtqueue *,
 			  struct vhost_buf *bufs, unsigned count);
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 08/11] vhost/test: convert to the buf API
From: Eugenio Pérez @ 2020-06-19 18:22 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/test.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 650e69261557..39e68f797a93 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -44,9 +44,10 @@ static void handle_vq(struct vhost_test *n)
 {
 	struct vhost_virtqueue *vq = &n->vqs[VHOST_TEST_VQ];
 	unsigned out, in;
-	int head;
+	int ret;
 	size_t len, total_len = 0;
 	void *private;
+	struct vhost_buf buf;
 
 	mutex_lock(&vq->mutex);
 	private = vhost_vq_get_backend(vq);
@@ -58,15 +59,15 @@ static void handle_vq(struct vhost_test *n)
 	vhost_disable_notify(&n->dev, vq);
 
 	for (;;) {
-		head = vhost_get_vq_desc(vq, vq->iov,
-					 ARRAY_SIZE(vq->iov),
-					 &out, &in,
-					 NULL, NULL);
+		ret = vhost_get_avail_buf(vq, &buf, vq->iov,
+					  ARRAY_SIZE(vq->iov),
+					  &out, &in,
+					  NULL, NULL);
 		/* On error, stop handling until the next kick. */
-		if (unlikely(head < 0))
+		if (unlikely(ret < 0))
 			break;
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
-		if (head == vq->num) {
+		if (!ret) {
 			if (unlikely(vhost_enable_notify(&n->dev, vq))) {
 				vhost_disable_notify(&n->dev, vq);
 				continue;
@@ -78,13 +79,14 @@ static void handle_vq(struct vhost_test *n)
 			       "out %d, int %d\n", out, in);
 			break;
 		}
-		len = iov_length(vq->iov, out);
+		len = buf.out_len;
 		/* Sanity check */
 		if (!len) {
 			vq_err(vq, "Unexpected 0 len for TX\n");
 			break;
 		}
-		vhost_add_used_and_signal(&n->dev, vq, head, 0);
+		vhost_put_used_buf(vq, &buf);
+		vhost_signal(&n->dev, vq);
 		total_len += len;
 		if (unlikely(vhost_exceeds_weight(vq, 0, total_len)))
 			break;
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 06/11] vhost/net: convert to new API: heads->bufs
From: Eugenio Pérez @ 2020-06-19 18:22 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

Convert vhost net to use the new format-agnostic API.
In particular, don't poke at vq internals such as the
heads array.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/net.c | 154 +++++++++++++++++++++++---------------------
 1 file changed, 82 insertions(+), 72 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 81900ce673b4..871214d8c64d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -59,13 +59,13 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
  * status internally; used for zerocopy tx only.
  */
 /* Lower device DMA failed */
-#define VHOST_DMA_FAILED_LEN	((__force __virtio32)3)
+#define VHOST_DMA_FAILED_LEN	(3)
 /* Lower device DMA done */
-#define VHOST_DMA_DONE_LEN	((__force __virtio32)2)
+#define VHOST_DMA_DONE_LEN	(2)
 /* Lower device DMA in progress */
-#define VHOST_DMA_IN_PROGRESS	((__force __virtio32)1)
+#define VHOST_DMA_IN_PROGRESS	(1)
 /* Buffer unused */
-#define VHOST_DMA_CLEAR_LEN	((__force __virtio32)0)
+#define VHOST_DMA_CLEAR_LEN	(0)
 
 #define VHOST_DMA_IS_DONE(len) ((__force u32)(len) >= (__force u32)VHOST_DMA_DONE_LEN)
 
@@ -112,9 +112,12 @@ struct vhost_net_virtqueue {
 	/* last used idx for outstanding DMA zerocopy buffers */
 	int upend_idx;
 	/* For TX, first used idx for DMA done zerocopy buffers
-	 * For RX, number of batched heads
+	 * For RX, number of batched bufs
 	 */
 	int done_idx;
+	/* Outstanding user bufs. UIO_MAXIOV in length. */
+	/* TODO: we can make this smaller for sure. */
+	struct vhost_buf *bufs;
 	/* Number of XDP frames batched */
 	int batched_xdp;
 	/* an array of userspace buffers info */
@@ -271,6 +274,8 @@ static void vhost_net_clear_ubuf_info(struct vhost_net *n)
 	int i;
 
 	for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
+		kfree(n->vqs[i].bufs);
+		n->vqs[i].bufs = NULL;
 		kfree(n->vqs[i].ubuf_info);
 		n->vqs[i].ubuf_info = NULL;
 	}
@@ -282,6 +287,12 @@ static int vhost_net_set_ubuf_info(struct vhost_net *n)
 	int i;
 
 	for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
+		n->vqs[i].bufs = kmalloc_array(UIO_MAXIOV,
+					       sizeof(*n->vqs[i].bufs),
+					       GFP_KERNEL);
+		if (!n->vqs[i].bufs)
+			goto err;
+
 		zcopy = vhost_net_zcopy_mask & (0x1 << i);
 		if (!zcopy)
 			continue;
@@ -364,18 +375,18 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 	int j = 0;
 
 	for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-		if (vq->heads[i].len == VHOST_DMA_FAILED_LEN)
+		if (nvq->bufs[i].in_len == VHOST_DMA_FAILED_LEN)
 			vhost_net_tx_err(net);
-		if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
-			vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
+		if (VHOST_DMA_IS_DONE(nvq->bufs[i].in_len)) {
+			nvq->bufs[i].in_len = VHOST_DMA_CLEAR_LEN;
 			++j;
 		} else
 			break;
 	}
 	while (j) {
 		add = min(UIO_MAXIOV - nvq->done_idx, j);
-		vhost_add_used_and_signal_n(vq->dev, vq,
-					    &vq->heads[nvq->done_idx], add);
+		vhost_put_used_n_bufs(vq, &nvq->bufs[nvq->done_idx], add);
+		vhost_signal(vq->dev, vq);
 		nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
 		j -= add;
 	}
@@ -390,7 +401,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	rcu_read_lock_bh();
 
 	/* set len to mark this desc buffers done DMA */
-	nvq->vq.heads[ubuf->desc].len = success ?
+	nvq->bufs[ubuf->desc].in_len = success ?
 		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
 	cnt = vhost_net_ubuf_put(ubufs);
 
@@ -452,7 +463,8 @@ static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
 	if (!nvq->done_idx)
 		return;
 
-	vhost_add_used_and_signal_n(dev, vq, vq->heads, nvq->done_idx);
+	vhost_put_used_n_bufs(vq, nvq->bufs, nvq->done_idx);
+	vhost_signal(dev, vq);
 	nvq->done_idx = 0;
 }
 
@@ -558,6 +570,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 				    struct vhost_net_virtqueue *tnvq,
+				    struct vhost_buf *buf,
 				    unsigned int *out_num, unsigned int *in_num,
 				    struct msghdr *msghdr, bool *busyloop_intr)
 {
@@ -565,10 +578,10 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 	struct vhost_virtqueue *rvq = &rnvq->vq;
 	struct vhost_virtqueue *tvq = &tnvq->vq;
 
-	int r = vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
-				  out_num, in_num, NULL, NULL);
+	int r = vhost_get_avail_buf(tvq, buf, tvq->iov, ARRAY_SIZE(tvq->iov),
+				    out_num, in_num, NULL, NULL);
 
-	if (r == tvq->num && tvq->busyloop_timeout) {
+	if (!r && tvq->busyloop_timeout) {
 		/* Flush batched packets first */
 		if (!vhost_sock_zcopy(vhost_vq_get_backend(tvq)))
 			vhost_tx_batch(net, tnvq,
@@ -577,8 +590,8 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 
 		vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, false);
 
-		r = vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
-				      out_num, in_num, NULL, NULL);
+		r = vhost_get_avail_buf(tvq, buf, tvq->iov, ARRAY_SIZE(tvq->iov),
+					out_num, in_num, NULL, NULL);
 	}
 
 	return r;
@@ -607,6 +620,7 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter,
 
 static int get_tx_bufs(struct vhost_net *net,
 		       struct vhost_net_virtqueue *nvq,
+		       struct vhost_buf *buf,
 		       struct msghdr *msg,
 		       unsigned int *out, unsigned int *in,
 		       size_t *len, bool *busyloop_intr)
@@ -614,9 +628,9 @@ static int get_tx_bufs(struct vhost_net *net,
 	struct vhost_virtqueue *vq = &nvq->vq;
 	int ret;
 
-	ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, msg, busyloop_intr);
+	ret = vhost_net_tx_get_vq_desc(net, nvq, buf, out, in, msg, busyloop_intr);
 
-	if (ret < 0 || ret == vq->num)
+	if (ret <= 0)
 		return ret;
 
 	if (*in) {
@@ -762,7 +776,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
 	unsigned out, in;
-	int head;
+	int ret;
 	struct msghdr msg = {
 		.msg_name = NULL,
 		.msg_namelen = 0,
@@ -774,6 +788,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 	int err;
 	int sent_pkts = 0;
 	bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
+	struct vhost_buf buf;
 
 	do {
 		bool busyloop_intr = false;
@@ -781,13 +796,13 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 		if (nvq->done_idx == VHOST_NET_BATCH)
 			vhost_tx_batch(net, nvq, sock, &msg);
 
-		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
-				   &busyloop_intr);
+		ret = get_tx_bufs(net, nvq, &buf, &msg, &out, &in, &len,
+				  &busyloop_intr);
 		/* On error, stop handling until the next kick. */
-		if (unlikely(head < 0))
+		if (unlikely(ret < 0))
 			break;
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
-		if (head == vq->num) {
+		if (!ret) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
 			} else if (unlikely(vhost_enable_notify(&net->dev,
@@ -809,7 +824,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 				goto done;
 			} else if (unlikely(err != -ENOSPC)) {
 				vhost_tx_batch(net, nvq, sock, &msg);
-				vhost_discard_vq_desc(vq, 1);
+				vhost_discard_avail_bufs(vq, &buf, 1);
 				vhost_net_enable_vq(net, vq);
 				break;
 			}
@@ -830,7 +845,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
 		if (unlikely(err < 0)) {
-			vhost_discard_vq_desc(vq, 1);
+			vhost_discard_avail_bufs(vq, &buf, 1);
 			vhost_net_enable_vq(net, vq);
 			break;
 		}
@@ -838,8 +853,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 			pr_debug("Truncated TX packet: len %d != %zd\n",
 				 err, len);
 done:
-		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
-		vq->heads[nvq->done_idx].len = 0;
+		nvq->bufs[nvq->done_idx] = buf;
 		++nvq->done_idx;
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
 
@@ -851,7 +865,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
 	unsigned out, in;
-	int head;
+	int ret;
 	struct msghdr msg = {
 		.msg_name = NULL,
 		.msg_namelen = 0,
@@ -865,6 +879,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
 	bool zcopy_used;
 	int sent_pkts = 0;
+	struct vhost_buf buf;
 
 	do {
 		bool busyloop_intr;
@@ -873,13 +888,13 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		vhost_zerocopy_signal_used(net, vq);
 
 		busyloop_intr = false;
-		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
-				   &busyloop_intr);
+		ret = get_tx_bufs(net, nvq, &buf, &msg, &out, &in, &len,
+				  &busyloop_intr);
 		/* On error, stop handling until the next kick. */
-		if (unlikely(head < 0))
+		if (unlikely(ret < 0))
 			break;
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
-		if (head == vq->num) {
+		if (!ret) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
 			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
@@ -898,8 +913,8 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			struct ubuf_info *ubuf;
 			ubuf = nvq->ubuf_info + nvq->upend_idx;
 
-			vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
-			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
+			nvq->bufs[nvq->upend_idx] = buf;
+			nvq->bufs[nvq->upend_idx].in_len = VHOST_DMA_IN_PROGRESS;
 			ubuf->callback = vhost_zerocopy_callback;
 			ubuf->ctx = nvq->ubufs;
 			ubuf->desc = nvq->upend_idx;
@@ -931,17 +946,19 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
 					% UIO_MAXIOV;
 			}
-			vhost_discard_vq_desc(vq, 1);
+			vhost_discard_avail_bufs(vq, &buf, 1);
 			vhost_net_enable_vq(net, vq);
 			break;
 		}
 		if (err != len)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
-		if (!zcopy_used)
-			vhost_add_used_and_signal(&net->dev, vq, head, 0);
-		else
+		if (!zcopy_used) {
+			vhost_put_used_buf(vq, &buf);
+			vhost_signal(&net->dev, vq);
+		} else {
 			vhost_zerocopy_signal_used(net, vq);
+		}
 		vhost_net_tx_packet(net);
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
 }
@@ -1005,7 +1022,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
 	int len = peek_head_len(rnvq, sk);
 
 	if (!len && rvq->busyloop_timeout) {
-		/* Flush batched heads first */
+		/* Flush batched bufs first */
 		vhost_net_signal_used(rnvq);
 		/* Both tx vq and rx socket were polled here */
 		vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, true);
@@ -1023,11 +1040,11 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
  * @iovcount	- returned count of io vectors we fill
  * @log		- vhost log
  * @log_num	- log offset
- * @quota       - headcount quota, 1 for big buffer
- *	returns number of buffer heads allocated, negative on error
+ * @quota       - bufcount quota, 1 for big buffer
+ *	returns number of buffers allocated, negative on error
  */
 static int get_rx_bufs(struct vhost_virtqueue *vq,
-		       struct vring_used_elem *heads,
+		       struct vhost_buf *bufs,
 		       int datalen,
 		       unsigned *iovcount,
 		       struct vhost_log *log,
@@ -1036,30 +1053,24 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 {
 	unsigned int out, in;
 	int seg = 0;
-	int headcount = 0;
-	unsigned d;
+	int bufcount = 0;
 	int r, nlogs = 0;
 	/* len is always initialized before use since we are always called with
 	 * datalen > 0.
 	 */
 	u32 uninitialized_var(len);
 
-	while (datalen > 0 && headcount < quota) {
+	while (datalen > 0 && bufcount < quota) {
 		if (unlikely(seg >= UIO_MAXIOV)) {
 			r = -ENOBUFS;
 			goto err;
 		}
-		r = vhost_get_vq_desc(vq, vq->iov + seg,
-				      ARRAY_SIZE(vq->iov) - seg, &out,
-				      &in, log, log_num);
-		if (unlikely(r < 0))
+		r = vhost_get_avail_buf(vq, bufs + bufcount, vq->iov + seg,
+					ARRAY_SIZE(vq->iov) - seg, &out,
+					&in, log, log_num);
+		if (unlikely(r <= 0))
 			goto err;
 
-		d = r;
-		if (d == vq->num) {
-			r = 0;
-			goto err;
-		}
 		if (unlikely(out || in <= 0)) {
 			vq_err(vq, "unexpected descriptor format for RX: "
 				"out %d, in %d\n", out, in);
@@ -1070,14 +1081,12 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 			nlogs += *log_num;
 			log += *log_num;
 		}
-		heads[headcount].id = cpu_to_vhost32(vq, d);
 		len = iov_length(vq->iov + seg, in);
-		heads[headcount].len = cpu_to_vhost32(vq, len);
 		datalen -= len;
-		++headcount;
+		++bufcount;
 		seg += in;
 	}
-	heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
+	bufs[bufcount - 1].in_len = len + datalen;
 	*iovcount = seg;
 	if (unlikely(log))
 		*log_num = nlogs;
@@ -1087,9 +1096,9 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 		r = UIO_MAXIOV + 1;
 		goto err;
 	}
-	return headcount;
+	return bufcount;
 err:
-	vhost_discard_vq_desc(vq, headcount);
+	vhost_discard_avail_bufs(vq, bufs, bufcount);
 	return r;
 }
 
@@ -1114,7 +1123,7 @@ static void handle_rx(struct vhost_net *net)
 	};
 	size_t total_len = 0;
 	int err, mergeable;
-	s16 headcount;
+	int bufcount;
 	size_t vhost_hlen, sock_hlen;
 	size_t vhost_len, sock_len;
 	bool busyloop_intr = false;
@@ -1148,14 +1157,14 @@ static void handle_rx(struct vhost_net *net)
 			break;
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
-		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
-					vhost_len, &in, vq_log, &log,
-					likely(mergeable) ? UIO_MAXIOV : 1);
+		bufcount = get_rx_bufs(vq, nvq->bufs + nvq->done_idx,
+				       vhost_len, &in, vq_log, &log,
+				       likely(mergeable) ? UIO_MAXIOV : 1);
 		/* On error, stop handling until the next kick. */
-		if (unlikely(headcount < 0))
+		if (unlikely(bufcount < 0))
 			goto out;
 		/* OK, now we need to know about added descriptors. */
-		if (!headcount) {
+		if (!bufcount) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
 			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
@@ -1172,7 +1181,7 @@ static void handle_rx(struct vhost_net *net)
 		if (nvq->rx_ring)
 			msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
 		/* On overrun, truncate and discard */
-		if (unlikely(headcount > UIO_MAXIOV)) {
+		if (unlikely(bufcount > UIO_MAXIOV)) {
 			iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
 			err = sock->ops->recvmsg(sock, &msg,
 						 1, MSG_DONTWAIT | MSG_TRUNC);
@@ -1196,7 +1205,7 @@ static void handle_rx(struct vhost_net *net)
 		if (unlikely(err != sock_len)) {
 			pr_debug("Discarded rx packet: "
 				 " len %d, expected %zd\n", err, sock_len);
-			vhost_discard_vq_desc(vq, headcount);
+			vhost_discard_avail_bufs(vq, nvq->bufs + nvq->done_idx, bufcount);
 			continue;
 		}
 		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
@@ -1215,15 +1224,15 @@ static void handle_rx(struct vhost_net *net)
 		}
 		/* TODO: Should check and handle checksum. */
 
-		num_buffers = cpu_to_vhost16(vq, headcount);
+		num_buffers = cpu_to_vhost16(vq, bufcount);
 		if (likely(mergeable) &&
 		    copy_to_iter(&num_buffers, sizeof num_buffers,
 				 &fixup) != sizeof num_buffers) {
 			vq_err(vq, "Failed num_buffers write");
-			vhost_discard_vq_desc(vq, headcount);
+			vhost_discard_avail_bufs(vq, nvq->bufs + nvq->done_idx, bufcount);
 			goto out;
 		}
-		nvq->done_idx += headcount;
+		nvq->done_idx += bufcount;
 		if (nvq->done_idx > VHOST_NET_BATCH)
 			vhost_net_signal_used(nvq);
 		if (unlikely(vq_log))
@@ -1315,6 +1324,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	n->vqs[VHOST_NET_VQ_TX].vq.handle_kick = handle_tx_kick;
 	n->vqs[VHOST_NET_VQ_RX].vq.handle_kick = handle_rx_kick;
 	for (i = 0; i < VHOST_NET_VQ_MAX; i++) {
+		n->vqs[i].bufs = NULL;
 		n->vqs[i].ubufs = NULL;
 		n->vqs[i].ubuf_info = NULL;
 		n->vqs[i].upend_idx = 0;
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 04/11] vhost: reorder functions
From: Eugenio Pérez @ 2020-06-19 18:22 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

Reorder functions in the file to not rely on forward
declarations, in preparation to making them static
down the road.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/vhost.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 13021d6986eb..ba8da8785ede 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2424,19 +2424,6 @@ void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
 }
 EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
-{
-	struct vring_used_elem heads = {
-		cpu_to_vhost32(vq, head),
-		cpu_to_vhost32(vq, len)
-	};
-
-	return vhost_add_used_n(vq, &heads, 1);
-}
-EXPORT_SYMBOL_GPL(vhost_add_used);
-
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 			    struct vring_used_elem *heads,
 			    unsigned count)
@@ -2506,6 +2493,19 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using eventfd. */
+int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+{
+	struct vring_used_elem heads = {
+		cpu_to_vhost32(vq, head),
+		cpu_to_vhost32(vq, len)
+	};
+
+	return vhost_add_used_n(vq, &heads, 1);
+}
+EXPORT_SYMBOL_GPL(vhost_add_used);
+
 static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
 	__u16 old, new;
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 02/11] vhost: use batched get_vq_desc version
From: Eugenio Pérez @ 2020-06-19 18:22 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

As testing shows no performance change, switch to that now.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Link: https://lore.kernel.org/r/20200401183118.8334-3-eperezma@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/test.c  |   2 +-
 drivers/vhost/vhost.c | 314 ++++++++----------------------------------
 drivers/vhost/vhost.h |   7 +-
 3 files changed, 61 insertions(+), 262 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index a09dedc79f68..650e69261557 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -119,7 +119,7 @@ static int vhost_test_open(struct inode *inode, struct file *f)
 	dev = &n->dev;
 	vqs[VHOST_TEST_VQ] = &n->vqs[VHOST_TEST_VQ];
 	n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-	vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
+	vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV + 64,
 		       VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL);
 
 	f->private_data = n;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2d784681b0fa..13021d6986eb 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -304,6 +304,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 {
 	vq->num = 1;
 	vq->ndescs = 0;
+	vq->first_desc = 0;
 	vq->desc = NULL;
 	vq->avail = NULL;
 	vq->used = NULL;
@@ -372,6 +373,11 @@ static int vhost_worker(void *data)
 	return 0;
 }
 
+static int vhost_vq_num_batch_descs(struct vhost_virtqueue *vq)
+{
+	return vq->max_descs - UIO_MAXIOV;
+}
+
 static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
 {
 	kfree(vq->descs);
@@ -394,6 +400,9 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
 	for (i = 0; i < dev->nvqs; ++i) {
 		vq = dev->vqs[i];
 		vq->max_descs = dev->iov_limit;
+		if (dev->use_worker && vhost_vq_num_batch_descs(vq) < 0) {
+			return -EINVAL;
+		}
 		vq->descs = kmalloc_array(vq->max_descs,
 					  sizeof(*vq->descs),
 					  GFP_KERNEL);
@@ -1610,6 +1619,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 		vq->last_avail_idx = s.num;
 		/* Forget the cached index value. */
 		vq->avail_idx = vq->last_avail_idx;
+		vq->ndescs = vq->first_desc = 0;
 		break;
 	case VHOST_GET_VRING_BASE:
 		s.index = idx;
@@ -2077,253 +2087,6 @@ static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc)
 	return next;
 }
 
-static int get_indirect(struct vhost_virtqueue *vq,
-			struct iovec iov[], unsigned int iov_size,
-			unsigned int *out_num, unsigned int *in_num,
-			struct vhost_log *log, unsigned int *log_num,
-			struct vring_desc *indirect)
-{
-	struct vring_desc desc;
-	unsigned int i = 0, count, found = 0;
-	u32 len = vhost32_to_cpu(vq, indirect->len);
-	struct iov_iter from;
-	int ret, access;
-
-	/* Sanity check */
-	if (unlikely(len % sizeof desc)) {
-		vq_err(vq, "Invalid length in indirect descriptor: "
-		       "len 0x%llx not multiple of 0x%zx\n",
-		       (unsigned long long)len,
-		       sizeof desc);
-		return -EINVAL;
-	}
-
-	ret = translate_desc(vq, vhost64_to_cpu(vq, indirect->addr), len, vq->indirect,
-			     UIO_MAXIOV, VHOST_ACCESS_RO);
-	if (unlikely(ret < 0)) {
-		if (ret != -EAGAIN)
-			vq_err(vq, "Translation failure %d in indirect.\n", ret);
-		return ret;
-	}
-	iov_iter_init(&from, READ, vq->indirect, ret, len);
-
-	/* We will use the result as an address to read from, so most
-	 * architectures only need a compiler barrier here. */
-	read_barrier_depends();
-
-	count = len / sizeof desc;
-	/* Buffers are chained via a 16 bit next field, so
-	 * we can have at most 2^16 of these. */
-	if (unlikely(count > USHRT_MAX + 1)) {
-		vq_err(vq, "Indirect buffer length too big: %d\n",
-		       indirect->len);
-		return -E2BIG;
-	}
-
-	do {
-		unsigned iov_count = *in_num + *out_num;
-		if (unlikely(++found > count)) {
-			vq_err(vq, "Loop detected: last one at %u "
-			       "indirect size %u\n",
-			       i, count);
-			return -EINVAL;
-		}
-		if (unlikely(!copy_from_iter_full(&desc, sizeof(desc), &from))) {
-			vq_err(vq, "Failed indirect descriptor: idx %d, %zx\n",
-			       i, (size_t)vhost64_to_cpu(vq, indirect->addr) + i * sizeof desc);
-			return -EINVAL;
-		}
-		if (unlikely(desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT))) {
-			vq_err(vq, "Nested indirect descriptor: idx %d, %zx\n",
-			       i, (size_t)vhost64_to_cpu(vq, indirect->addr) + i * sizeof desc);
-			return -EINVAL;
-		}
-
-		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE))
-			access = VHOST_ACCESS_WO;
-		else
-			access = VHOST_ACCESS_RO;
-
-		ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr),
-				     vhost32_to_cpu(vq, desc.len), iov + iov_count,
-				     iov_size - iov_count, access);
-		if (unlikely(ret < 0)) {
-			if (ret != -EAGAIN)
-				vq_err(vq, "Translation failure %d indirect idx %d\n",
-					ret, i);
-			return ret;
-		}
-		/* If this is an input descriptor, increment that count. */
-		if (access == VHOST_ACCESS_WO) {
-			*in_num += ret;
-			if (unlikely(log && ret)) {
-				log[*log_num].addr = vhost64_to_cpu(vq, desc.addr);
-				log[*log_num].len = vhost32_to_cpu(vq, desc.len);
-				++*log_num;
-			}
-		} else {
-			/* If it's an output descriptor, they're all supposed
-			 * to come before any input descriptors. */
-			if (unlikely(*in_num)) {
-				vq_err(vq, "Indirect descriptor "
-				       "has out after in: idx %d\n", i);
-				return -EINVAL;
-			}
-			*out_num += ret;
-		}
-	} while ((i = next_desc(vq, &desc)) != -1);
-	return 0;
-}
-
-/* This looks in the virtqueue and for the first available buffer, and converts
- * it to an iovec for convenient access.  Since descriptors consist of some
- * number of output then some number of input descriptors, it's actually two
- * iovecs, but we pack them into one and note how many of each there were.
- *
- * This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
-		      struct iovec iov[], unsigned int iov_size,
-		      unsigned int *out_num, unsigned int *in_num,
-		      struct vhost_log *log, unsigned int *log_num)
-{
-	struct vring_desc desc;
-	unsigned int i, head, found = 0;
-	u16 last_avail_idx;
-	__virtio16 avail_idx;
-	__virtio16 ring_head;
-	int ret, access;
-
-	/* Check it isn't doing very strange things with descriptor numbers. */
-	last_avail_idx = vq->last_avail_idx;
-
-	if (vq->avail_idx == vq->last_avail_idx) {
-		if (unlikely(vhost_get_avail_idx(vq, &avail_idx))) {
-			vq_err(vq, "Failed to access avail idx at %p\n",
-				&vq->avail->idx);
-			return -EFAULT;
-		}
-		vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
-
-		if (unlikely((u16)(vq->avail_idx - last_avail_idx) > vq->num)) {
-			vq_err(vq, "Guest moved used index from %u to %u",
-				last_avail_idx, vq->avail_idx);
-			return -EFAULT;
-		}
-
-		/* If there's nothing new since last we looked, return
-		 * invalid.
-		 */
-		if (vq->avail_idx == last_avail_idx)
-			return vq->num;
-
-		/* Only get avail ring entries after they have been
-		 * exposed by guest.
-		 */
-		smp_rmb();
-	}
-
-	/* Grab the next descriptor number they're advertising, and increment
-	 * the index we've seen. */
-	if (unlikely(vhost_get_avail_head(vq, &ring_head, last_avail_idx))) {
-		vq_err(vq, "Failed to read head: idx %d address %p\n",
-		       last_avail_idx,
-		       &vq->avail->ring[last_avail_idx % vq->num]);
-		return -EFAULT;
-	}
-
-	head = vhost16_to_cpu(vq, ring_head);
-
-	/* If their number is silly, that's an error. */
-	if (unlikely(head >= vq->num)) {
-		vq_err(vq, "Guest says index %u > %u is available",
-		       head, vq->num);
-		return -EINVAL;
-	}
-
-	/* When we start there are none of either input nor output. */
-	*out_num = *in_num = 0;
-	if (unlikely(log))
-		*log_num = 0;
-
-	i = head;
-	do {
-		unsigned iov_count = *in_num + *out_num;
-		if (unlikely(i >= vq->num)) {
-			vq_err(vq, "Desc index is %u > %u, head = %u",
-			       i, vq->num, head);
-			return -EINVAL;
-		}
-		if (unlikely(++found > vq->num)) {
-			vq_err(vq, "Loop detected: last one at %u "
-			       "vq size %u head %u\n",
-			       i, vq->num, head);
-			return -EINVAL;
-		}
-		ret = vhost_get_desc(vq, &desc, i);
-		if (unlikely(ret)) {
-			vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
-			       i, vq->desc + i);
-			return -EFAULT;
-		}
-		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT)) {
-			ret = get_indirect(vq, iov, iov_size,
-					   out_num, in_num,
-					   log, log_num, &desc);
-			if (unlikely(ret < 0)) {
-				if (ret != -EAGAIN)
-					vq_err(vq, "Failure detected "
-						"in indirect descriptor at idx %d\n", i);
-				return ret;
-			}
-			continue;
-		}
-
-		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE))
-			access = VHOST_ACCESS_WO;
-		else
-			access = VHOST_ACCESS_RO;
-		ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr),
-				     vhost32_to_cpu(vq, desc.len), iov + iov_count,
-				     iov_size - iov_count, access);
-		if (unlikely(ret < 0)) {
-			if (ret != -EAGAIN)
-				vq_err(vq, "Translation failure %d descriptor idx %d\n",
-					ret, i);
-			return ret;
-		}
-		if (access == VHOST_ACCESS_WO) {
-			/* If this is an input descriptor,
-			 * increment that count. */
-			*in_num += ret;
-			if (unlikely(log && ret)) {
-				log[*log_num].addr = vhost64_to_cpu(vq, desc.addr);
-				log[*log_num].len = vhost32_to_cpu(vq, desc.len);
-				++*log_num;
-			}
-		} else {
-			/* If it's an output descriptor, they're all supposed
-			 * to come before any input descriptors. */
-			if (unlikely(*in_num)) {
-				vq_err(vq, "Descriptor has out after in: "
-				       "idx %d\n", i);
-				return -EINVAL;
-			}
-			*out_num += ret;
-		}
-	} while ((i = next_desc(vq, &desc)) != -1);
-
-	/* On success, increment avail index. */
-	vq->last_avail_idx++;
-
-	/* Assume notifications from guest are disabled at this point,
-	 * if they aren't we would need to update avail_event index. */
-	BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
-	return head;
-}
-EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
-
 static struct vhost_desc *peek_split_desc(struct vhost_virtqueue *vq)
 {
 	BUG_ON(!vq->ndescs);
@@ -2427,7 +2190,7 @@ static int fetch_indirect_descs(struct vhost_virtqueue *vq,
 
 /* This function returns a value > 0 if a descriptor was found, or 0 if none were found.
  * A negative code is returned on error. */
-static int fetch_descs(struct vhost_virtqueue *vq)
+static int fetch_buf(struct vhost_virtqueue *vq)
 {
 	unsigned int i, head, found = 0;
 	struct vhost_desc *last;
@@ -2440,7 +2203,7 @@ static int fetch_descs(struct vhost_virtqueue *vq)
 	/* Check it isn't doing very strange things with descriptor numbers. */
 	last_avail_idx = vq->last_avail_idx;
 
-	if (vq->avail_idx == vq->last_avail_idx) {
+	if (unlikely(vq->avail_idx == vq->last_avail_idx)) {
 		if (unlikely(vhost_get_avail_idx(vq, &avail_idx))) {
 			vq_err(vq, "Failed to access avail idx at %p\n",
 				&vq->avail->idx);
@@ -2531,6 +2294,41 @@ static int fetch_descs(struct vhost_virtqueue *vq)
 	return 1;
 }
 
+/* This function returns a value > 0 if a descriptor was found, or 0 if none were found.
+ * A negative code is returned on error. */
+static int fetch_descs(struct vhost_virtqueue *vq)
+{
+	int ret;
+
+	if (unlikely(vq->first_desc >= vq->ndescs)) {
+		vq->first_desc = 0;
+		vq->ndescs = 0;
+	}
+
+	if (vq->ndescs)
+		return 1;
+
+	for (ret = 1;
+	     ret > 0 && vq->ndescs <= vhost_vq_num_batch_descs(vq);
+	     ret = fetch_buf(vq))
+		;
+
+	/* On success we expect some descs */
+	BUG_ON(ret > 0 && !vq->ndescs);
+	return vq->ndescs ? vq->ndescs : ret;
+}
+
+/* Reverse the effects of fetch_descs */
+static void unfetch_descs(struct vhost_virtqueue *vq)
+{
+	int i;
+
+	for (i = vq->first_desc; i < vq->ndescs; ++i)
+		if (!(vq->descs[i].flags & VRING_DESC_F_NEXT))
+			vq->last_avail_idx -= 1;
+	vq->ndescs = 0;
+}
+
 /* This looks in the virtqueue and for the first available buffer, and converts
  * it to an iovec for convenient access.  Since descriptors consist of some
  * number of output then some number of input descriptors, it's actually two
@@ -2539,7 +2337,7 @@ static int fetch_descs(struct vhost_virtqueue *vq)
  * This function returns the descriptor number found, or vq->num (which is
  * never a valid descriptor number) if none was found.  A negative code is
  * returned on error. */
-int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq,
+int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 		      struct iovec iov[], unsigned int iov_size,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num)
@@ -2548,7 +2346,7 @@ int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq,
 	int i;
 
 	if (ret <= 0)
-		goto err_fetch;
+		goto err;
 
 	/* Now convert to IOV */
 	/* When we start there are none of either input nor output. */
@@ -2556,7 +2354,7 @@ int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq,
 	if (unlikely(log))
 		*log_num = 0;
 
-	for (i = 0; i < vq->ndescs; ++i) {
+	for (i = vq->first_desc; i < vq->ndescs; ++i) {
 		unsigned iov_count = *in_num + *out_num;
 		struct vhost_desc *desc = &vq->descs[i];
 		int access;
@@ -2602,24 +2400,26 @@ int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq,
 		}
 
 		ret = desc->id;
+
+		if (!(desc->flags & VRING_DESC_F_NEXT))
+			break;
 	}
 
-	vq->ndescs = 0;
+	vq->first_desc = i + 1;
 
 	return ret;
 
 err:
-	vhost_discard_vq_desc(vq, 1);
-err_fetch:
-	vq->ndescs = 0;
+	unfetch_descs(vq);
 
 	return ret ? ret : vq->num;
 }
-EXPORT_SYMBOL_GPL(vhost_get_vq_desc_batch);
+EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
 /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
 void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
 {
+	unfetch_descs(vq);
 	vq->last_avail_idx -= n;
 }
 EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 87089d51490d..fed36af5c444 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -81,6 +81,7 @@ struct vhost_virtqueue {
 
 	struct vhost_desc *descs;
 	int ndescs;
+	int first_desc;
 	int max_descs;
 
 	struct file *kick;
@@ -189,10 +190,6 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 bool vhost_vq_access_ok(struct vhost_virtqueue *vq);
 bool vhost_log_access_ok(struct vhost_dev *);
 
-int vhost_get_vq_desc_batch(struct vhost_virtqueue *,
-		      struct iovec iov[], unsigned int iov_count,
-		      unsigned int *out_num, unsigned int *in_num,
-		      struct vhost_log *log, unsigned int *log_num);
 int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct iovec iov[], unsigned int iov_count,
 		      unsigned int *out_num, unsigned int *in_num,
@@ -261,6 +258,8 @@ static inline void vhost_vq_set_backend(struct vhost_virtqueue *vq,
 					void *private_data)
 {
 	vq->private_data = private_data;
+	vq->ndescs = 0;
+	vq->first_desc = 0;
 }
 
 /**
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 01/11] vhost: option to fetch descriptors through an independent struct
From: Eugenio Pérez @ 2020-06-19 18:22 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang
In-Reply-To: <20200619182302.850-1-eperezma@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>

The idea is to support multiple ring formats by converting
to a format-independent array of descriptors.

This costs extra cycles, but we gain in ability
to fetch a batch of descriptors in one go, which
is good for code cache locality.

When used, this causes a minor performance degradation,
it's been kept as simple as possible for ease of review.
A follow-up patch gets us back the performance by adding batching.

To simplify benchmarking, I kept the old code around so one can switch
back and forth between old and new code. This will go away in the final
submission.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Link: https://lore.kernel.org/r/20200401183118.8334-2-eperezma@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/vhost.c | 305 +++++++++++++++++++++++++++++++++++++++++-
 drivers/vhost/vhost.h |  16 +++
 2 files changed, 320 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 062595ee1f83..2d784681b0fa 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -303,6 +303,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 			   struct vhost_virtqueue *vq)
 {
 	vq->num = 1;
+	vq->ndescs = 0;
 	vq->desc = NULL;
 	vq->avail = NULL;
 	vq->used = NULL;
@@ -373,6 +374,9 @@ static int vhost_worker(void *data)
 
 static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
 {
+	kfree(vq->descs);
+	vq->descs = NULL;
+	vq->max_descs = 0;
 	kfree(vq->indirect);
 	vq->indirect = NULL;
 	kfree(vq->log);
@@ -389,6 +393,10 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
 
 	for (i = 0; i < dev->nvqs; ++i) {
 		vq = dev->vqs[i];
+		vq->max_descs = dev->iov_limit;
+		vq->descs = kmalloc_array(vq->max_descs,
+					  sizeof(*vq->descs),
+					  GFP_KERNEL);
 		vq->indirect = kmalloc_array(UIO_MAXIOV,
 					     sizeof(*vq->indirect),
 					     GFP_KERNEL);
@@ -396,7 +404,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
 					GFP_KERNEL);
 		vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
 					  GFP_KERNEL);
-		if (!vq->indirect || !vq->log || !vq->heads)
+		if (!vq->indirect || !vq->log || !vq->heads || !vq->descs)
 			goto err_nomem;
 	}
 	return 0;
@@ -488,6 +496,8 @@ void vhost_dev_init(struct vhost_dev *dev,
 
 	for (i = 0; i < dev->nvqs; ++i) {
 		vq = dev->vqs[i];
+		vq->descs = NULL;
+		vq->max_descs = 0;
 		vq->log = NULL;
 		vq->indirect = NULL;
 		vq->heads = NULL;
@@ -2314,6 +2324,299 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
+static struct vhost_desc *peek_split_desc(struct vhost_virtqueue *vq)
+{
+	BUG_ON(!vq->ndescs);
+	return &vq->descs[vq->ndescs - 1];
+}
+
+static void pop_split_desc(struct vhost_virtqueue *vq)
+{
+	BUG_ON(!vq->ndescs);
+	--vq->ndescs;
+}
+
+#define VHOST_DESC_FLAGS (VRING_DESC_F_INDIRECT | VRING_DESC_F_WRITE | \
+			  VRING_DESC_F_NEXT)
+static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc *desc, u16 id)
+{
+	struct vhost_desc *h;
+
+	if (unlikely(vq->ndescs >= vq->max_descs))
+		return -EINVAL;
+	h = &vq->descs[vq->ndescs++];
+	h->addr = vhost64_to_cpu(vq, desc->addr);
+	h->len = vhost32_to_cpu(vq, desc->len);
+	h->flags = vhost16_to_cpu(vq, desc->flags) & VHOST_DESC_FLAGS;
+	h->id = id;
+
+	return 0;
+}
+
+static int fetch_indirect_descs(struct vhost_virtqueue *vq,
+				struct vhost_desc *indirect,
+				u16 head)
+{
+	struct vring_desc desc;
+	unsigned int i = 0, count, found = 0;
+	u32 len = indirect->len;
+	struct iov_iter from;
+	int ret;
+
+	/* Sanity check */
+	if (unlikely(len % sizeof desc)) {
+		vq_err(vq, "Invalid length in indirect descriptor: "
+		       "len 0x%llx not multiple of 0x%zx\n",
+		       (unsigned long long)len,
+		       sizeof desc);
+		return -EINVAL;
+	}
+
+	ret = translate_desc(vq, indirect->addr, len, vq->indirect,
+			     UIO_MAXIOV, VHOST_ACCESS_RO);
+	if (unlikely(ret < 0)) {
+		if (ret != -EAGAIN)
+			vq_err(vq, "Translation failure %d in indirect.\n", ret);
+		return ret;
+	}
+	iov_iter_init(&from, READ, vq->indirect, ret, len);
+
+	/* We will use the result as an address to read from, so most
+	 * architectures only need a compiler barrier here. */
+	read_barrier_depends();
+
+	count = len / sizeof desc;
+	/* Buffers are chained via a 16 bit next field, so
+	 * we can have at most 2^16 of these. */
+	if (unlikely(count > USHRT_MAX + 1)) {
+		vq_err(vq, "Indirect buffer length too big: %d\n",
+		       indirect->len);
+		return -E2BIG;
+	}
+	if (unlikely(vq->ndescs + count > vq->max_descs)) {
+		vq_err(vq, "Too many indirect + direct descs: %d + %d\n",
+		       vq->ndescs, indirect->len);
+		return -E2BIG;
+	}
+
+	do {
+		if (unlikely(++found > count)) {
+			vq_err(vq, "Loop detected: last one at %u "
+			       "indirect size %u\n",
+			       i, count);
+			return -EINVAL;
+		}
+		if (unlikely(!copy_from_iter_full(&desc, sizeof(desc), &from))) {
+			vq_err(vq, "Failed indirect descriptor: idx %d, %zx\n",
+			       i, (size_t)indirect->addr + i * sizeof desc);
+			return -EINVAL;
+		}
+		if (unlikely(desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT))) {
+			vq_err(vq, "Nested indirect descriptor: idx %d, %zx\n",
+			       i, (size_t)indirect->addr + i * sizeof desc);
+			return -EINVAL;
+		}
+
+		/* Note: push_split_desc can't fail here:
+		 * we never fetch unless there's space. */
+		ret = push_split_desc(vq, &desc, head);
+		WARN_ON(ret);
+	} while ((i = next_desc(vq, &desc)) != -1);
+	return 0;
+}
+
+/* This function returns a value > 0 if a descriptor was found, or 0 if none were found.
+ * A negative code is returned on error. */
+static int fetch_descs(struct vhost_virtqueue *vq)
+{
+	unsigned int i, head, found = 0;
+	struct vhost_desc *last;
+	struct vring_desc desc;
+	__virtio16 avail_idx;
+	__virtio16 ring_head;
+	u16 last_avail_idx;
+	int ret;
+
+	/* Check it isn't doing very strange things with descriptor numbers. */
+	last_avail_idx = vq->last_avail_idx;
+
+	if (vq->avail_idx == vq->last_avail_idx) {
+		if (unlikely(vhost_get_avail_idx(vq, &avail_idx))) {
+			vq_err(vq, "Failed to access avail idx at %p\n",
+				&vq->avail->idx);
+			return -EFAULT;
+		}
+		vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+
+		if (unlikely((u16)(vq->avail_idx - last_avail_idx) > vq->num)) {
+			vq_err(vq, "Guest moved used index from %u to %u",
+				last_avail_idx, vq->avail_idx);
+			return -EFAULT;
+		}
+
+		/* If there's nothing new since last we looked, return
+		 * invalid.
+		 */
+		if (vq->avail_idx == last_avail_idx)
+			return 0;
+
+		/* Only get avail ring entries after they have been
+		 * exposed by guest.
+		 */
+		smp_rmb();
+	}
+
+	/* Grab the next descriptor number they're advertising */
+	if (unlikely(vhost_get_avail_head(vq, &ring_head, last_avail_idx))) {
+		vq_err(vq, "Failed to read head: idx %d address %p\n",
+		       last_avail_idx,
+		       &vq->avail->ring[last_avail_idx % vq->num]);
+		return -EFAULT;
+	}
+
+	head = vhost16_to_cpu(vq, ring_head);
+
+	/* If their number is silly, that's an error. */
+	if (unlikely(head >= vq->num)) {
+		vq_err(vq, "Guest says index %u > %u is available",
+		       head, vq->num);
+		return -EINVAL;
+	}
+
+	i = head;
+	do {
+		if (unlikely(i >= vq->num)) {
+			vq_err(vq, "Desc index is %u > %u, head = %u",
+			       i, vq->num, head);
+			return -EINVAL;
+		}
+		if (unlikely(++found > vq->num)) {
+			vq_err(vq, "Loop detected: last one at %u "
+			       "vq size %u head %u\n",
+			       i, vq->num, head);
+			return -EINVAL;
+		}
+		ret = vhost_get_desc(vq, &desc, i);
+		if (unlikely(ret)) {
+			vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+			       i, vq->desc + i);
+			return -EFAULT;
+		}
+		ret = push_split_desc(vq, &desc, head);
+		if (unlikely(ret)) {
+			vq_err(vq, "Failed to save descriptor: idx %d\n", i);
+			return -EINVAL;
+		}
+	} while ((i = next_desc(vq, &desc)) != -1);
+
+	last = peek_split_desc(vq);
+	if (unlikely(last->flags & VRING_DESC_F_INDIRECT)) {
+		pop_split_desc(vq);
+		ret = fetch_indirect_descs(vq, last, head);
+		if (unlikely(ret < 0)) {
+			if (ret != -EAGAIN)
+				vq_err(vq, "Failure detected "
+				       "in indirect descriptor at idx %d\n", head);
+			return ret;
+		}
+	}
+
+	/* Assume notifications from guest are disabled at this point,
+	 * if they aren't we would need to update avail_event index. */
+	BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
+
+	/* On success, increment avail index. */
+	vq->last_avail_idx++;
+
+	return 1;
+}
+
+/* This looks in the virtqueue and for the first available buffer, and converts
+ * it to an iovec for convenient access.  Since descriptors consist of some
+ * number of output then some number of input descriptors, it's actually two
+ * iovecs, but we pack them into one and note how many of each there were.
+ *
+ * This function returns the descriptor number found, or vq->num (which is
+ * never a valid descriptor number) if none was found.  A negative code is
+ * returned on error. */
+int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq,
+		      struct iovec iov[], unsigned int iov_size,
+		      unsigned int *out_num, unsigned int *in_num,
+		      struct vhost_log *log, unsigned int *log_num)
+{
+	int ret = fetch_descs(vq);
+	int i;
+
+	if (ret <= 0)
+		goto err_fetch;
+
+	/* Now convert to IOV */
+	/* When we start there are none of either input nor output. */
+	*out_num = *in_num = 0;
+	if (unlikely(log))
+		*log_num = 0;
+
+	for (i = 0; i < vq->ndescs; ++i) {
+		unsigned iov_count = *in_num + *out_num;
+		struct vhost_desc *desc = &vq->descs[i];
+		int access;
+
+		if (desc->flags & ~VHOST_DESC_FLAGS) {
+			vq_err(vq, "Unexpected flags: 0x%x at descriptor id 0x%x\n",
+			       desc->flags, desc->id);
+			ret = -EINVAL;
+			goto err;
+		}
+		if (desc->flags & VRING_DESC_F_WRITE)
+			access = VHOST_ACCESS_WO;
+		else
+			access = VHOST_ACCESS_RO;
+		ret = translate_desc(vq, desc->addr,
+				     desc->len, iov + iov_count,
+				     iov_size - iov_count, access);
+		if (unlikely(ret < 0)) {
+			if (ret != -EAGAIN)
+				vq_err(vq, "Translation failure %d descriptor idx %d\n",
+					ret, i);
+			goto err;
+		}
+		if (access == VHOST_ACCESS_WO) {
+			/* If this is an input descriptor,
+			 * increment that count. */
+			*in_num += ret;
+			if (unlikely(log && ret)) {
+				log[*log_num].addr = desc->addr;
+				log[*log_num].len = desc->len;
+				++*log_num;
+			}
+		} else {
+			/* If it's an output descriptor, they're all supposed
+			 * to come before any input descriptors. */
+			if (unlikely(*in_num)) {
+				vq_err(vq, "Descriptor has out after in: "
+				       "idx %d\n", i);
+				ret = -EINVAL;
+				goto err;
+			}
+			*out_num += ret;
+		}
+
+		ret = desc->id;
+	}
+
+	vq->ndescs = 0;
+
+	return ret;
+
+err:
+	vhost_discard_vq_desc(vq, 1);
+err_fetch:
+	vq->ndescs = 0;
+
+	return ret ? ret : vq->num;
+}
+EXPORT_SYMBOL_GPL(vhost_get_vq_desc_batch);
+
 /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
 void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index c8e96a095d3b..87089d51490d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -60,6 +60,13 @@ enum vhost_uaddr_type {
 	VHOST_NUM_ADDRS = 3,
 };
 
+struct vhost_desc {
+	u64 addr;
+	u32 len;
+	u16 flags; /* VRING_DESC_F_WRITE, VRING_DESC_F_NEXT */
+	u16 id;
+};
+
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
 	struct vhost_dev *dev;
@@ -71,6 +78,11 @@ struct vhost_virtqueue {
 	vring_avail_t __user *avail;
 	vring_used_t __user *used;
 	const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS];
+
+	struct vhost_desc *descs;
+	int ndescs;
+	int max_descs;
+
 	struct file *kick;
 	struct eventfd_ctx *call_ctx;
 	struct eventfd_ctx *error_ctx;
@@ -177,6 +189,10 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 bool vhost_vq_access_ok(struct vhost_virtqueue *vq);
 bool vhost_log_access_ok(struct vhost_dev *);
 
+int vhost_get_vq_desc_batch(struct vhost_virtqueue *,
+		      struct iovec iov[], unsigned int iov_count,
+		      unsigned int *out_num, unsigned int *in_num,
+		      struct vhost_log *log, unsigned int *log_num);
 int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct iovec iov[], unsigned int iov_count,
 		      unsigned int *out_num, unsigned int *in_num,
-- 
2.18.1


^ permalink raw reply related

* [RFC v9 00/11] vhost: ring format independence
From: Eugenio Pérez @ 2020-06-19 18:22 UTC (permalink / raw)
  To: mst
  Cc: kvm list, Stefano Garzarella, virtualization, linux-kernel,
	netdev, Stefan Hajnoczi, Jason Wang

This adds infrastructure required for supporting
multiple ring formats.

The idea is as follows: we convert descriptors to an
independent format first, and process that converting to
iov later.

Used ring is similar: we fetch into an independent struct first,
convert that to IOV later.

The point is that we have a tight loop that fetches
descriptors, which is good for cache utilization.
This will also allow all kind of batching tricks -
e.g. it seems possible to keep SMAP disabled while
we are fetching multiple descriptors.

For used descriptors, this allows keeping track of the buffer length
without need to rescan IOV.

This seems to perform better on UDP stream tests, but a little bit worse on
RR tests and TCP streams, based on a microbenchmark.
More testing would be very much appreciated.

changes from v8:
        - fixes fetch_descs returning "no descriptors available" when
          only few descriptors were available, stalling the communications.
        - minor syntax errors in intermediate commits.
        - skipping checking for sane max_descs if vhost device is not going to
          use worker like vDPA devices.

changes from v7:
        - squashed in fixes. no longer hangs but still known
          to cause data corruption for some people. under debug.

changes from v6:
        - fixes some bugs introduced in v6 and v5

changes from v5:
        - addressed comments by Jason: squashed API changes, fixed up discard

changes from v4:
        - added used descriptor format independence
        - addressed comments by jason
        - fixed a crash detected by the lkp robot.

changes from v3:
        - fixed error handling in case of indirect descriptors
        - add BUG_ON to detect buffer overflow in case of bugs
                in response to comment by Jason Wang
        - minor code tweaks

Changes from v2:
        - fixed indirect descriptor batching
                reported by Jason Wang

Changes from v1:
        - typo fixes

Michael S. Tsirkin (11):
  vhost: option to fetch descriptors through an independent struct
  vhost: use batched get_vq_desc version
  vhost/net: pass net specific struct pointer
  vhost: reorder functions
  vhost: format-independent API for used buffers
  vhost/net: convert to new API: heads->bufs
  vhost/net: avoid iov length math
  vhost/test: convert to the buf API
  vhost/scsi: switch to buf APIs
  vhost/vsock: switch to the buf API
  vhost: drop head based APIs

 drivers/vhost/net.c   | 174 ++++++++++----------
 drivers/vhost/scsi.c  |  73 +++++----
 drivers/vhost/test.c  |  22 +--
 drivers/vhost/vhost.c | 372 +++++++++++++++++++++++++++---------------
 drivers/vhost/vhost.h |  44 +++--
 drivers/vhost/vsock.c |  30 ++--
 6 files changed, 435 insertions(+), 280 deletions(-)

-- 
2.18.1

^ permalink raw reply

* Re: [PATCH 03/11] bpf: Add btf_ids object
From: Andrii Nakryiko @ 2020-06-19 18:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Networking, bpf,
	Song Liu, Yonghong Song, Martin KaFai Lau, David Miller,
	John Fastabend, Wenbo Zhang, KP Singh, Andrii Nakryiko,
	Brendan Gregg, Florent Revest, Al Viro
In-Reply-To: <20200619131306.GD2465907@krava>

On Fri, Jun 19, 2020 at 6:13 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Thu, Jun 18, 2020 at 05:56:38PM -0700, Andrii Nakryiko wrote:
> > On Tue, Jun 16, 2020 at 3:05 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > Adding support to generate .BTF_ids section that would
> > > hold various BTF IDs list for verifier.
> > >
> > > Adding macros help to define lists of BTF IDs placed in
> > > .BTF_ids section. They are initially filled with zeros
> > > (during compilation) and resolved later during the
> > > linking phase by btfid tool.
> > >
> > > Following defines list of one BTF ID that is accessible
> > > within kernel code as bpf_skb_output_btf_ids array.
> > >
> > >   extern int bpf_skb_output_btf_ids[];
> > >
> > >   BTF_ID_LIST(bpf_skb_output_btf_ids)
> > >   BTF_ID(struct, sk_buff)
> > >
> > > Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > >  include/asm-generic/vmlinux.lds.h |  4 ++
> > >  kernel/bpf/Makefile               |  2 +-
> > >  kernel/bpf/btf_ids.c              |  3 ++
> > >  kernel/bpf/btf_ids.h              | 70 +++++++++++++++++++++++++++++++
> > >  4 files changed, 78 insertions(+), 1 deletion(-)
> > >  create mode 100644 kernel/bpf/btf_ids.c
> > >  create mode 100644 kernel/bpf/btf_ids.h
> > >
> >
> > [...]
> >
> > > +/*
> > > + * Following macros help to define lists of BTF IDs placed
> > > + * in .BTF_ids section. They are initially filled with zeros
> > > + * (during compilation) and resolved later during the
> > > + * linking phase by btfid tool.
> > > + *
> > > + * Any change in list layout must be reflected in btfid
> > > + * tool logic.
> > > + */
> > > +
> > > +#define SECTION ".BTF_ids"
> >
> > nit: SECTION is super generic and non-greppable. BTF_IDS_SECTION?
>
> ok
>
> >
> > > +
> > > +#define ____BTF_ID(symbol)                             \
> > > +asm(                                                   \
> > > +".pushsection " SECTION ",\"a\";               \n"     \
> >
> > section should be also read-only? Either immediately here, of btfid
> > tool should mark it? Unless I missed that it's already doing it :)
>
> hm, it's there next to the .BTF section within RO_DATA macro,
> so I thought that was enough.. I'll double check

ah, linker script magic, got it

>
> >
> > > +".local " #symbol " ;                          \n"     \
> > > +".type  " #symbol ", @object;                  \n"     \
> > > +".size  " #symbol ", 4;                        \n"     \
> > > +#symbol ":                                     \n"     \
> > > +".zero 4                                       \n"     \
> > > +".popsection;                                  \n");
> > > +
> > > +#define __BTF_ID(...) \
> > > +       ____BTF_ID(__VA_ARGS__)
> >
> > why varargs, if it's always a single argument? Or it's one of those
> > macro black magic things were it works only in this particular case,
> > but not others?
>
> yea, I kind of struggled in here, because any other would not
> expand the name concat together with the unique ID bit,
> __VA_ARGS__ did it nicely ;-) I'll revisit this

it's probably not varargs, but rather nested macro call. Macros are
weird and tricky...

>
> thanks,
> jirka
>

^ permalink raw reply

* [PATCH] fs/epoll: Enable non-blocking busypoll with epoll timeout of 0
From: Sridhar Samudrala @ 2020-06-19 18:13 UTC (permalink / raw)
  To: viro, linux-fsdevel, netdev

This patch triggers non-blocking busy poll when busy_poll is enabled and
epoll is called with a timeout of 0 and is associated with a napi_id.
This enables an app thread to go through napi poll routine once by calling
epoll with a 0 timeout.

poll/select with a 0 timeout behave in a similar manner.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 fs/eventpoll.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 12eebcdea9c8..5f55078d6381 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1847,6 +1847,19 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		eavail = ep_events_available(ep);
 		write_unlock_irq(&ep->lock);
 
+		/*
+		 * Trigger non-blocking busy poll if timeout is 0 and there are
+		 * no events available. Passing timed_out(1) to ep_busy_loop
+		 * will make sure that busy polling is triggered only once and
+		 * only if sysctl.net.core.busy_poll is set to non-zero value.
+		 */
+		if (!eavail) {
+			ep_busy_loop(ep, timed_out);
+			write_lock_irq(&ep->lock);
+			eavail = ep_events_available(ep);
+			write_unlock_irq(&ep->lock);
+		}
+
 		goto send_events;
 	}
 
-- 
2.25.4


^ permalink raw reply related

* Re: [PATCH 01/11] bpf: Add btfid tool to resolve BTF IDs in ELF object
From: Andrii Nakryiko @ 2020-06-19 18:12 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Networking, bpf,
	Song Liu, Yonghong Song, Martin KaFai Lau, David Miller,
	John Fastabend, Wenbo Zhang, KP Singh, Andrii Nakryiko,
	Brendan Gregg, Florent Revest, Al Viro
In-Reply-To: <20200619130354.GB2465907@krava>

On Fri, Jun 19, 2020 at 6:04 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Thu, Jun 18, 2020 at 05:38:03PM -0700, Andrii Nakryiko wrote:
> > On Tue, Jun 16, 2020 at 3:06 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > The btfid tool scans Elf object for .BTF_ids section and
> > > resolves its symbols with BTF IDs.
> >
> > naming is hard and subjective, I know. But given this actively
> > modifies ELF file it probably should indicate this in the name. So
> > something like patch_btfids or resolve_btfids would be a bit more
> > accurate and for people not in the know will still trigger the
> > "warning, tool can modify something" flag, if there are any problems.
>
> resolve_btfids sounds good to me
>
> >
> > >
> > > It will be used to during linking time to resolve arrays
> > > of BTF IDs used in verifier, so these IDs do not need to
> > > be resolved in runtime.
> > >
> > > The expected layout of .BTF_ids section is described
> > > in btfid.c header. Related kernel changes are coming in
> > > following changes.
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > >  tools/bpf/btfid/Build    |  26 ++
> > >  tools/bpf/btfid/Makefile |  71 +++++
> > >  tools/bpf/btfid/btfid.c  | 627 +++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 724 insertions(+)
> > >  create mode 100644 tools/bpf/btfid/Build
> > >  create mode 100644 tools/bpf/btfid/Makefile
> > >  create mode 100644 tools/bpf/btfid/btfid.c
> > >
> >
> > [...]
> >
> > > diff --git a/tools/bpf/btfid/btfid.c b/tools/bpf/btfid/btfid.c
> > > new file mode 100644
> > > index 000000000000..7cdf39bfb150
> > > --- /dev/null
> > > +++ b/tools/bpf/btfid/btfid.c
> > > @@ -0,0 +1,627 @@
> > > +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> > > +#define  _GNU_SOURCE
> > > +
> > > +/*
> > > + * btfid scans Elf object for .BTF_ids section and resolves
> > > + * its symbols with BTF IDs.
> > > + *
> > > + * Each symbol points to 4 bytes data and is expected to have
> > > + * following name syntax:
> > > + *
> > > + * __BTF_ID__<type>__<symbol>[__<id>]
> >
> > This ___<id> thingy is just disambiguation between multiple places in
> > the code that could have BTF_ID macro, right? Or it has extra meaning?
>
> it's there so you could multiple BTF_ID instances with the same
> symbol name

yeah, reading subsequent patches clarified that for me :)

>
> >
> > > + *
> > > + * type is:
> > > + *
> > > + *   func   - lookup BTF_KIND_FUNC symbol with <symbol> name
> > > + *            and put its ID into its data
> > > + *
> > > + *             __BTF_ID__func__vfs_close__1:
> > > + *             .zero 4
> > > + *
> > > + *   struct - lookup BTF_KIND_STRUCT symbol with <symbol> name
> > > + *            and put its ID into its data
> > > + *
> > > + *             __BTF_ID__struct__sk_buff__1:
> > > + *             .zero 4
> > > + *
> > > + *   sort   - put symbol size into data area and sort following
> >
> > Oh, I finally got what "put symbol size" means :) It's quite unclear,
> > to be honest. Also, is this size in bytes or number of IDs? Clarifying
> > would be helpful (I'll probably get this from reading further down the
> > code, but still..)
>
> I 'put' ;-) the documentation mainly to kernel/bpf/btf_ids.h,
>
> so there are 2 types of lists, first one defines
> just IDs as they go:
>
>  BTF_ID_LIST(list1)
>  BTF_ID(type1, name1)
>  BTF_ID(type2, name2)
>
> and it's used for helpers btf_id array
>
> 2nd one provides count and is sorted:
>
>  BTF_WHITELIST_ENTRY(list2)
>  BTF_ID(type1, name1)
>  BTF_ID(type2, name2)
>  BTF_WHITELIST_END(list)
>
> and it's used for d_path whitelist so far


yeah, because BTF_ID_LIST and BTF_WHITELIST stuff was in two different
patches, I initially was confused by BTF_ID_LIST and thought it's
actually what BTF_WHITELIST is. Sorry for too many questions!


>
> SNIP
>
> > > +               if (sym.st_shndx != obj->efile.idlist_shndx)
> > > +                       continue;
> > > +
> > > +               name = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
> > > +                                 sym.st_name);
> > > +
> > > +               if (!is_btf_id(name))
> > > +                       continue;
> > > +
> > > +               /*
> > > +                * __BTF_ID__TYPE__vfs_truncate__0
> > > +                * prefix =  ^
> > > +                */
> > > +               prefix = name + sizeof(BTF_ID) - 1;
> > > +
> > > +               if (!strncmp(prefix, BTF_STRUCT, sizeof(BTF_STRUCT) - 1)) {
> > > +                       id = add_struct(obj, prefix);
> > > +               } else if (!strncmp(prefix, BTF_FUNC, sizeof(BTF_FUNC) - 1)) {
> > > +                       id = add_func(obj, prefix);
> > > +               } else if (!strncmp(prefix, BTF_SORT, sizeof(BTF_SORT) - 1)) {
> > > +                       id = add_sort(obj, prefix);
> > > +
> > > +                       /*
> > > +                        * SORT objects store list's count, which is encoded
> > > +                        * in symbol's size.
> > > +                        */
> > > +                       if (id)
> > > +                               id->cnt = sym.st_size / sizeof(int);
> >
> > doesn't sym.st_size also include extra 4 bytes for length prefix?
>
> no, count is excluded from the size
>
> SNIP
>
> > > +       btf = btf__parse_elf(obj->path, NULL);
> > > +       err = libbpf_get_error(btf);
> > > +       if (err) {
> > > +               pr_err("FAILED: load BTF from %s: %s",
> > > +                       obj->path, strerror(err));
> > > +               return -1;
> > > +       }
> > > +
> > > +       nr = btf__get_nr_types(btf);
> > > +
> > > +       /*
> > > +        * Iterate all the BTF types and search for collected symbol IDs.
> > > +        */
> > > +       for (type_id = 0; type_id < nr; type_id++) {
> >
> > common gotcha: type_id <= nr, you can also skip type_id == 0 (always VOID)
>
> ugh, yep.. thanks ;-)
>
> >
> > > +               const struct btf_type *type;
> > > +               struct rb_root *root = NULL;
> > > +               struct btf_id *id;
> > > +               const char *str;
> > > +               int *nr;
> > > +
> > > +               type = btf__type_by_id(btf, type_id);
> > > +               if (!type)
> > > +                       continue;
> >
> > This ought to be an error...
>
> ok, something like "BTF malformed error"
>
> >
> > > +
> > > +               /* We support func/struct types. */
> > > +               if (BTF_INFO_KIND(type->info) == BTF_KIND_FUNC && nr_funcs) {
> >
> > see libbpf's btf.h: btf_is_func(type)
>
> ok
>
> >
> > > +                       root = &obj->funcs;
> > > +                       nr = &nr_funcs;
> > > +               } else if (BTF_INFO_KIND(type->info) == BTF_KIND_STRUCT && nr_structs) {
> >
> > same as above: btf_is_struct
> >
> > But I think you also need to support unions?
> >
> > Also what about typedefs? A lot of types are typedefs to struct/func_proto/etc.
>
> I added only types which are needed at the moment, but maybe
> we can add the basic types now, so we don't need to bother later,
> when we forget how this all work ;-)

yeah, exactly. Once this works, no one will want to go and revisit it,
so I'd rather make it generic from the get go, especially that it's
really easy in this case, right?

>
> >
> > > +                       root = &obj->structs;
> > > +                       nr = &nr_structs;
> > > +               } else {
> > > +                       continue;
> > > +               }
> > > +
> > > +               str = btf__name_by_offset(btf, type->name_off);
> > > +               if (!str)
> > > +                       continue;
> >
> > error, shouldn't happen
>
> ok
>
> >
> > > +
> > > +               id = btf_id__find(root, str);
> > > +               if (id) {
> >
> > isn't it an error, if not found?
>
> no, at this point we are checking if this BTF type was collected
> as a symbol for struct/func in some list.. if not, we continue the
> iteration to next BTF type

ah, ok

>
> >
> > > +                       id->id = type_id;
> > > +                       (*nr)--;
> > > +               }
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> >
> > [...]
> >
> > > +
> > > +       /*
> > > +        * We do proper cleanup and file close
> > > +        * intentionally only on success.
> > > +        */
> > > +       if (elf_collect(&obj))
> > > +               return -1;
> > > +
> > > +       if (symbols_collect(&obj))
> > > +               return -1;
> > > +
> > > +       if (symbols_resolve(&obj))
> > > +               return -1;
> > > +
> > > +       if (symbols_patch(&obj))
> > > +               return -1;
> >
> > nit: should these elf_end/close properly on error?
>
> I wrote in the comment above that I intentionaly do not cleanup
> on error path.. I wanted to save some time, but actualy I think
> that would not be so expensive, I can add it

as in save CPU time in error case? Who cares about that? If saving
developer time, well... `goto cleanup` is common and simple pattern ;)

>
> thanks,
> jirka
>
> >
> >
> > > +
> > > +       elf_end(obj.efile.elf);
> > > +       close(obj.efile.fd);
> > > +       return 0;
> > > +}
> > > --
> > > 2.25.4
> > >
> >
>

^ permalink raw reply

* Re: [PATCH 3/3] net: phy: marvell: Add Marvell 88E1548P support
From: Florian Fainelli @ 2020-06-19 18:09 UTC (permalink / raw)
  To: Maxim Kochetkov, netdev
  Cc: Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
	Jakub Kicinski
In-Reply-To: <20200619084904.95432-4-fido_max@inbox.ru>



On 6/19/2020 1:49 AM, Maxim Kochetkov wrote:
> Add support for this new phy ID.
> 
> Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox