* RE: [Intel-wired-lan] [next-queue PATCH v7 02/10] igb: Fix queue selection on MAC filters on i210
From: Brown, Aaron F @ 2018-04-17 0:52 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-3-vinicius.gomes@intel.com>
> From: Brown, Aaron F
> Sent: Friday, April 13, 2018 7:18 PM
> To: 'Vinicius Costa Gomes' <vinicius.gomes@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> palencia@intel.com>
> Subject: RE: [Intel-wired-lan] [next-queue PATCH v7 02/10] igb: Fix queue
> selection on MAC filters on i210
>
> > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> > Behalf Of Vinicius Costa Gomes
> > Sent: Tuesday, April 10, 2018 10:50 AM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> > palencia@intel.com>
> > Subject: [Intel-wired-lan] [next-queue PATCH v7 02/10] igb: Fix queue
> > selection on MAC filters on i210
> >
> > On the RAH registers there are semantic differences on the meaning of
> > the "queue" parameter for traffic steering depending on the controller
> > model: there is the 82575 meaning, which "queue" means a RX Hardware
> > Queue, and the i350 meaning, where it is a reception pool.
> >
> > The previous behaviour was having no effect for i210 based controllers
> > because the QSEL bit of the RAH register wasn't being set.
> >
> > This patch separates the condition in discrete cases, so the different
> > handling is clearer.
> >
> > Fixes: 83c21335c876 ("igb: improve MAC filter handling")
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > ---
> > drivers/net/ethernet/intel/igb/igb_main.c | 13 +++++++++----
> > 1 file changed, 9 insertions(+), 4 deletions(-)
> >
>
> Tested by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* RE: [Intel-wired-lan] [next-queue PATCH v7 03/10] igb: Enable the hardware traffic class feature bit for igb models
From: Brown, Aaron F @ 2018-04-17 0:52 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-4-vinicius.gomes@intel.com>
> From: Brown, Aaron F
> Sent: Friday, April 13, 2018 7:18 PM
> To: 'Vinicius Costa Gomes' <vinicius.gomes@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> palencia@intel.com>
> Subject: RE: [Intel-wired-lan] [next-queue PATCH v7 03/10] igb: Enable the
> hardware traffic class feature bit for igb models
>
> > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> > Behalf Of Vinicius Costa Gomes
> > Sent: Tuesday, April 10, 2018 10:50 AM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> > palencia@intel.com>
> > Subject: [Intel-wired-lan] [next-queue PATCH v7 03/10] igb: Enable the
> > hardware traffic class feature bit for igb models
> >
> > This will allow functionality depending on the hardware being traffic
> > class aware to work. In particular the tc-flower offloading checks
> > verifies that this bit is set.
> >
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > ---
> > drivers/net/ethernet/intel/igb/igb_main.c | 3 +++
> > 1 file changed, 3 insertions(+)
>
> Tested by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* RE: [next-queue PATCH v7 04/10] igb: Add support for MAC address filters specifying source addresses
From: Brown, Aaron F @ 2018-04-17 0:53 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: Gomes, Vinicius, Kirsher, Jeffrey T, netdev@vger.kernel.org,
Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-5-vinicius.gomes@intel.com>
> From: Brown, Aaron F
> Sent: Friday, April 13, 2018 7:19 PM
> To: 'Vinicius Costa Gomes' <vinicius.gomes@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> Jesus <jesus.sanchez-palencia@intel.com>
> Subject: RE: [next-queue PATCH v7 04/10] igb: Add support for MAC address
> filters specifying source addresses
>
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Vinicius Costa Gomes
> > Sent: Tuesday, April 10, 2018 10:50 AM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> > Jesus <jesus.sanchez-palencia@intel.com>
> > Subject: [next-queue PATCH v7 04/10] igb: Add support for MAC address
> > filters specifying source addresses
> >
> > Makes it possible to direct packets to queues based on their source
> > address. Documents the expected usage of the 'flags' parameter.
> >
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > ---
> > .../net/ethernet/intel/igb/e1000_defines.h | 1 +
> > drivers/net/ethernet/intel/igb/igb.h | 1 +
> > drivers/net/ethernet/intel/igb/igb_main.c | 40 ++++++++++++++++---
> > 3 files changed, 37 insertions(+), 5 deletions(-)
>
> Tested by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* RE: [Intel-wired-lan] [next-queue PATCH v7 05/10] igb: Add support for enabling queue steering in filters
From: Brown, Aaron F @ 2018-04-17 0:53 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-6-vinicius.gomes@intel.com>
> From: Brown, Aaron F
> Sent: Friday, April 13, 2018 7:20 PM
> To: 'Vinicius Costa Gomes' <vinicius.gomes@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> palencia@intel.com>
> Subject: RE: [Intel-wired-lan] [next-queue PATCH v7 05/10] igb: Add support
> for enabling queue steering in filters
>
> > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> > Behalf Of Vinicius Costa Gomes
> > Sent: Tuesday, April 10, 2018 10:50 AM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> > palencia@intel.com>
> > Subject: [Intel-wired-lan] [next-queue PATCH v7 05/10] igb: Add support
> for
> > enabling queue steering in filters
> >
> > On some igb models (82575 and i210) the MAC address filters can
> > control to which queue the packet will be assigned.
> >
> > This extends the 'state' with one more state to signify that queue
> > selection should be enabled for that filter.
> >
> > As 82575 parts are no longer easily obtained (and this was developed
> > against i210), only support for the i210 model is enabled.
> >
> > These functions are exported and will be used in the next patch.
> >
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > ---
> > .../net/ethernet/intel/igb/e1000_defines.h | 1 +
> > drivers/net/ethernet/intel/igb/igb.h | 6 +++++
> > drivers/net/ethernet/intel/igb/igb_main.c | 26 +++++++++++++++++++
> > 3 files changed, 33 insertions(+)
>
> Tested by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* RE: [Intel-wired-lan] [next-queue PATCH v7 06/10] igb: Allow filters to be added for the local MAC address
From: Brown, Aaron F @ 2018-04-17 0:53 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-7-vinicius.gomes@intel.com>
> From: Brown, Aaron F
> Sent: Friday, April 13, 2018 7:20 PM
> To: 'Vinicius Costa Gomes' <vinicius.gomes@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> palencia@intel.com>
> Subject: RE: [Intel-wired-lan] [next-queue PATCH v7 06/10] igb: Allow filters
> to be added for the local MAC address
>
> > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> > Behalf Of Vinicius Costa Gomes
> > Sent: Tuesday, April 10, 2018 10:50 AM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> > palencia@intel.com>
> > Subject: [Intel-wired-lan] [next-queue PATCH v7 06/10] igb: Allow filters to
> > be added for the local MAC address
> >
> > Users expect that when adding a steering filter for the local MAC
> > address, that all the traffic directed to that address will go to some
> > queue.
> >
> > Currently, it's not possible to configure entries in the "in use"
> > state, which is the normal state of the local MAC address entry (it is
> > the default), this patch allows to override the steering configuration
> > of "in use" entries, if the filter to be added match the address and
> > address type (source or destination) of an existing entry.
> >
> > There is a bit of a special handling for entries referring to the
> > local MAC address, when they are removed, only the steering
> > configuration is reset.
> >
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > ---
> > drivers/net/ethernet/intel/igb/igb_main.c | 40 ++++++++++++++++++++-
> --
> > 1 file changed, 36 insertions(+), 4 deletions(-)
>
> Tested by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* RE: [next-queue PATCH v7 07/10] igb: Enable nfc filters to specify MAC addresses
From: Brown, Aaron F @ 2018-04-17 0:54 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: Gomes, Vinicius, Kirsher, Jeffrey T, netdev@vger.kernel.org,
Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-8-vinicius.gomes@intel.com>
> From: Brown, Aaron F
> Sent: Friday, April 13, 2018 7:22 PM
> To: 'Vinicius Costa Gomes' <vinicius.gomes@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> Jesus <jesus.sanchez-palencia@intel.com>
> Subject: RE: [next-queue PATCH v7 07/10] igb: Enable nfc filters to specify
> MAC addresses
>
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Vinicius Costa Gomes
> > Sent: Tuesday, April 10, 2018 10:50 AM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> > Jesus <jesus.sanchez-palencia@intel.com>
> > Subject: [next-queue PATCH v7 07/10] igb: Enable nfc filters to specify MAC
> > addresses
> >
> > This allows igb_add_filter()/igb_erase_filter() to work on filters
> > that include MAC addresses (both source and destination).
> >
> > For now, this only exposes the functionality, the next commit glues
> > ethtool into this. Later in this series, these APIs are used to allow
> > offloading of cls_flower filters.
> >
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > ---
> > drivers/net/ethernet/intel/igb/igb.h | 4 +++
> > drivers/net/ethernet/intel/igb/igb_ethtool.c | 28
> ++++++++++++++++++++
> > 2 files changed, 32 insertions(+)
>
> Tested by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* RE: [next-queue PATCH v7 08/10] igb: Add MAC address support for ethtool nftuple filters
From: Brown, Aaron F @ 2018-04-17 0:54 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: Gomes, Vinicius, Kirsher, Jeffrey T, netdev@vger.kernel.org,
Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-9-vinicius.gomes@intel.com>
> From: Brown, Aaron F
> Sent: Friday, April 13, 2018 7:23 PM
> To: 'Vinicius Costa Gomes' <vinicius.gomes@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> Jesus <jesus.sanchez-palencia@intel.com>
> Subject: RE: [next-queue PATCH v7 08/10] igb: Add MAC address support for
> ethtool nftuple filters
>
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Vinicius Costa Gomes
> > Sent: Tuesday, April 10, 2018 10:50 AM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> > Jesus <jesus.sanchez-palencia@intel.com>
> > Subject: [next-queue PATCH v7 08/10] igb: Add MAC address support for
> > ethtool nftuple filters
> >
> > This adds the capability of configuring the queue steering of arriving
> > packets based on their source and destination MAC addresses.
> >
> > Source address steering (i.e. driving traffic to a specific queue),
> > for the i210, does not work, but filtering does (i.e. accepting
> > traffic based on the source address). So, trying to add a filter
> > specifying only a source address will be an error.
> >
> > In practical terms this adds support for the following use cases,
> > characterized by these examples:
> >
> > $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
> > (this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
> > to the RX queue 0)
> >
> > $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 \
> > proto 0x22f0 action 3
> > (this will direct packets with source address "44:44:44:44:44:44" and
> > ethertype 0x22f0 to the RX queue 3)
> >
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > ---
> > drivers/net/ethernet/intel/igb/igb_ethtool.c | 43 ++++++++++++++++++-
> -
> > 1 file changed, 39 insertions(+), 4 deletions(-)
> >
>
> Tested by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* RE: [Intel-wired-lan] [next-queue PATCH v7 09/10] igb: Add the skeletons for tc-flower offloading
From: Brown, Aaron F @ 2018-04-17 0:54 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-10-vinicius.gomes@intel.com>
> From: Brown, Aaron F
> Sent: Friday, April 13, 2018 7:23 PM
> To: 'Vinicius Costa Gomes' <vinicius.gomes@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> palencia@intel.com>
> Subject: RE: [Intel-wired-lan] [next-queue PATCH v7 09/10] igb: Add the
> skeletons for tc-flower offloading
>
> > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> > Behalf Of Vinicius Costa Gomes
> > Sent: Tuesday, April 10, 2018 10:50 AM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> > palencia@intel.com>
> > Subject: [Intel-wired-lan] [next-queue PATCH v7 09/10] igb: Add the
> > skeletons for tc-flower offloading
> >
> > This adds basic functions needed to implement offloading for filters
> > created by tc-flower.
> >
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > ---
> > drivers/net/ethernet/intel/igb/igb_main.c | 66
> > +++++++++++++++++++++++
> > 1 file changed, 66 insertions(+)
> >
>
> Tested by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* RE: [next-queue PATCH v7 10/10] igb: Add support for adding offloaded clsflower filters
From: Brown, Aaron F @ 2018-04-17 0:55 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: Gomes, Vinicius, Kirsher, Jeffrey T, netdev@vger.kernel.org,
Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-11-vinicius.gomes@intel.com>
> From: Brown, Aaron F
> Sent: Friday, April 13, 2018 7:25 PM
> To: 'Vinicius Costa Gomes' <vinicius.gomes@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> Jesus <jesus.sanchez-palencia@intel.com>
> Subject: RE: [next-queue PATCH v7 10/10] igb: Add support for adding
> offloaded clsflower filters
>
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Vinicius Costa Gomes
> > Sent: Tuesday, April 10, 2018 10:50 AM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> > Jesus <jesus.sanchez-palencia@intel.com>
> > Subject: [next-queue PATCH v7 10/10] igb: Add support for adding
> offloaded
> > clsflower filters
> >
> > This allows filters added by tc-flower and specifying MAC addresses,
> > Ethernet types, and the VLAN priority field, to be offloaded to the
> > controller.
> >
> > This reuses most of the infrastructure used by ethtool, but clsflower
> > filters are kept in a separated list, so they are invisible to
> > ethtool.
> >
> > To setup clsflower offloading:
> >
> > $ tc qdisc replace dev eth0 handle 100: parent root mqprio \
> > num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
> > queues 1@0 1@1 2@2 hw 0
> > (clsflower offloading depends on the netword driver to be configured
> > with multiple traffic classes, we use mqprio's 'num_tc' parameter to
> > set it to 3)
> >
> > $ tc qdisc add dev eth0 ingress
> >
> > Examples of filters:
> >
> > $ tc filter add dev eth0 parent ffff: flower \
> > dst_mac aa:aa:aa:aa:aa:aa \
> > hw_tc 2 skip_sw
> > (just a simple filter filtering for the destination MAC address and
> > steering that traffic to queue 2)
> >
> > $ tc filter add dev enp2s0 parent ffff: proto 0x22f0 flower \
> > src_mac cc:cc:cc:cc:cc:cc \
> > hw_tc 1 skip_sw
> > (as the i210 doesn't support steering traffic based on the source
> > address alone, we need to use another steering traffic, in this case
> > we are using the ethernet type (0x22f0) to steer traffic to queue 1)
> >
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > ---
> > drivers/net/ethernet/intel/igb/igb.h | 2 +
> > drivers/net/ethernet/intel/igb/igb_main.c | 188
> > +++++++++++++++++++++-
> > 2 files changed, 188 insertions(+), 2 deletions(-)
>
> Tested by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* RE: [next-queue PATCH] igb: Fix the transmission mode of queue 0 for Qav mode
From: Brown, Aaron F @ 2018-04-17 0:55 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: Gomes, Vinicius, Kirsher, Jeffrey T, netdev@vger.kernel.org,
Sanchez-Palencia, Jesus, Guedes, Andre
In-Reply-To: <20180331000652.2855-1-vinicius.gomes@intel.com>
> From: Brown, Aaron F
> Sent: Friday, April 13, 2018 7:28 PM
> To: 'Vinicius Costa Gomes' <vinicius.gomes@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> Jesus <jesus.sanchez-palencia@intel.com>; Guedes, Andre
> <andre.guedes@intel.com>
> Subject: RE: [next-queue PATCH] igb: Fix the transmission mode of queue 0
> for Qav mode
>
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Vinicius Costa Gomes
> > Sent: Friday, March 30, 2018 5:07 PM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> > Jesus <jesus.sanchez-palencia@intel.com>; Guedes, Andre
> > <andre.guedes@intel.com>
> > Subject: [next-queue PATCH] igb: Fix the transmission mode of queue 0 for
> > Qav mode
> >
> > When Qav mode is enabled, queue 0 should be kept on Stream
> Reservation
> > mode. From the i210 datasheet, section 8.12.19:
> >
> > "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set
> to
> > Qav." ("QueueMode 1b" represents the Stream Reservation mode)
> >
> > The solution is to give queue 0 the all the credits it might need, so
> > it has priority over queue 1.
> >
> > A situation where this can happen is when cbs is "installed" only on
> > queue 1, leaving queue 0 alone. For example:
> >
> > $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \
> > map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
> >
> > $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \
> > hicredit 30 sendslope -980000 idleslope 20000 offload 1
> >
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > ---
> > drivers/net/ethernet/intel/igb/igb_main.c | 17 ++++++++++++++++-
> > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
>
> Tested by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* Re: [PATCH 08/10] net: ax88796: Make reset more robust on AX88796B
From: Andrew Lunn @ 2018-04-17 1:32 UTC (permalink / raw)
To: Michael Schmitz
Cc: netdev, Linux/m68k, Michael Karcher, John Paul Adrian Glaubitz,
Michael Karcher
In-Reply-To: <CAOmrzkKoPwNRf5Nq=GN8ktP-9B5dNyGggFsrJ-Tr3D-z=Jj2rQ@mail.gmail.com>
> > This should really be fixed in the PHY driver, not the MAC.
>
> OK - do you want this separate, or as part of this series? Might have
> a few side effects on more commonly used hardware, perhaps?
Hi Michael
What PHY driver is used? In the driver you can implement a .soft_reset
function which first does the dummy write, and then uses
genphy_soft_reset() to do the actual reset.
Andrew
^ permalink raw reply
* [PATCH v2 3/8] net: ax88796: Do not free IRQ in ax_remove() (already freed in ax_close()).
From: Michael Schmitz @ 2018-04-17 2:08 UTC (permalink / raw)
To: netdev
Cc: andrew, linux-m68k, Michael.Karcher, John Paul Adrian Glaubitz,
Michael Karcher
In-Reply-To: <1523916285-6057-1-git-send-email-schmitzmic@gmail.com>
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
This complements the fix in 82533ad9a1c that removed the free_irq
call in the error path of probe, to also not call free_irq when
remove is called to revert the effects of probe.
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
---
drivers/net/ethernet/8390/ax88796.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index 83e59ae..ecf104c 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -793,7 +793,6 @@ static int ax_remove(struct platform_device *pdev)
struct resource *mem;
unregister_netdev(dev);
- free_irq(dev->irq, dev);
iounmap(ei_local->mem);
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
--
1.7.0.4
^ permalink raw reply related
* [PATCH v2 0/8] New network driver for Amiga X-Surf 100 (m68k)
From: Michael Schmitz @ 2018-04-17 2:08 UTC (permalink / raw)
To: netdev; +Cc: andrew, linux-m68k, Michael.Karcher
In-Reply-To: <1523916285-6057-1-git-send-email-schmitzmic@gmail.com>
This patch series adds support for the Individual Computers X-Surf 100
network card for m68k Amiga, a network adapter based on the AX88796 chip set.
The driver was originally written for kernel version 3.19 by Michael Karcher
(see CC:), and adapted to 4.16 for submission to netdev by me. Questions
regarding motivation for some of the changes are probably best directed at
Michael Karcher.
The driver has been tested by Adrian <glaubitz@physik.fu-berlin.de> who will
send his Tested-by tag separately.
A few changes to the ax88796 driver were required:
- to read the MAC address, some setup of the ax99796 chip must be done,
- attach to the MII bus only on device open to allow module unloading,
- allow to supersede ax_block_input/ax_block_output by card-specific
optimized code,
- use an optional interrupt status callback to allow easier sharing of the
card interrupt,
- set IRQF_SHARED if platform IRQ resource is marked shareable,
Some additional cleanup:
- do not attempt to free IRQ in ax_remove (complements 82533ad9a1c),
- clear platform drvdata on probe fail and module remove.
Changes since v1:
Raised in review by Andrew Lunn:
- move MII code around to avoid need for forward declaration
- combine patches 2 and 7 to add cleanup in error path
The patch series, in order:
1/8 net: ax88796: Fix MAC address reading
2/8 net: ax88796: Attach MII bus only when open
3/8 net: ax88796: Do not free IRQ in ax_remove() (already freed in ax_close()).
4/8 net: ax88796: Add block_input/output hooks to ax_plat_data
5/8 net: ax88796: add interrupt status callback to platform data
6/8 net: ax88796: set IRQF_SHARED flag when IRQ resource is marked as shareable
7/8 net: ax88796: release platform device drvdata on probe error and module remove
8/8 net: New ax88796 platform driver for Amiga X-Surf 100 Zorro board (m68k)
drivers/net/ethernet/8390/Kconfig | 14 +-
drivers/net/ethernet/8390/Makefile | 1 +
drivers/net/ethernet/8390/ax88796.c | 228 +++++++++++--------
drivers/net/ethernet/8390/xsurf100.c | 411 ++++++++++++++++++++++++++++++++++
include/net/ax88796.h | 14 +-
5 files changed, 573 insertions(+), 95 deletions(-)
Cheers,
Michael
^ permalink raw reply
* [PATCH v2 2/8] net: ax88796: Attach MII bus only when open
From: Michael Schmitz @ 2018-04-17 2:08 UTC (permalink / raw)
To: netdev
Cc: andrew, linux-m68k, Michael.Karcher, Michael Karcher,
Michael Karcher, Michael Schmitz
In-Reply-To: <1523916285-6057-1-git-send-email-schmitzmic@gmail.com>
From: Michael Karcher <debian@mkarcher.dialup.fu-berlin.de>
Call ax_mii_init in ax_open(), and unregister/remove mdiobus resources
in ax_close().
This is needed to be able to unload the module, as the module is busy
while the MII bus is attached.
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
drivers/net/ethernet/8390/ax88796.c | 183 ++++++++++++++++++-----------------
1 files changed, 95 insertions(+), 88 deletions(-)
diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index 2a256aa..83e59ae 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -389,6 +389,90 @@ static void ax_phy_switch(struct net_device *dev, int on)
ei_outb(reg_gpoc, ei_local->mem + EI_SHIFT(0x17));
}
+static void ax_bb_mdc(struct mdiobb_ctrl *ctrl, int level)
+{
+ struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
+
+ if (level)
+ ax->reg_memr |= AX_MEMR_MDC;
+ else
+ ax->reg_memr &= ~AX_MEMR_MDC;
+
+ ei_outb(ax->reg_memr, ax->addr_memr);
+}
+
+static void ax_bb_dir(struct mdiobb_ctrl *ctrl, int output)
+{
+ struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
+
+ if (output)
+ ax->reg_memr &= ~AX_MEMR_MDIR;
+ else
+ ax->reg_memr |= AX_MEMR_MDIR;
+
+ ei_outb(ax->reg_memr, ax->addr_memr);
+}
+
+static void ax_bb_set_data(struct mdiobb_ctrl *ctrl, int value)
+{
+ struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
+
+ if (value)
+ ax->reg_memr |= AX_MEMR_MDO;
+ else
+ ax->reg_memr &= ~AX_MEMR_MDO;
+
+ ei_outb(ax->reg_memr, ax->addr_memr);
+}
+
+static int ax_bb_get_data(struct mdiobb_ctrl *ctrl)
+{
+ struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
+ int reg_memr = ei_inb(ax->addr_memr);
+
+ return reg_memr & AX_MEMR_MDI ? 1 : 0;
+}
+
+static const struct mdiobb_ops bb_ops = {
+ .owner = THIS_MODULE,
+ .set_mdc = ax_bb_mdc,
+ .set_mdio_dir = ax_bb_dir,
+ .set_mdio_data = ax_bb_set_data,
+ .get_mdio_data = ax_bb_get_data,
+};
+
+static int ax_mii_init(struct net_device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev->dev.parent);
+ struct ei_device *ei_local = netdev_priv(dev);
+ struct ax_device *ax = to_ax_dev(dev);
+ int err;
+
+ ax->bb_ctrl.ops = &bb_ops;
+ ax->addr_memr = ei_local->mem + AX_MEMR;
+ ax->mii_bus = alloc_mdio_bitbang(&ax->bb_ctrl);
+ if (!ax->mii_bus) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ ax->mii_bus->name = "ax88796_mii_bus";
+ ax->mii_bus->parent = dev->dev.parent;
+ snprintf(ax->mii_bus->id, MII_BUS_ID_SIZE, "%s-%x",
+ pdev->name, pdev->id);
+
+ err = mdiobus_register(ax->mii_bus);
+ if (err)
+ goto out_free_mdio_bitbang;
+
+ return 0;
+
+ out_free_mdio_bitbang:
+ free_mdio_bitbang(ax->mii_bus);
+ out:
+ return err;
+}
+
static int ax_open(struct net_device *dev)
{
struct ax_device *ax = to_ax_dev(dev);
@@ -396,6 +480,10 @@ static int ax_open(struct net_device *dev)
netdev_dbg(dev, "open\n");
+ ret = ax_mii_init(dev);
+ if (ret)
+ goto failed_mii;
+
ret = request_irq(dev->irq, ax_ei_interrupt, ax->irqflags,
dev->name, dev);
if (ret)
@@ -423,6 +511,10 @@ static int ax_open(struct net_device *dev)
ax_phy_switch(dev, 0);
free_irq(dev->irq, dev);
failed_request_irq:
+ /* unregister mdiobus */
+ mdiobus_unregister(ax->mii_bus);
+ free_mdio_bitbang(ax->mii_bus);
+ failed_mii:
return ret;
}
@@ -442,6 +534,9 @@ static int ax_close(struct net_device *dev)
phy_disconnect(dev->phydev);
free_irq(dev->irq, dev);
+
+ mdiobus_unregister(ax->mii_bus);
+ free_mdio_bitbang(ax->mii_bus);
return 0;
}
@@ -541,92 +636,8 @@ static void ax_eeprom_register_write(struct eeprom_93cx6 *eeprom)
#endif
};
-static void ax_bb_mdc(struct mdiobb_ctrl *ctrl, int level)
-{
- struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
-
- if (level)
- ax->reg_memr |= AX_MEMR_MDC;
- else
- ax->reg_memr &= ~AX_MEMR_MDC;
-
- ei_outb(ax->reg_memr, ax->addr_memr);
-}
-
-static void ax_bb_dir(struct mdiobb_ctrl *ctrl, int output)
-{
- struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
-
- if (output)
- ax->reg_memr &= ~AX_MEMR_MDIR;
- else
- ax->reg_memr |= AX_MEMR_MDIR;
-
- ei_outb(ax->reg_memr, ax->addr_memr);
-}
-
-static void ax_bb_set_data(struct mdiobb_ctrl *ctrl, int value)
-{
- struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
-
- if (value)
- ax->reg_memr |= AX_MEMR_MDO;
- else
- ax->reg_memr &= ~AX_MEMR_MDO;
-
- ei_outb(ax->reg_memr, ax->addr_memr);
-}
-
-static int ax_bb_get_data(struct mdiobb_ctrl *ctrl)
-{
- struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
- int reg_memr = ei_inb(ax->addr_memr);
-
- return reg_memr & AX_MEMR_MDI ? 1 : 0;
-}
-
-static const struct mdiobb_ops bb_ops = {
- .owner = THIS_MODULE,
- .set_mdc = ax_bb_mdc,
- .set_mdio_dir = ax_bb_dir,
- .set_mdio_data = ax_bb_set_data,
- .get_mdio_data = ax_bb_get_data,
-};
-
/* setup code */
-static int ax_mii_init(struct net_device *dev)
-{
- struct platform_device *pdev = to_platform_device(dev->dev.parent);
- struct ei_device *ei_local = netdev_priv(dev);
- struct ax_device *ax = to_ax_dev(dev);
- int err;
-
- ax->bb_ctrl.ops = &bb_ops;
- ax->addr_memr = ei_local->mem + AX_MEMR;
- ax->mii_bus = alloc_mdio_bitbang(&ax->bb_ctrl);
- if (!ax->mii_bus) {
- err = -ENOMEM;
- goto out;
- }
-
- ax->mii_bus->name = "ax88796_mii_bus";
- ax->mii_bus->parent = dev->dev.parent;
- snprintf(ax->mii_bus->id, MII_BUS_ID_SIZE, "%s-%x",
- pdev->name, pdev->id);
-
- err = mdiobus_register(ax->mii_bus);
- if (err)
- goto out_free_mdio_bitbang;
-
- return 0;
-
- out_free_mdio_bitbang:
- free_mdio_bitbang(ax->mii_bus);
- out:
- return err;
-}
-
static void ax_initial_setup(struct net_device *dev, struct ei_device *ei_local)
{
void __iomem *ioaddr = ei_local->mem;
@@ -758,10 +769,6 @@ static int ax_init_dev(struct net_device *dev)
dev->netdev_ops = &ax_netdev_ops;
dev->ethtool_ops = &ax_ethtool_ops;
- ret = ax_mii_init(dev);
- if (ret)
- goto err_out;
-
ax_NS8390_init(dev, 0);
ret = register_netdev(dev);
--
1.7.0.4
^ permalink raw reply related
* [PATCH v2 4/8] net: ax88796: Add block_input/output hooks to ax_plat_data
From: Michael Schmitz @ 2018-04-17 2:08 UTC (permalink / raw)
To: netdev
Cc: andrew, linux-m68k, Michael.Karcher, Michael Schmitz,
Michael Karcher
In-Reply-To: <1523916285-6057-1-git-send-email-schmitzmic@gmail.com>
Add platform specific hooks for block transfer reads/writes of packet
buffer data, superseding the default provided ax_block_input/output.
Currently used for m68k Amiga XSurf100.
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
drivers/net/ethernet/8390/ax88796.c | 10 ++++++++--
include/net/ax88796.h | 9 ++++++++-
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index ecf104c..29cde38 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -760,8 +760,14 @@ static int ax_init_dev(struct net_device *dev)
#endif
ei_local->reset_8390 = &ax_reset_8390;
- ei_local->block_input = &ax_block_input;
- ei_local->block_output = &ax_block_output;
+ if (ax->plat->block_input)
+ ei_local->block_input = ax->plat->block_input;
+ else
+ ei_local->block_input = &ax_block_input;
+ if (ax->plat->block_output)
+ ei_local->block_output = ax->plat->block_output;
+ else
+ ei_local->block_output = &ax_block_output;
ei_local->get_8390_hdr = &ax_get_8390_hdr;
ei_local->priv = 0;
ei_local->msg_enable = ax_msg_enable;
diff --git a/include/net/ax88796.h b/include/net/ax88796.h
index b9a3bec..26cc459 100644
--- a/include/net/ax88796.h
+++ b/include/net/ax88796.h
@@ -8,10 +8,11 @@
* published by the Free Software Foundation.
*
*/
-
#ifndef __NET_AX88796_PLAT_H
#define __NET_AX88796_PLAT_H
+struct net_device;
+
#define AXFLG_HAS_EEPROM (1<<0)
#define AXFLG_MAC_FROMDEV (1<<1) /* device already has MAC */
#define AXFLG_HAS_93CX6 (1<<2) /* use eeprom_93cx6 driver */
@@ -26,6 +27,12 @@ struct ax_plat_data {
u32 *reg_offsets; /* register offsets */
u8 *mac_addr; /* MAC addr (only used when
AXFLG_MAC_FROMPLATFORM is used */
+
+ /* uses default ax88796 buffer if set to NULL */
+ void (*block_output)(struct net_device *dev, int count,
+ const unsigned char *buf, int star_page);
+ void (*block_input)(struct net_device *dev, int count,
+ struct sk_buff *skb, int ring_offset);
};
#endif /* __NET_AX88796_PLAT_H */
--
1.7.0.4
^ permalink raw reply related
* [PATCH v2 1/8] net: ax88796: Fix MAC address reading
From: Michael Schmitz @ 2018-04-17 2:08 UTC (permalink / raw)
To: netdev
Cc: andrew, linux-m68k, Michael.Karcher, Michael Karcher,
Michael Karcher, Michael Schmitz
In-Reply-To: <1523916285-6057-1-git-send-email-schmitzmic@gmail.com>
From: Michael Karcher <debian@mkarcher.dialup.fu-berlin.de>
To read the MAC address from the (virtual) SAprom, the remote DMA
unit needs to be set up like for every other process access to card-local
memory.
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
drivers/net/ethernet/8390/ax88796.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index 2455547..2a256aa 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -671,10 +671,16 @@ static int ax_init_dev(struct net_device *dev)
if (ax->plat->flags & AXFLG_HAS_EEPROM) {
unsigned char SA_prom[32];
+ ei_outb(6, ioaddr + EN0_RCNTLO);
+ ei_outb(0, ioaddr + EN0_RCNTHI);
+ ei_outb(0, ioaddr + EN0_RSARLO);
+ ei_outb(0, ioaddr + EN0_RSARHI);
+ ei_outb(E8390_RREAD + E8390_START, ioaddr + NE_CMD);
for (i = 0; i < sizeof(SA_prom); i += 2) {
SA_prom[i] = ei_inb(ioaddr + NE_DATAPORT);
SA_prom[i + 1] = ei_inb(ioaddr + NE_DATAPORT);
}
+ ei_outb(ENISR_RDC, ioaddr + EN0_ISR); /* Ack intr. */
if (ax->plat->wordlength == 2)
for (i = 0; i < 16; i++)
--
1.7.0.4
^ permalink raw reply related
* [PATCH v2 5/8] net: ax88796: add interrupt status callback to platform data
From: Michael Schmitz @ 2018-04-17 2:08 UTC (permalink / raw)
To: netdev
Cc: andrew, linux-m68k, Michael.Karcher, Michael Schmitz,
Michael Karcher
In-Reply-To: <1523916285-6057-1-git-send-email-schmitzmic@gmail.com>
To be able to tell the ax88796 driver whether it is sensible to enter
the 8390 interrupt handler, an "is this interrupt caused by the 88796"
callback has been added to the ax_plat_data structure (with NULL being
compatible to the previous behaviour).
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
drivers/net/ethernet/8390/ax88796.c | 23 +++++++++++++++++++++--
include/net/ax88796.h | 5 +++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index 29cde38..c799441 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -165,6 +165,21 @@ static void ax_reset_8390(struct net_device *dev)
ei_outb(ENISR_RESET, addr + EN0_ISR); /* Ack intr. */
}
+/* Wrapper for __ei_interrupt for platforms that have a platform-specific
+ * way to find out whether the interrupt request might be caused by
+ * the ax88796 chip.
+ */
+static irqreturn_t ax_ei_interrupt_filtered(int irq, void *dev_id)
+{
+ struct net_device *dev = dev_id;
+ struct ax_device *ax = to_ax_dev(dev);
+ struct platform_device *pdev = to_platform_device(dev->dev.parent);
+
+ if (!ax->plat->check_irq(pdev))
+ return IRQ_NONE;
+
+ return ax_ei_interrupt(irq, dev_id);
+}
static void ax_get_8390_hdr(struct net_device *dev, struct e8390_pkt_hdr *hdr,
int ring_page)
@@ -484,8 +499,12 @@ static int ax_open(struct net_device *dev)
if (ret)
goto failed_mii;
- ret = request_irq(dev->irq, ax_ei_interrupt, ax->irqflags,
- dev->name, dev);
+ if (ax->plat->check_irq)
+ ret = request_irq(dev->irq, ax_ei_interrupt_filtered,
+ ax->irqflags, dev->name, dev);
+ else
+ ret = request_irq(dev->irq, ax_ei_interrupt, ax->irqflags,
+ dev->name, dev);
if (ret)
goto failed_request_irq;
diff --git a/include/net/ax88796.h b/include/net/ax88796.h
index 26cc459..26412cd 100644
--- a/include/net/ax88796.h
+++ b/include/net/ax88796.h
@@ -12,6 +12,7 @@
#define __NET_AX88796_PLAT_H
struct net_device;
+struct platform_device;
#define AXFLG_HAS_EEPROM (1<<0)
#define AXFLG_MAC_FROMDEV (1<<1) /* device already has MAC */
@@ -33,6 +34,10 @@ struct ax_plat_data {
const unsigned char *buf, int star_page);
void (*block_input)(struct net_device *dev, int count,
struct sk_buff *skb, int ring_offset);
+ /* returns nonzero if a pending interrupt request might by caused by
+ * the ax88786. Handles all interrupts if set to NULL
+ */
+ int (*check_irq)(struct platform_device *pdev);
};
#endif /* __NET_AX88796_PLAT_H */
--
1.7.0.4
^ permalink raw reply related
* [PATCH v2 7/8] net: ax88796: release platform device drvdata on probe error and module remove
From: Michael Schmitz @ 2018-04-17 2:08 UTC (permalink / raw)
To: netdev; +Cc: andrew, linux-m68k, Michael.Karcher, Michael Schmitz
In-Reply-To: <1523916285-6057-1-git-send-email-schmitzmic@gmail.com>
The net device struct pointer is stored as platform device drvdata on
module probe - clear the drvdata entry on probe fail there, as well as
when unloading the module.
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
drivers/net/ethernet/8390/ax88796.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index a72dfbc..eb72282 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -829,6 +829,7 @@ static int ax_remove(struct platform_device *pdev)
release_mem_region(mem->start, resource_size(mem));
}
+ platform_set_drvdata(pdev, NULL);
free_netdev(dev);
return 0;
@@ -962,6 +963,7 @@ static int ax_probe(struct platform_device *pdev)
release_mem_region(mem->start, mem_size);
exit_mem:
+ platform_set_drvdata(pdev, NULL);
free_netdev(dev);
return ret;
--
1.7.0.4
^ permalink raw reply related
* [PATCH v2 6/8] net: ax88796: set IRQF_SHARED flag when IRQ resource is marked as shareable
From: Michael Schmitz @ 2018-04-17 2:08 UTC (permalink / raw)
To: netdev
Cc: andrew, linux-m68k, Michael.Karcher, John Paul Adrian Glaubitz,
Michael Karcher, Michael Schmitz
In-Reply-To: <1523916285-6057-1-git-send-email-schmitzmic@gmail.com>
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
On the Amiga X-Surf100, the network card interrupt is shared with many
other interrupt sources, so requires the IRQF_SHARED flag to register.
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
drivers/net/ethernet/8390/ax88796.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index c799441..a72dfbc 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -875,6 +875,9 @@ static int ax_probe(struct platform_device *pdev)
dev->irq = irq->start;
ax->irqflags = irq->flags & IRQF_TRIGGER_MASK;
+ if (irq->flags & IORESOURCE_IRQ_SHAREABLE)
+ ax->irqflags |= IRQF_SHARED;
+
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!mem) {
dev_err(&pdev->dev, "no MEM specified\n");
--
1.7.0.4
^ permalink raw reply related
* [PATCH v2 8/8] net: New ax88796 platform driver for Amiga X-Surf 100 Zorro board (m68k)
From: Michael Schmitz @ 2018-04-17 2:08 UTC (permalink / raw)
To: netdev
Cc: andrew, linux-m68k, Michael.Karcher, Michael Schmitz,
Michael Karcher
In-Reply-To: <1523916285-6057-1-git-send-email-schmitzmic@gmail.com>
Add platform device driver to populate the ax88796 platform data from
information provided by the XSurf100 zorro device driver.
This driver will have to be loaded before loading the ax88796 module,
or compiled as built-in.
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
drivers/net/ethernet/8390/Kconfig | 14 +-
drivers/net/ethernet/8390/Makefile | 1 +
drivers/net/ethernet/8390/xsurf100.c | 411 ++++++++++++++++++++++++++++++++++
3 files changed, 425 insertions(+), 1 deletions(-)
create mode 100644 drivers/net/ethernet/8390/xsurf100.c
diff --git a/drivers/net/ethernet/8390/Kconfig b/drivers/net/ethernet/8390/Kconfig
index fdc6734..0cadd45 100644
--- a/drivers/net/ethernet/8390/Kconfig
+++ b/drivers/net/ethernet/8390/Kconfig
@@ -30,7 +30,7 @@ config PCMCIA_AXNET
config AX88796
tristate "ASIX AX88796 NE2000 clone support"
- depends on (ARM || MIPS || SUPERH)
+ depends on (ARM || MIPS || SUPERH || AMIGA)
select CRC32
select PHYLIB
select MDIO_BITBANG
@@ -45,6 +45,18 @@ config AX88796_93CX6
---help---
Select this if your platform comes with an external 93CX6 eeprom.
+config XSURF100
+ tristate "Amiga XSurf 100 AX88796/NE2000 clone support"
+ depends on ZORRO
+ depends on AX88796
+ ---help---
+ This driver is for the Individual Computers X-Surf 100 Ethernet
+ card (based on the Asix AX88796 chip). If you have such a card,
+ say Y. Otherwise, say N.
+
+ To compile this driver as a module, choose M here: the module
+ will be called xsurf100.
+
config HYDRA
tristate "Hydra support"
depends on ZORRO
diff --git a/drivers/net/ethernet/8390/Makefile b/drivers/net/ethernet/8390/Makefile
index f975c2f..3715f8d 100644
--- a/drivers/net/ethernet/8390/Makefile
+++ b/drivers/net/ethernet/8390/Makefile
@@ -16,4 +16,5 @@ obj-$(CONFIG_PCMCIA_PCNET) += pcnet_cs.o 8390.o
obj-$(CONFIG_STNIC) += stnic.o 8390.o
obj-$(CONFIG_ULTRA) += smc-ultra.o 8390.o
obj-$(CONFIG_WD80x3) += wd.o 8390.o
+obj-$(CONFIG_XSURF100) += xsurf100.o
obj-$(CONFIG_ZORRO8390) += zorro8390.o 8390.o
diff --git a/drivers/net/ethernet/8390/xsurf100.c b/drivers/net/ethernet/8390/xsurf100.c
new file mode 100644
index 0000000..3caece0
--- /dev/null
+++ b/drivers/net/ethernet/8390/xsurf100.c
@@ -0,0 +1,411 @@
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/zorro.h>
+#include <net/ax88796.h>
+#include <asm/amigaints.h>
+
+#define ZORRO_PROD_INDIVIDUAL_COMPUTERS_X_SURF100 \
+ ZORRO_ID(INDIVIDUAL_COMPUTERS, 0x64, 0)
+
+#define XS100_IRQSTATUS_BASE 0x40
+#define XS100_8390_BASE 0x800
+
+/* Longword-access area. Translated to 2 16-bit access cycles by the
+ * X-Surf 100 FPGA
+ */
+#define XS100_8390_DATA32_BASE 0x8000
+#define XS100_8390_DATA32_SIZE 0x2000
+/* Sub-Areas for fast data register access; addresses relative to area begin */
+#define XS100_8390_DATA_READ32_BASE 0x0880
+#define XS100_8390_DATA_WRITE32_BASE 0x0C80
+#define XS100_8390_DATA_AREA_SIZE 0x80
+
+#define __NS8390_init ax_NS8390_init
+
+/* force unsigned long back to 'void __iomem *' */
+#define ax_convert_addr(_a) ((void __force __iomem *)(_a))
+
+#define ei_inb(_a) z_readb(ax_convert_addr(_a))
+#define ei_outb(_v, _a) z_writeb(_v, ax_convert_addr(_a))
+
+#define ei_inw(_a) z_readw(ax_convert_addr(_a))
+#define ei_outw(_v, _a) z_writew(_v, ax_convert_addr(_a))
+
+#define ei_inb_p(_a) ei_inb(_a)
+#define ei_outb_p(_v, _a) ei_outb(_v, _a)
+
+/* define EI_SHIFT() to take into account our register offsets */
+#define EI_SHIFT(x) (ei_local->reg_offset[(x)])
+
+/* Ensure we have our RCR base value */
+#define AX88796_PLATFORM
+
+static unsigned char version[] =
+ "ax88796.c: Copyright 2005,2007 Simtec Electronics\n";
+
+#include "lib8390.c"
+
+/* from ne.c */
+#define NE_CMD EI_SHIFT(0x00)
+#define NE_RESET EI_SHIFT(0x1f)
+#define NE_DATAPORT EI_SHIFT(0x10)
+
+/* Hard reset the card. This used to pause for the same period that a
+ * 8390 reset command required, but that shouldn't be necessary.
+ */
+static void ax_reset_8390(struct net_device *dev)
+{
+ struct ei_device *ei_local = netdev_priv(dev);
+ unsigned long reset_start_time = jiffies;
+ void __iomem *addr = (void __iomem *)dev->base_addr;
+
+ netif_dbg(ei_local, hw, dev, "resetting the 8390 t=%ld...\n", jiffies);
+
+ ei_outb(ei_inb(addr + NE_RESET), addr + NE_RESET);
+
+ ei_local->txing = 0;
+ ei_local->dmaing = 0;
+
+ /* This check _should_not_ be necessary, omit eventually. */
+ while ((ei_inb(addr + EN0_ISR) & ENISR_RESET) == 0) {
+ if (time_after(jiffies, reset_start_time + 2 * HZ / 100)) {
+ netdev_warn(dev, "%s: did not complete.\n", __func__);
+ break;
+ }
+ }
+
+ ei_outb(ENISR_RESET, addr + EN0_ISR); /* Ack intr. */
+}
+
+struct xsurf100_ax_plat_data {
+ struct ax_plat_data ax;
+ void __iomem *base_regs;
+ void __iomem *data_area;
+};
+
+static int is_xsurf100_network_irq(struct platform_device *pdev)
+{
+ struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+ return (readw(xs100->base_regs + XS100_IRQSTATUS_BASE) & 0xaaaa) != 0;
+}
+
+/* These functions guarantee that the iomem is accessed with 32 bit
+ * cycles only. z_memcpy_fromio / z_memcpy_toio don't
+ */
+static void z_memcpy_fromio32(void *dst, const void __iomem *src, size_t bytes)
+{
+ while (bytes > 32) {
+ asm __volatile__
+ ("movem.l (%0)+,%%d0-%%d7\n"
+ "movem.l %%d0-%%d7,(%1)\n"
+ "adda.l #32,%1" : "=a"(src), "=a"(dst)
+ : "0"(src), "1"(dst) : "d0", "d1", "d2", "d3", "d4",
+ "d5", "d6", "d7", "memory");
+ bytes -= 32;
+ }
+ while (bytes) {
+ *(uint32_t *)dst = z_readl(src);
+ src += 4;
+ dst += 4;
+ bytes -= 4;
+ }
+}
+
+static void z_memcpy_toio32(void __iomem *dst, const void *src, size_t bytes)
+{
+ while (bytes) {
+ z_writel(*(const uint32_t *)src, dst);
+ src += 4;
+ dst += 4;
+ bytes -= 4;
+ }
+}
+
+static void xs100_write(struct net_device *dev, const void *src,
+ unsigned int count)
+{
+ struct ei_device *ei_local = netdev_priv(dev);
+ struct platform_device *pdev = to_platform_device(dev->dev.parent);
+ struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+ /* copy whole blocks */
+ while (count > XS100_8390_DATA_AREA_SIZE) {
+ z_memcpy_toio32(xs100->data_area +
+ XS100_8390_DATA_WRITE32_BASE, src,
+ XS100_8390_DATA_AREA_SIZE);
+ src += XS100_8390_DATA_AREA_SIZE;
+ count -= XS100_8390_DATA_AREA_SIZE;
+ }
+ /* copy whole dwords */
+ z_memcpy_toio32(xs100->data_area + XS100_8390_DATA_WRITE32_BASE,
+ src, count & ~3);
+ src += count & ~3;
+ if (count & 2) {
+ ei_outw(*(uint16_t *)src, ei_local->mem + NE_DATAPORT);
+ src += 2;
+ }
+ if (count & 1)
+ ei_outb(*(uint8_t *)src, ei_local->mem + NE_DATAPORT);
+}
+
+static void xs100_read(struct net_device *dev, void *dst, unsigned int count)
+{
+ struct ei_device *ei_local = netdev_priv(dev);
+ struct platform_device *pdev = to_platform_device(dev->dev.parent);
+ struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+ /* copy whole blocks */
+ while (count > XS100_8390_DATA_AREA_SIZE) {
+ z_memcpy_fromio32(dst, xs100->data_area +
+ XS100_8390_DATA_READ32_BASE,
+ XS100_8390_DATA_AREA_SIZE);
+ dst += XS100_8390_DATA_AREA_SIZE;
+ count -= XS100_8390_DATA_AREA_SIZE;
+ }
+ /* copy whole dwords */
+ z_memcpy_fromio32(dst, xs100->data_area + XS100_8390_DATA_READ32_BASE,
+ count & ~3);
+ dst += count & ~3;
+ if (count & 2) {
+ *(uint16_t *)dst = ei_inw(ei_local->mem + NE_DATAPORT);
+ dst += 2;
+ }
+ if (count & 1)
+ *(uint8_t *)dst = ei_inb(ei_local->mem + NE_DATAPORT);
+}
+
+/* Block input and output, similar to the Crynwr packet driver. If
+ * you are porting to a new ethercard, look at the packet driver
+ * source for hints. The NEx000 doesn't share the on-board packet
+ * memory -- you have to put the packet out through the "remote DMA"
+ * dataport using ei_outb.
+ */
+static void xs100_block_input(struct net_device *dev, int count,
+ struct sk_buff *skb, int ring_offset)
+{
+ struct ei_device *ei_local = netdev_priv(dev);
+ void __iomem *nic_base = ei_local->mem;
+ char *buf = skb->data;
+
+ if (ei_local->dmaing) {
+ netdev_err(dev,
+ "DMAing conflict in %s "
+ "[DMAstat:%d][irqlock:%d].\n",
+ __func__,
+ ei_local->dmaing, ei_local->irqlock);
+ return;
+ }
+
+ ei_local->dmaing |= 0x01;
+
+ ei_outb(E8390_NODMA + E8390_PAGE0 + E8390_START, nic_base + NE_CMD);
+ ei_outb(count & 0xff, nic_base + EN0_RCNTLO);
+ ei_outb(count >> 8, nic_base + EN0_RCNTHI);
+ ei_outb(ring_offset & 0xff, nic_base + EN0_RSARLO);
+ ei_outb(ring_offset >> 8, nic_base + EN0_RSARHI);
+ ei_outb(E8390_RREAD + E8390_START, nic_base + NE_CMD);
+
+ xs100_read(dev, buf, count);
+
+ ei_local->dmaing &= ~1;
+}
+
+static void xs100_block_output(struct net_device *dev, int count,
+ const unsigned char *buf, const int start_page)
+{
+ struct ei_device *ei_local = netdev_priv(dev);
+ void __iomem *nic_base = ei_local->mem;
+ unsigned long dma_start;
+
+ /* Round the count up for word writes. Do we need to do this?
+ * What effect will an odd byte count have on the 8390? I
+ * should check someday.
+ */
+ if (ei_local->word16 && (count & 0x01))
+ count++;
+
+ /* This *shouldn't* happen. If it does, it's the last thing
+ * you'll see
+ */
+ if (ei_local->dmaing) {
+ netdev_err(dev, "DMAing conflict in %s "
+ "[DMAstat:%d][irqlock:%d]\n",
+ __func__,
+ ei_local->dmaing, ei_local->irqlock);
+ return;
+ }
+
+ ei_local->dmaing |= 0x01;
+ /* We should already be in page 0, but to be safe... */
+ ei_outb(E8390_PAGE0 + E8390_START + E8390_NODMA, nic_base + NE_CMD);
+
+ ei_outb(ENISR_RDC, nic_base + EN0_ISR);
+
+ /* Now the normal output. */
+ ei_outb(count & 0xff, nic_base + EN0_RCNTLO);
+ ei_outb(count >> 8, nic_base + EN0_RCNTHI);
+ ei_outb(0x00, nic_base + EN0_RSARLO);
+ ei_outb(start_page, nic_base + EN0_RSARHI);
+
+ ei_outb(E8390_RWRITE + E8390_START, nic_base + NE_CMD);
+
+ xs100_write(dev, buf, count);
+
+ dma_start = jiffies;
+
+ while ((ei_inb(nic_base + EN0_ISR) & ENISR_RDC) == 0) {
+ if (jiffies - dma_start > 2 * HZ / 100) { /* 20ms */
+ netdev_warn(dev, "timeout waiting for Tx RDC.\n");
+ ax_reset_8390(dev);
+ ax_NS8390_init(dev, 1);
+ break;
+ }
+ }
+
+ ei_outb(ENISR_RDC, nic_base + EN0_ISR); /* Ack intr. */
+ ei_local->dmaing &= ~0x01;
+}
+
+static int xsurf100_probe(struct zorro_dev *zdev,
+ const struct zorro_device_id *ent)
+{
+ struct platform_device *pdev;
+ struct xsurf100_ax_plat_data ax88796_data;
+ struct resource res[2] = {
+ DEFINE_RES_NAMED(IRQ_AMIGA_PORTS, 1, NULL,
+ IORESOURCE_IRQ | IORESOURCE_IRQ_SHAREABLE),
+ DEFINE_RES_MEM(zdev->resource.start + XS100_8390_BASE,
+ 4 * 0x20)
+ };
+ int reg;
+ /* This table is referenced in the device structure, so it must
+ * outlive the scope of xsurf100_probe.
+ */
+ static u32 reg_offsets[32];
+ int ret = 0;
+
+ /* X-Surf 100 control and 32 bit ring buffer data access areas.
+ * These resources are not used by the ax88796 driver, so must
+ * be requested here and passed via platform data.
+ */
+
+ if (!request_mem_region(zdev->resource.start, 0x100, zdev->name)) {
+ dev_err(&zdev->dev, "cannot reserve X-Surf 100 control registers\n");
+ return -ENXIO;
+ }
+
+ if (!request_mem_region(zdev->resource.start +
+ XS100_8390_DATA32_BASE,
+ XS100_8390_DATA32_SIZE,
+ "X-Surf 100 32-bit data access")) {
+ dev_err(&zdev->dev, "cannot reserve 32-bit area\n");
+ ret = -ENXIO;
+ goto exit_req;
+ }
+
+ for (reg = 0; reg < 0x20; reg++)
+ reg_offsets[reg] = 4 * reg;
+
+ memset(&ax88796_data, 0, sizeof(ax88796_data));
+ ax88796_data.ax.flags = AXFLG_HAS_EEPROM;
+ ax88796_data.ax.wordlength = 2;
+ ax88796_data.ax.dcr_val = 0x48;
+ ax88796_data.ax.rcr_val = 0x40;
+ ax88796_data.ax.reg_offsets = reg_offsets;
+ ax88796_data.ax.check_irq = is_xsurf100_network_irq;
+ ax88796_data.base_regs = ioremap(zdev->resource.start, 0x100);
+
+ /* error handling for ioremap regs */
+ if (!ax88796_data.base_regs) {
+ dev_err(&zdev->dev, "Cannot ioremap area %p (registers)\n",
+ (void *)zdev->resource.start);
+
+ ret = -ENXIO;
+ goto exit_req2;
+ }
+
+ ax88796_data.data_area = ioremap(zdev->resource.start +
+ XS100_8390_DATA32_BASE, XS100_8390_DATA32_SIZE);
+
+ /* error handling for ioremap data */
+ if (!ax88796_data.data_area) {
+ dev_err(&zdev->dev, "Cannot ioremap area %p (32-bit access)\n",
+ (void *)zdev->resource.start + XS100_8390_DATA32_BASE);
+
+ ret = -ENXIO;
+ goto exit_mem;
+ }
+
+ ax88796_data.ax.block_output = xs100_block_output;
+ ax88796_data.ax.block_input = xs100_block_input;
+
+ pdev = platform_device_register_resndata(&zdev->dev, "ax88796",
+ zdev->slotaddr, res, 2,
+ &ax88796_data,
+ sizeof(ax88796_data));
+
+ if (IS_ERR(pdev)) {
+ dev_err(&zdev->dev, "cannot register platform device\n");
+ ret = -ENXIO;
+ goto exit_mem2;
+ }
+
+ zorro_set_drvdata(zdev, pdev);
+
+ if (!ret)
+ return 0;
+
+ exit_mem2:
+ iounmap(ax88796_data.data_area);
+
+ exit_mem:
+ iounmap(ax88796_data.base_regs);
+
+ exit_req2:
+ release_mem_region(zdev->resource.start + XS100_8390_DATA32_BASE,
+ XS100_8390_DATA32_SIZE);
+
+ exit_req:
+ release_mem_region(zdev->resource.start, 0x100);
+
+ return ret;
+}
+
+static void xsurf100_remove(struct zorro_dev *zdev)
+{
+ struct platform_device *pdev;
+ struct xsurf100_ax_plat_data *xs100;
+
+ pdev = zorro_get_drvdata(zdev);
+ xs100 = dev_get_platdata(&pdev->dev);
+
+ platform_device_unregister(pdev);
+
+ iounmap(xs100->base_regs);
+ release_mem_region(zdev->resource.start, 0x100);
+ iounmap(xs100->data_area);
+ release_mem_region(zdev->resource.start + XS100_8390_DATA32_BASE,
+ XS100_8390_DATA32_SIZE);
+}
+
+static const struct zorro_device_id xsurf100_zorro_tbl[] = {
+ { ZORRO_PROD_INDIVIDUAL_COMPUTERS_X_SURF100, },
+ { 0 }
+};
+
+MODULE_DEVICE_TABLE(zorro, xsurf100_zorro_tbl);
+
+static struct zorro_driver xsurf100_driver = {
+ .name = "xsurf100",
+ .id_table = xsurf100_zorro_tbl,
+ .probe = xsurf100_probe,
+ .remove = xsurf100_remove,
+};
+
+module_driver(xsurf100_driver, zorro_register_driver, zorro_unregister_driver);
+
+MODULE_DESCRIPTION("X-Surf 100 driver");
+MODULE_AUTHOR("Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>");
+MODULE_LICENSE("GPL v2");
--
1.7.0.4
^ permalink raw reply related
* Re: SRIOV switchdev mode BoF minutes
From: Samudrala, Sridhar @ 2018-04-17 2:08 UTC (permalink / raw)
To: Andy Gospodarek, Or Gerlitz
Cc: David Miller, Anjali Singhai Jain, Michael Chan, Simon Horman,
Jakub Kicinski, John Fastabend, Saeed Mahameed, Jiri Pirko,
Rony Efraim, Linux Netdev List
In-Reply-To: <20180416123936.GH33938@C02RW35GFVH8.dhcp.broadcom.net>
On 4/16/2018 5:39 AM, Andy Gospodarek wrote:
> On Sun, Apr 15, 2018 at 09:01:16AM +0300, Or Gerlitz wrote:
>> On Sat, Apr 14, 2018 at 2:03 AM, Samudrala, Sridhar
>> <sridhar.samudrala@intel.com> wrote:
>>
>>> I meant between PFs on 2 compute nodes.
>> If the PF serves as uplink rep, it functions as a switch port -- applications
>> don't run on switch ports. One way to get apps to run on the host in switchdev
>> mode is probe one of the VFs there.
>>
>>
>>
So once a pci device is configured in 'switchdev' mode, only port representor netdevs are
seen on the host, no more PF netdev.
Are you going to expose another way to change sriov_num_vfs when the device is in
'switchdev' mode OR do we need to switch to 'legacy' mode to increase/decrease the number of
VFs?
Even in switchdev mode, i guess it will be possible for host apps to use the IP configured
on the uplink rep to talk externally.
In case of multiple uplinks, are you exposing one uplink-rep netdev per uplink?
^ permalink raw reply
* Re: [RFC v2] virtio: support packed ring
From: Jason Wang @ 2018-04-17 2:11 UTC (permalink / raw)
To: Tiwei Bie; +Cc: mst, wexu, virtualization, linux-kernel, netdev, jfreimann
In-Reply-To: <20180413071529.f4esh654dakodf4f@debian>
On 2018年04月13日 15:15, Tiwei Bie wrote:
> On Fri, Apr 13, 2018 at 12:30:24PM +0800, Jason Wang wrote:
>> On 2018年04月01日 22:12, Tiwei Bie wrote:
>>> Hello everyone,
>>>
>>> This RFC implements packed ring support for virtio driver.
>>>
>>> The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
>>> by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
>>> Minor changes are needed for the vhost code, e.g. to kick the guest.
>>>
>>> TODO:
>>> - Refinements and bug fixes;
>>> - Split into small patches;
>>> - Test indirect descriptor support;
>>> - Test/fix event suppression support;
>>> - Test devices other than net;
>>>
>>> RFC v1 -> RFC v2:
>>> - Add indirect descriptor support - compile test only;
>>> - Add event suppression supprt - compile test only;
>>> - Move vring_packed_init() out of uapi (Jason, MST);
>>> - Merge two loops into one in virtqueue_add_packed() (Jason);
>>> - Split vring_unmap_one() for packed ring and split ring (Jason);
>>> - Avoid using '%' operator (Jason);
>>> - Rename free_head -> next_avail_idx (Jason);
>>> - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
>>> - Some other refinements and bug fixes;
>>>
>>> Thanks!
>>>
>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>> drivers/virtio/virtio_ring.c | 1094 +++++++++++++++++++++++++++++-------
>>> include/linux/virtio_ring.h | 8 +-
>>> include/uapi/linux/virtio_config.h | 12 +-
>>> include/uapi/linux/virtio_ring.h | 61 ++
>>> 4 files changed, 980 insertions(+), 195 deletions(-)
> [...]
>>> +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
>>> + unsigned int total_sg,
>>> + gfp_t gfp)
>>> +{
>>> + struct vring_packed_desc *desc;
>>> +
>>> + /*
>>> + * We require lowmem mappings for the descriptors because
>>> + * otherwise virt_to_phys will give us bogus addresses in the
>>> + * virtqueue.
>>> + */
>>> + gfp &= ~__GFP_HIGHMEM;
>>> +
>>> + desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
>> Can we simply check vq->packed here to avoid duplicating helpers?
> Then it would be something like this:
>
> static void *alloc_indirect(struct virtqueue *_vq, unsigned int total_sg,
> gfp_t gfp)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> void *data;
>
> /*
> * We require lowmem mappings for the descriptors because
> * otherwise virt_to_phys will give us bogus addresses in the
> * virtqueue.
> */
> gfp &= ~__GFP_HIGHMEM;
>
> if (vq->packed) {
> data = kmalloc(total_sg * sizeof(struct vring_packed_desc),
> gfp);
> if (!data)
> return NULL;
> } else {
> struct vring_desc *desc;
> unsigned int i;
>
> desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
> if (!desc)
> return NULL;
>
> for (i = 0; i < total_sg; i++)
> desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
>
> data = desc;
> }
>
> return data;
> }
>
> I would prefer to have two simpler helpers (and to the callers,
> it's already very clear about which one they should call), i.e.
> the current implementation:
>
> static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> unsigned int total_sg,
> gfp_t gfp)
> {
> struct vring_packed_desc *desc;
>
> /*
> * We require lowmem mappings for the descriptors because
> * otherwise virt_to_phys will give us bogus addresses in the
> * virtqueue.
> */
> gfp &= ~__GFP_HIGHMEM;
>
> desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
>
> return desc;
> }
>
> static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> unsigned int total_sg,
> gfp_t gfp)
> {
> struct vring_desc *desc;
> unsigned int i;
>
> /*
> * We require lowmem mappings for the descriptors because
> * otherwise virt_to_phys will give us bogus addresses in the
> * virtqueue.
> */
> gfp &= ~__GFP_HIGHMEM;
>
> desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
> if (!desc)
> return NULL;
>
> for (i = 0; i < total_sg; i++)
> desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> return desc;
> }
Yeah, I miss that split version needs a desc list.
>
>>> +
>>> + return desc;
>>> +}
> [...]
>>> +static inline int virtqueue_add_packed(struct virtqueue *_vq,
>>> + struct scatterlist *sgs[],
>>> + unsigned int total_sg,
>>> + unsigned int out_sgs,
>>> + unsigned int in_sgs,
>>> + void *data,
>>> + void *ctx,
>>> + gfp_t gfp)
>>> +{
>>> + struct vring_virtqueue *vq = to_vvq(_vq);
>>> + struct vring_packed_desc *desc;
>>> + struct scatterlist *sg;
>>> + unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
>>> + __virtio16 uninitialized_var(head_flags), flags;
>>> + int head, wrap_counter;
>>> + bool indirect;
>>> +
>>> + START_USE(vq);
>>> +
>>> + BUG_ON(data == NULL);
>>> + BUG_ON(ctx && vq->indirect);
>>> +
>>> + if (unlikely(vq->broken)) {
>>> + END_USE(vq);
>>> + return -EIO;
>>> + }
>>> +
>>> +#ifdef DEBUG
>>> + {
>>> + ktime_t now = ktime_get();
>>> +
>>> + /* No kick or get, with .1 second between? Warn. */
>>> + if (vq->last_add_time_valid)
>>> + WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
>>> + > 100);
>>> + vq->last_add_time = now;
>>> + vq->last_add_time_valid = true;
>>> + }
>>> +#endif
>>> +
>>> + BUG_ON(total_sg == 0);
>>> +
>>> + head = vq->next_avail_idx;
>>> + wrap_counter = vq->wrap_counter;
>>> +
>>> + /* If the host supports indirect descriptor tables, and we have multiple
>>> + * buffers, then go indirect. FIXME: tune this threshold */
>>> + if (vq->indirect && total_sg > 1 && vq->vq.num_free)
>> Let's introduce a helper like virtqueue_need_indirect() to avoid duplicating
>> codes and FIXME.
> Okay.
>
>>> + desc = alloc_indirect_packed(_vq, total_sg, gfp);
>>> + else {
>>> + desc = NULL;
>>> + WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
>>> + }
>>> +
>>> + if (desc) {
>>> + /* Use a single buffer which doesn't continue */
>>> + indirect = true;
>>> + /* Set up rest to use this indirect table. */
>>> + i = 0;
>>> + descs_used = 1;
>>> + } else {
>>> + indirect = false;
>>> + desc = vq->vring_packed.desc;
>>> + i = head;
>>> + descs_used = total_sg;
>>> + }
>>> +
>>> + if (vq->vq.num_free < descs_used) {
>>> + pr_debug("Can't add buf len %i - avail = %i\n",
>>> + descs_used, vq->vq.num_free);
>>> + /* FIXME: for historical reasons, we force a notify here if
>>> + * there are outgoing parts to the buffer. Presumably the
>>> + * host should service the ring ASAP. */
>>> + if (out_sgs)
>>> + vq->notify(&vq->vq);
>>> + if (indirect)
>>> + kfree(desc);
>>> + END_USE(vq);
>>> + return -ENOSPC;
>>> + }
>>> +
>>> + for (n = 0; n < out_sgs + in_sgs; n++) {
>>> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>> + dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
>>> + DMA_TO_DEVICE : DMA_FROM_DEVICE);
>>> + if (vring_mapping_error(vq, addr))
>>> + goto unmap_release;
>>> +
>>> + flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
>>> + (n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
>>> + VRING_DESC_F_AVAIL(vq->wrap_counter) |
>>> + VRING_DESC_F_USED(!vq->wrap_counter));
>>> + if (!indirect && i == head)
>>> + head_flags = flags;
>>> + else
>>> + desc[i].flags = flags;
>>> +
>>> + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>> + desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>> + desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>> Similar to V1, we only need this for the last descriptor.
> Okay, will just set it for the last desc.
>
>>> + prev = i;
>> It looks to me there's no need to track prev inside the loop here.
>>
>>> + i++;
>>> + if (!indirect && i >= vq->vring_packed.num) {
>>> + i = 0;
>>> + vq->wrap_counter ^= 1;
>>> + }
>>> + }
>>> + }
>>> + /* Last one doesn't continue. */
>>> + if (total_sg == 1)
>>> + head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>> + else
>>> + desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>> The only case when prev != i - 1 is i == 0, we can add a if here.
> It's just a mirror of the existing implementation in split ring.
> It seems that split ring implementation needs this just because
> it's much harder for it to find the prev, which is not true for
> packed ring. So I'll take your suggestion. Thanks!
>
> [...]
>>> +static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
>>> +{
>>> + struct vring_virtqueue *vq = to_vvq(_vq);
>>> + u16 new, old, off_wrap;
>>> + bool needs_kick;
>>> +
>>> + START_USE(vq);
>>> + /* We need to expose the new flags value before checking notification
>>> + * suppressions. */
>>> + virtio_mb(vq->weak_barriers);
>>> +
>>> + old = vq->next_avail_idx - vq->num_added;
>>> + new = vq->next_avail_idx;
>>> + vq->num_added = 0;
>>> +
>>> +#ifdef DEBUG
>>> + if (vq->last_add_time_valid) {
>>> + WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
>>> + vq->last_add_time)) > 100);
>>> + }
>>> + vq->last_add_time_valid = false;
>>> +#endif
>>> +
>>> + off_wrap = virtio16_to_cpu(_vq->vdev, vq->vring_packed.device->off_wrap);
>>> +
>>> + if (vq->event) {
>> It looks to me we should examine RING_EVENT_FLAGS_DESC in desc_event_flags
>> instead of vq->event here. Spec does not forces to use evenf_off and
>> event_wrap if event index is enabled.
>>
>>> + // FIXME: fix this!
>>> + needs_kick = ((off_wrap >> 15) == vq->wrap_counter) &&
>>> + vring_need_event(off_wrap & ~(1<<15), new, old);
>> Why need a & here?
> Because wrap_counter (the most significant bit in off_wrap)
> isn't part of the index.
>
>>> + } else {
>> Need a smp_rmb() to make sure desc_event_flags was checked before flags.
> I don't get your point, if my understanding is correct,
> desc_event_flags is vq->vring_packed.device->flags. So
> what's the "flags"?
Sorry, I mean we need check device.flags before off_warp. So it needs an
smp_rmb() in the middle. It looks to me there's no guarantee that
VRING_EVENT_F_DESC is set if event index is supported.
>
>>> + needs_kick = (vq->vring_packed.device->flags !=
>>> + cpu_to_virtio16(_vq->vdev, VRING_EVENT_F_DISABLE));
>>> + }
>>> + END_USE(vq);
>>> + return needs_kick;
>>> +}
> [...]
>>> +static int detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
>>> + void **ctx)
>>> +{
>>> + struct vring_packed_desc *desc;
>>> + unsigned int i, j;
>>> +
>>> + /* Clear data ptr. */
>>> + vq->desc_state[head].data = NULL;
>>> +
>>> + i = head;
>>> +
>>> + for (j = 0; j < vq->desc_state[head].num; j++) {
>>> + desc = &vq->vring_packed.desc[i];
>>> + vring_unmap_one_packed(vq, desc);
>>> + desc->flags = 0x0;
>> Looks like this is unnecessary.
> It's safer to zero it. If we don't zero it, after we
> call virtqueue_detach_unused_buf_packed() which calls
> this function, the desc is still available to the
> device.
Well detach_unused_buf_packed() should be called after device is
stopped, otherwise even if you try to clear, there will still be a
window that device may use it.
>
>>> + i++;
>>> + if (i >= vq->vring_packed.num)
>>> + i = 0;
>>> + }
> [...]
>>> +static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
>>> +{
>>> + struct vring_virtqueue *vq = to_vvq(_vq);
>>> + u16 last_used_idx, wrap_counter, off_wrap;
>>> +
>>> + START_USE(vq);
>>> +
>>> + last_used_idx = vq->last_used_idx;
>>> + wrap_counter = vq->wrap_counter;
>>> +
>>> + if (last_used_idx > vq->next_avail_idx)
>>> + wrap_counter ^= 1;
>>> +
>>> + off_wrap = last_used_idx | (wrap_counter << 15);
>>> +
>>> + /* We optimistically turn back on interrupts, then check if there was
>>> + * more to do. */
>>> + /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>>> + * either clear the flags bit or point the event index at the next
>>> + * entry. Always do both to keep code simple. */
>>> + if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
>>> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC:
>>> + VRING_EVENT_F_ENABLE;
>>> + vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
>>> + vq->event_flags_shadow);
>>> + }
>> A smp_wmb() is missed here?
>>
>>> + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev, off_wrap);
>> And according to the spec, it looks to me write a VRING_EVENT_F_ENABLE is
>> sufficient here.
> I didn't think much when implementing the event suppression
> for packed ring previously. After I saw your comments, I found
> something new. Indeed, unlike the split ring, for the packed
> ring, spec doesn't say we must use VRING_EVENT_F_DESC when
> EVENT_IDX is negotiated. So do you think below thought is
> right or makes sense?
>
> - For virtqueue_enable_cb_prepare(), we just need to enable
> the ring by setting flags to VRING_EVENT_F_ENABLE in any
> case.
>
> - We will try to use VRING_EVENT_F_DESC (if EVENT_IDX is
> negotiated) only when we want to delay the interrupts
> virtqueue_enable_cb_delayed().
This looks good to me.
>
>>> + END_USE(vq);
>>> + return last_used_idx;
>>> +}
>>> +
> [...]
>>> @@ -1157,14 +1852,18 @@ void vring_transport_features(struct virtio_device *vdev)
>>> for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
>>> switch (i) {
>>> - case VIRTIO_RING_F_INDIRECT_DESC:
>>> +#if 0
>>> + case VIRTIO_RING_F_INDIRECT_DESC: // FIXME not tested yet.
>>> break;
>>> - case VIRTIO_RING_F_EVENT_IDX:
>>> + case VIRTIO_RING_F_EVENT_IDX: // FIXME probably not work.
>>> break;
>>> +#endif
>> It would be better if you can split EVENT_IDX and INDIRECT_DESC into
>> separate patches too.
> Sure. Will do it in the next version.
>
> Thanks for the review!
Thanks.
>> Thanks
>>
^ permalink raw reply
* Re: [PATCH net,stable] tun: fix vlan packet truncation
From: Jason Wang @ 2018-04-17 2:13 UTC (permalink / raw)
To: Bjørn Mork, netdev
In-Reply-To: <20180416220038.21743-1-bjorn@mork.no>
On 2018年04月17日 06:00, Bjørn Mork wrote:
> Bogus trimming in tun_net_xmit() causes truncated vlan packets.
>
> skb->len is correct whether or not skb_vlan_tag_present() is true. There
> is no more reason to adjust the skb length on xmit in this driver than
> any other driver. tun_put_user() adds 4 bytes to the total for tagged
> packets because it transmits the tag inline to userspace. This is
> similar to a nic transmitting the tag inline on the wire.
>
> Reproducing the bug by sending any tagged packet through back-to-back
> connected tap interfaces:
>
> socat TUN,tun-type=tap,iff-up,tun-name=in TUN,tun-type=tap,iff-up,tun-name=out &
> ip link add link in name in.20 type vlan id 20
> ip addr add 10.9.9.9/24 dev in.20
> ip link set in.20 up
> tshark -nxxi in -f arp -c1 2>/dev/null &
> tshark -nxxi out -f arp -c1 2>/dev/null &
> ping -c 1 10.9.9.5 >/dev/null 2>&1
>
> The output from the 'in' and 'out' interfaces are different when the
> bug is present:
>
> Capturing on 'in'
> 0000 ff ff ff ff ff ff 76 cf 76 37 d5 0a 81 00 00 14 ......v.v7......
> 0010 08 06 00 01 08 00 06 04 00 01 76 cf 76 37 d5 0a ..........v.v7..
> 0020 0a 09 09 09 00 00 00 00 00 00 0a 09 09 05 ..............
>
> Capturing on 'out'
> 0000 ff ff ff ff ff ff 76 cf 76 37 d5 0a 81 00 00 14 ......v.v7......
> 0010 08 06 00 01 08 00 06 04 00 01 76 cf 76 37 d5 0a ..........v.v7..
> 0020 0a 09 09 09 00 00 00 00 00 00 ..........
>
> Fixes: aff3d70a07ff ("tun: allow to attach ebpf socket filter")
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> ---
> drivers/net/tun.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 28583aa0c17d..01cf8e3d8edc 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1103,13 +1103,6 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>
> len = run_ebpf_filter(tun, skb, len);
>
> - /* Trim extra bytes since we may insert vlan proto & TCI
> - * in tun_put_user().
> - */
> - len -= skb_vlan_tag_present(skb) ? sizeof(struct veth) : 0;
> - if (len <= 0 || pskb_trim(skb, len))
> - goto drop;
> -
> if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
> goto drop;
>
Good catch, but I still think we should do the truncation in
run_ebpf_filter to match the behavior socket ebpf filter.
Thanks
^ permalink raw reply
* Re: [RFC v2] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-04-17 2:17 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <8dee7d62-ac0b-54ba-6bec-4bc4a6fb34e9@redhat.com>
On Tue, Apr 17, 2018 at 10:11:58AM +0800, Jason Wang wrote:
>
>
> On 2018年04月13日 15:15, Tiwei Bie wrote:
> > On Fri, Apr 13, 2018 at 12:30:24PM +0800, Jason Wang wrote:
> > > On 2018年04月01日 22:12, Tiwei Bie wrote:
> > > > Hello everyone,
> > > >
> > > > This RFC implements packed ring support for virtio driver.
> > > >
> > > > The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
> > > > by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
> > > > Minor changes are needed for the vhost code, e.g. to kick the guest.
> > > >
> > > > TODO:
> > > > - Refinements and bug fixes;
> > > > - Split into small patches;
> > > > - Test indirect descriptor support;
> > > > - Test/fix event suppression support;
> > > > - Test devices other than net;
> > > >
> > > > RFC v1 -> RFC v2:
> > > > - Add indirect descriptor support - compile test only;
> > > > - Add event suppression supprt - compile test only;
> > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > - Avoid using '%' operator (Jason);
> > > > - Rename free_head -> next_avail_idx (Jason);
> > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > - Some other refinements and bug fixes;
> > > >
> > > > Thanks!
> > > >
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > ---
> > > > drivers/virtio/virtio_ring.c | 1094 +++++++++++++++++++++++++++++-------
> > > > include/linux/virtio_ring.h | 8 +-
> > > > include/uapi/linux/virtio_config.h | 12 +-
> > > > include/uapi/linux/virtio_ring.h | 61 ++
> > > > 4 files changed, 980 insertions(+), 195 deletions(-)
> > [...]
> > > > +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> > > > + unsigned int total_sg,
> > > > + gfp_t gfp)
> > > > +{
> > > > + struct vring_packed_desc *desc;
> > > > +
> > > > + /*
> > > > + * We require lowmem mappings for the descriptors because
> > > > + * otherwise virt_to_phys will give us bogus addresses in the
> > > > + * virtqueue.
> > > > + */
> > > > + gfp &= ~__GFP_HIGHMEM;
> > > > +
> > > > + desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
> > > Can we simply check vq->packed here to avoid duplicating helpers?
> > Then it would be something like this:
> >
> > static void *alloc_indirect(struct virtqueue *_vq, unsigned int total_sg,
> > gfp_t gfp)
> > {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> > void *data;
> >
> > /*
> > * We require lowmem mappings for the descriptors because
> > * otherwise virt_to_phys will give us bogus addresses in the
> > * virtqueue.
> > */
> > gfp &= ~__GFP_HIGHMEM;
> >
> > if (vq->packed) {
> > data = kmalloc(total_sg * sizeof(struct vring_packed_desc),
> > gfp);
> > if (!data)
> > return NULL;
> > } else {
> > struct vring_desc *desc;
> > unsigned int i;
> >
> > desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
> > if (!desc)
> > return NULL;
> >
> > for (i = 0; i < total_sg; i++)
> > desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> >
> > data = desc;
> > }
> >
> > return data;
> > }
> >
> > I would prefer to have two simpler helpers (and to the callers,
> > it's already very clear about which one they should call), i.e.
> > the current implementation:
> >
> > static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> > unsigned int total_sg,
> > gfp_t gfp)
> > {
> > struct vring_packed_desc *desc;
> >
> > /*
> > * We require lowmem mappings for the descriptors because
> > * otherwise virt_to_phys will give us bogus addresses in the
> > * virtqueue.
> > */
> > gfp &= ~__GFP_HIGHMEM;
> >
> > desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
> >
> > return desc;
> > }
> >
> > static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > unsigned int total_sg,
> > gfp_t gfp)
> > {
> > struct vring_desc *desc;
> > unsigned int i;
> >
> > /*
> > * We require lowmem mappings for the descriptors because
> > * otherwise virt_to_phys will give us bogus addresses in the
> > * virtqueue.
> > */
> > gfp &= ~__GFP_HIGHMEM;
> >
> > desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
> > if (!desc)
> > return NULL;
> >
> > for (i = 0; i < total_sg; i++)
> > desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> > return desc;
> > }
>
> Yeah, I miss that split version needs a desc list.
>
> >
> > > > +
> > > > + return desc;
> > > > +}
> > [...]
> > > > +static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > > + struct scatterlist *sgs[],
> > > > + unsigned int total_sg,
> > > > + unsigned int out_sgs,
> > > > + unsigned int in_sgs,
> > > > + void *data,
> > > > + void *ctx,
> > > > + gfp_t gfp)
> > > > +{
> > > > + struct vring_virtqueue *vq = to_vvq(_vq);
> > > > + struct vring_packed_desc *desc;
> > > > + struct scatterlist *sg;
> > > > + unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> > > > + __virtio16 uninitialized_var(head_flags), flags;
> > > > + int head, wrap_counter;
> > > > + bool indirect;
> > > > +
> > > > + START_USE(vq);
> > > > +
> > > > + BUG_ON(data == NULL);
> > > > + BUG_ON(ctx && vq->indirect);
> > > > +
> > > > + if (unlikely(vq->broken)) {
> > > > + END_USE(vq);
> > > > + return -EIO;
> > > > + }
> > > > +
> > > > +#ifdef DEBUG
> > > > + {
> > > > + ktime_t now = ktime_get();
> > > > +
> > > > + /* No kick or get, with .1 second between? Warn. */
> > > > + if (vq->last_add_time_valid)
> > > > + WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> > > > + > 100);
> > > > + vq->last_add_time = now;
> > > > + vq->last_add_time_valid = true;
> > > > + }
> > > > +#endif
> > > > +
> > > > + BUG_ON(total_sg == 0);
> > > > +
> > > > + head = vq->next_avail_idx;
> > > > + wrap_counter = vq->wrap_counter;
> > > > +
> > > > + /* If the host supports indirect descriptor tables, and we have multiple
> > > > + * buffers, then go indirect. FIXME: tune this threshold */
> > > > + if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> > > Let's introduce a helper like virtqueue_need_indirect() to avoid duplicating
> > > codes and FIXME.
> > Okay.
> >
> > > > + desc = alloc_indirect_packed(_vq, total_sg, gfp);
> > > > + else {
> > > > + desc = NULL;
> > > > + WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
> > > > + }
> > > > +
> > > > + if (desc) {
> > > > + /* Use a single buffer which doesn't continue */
> > > > + indirect = true;
> > > > + /* Set up rest to use this indirect table. */
> > > > + i = 0;
> > > > + descs_used = 1;
> > > > + } else {
> > > > + indirect = false;
> > > > + desc = vq->vring_packed.desc;
> > > > + i = head;
> > > > + descs_used = total_sg;
> > > > + }
> > > > +
> > > > + if (vq->vq.num_free < descs_used) {
> > > > + pr_debug("Can't add buf len %i - avail = %i\n",
> > > > + descs_used, vq->vq.num_free);
> > > > + /* FIXME: for historical reasons, we force a notify here if
> > > > + * there are outgoing parts to the buffer. Presumably the
> > > > + * host should service the ring ASAP. */
> > > > + if (out_sgs)
> > > > + vq->notify(&vq->vq);
> > > > + if (indirect)
> > > > + kfree(desc);
> > > > + END_USE(vq);
> > > > + return -ENOSPC;
> > > > + }
> > > > +
> > > > + for (n = 0; n < out_sgs + in_sgs; n++) {
> > > > + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > + dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> > > > + DMA_TO_DEVICE : DMA_FROM_DEVICE);
> > > > + if (vring_mapping_error(vq, addr))
> > > > + goto unmap_release;
> > > > +
> > > > + flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > > > + (n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
> > > > + VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > > > + VRING_DESC_F_USED(!vq->wrap_counter));
> > > > + if (!indirect && i == head)
> > > > + head_flags = flags;
> > > > + else
> > > > + desc[i].flags = flags;
> > > > +
> > > > + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > + desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > + desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > Similar to V1, we only need this for the last descriptor.
> > Okay, will just set it for the last desc.
> >
> > > > + prev = i;
> > > It looks to me there's no need to track prev inside the loop here.
> > >
> > > > + i++;
> > > > + if (!indirect && i >= vq->vring_packed.num) {
> > > > + i = 0;
> > > > + vq->wrap_counter ^= 1;
> > > > + }
> > > > + }
> > > > + }
> > > > + /* Last one doesn't continue. */
> > > > + if (total_sg == 1)
> > > > + head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > + else
> > > > + desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > The only case when prev != i - 1 is i == 0, we can add a if here.
> > It's just a mirror of the existing implementation in split ring.
> > It seems that split ring implementation needs this just because
> > it's much harder for it to find the prev, which is not true for
> > packed ring. So I'll take your suggestion. Thanks!
> >
> > [...]
> > > > +static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > +{
> > > > + struct vring_virtqueue *vq = to_vvq(_vq);
> > > > + u16 new, old, off_wrap;
> > > > + bool needs_kick;
> > > > +
> > > > + START_USE(vq);
> > > > + /* We need to expose the new flags value before checking notification
> > > > + * suppressions. */
> > > > + virtio_mb(vq->weak_barriers);
> > > > +
> > > > + old = vq->next_avail_idx - vq->num_added;
> > > > + new = vq->next_avail_idx;
> > > > + vq->num_added = 0;
> > > > +
> > > > +#ifdef DEBUG
> > > > + if (vq->last_add_time_valid) {
> > > > + WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> > > > + vq->last_add_time)) > 100);
> > > > + }
> > > > + vq->last_add_time_valid = false;
> > > > +#endif
> > > > +
> > > > + off_wrap = virtio16_to_cpu(_vq->vdev, vq->vring_packed.device->off_wrap);
> > > > +
> > > > + if (vq->event) {
> > > It looks to me we should examine RING_EVENT_FLAGS_DESC in desc_event_flags
> > > instead of vq->event here. Spec does not forces to use evenf_off and
> > > event_wrap if event index is enabled.
> > >
> > > > + // FIXME: fix this!
> > > > + needs_kick = ((off_wrap >> 15) == vq->wrap_counter) &&
> > > > + vring_need_event(off_wrap & ~(1<<15), new, old);
> > > Why need a & here?
> > Because wrap_counter (the most significant bit in off_wrap)
> > isn't part of the index.
> >
> > > > + } else {
> > > Need a smp_rmb() to make sure desc_event_flags was checked before flags.
> > I don't get your point, if my understanding is correct,
> > desc_event_flags is vq->vring_packed.device->flags. So
> > what's the "flags"?
>
> Sorry, I mean we need check device.flags before off_warp. So it needs an
> smp_rmb() in the middle.
It's best to just read all flags atomically as u32.
> It looks to me there's no guarantee that
> VRING_EVENT_F_DESC is set if event index is supported.
>
> >
> > > > + needs_kick = (vq->vring_packed.device->flags !=
> > > > + cpu_to_virtio16(_vq->vdev, VRING_EVENT_F_DISABLE));
> > > > + }
> > > > + END_USE(vq);
> > > > + return needs_kick;
> > > > +}
> > [...]
> > > > +static int detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > > > + void **ctx)
> > > > +{
> > > > + struct vring_packed_desc *desc;
> > > > + unsigned int i, j;
> > > > +
> > > > + /* Clear data ptr. */
> > > > + vq->desc_state[head].data = NULL;
> > > > +
> > > > + i = head;
> > > > +
> > > > + for (j = 0; j < vq->desc_state[head].num; j++) {
> > > > + desc = &vq->vring_packed.desc[i];
> > > > + vring_unmap_one_packed(vq, desc);
> > > > + desc->flags = 0x0;
> > > Looks like this is unnecessary.
> > It's safer to zero it. If we don't zero it, after we
> > call virtqueue_detach_unused_buf_packed() which calls
> > this function, the desc is still available to the
> > device.
>
> Well detach_unused_buf_packed() should be called after device is stopped,
> otherwise even if you try to clear, there will still be a window that device
> may use it.
>
> >
> > > > + i++;
> > > > + if (i >= vq->vring_packed.num)
> > > > + i = 0;
> > > > + }
> > [...]
> > > > +static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > > > +{
> > > > + struct vring_virtqueue *vq = to_vvq(_vq);
> > > > + u16 last_used_idx, wrap_counter, off_wrap;
> > > > +
> > > > + START_USE(vq);
> > > > +
> > > > + last_used_idx = vq->last_used_idx;
> > > > + wrap_counter = vq->wrap_counter;
> > > > +
> > > > + if (last_used_idx > vq->next_avail_idx)
> > > > + wrap_counter ^= 1;
> > > > +
> > > > + off_wrap = last_used_idx | (wrap_counter << 15);
> > > > +
> > > > + /* We optimistically turn back on interrupts, then check if there was
> > > > + * more to do. */
> > > > + /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> > > > + * either clear the flags bit or point the event index at the next
> > > > + * entry. Always do both to keep code simple. */
> > > > + if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC:
> > > > + VRING_EVENT_F_ENABLE;
> > > > + vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > + vq->event_flags_shadow);
> > > > + }
> > > A smp_wmb() is missed here?
> > >
> > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev, off_wrap);
> > > And according to the spec, it looks to me write a VRING_EVENT_F_ENABLE is
> > > sufficient here.
> > I didn't think much when implementing the event suppression
> > for packed ring previously. After I saw your comments, I found
> > something new. Indeed, unlike the split ring, for the packed
> > ring, spec doesn't say we must use VRING_EVENT_F_DESC when
> > EVENT_IDX is negotiated. So do you think below thought is
> > right or makes sense?
> >
> > - For virtqueue_enable_cb_prepare(), we just need to enable
> > the ring by setting flags to VRING_EVENT_F_ENABLE in any
> > case.
> >
> > - We will try to use VRING_EVENT_F_DESC (if EVENT_IDX is
> > negotiated) only when we want to delay the interrupts
> > virtqueue_enable_cb_delayed().
>
> This looks good to me.
I suspect this will lead to extra interrupts if host is fast.
So I think for now we should always use VRING_EVENT_F_DESC
if EVENT_IDX is negotiated.
VRING_EVENT_F_DISABLE makes more sense to me.
> >
> > > > + END_USE(vq);
> > > > + return last_used_idx;
> > > > +}
> > > > +
> > [...]
> > > > @@ -1157,14 +1852,18 @@ void vring_transport_features(struct virtio_device *vdev)
> > > > for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
> > > > switch (i) {
> > > > - case VIRTIO_RING_F_INDIRECT_DESC:
> > > > +#if 0
> > > > + case VIRTIO_RING_F_INDIRECT_DESC: // FIXME not tested yet.
> > > > break;
> > > > - case VIRTIO_RING_F_EVENT_IDX:
> > > > + case VIRTIO_RING_F_EVENT_IDX: // FIXME probably not work.
> > > > break;
> > > > +#endif
> > > It would be better if you can split EVENT_IDX and INDIRECT_DESC into
> > > separate patches too.
> > Sure. Will do it in the next version.
> >
> > Thanks for the review!
>
> Thanks.
>
> > > Thanks
> > >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC v2] virtio: support packed ring
From: Jason Wang @ 2018-04-17 2:24 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Tiwei Bie, wexu, virtualization, linux-kernel, netdev, jfreimann
In-Reply-To: <20180417051343-mutt-send-email-mst@kernel.org>
On 2018年04月17日 10:17, Michael S. Tsirkin wrote:
> On Tue, Apr 17, 2018 at 10:11:58AM +0800, Jason Wang wrote:
>>
>> On 2018年04月13日 15:15, Tiwei Bie wrote:
>>> On Fri, Apr 13, 2018 at 12:30:24PM +0800, Jason Wang wrote:
>>>> On 2018年04月01日 22:12, Tiwei Bie wrote:
>>>>> Hello everyone,
>>>>>
>>>>> This RFC implements packed ring support for virtio driver.
>>>>>
>>>>> The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
>>>>> by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
>>>>> Minor changes are needed for the vhost code, e.g. to kick the guest.
>>>>>
>>>>> TODO:
>>>>> - Refinements and bug fixes;
>>>>> - Split into small patches;
>>>>> - Test indirect descriptor support;
>>>>> - Test/fix event suppression support;
>>>>> - Test devices other than net;
>>>>>
>>>>> RFC v1 -> RFC v2:
>>>>> - Add indirect descriptor support - compile test only;
>>>>> - Add event suppression supprt - compile test only;
>>>>> - Move vring_packed_init() out of uapi (Jason, MST);
>>>>> - Merge two loops into one in virtqueue_add_packed() (Jason);
>>>>> - Split vring_unmap_one() for packed ring and split ring (Jason);
>>>>> - Avoid using '%' operator (Jason);
>>>>> - Rename free_head -> next_avail_idx (Jason);
>>>>> - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
>>>>> - Some other refinements and bug fixes;
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>> ---
>>>>> drivers/virtio/virtio_ring.c | 1094 +++++++++++++++++++++++++++++-------
>>>>> include/linux/virtio_ring.h | 8 +-
>>>>> include/uapi/linux/virtio_config.h | 12 +-
>>>>> include/uapi/linux/virtio_ring.h | 61 ++
>>>>> 4 files changed, 980 insertions(+), 195 deletions(-)
>>> [...]
[...]
>>>> It looks to me we should examine RING_EVENT_FLAGS_DESC in desc_event_flags
>>>> instead of vq->event here. Spec does not forces to use evenf_off and
>>>> event_wrap if event index is enabled.
>>>>
>>>>> + // FIXME: fix this!
>>>>> + needs_kick = ((off_wrap >> 15) == vq->wrap_counter) &&
>>>>> + vring_need_event(off_wrap & ~(1<<15), new, old);
>>>> Why need a & here?
>>> Because wrap_counter (the most significant bit in off_wrap)
>>> isn't part of the index.
>>>
>>>>> + } else {
>>>> Need a smp_rmb() to make sure desc_event_flags was checked before flags.
>>> I don't get your point, if my understanding is correct,
>>> desc_event_flags is vq->vring_packed.device->flags. So
>>> what's the "flags"?
>> Sorry, I mean we need check device.flags before off_warp. So it needs an
>> smp_rmb() in the middle.
> It's best to just read all flags atomically as u32.
Yes it is.
>
>> It looks to me there's no guarantee that
>> VRING_EVENT_F_DESC is set if event index is supported.
>>
>>>>> + needs_kick = (vq->vring_packed.device->flags !=
>>>>> + cpu_to_virtio16(_vq->vdev, VRING_EVENT_F_DISABLE));
>>>>> + }
>>>>> + END_USE(vq);
>>>>> + return needs_kick;
>>>>> +}
>>> [...]
>>>>> +static int detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
>>>>> + void **ctx)
>>>>> +{
>>>>> + struct vring_packed_desc *desc;
>>>>> + unsigned int i, j;
>>>>> +
>>>>> + /* Clear data ptr. */
>>>>> + vq->desc_state[head].data = NULL;
>>>>> +
>>>>> + i = head;
>>>>> +
>>>>> + for (j = 0; j < vq->desc_state[head].num; j++) {
>>>>> + desc = &vq->vring_packed.desc[i];
>>>>> + vring_unmap_one_packed(vq, desc);
>>>>> + desc->flags = 0x0;
>>>> Looks like this is unnecessary.
>>> It's safer to zero it. If we don't zero it, after we
>>> call virtqueue_detach_unused_buf_packed() which calls
>>> this function, the desc is still available to the
>>> device.
>> Well detach_unused_buf_packed() should be called after device is stopped,
>> otherwise even if you try to clear, there will still be a window that device
>> may use it.
>>
>>>>> + i++;
>>>>> + if (i >= vq->vring_packed.num)
>>>>> + i = 0;
>>>>> + }
>>> [...]
>>>>> +static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
>>>>> +{
>>>>> + struct vring_virtqueue *vq = to_vvq(_vq);
>>>>> + u16 last_used_idx, wrap_counter, off_wrap;
>>>>> +
>>>>> + START_USE(vq);
>>>>> +
>>>>> + last_used_idx = vq->last_used_idx;
>>>>> + wrap_counter = vq->wrap_counter;
>>>>> +
>>>>> + if (last_used_idx > vq->next_avail_idx)
>>>>> + wrap_counter ^= 1;
>>>>> +
>>>>> + off_wrap = last_used_idx | (wrap_counter << 15);
>>>>> +
>>>>> + /* We optimistically turn back on interrupts, then check if there was
>>>>> + * more to do. */
>>>>> + /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>>>>> + * either clear the flags bit or point the event index at the next
>>>>> + * entry. Always do both to keep code simple. */
>>>>> + if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
>>>>> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC:
>>>>> + VRING_EVENT_F_ENABLE;
>>>>> + vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
>>>>> + vq->event_flags_shadow);
>>>>> + }
>>>> A smp_wmb() is missed here?
>>>>
>>>>> + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev, off_wrap);
>>>> And according to the spec, it looks to me write a VRING_EVENT_F_ENABLE is
>>>> sufficient here.
>>> I didn't think much when implementing the event suppression
>>> for packed ring previously. After I saw your comments, I found
>>> something new. Indeed, unlike the split ring, for the packed
>>> ring, spec doesn't say we must use VRING_EVENT_F_DESC when
>>> EVENT_IDX is negotiated. So do you think below thought is
>>> right or makes sense?
>>>
>>> - For virtqueue_enable_cb_prepare(), we just need to enable
>>> the ring by setting flags to VRING_EVENT_F_ENABLE in any
>>> case.
>>>
>>> - We will try to use VRING_EVENT_F_DESC (if EVENT_IDX is
>>> negotiated) only when we want to delay the interrupts
>>> virtqueue_enable_cb_delayed().
>> This looks good to me.
> I suspect this will lead to extra interrupts if host is fast.
> So I think for now we should always use VRING_EVENT_F_DESC
> if EVENT_IDX is negotiated.
Right, so if this is true, maybe we'd better force this in the spec?
Thanks
>
> VRING_EVENT_F_DISABLE makes more sense to me.
>
[...]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox