* Re: [PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race
From: Stefan Hajnoczi @ 2017-08-17 14:05 UTC (permalink / raw)
To: Dexuan Cui
Cc: Michal Kubecek, joe@perches.com, olaf@aepfle.de,
Stephen Hemminger, jasowang@redhat.com, netdev@vger.kernel.org,
Haiyang Zhang, Dave Scott, linux-kernel@vger.kernel.org,
apw@canonical.com, Jorgen Hansen, Rolf Neugebauer, Marcelo Cerri,
devel@linuxdriverproject.org, Vitaly Kuznetsov,
davem@davemloft.net, George Zhang, Dan Carpenter
In-Reply-To: <KL1P15301MB00082249964D8EF1BAA48790BF8D0@KL1P15301MB0008.APCP153.PROD.OUTLOOK.COM>
[-- Attachment #1.1: Type: text/plain, Size: 474 bytes --]
On Tue, Aug 15, 2017 at 10:15:39PM +0000, Dexuan Cui wrote:
> With the current code, when vsock_dequeue_accept() is removing a sock
> from the list, nothing prevents vsock_enqueue_accept() from adding a new
> sock into the list concurrently. We should add a lock to protect the list.
The listener sock is locked, preventing concurrent modification. I have
checked both the virtio and vmci transports. Can you post an example
where the listener sock isn't locked?
Stefan
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 169 bytes --]
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
^ permalink raw reply
* Re: [ovs-dev] [PATCH v3] openvswitch: enable NSH support
From: Eric Garver @ 2017-08-17 14:14 UTC (permalink / raw)
To: Yang, Yi; +Cc: dev, netdev, jbenc
In-Reply-To: <20170816234940.GB123125@cran64.bj.intel.com>
On Thu, Aug 17, 2017 at 07:49:41AM +0800, Yang, Yi wrote:
> On Wed, Aug 16, 2017 at 11:15:28PM +0800, Eric Garver wrote:
> > On Wed, Aug 16, 2017 at 01:35:30PM +0800, Yi Yang wrote:
> > > +
> > > +#define NSH_DST_PORT 4790 /* UDP Port for NSH on VXLAN. */
> > > +#define ETH_P_NSH 0x894F /* Ethertype for NSH. */
> >
> > ETH_P_NSH probably belongs in include/uapi/linux/if_ether.h with all the
> > other ETH_P_* defines.
> >
>
> Ok, I'll move it to include/uapi/linux/if_ether.h, but in userspace, we
> still need to keep it in nsh.h.
>
> > >
> > > +struct ovs_key_nsh {
> > > + __u8 flags;
> > > + __u8 mdtype;
> > > + __u8 np;
> > > + __u8 pad;
> > > + __be32 path_hdr;
> > > + __be32 context[NSH_MD1_CONTEXT_SIZE];
> > > +};
> > > +
> > > struct sw_flow_key {
> > > u8 tun_opts[IP_TUNNEL_OPTS_MAX];
> > > u8 tun_opts_len;
> > > @@ -144,6 +154,7 @@ struct sw_flow_key {
> > > };
> > > } ipv6;
> > > };
> > > + struct ovs_key_nsh nsh; /* network service header */
> >
> > Are you intentionally not reserving space in the flow key for
> > OVS_NSH_KEY_ATTR_MD2? I know it's not supported yet, but much of the
> > code is stubbed out for it - just making sure this wasn't an oversight.
> >
>
> For MD type 2, we'll reuse tun_metedata keys in struct flow_tnl which
> will be reworked and it will be shared by NSH and GENEVE, so we won't
> have new keys in "struct ovs_key_nsh" for MD type 2.
Be careful here. VXLAN also uses tun_metadata for GBP. VXLAN-GPE (+ NSH)
and VXLAN-GBP are mutually exclusive AFAICS, but you should verify it
all behaves as expected.
[..]
^ permalink raw reply
* RE: [PATCH net v2 2/2] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
From: Tantilov, Emil S @ 2017-08-17 14:17 UTC (permalink / raw)
To: Ding Tianhong, davem@davemloft.net, Kirsher, Jeffrey T,
keescook@chromium.org, linux-kernel@vger.kernel.org,
sparclinux@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
alexander.duyck@gmail.com, netdev@vger.kernel.org,
linuxarm@huawei.com
In-Reply-To: <1502940316-13384-3-git-send-email-dingtianhong@huawei.com>
>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>owner@vger.kernel.org] On Behalf Of Ding Tianhong
>Sent: Wednesday, August 16, 2017 8:25 PM
>To: davem@davemloft.net; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
>keescook@chromium.org; linux-kernel@vger.kernel.org;
>sparclinux@vger.kernel.org; intel-wired-lan@lists.osuosl.org;
>alexander.duyck@gmail.com; netdev@vger.kernel.org; linuxarm@huawei.com
>Cc: Ding Tianhong <dingtianhong@huawei.com>
>Subject: [PATCH net v2 2/2] net: ixgbe: Use new
>PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
>
>The ixgbe driver use the compile check to determine if it can
>send TLPs to Root Port with the Relaxed Ordering Attribute set,
>this is too inconvenient, now the new flag
>PCI_DEV_FLAGS_NO_RELAXED_ORDERING
>has been added to the kernel and we could check the bit4 in the PCIe
>Device Control register to determine whether we should use the Relaxed
>Ordering Attributes or not, so use this new way in the ixgbe driver.
>
>Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>---
> drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c | 37 ++++++++++++---------
>----
> drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++++++++++----------
> 2 files changed, 35 insertions(+), 34 deletions(-)
>
>diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
>b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
>index 523f9d0..d1571e3 100644
>--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
>+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
>@@ -175,31 +175,30 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw
>*hw)
> **/
> static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
> {
>-#ifndef CONFIG_SPARC
>- u32 regval;
>- u32 i;
>-#endif
>+ u32 regval, i;
> s32 ret_val;
>+ struct ixgbe_adapter *adapter = hw->back;
>
> ret_val = ixgbe_start_hw_generic(hw);
>
>-#ifndef CONFIG_SPARC
>- /* Disable relaxed ordering */
>- for (i = 0; ((i < hw->mac.max_tx_queues) &&
>- (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
>- regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
>- regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
>- IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
>- }
>+ if (!pcie_relaxed_ordering_enabled(adapter->pdev)) {
As Alex mentioned there is no need for this check in any form.
The HW defaults to Relaxed Ordering enabled unless it is disabled in
the PCIe Device Control Register. So the above logic is already done by HW.
All you have to do is strip the code disabling relaxed ordering.
Thanks,
Emil
^ permalink raw reply
* RE: [net-next 11/15] net/mlx5e: Properly indent within conditional statements
From: David Laight @ 2017-08-17 14:18 UTC (permalink / raw)
To: 'Saeed Mahameed', David S. Miller
Cc: netdev@vger.kernel.org, Leon Romanovsky, Or Gerlitz
In-Reply-To: <20170817133003.16900-12-saeedm@mellanox.com>
From: Saeed Mahameed
> Sent: 17 August 2017 14:30
> To: David S. Miller
> Cc: netdev@vger.kernel.org; Leon Romanovsky; Or Gerlitz; Saeed Mahameed
> Subject: [net-next 11/15] net/mlx5e: Properly indent within conditional statements
>
> From: Or Gerlitz <ogerlitz@mellanox.com>
>
> To fix these checkpatch complaints:
>
> WARNING: suspect code indent for conditional statements (8, 24)
> + if (eth_proto & (MLX5E_PROT_MASK(MLX5E_10GBASE_SR)
> [...]
> + return PORT_FIBRE;
>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
> .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> index a75ac4d11c5b..ed161312a773 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> @@ -988,23 +988,23 @@ static u8 get_connector_port(u32 eth_proto, u8 connector_type)
> return ptys2connector_type[connector_type];
>
> if (eth_proto & (MLX5E_PROT_MASK(MLX5E_10GBASE_SR)
> - | MLX5E_PROT_MASK(MLX5E_40GBASE_SR4)
> - | MLX5E_PROT_MASK(MLX5E_100GBASE_SR4)
> - | MLX5E_PROT_MASK(MLX5E_1000BASE_CX_SGMII))) {
> - return PORT_FIBRE;
> + | MLX5E_PROT_MASK(MLX5E_40GBASE_SR4)
> + | MLX5E_PROT_MASK(MLX5E_100GBASE_SR4)
> + | MLX5E_PROT_MASK(MLX5E_1000BASE_CX_SGMII))) {
> + return PORT_FIBRE;
Gah, that is why the rules are stupid.
If anything the continuation lines want indenting a few more bytes.
David
^ permalink raw reply
* Re: [PATCH net] datagram: When peeking datagrams with offset < 0 don't skip empty skbs
From: Willem de Bruijn @ 2017-08-17 14:37 UTC (permalink / raw)
To: David Laight
Cc: Paolo Abeni, Matthew Dawson, Network Development,
Macieira, Thiago
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DD0058EDB@AcuExch.aculab.com>
On Thu, Aug 17, 2017 at 5:15 AM, David Laight <David.Laight@aculab.com> wrote:
> From: Willem de Bruijn
>> Sent: 17 August 2017 00:27
>> Actually, it is safe even without the check. Overflow of the signed integer
>> is benign here.
>
> IIRC the C language states that 'signed integer overflow' is undefined.
> So 'MAXINT + 1' doesn't have to equal '-MAXINT - 1' (as one would
> expect on a 2's compliment system).
>
> While the linux kernel probably won't run on systems where this isn't true
> (eg where signed arithmetic saturates) gcc will assume it can't happen
> and optimise code with that assumption.
Ah, of course. Thanks. The last patch does not rely on such tricks, indeed.
On rereading, it is actually very similar to Matthew's original. The
main difference is handling the negative offset case inside
__skb_try_recv_from_queue instead of in each caller.
^ permalink raw reply
* Re: [PATCH net-next 3/3] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
From: Stefan Hajnoczi @ 2017-08-17 14:55 UTC (permalink / raw)
To: Dexuan Cui
Cc: davem@davemloft.net, netdev@vger.kernel.org,
devel@linuxdriverproject.org, KY Srinivasan, Haiyang Zhang,
Stephen Hemminger, George Zhang, Jorgen Hansen, Michal Kubecek,
Vitaly Kuznetsov, Cathy Avery, jasowang@redhat.com,
Rolf Neugebauer, Dave Scott, Marcelo Cerri, apw@canonical.com,
olaf@aepfle.de, joe@perches.com, "linux
In-Reply-To: <KL1P15301MB00086EF58514BFD08F66B6C4BF8D0@KL1P15301MB0008.APCP153.PROD.OUTLOOK.COM>
[-- Attachment #1: Type: text/plain, Size: 969 bytes --]
On Tue, Aug 15, 2017 at 10:18:41PM +0000, Dexuan Cui wrote:
> +static u32 hvs_get_local_cid(void)
> +{
> + return VMADDR_CID_ANY;
> +}
Interesting concept: the guest never knows its CID. This is nice from a
live migration perspective. Currently VMCI and virtio adjust listen
socket local CIDs after migration.
> +static bool hvs_stream_allow(u32 cid, u32 port)
> +{
> + static const u32 valid_cids[] = {
> + VMADDR_CID_ANY,
Is this for loopback?
> + VMADDR_CID_HOST,
> + };
> + int i;
> +
> + /* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0xFFFFFFFF) is
> + * reserved as ephemeral ports, which are used as the host's ports
> + * when the host initiates connections.
> + */
> + if (port > MAX_HOST_LISTEN_PORT)
> + return false;
Without this if statement the guest will attempt to connect. I guess
there will be no listen sockets above MAX_HOST_LISTEN_PORT, so the
connection attempt will fail.
...but hardcode this knowledge into the guest driver?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply
* Re: [PATCH net] tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
From: Neal Cardwell @ 2017-08-17 14:56 UTC (permalink / raw)
To: David Miller; +Cc: Netdev, Neal Cardwell, Yuchung Cheng, Eric Dumazet
In-Reply-To: <20170816215336.30419-1-ncardwell@google.com>
On Wed, Aug 16, 2017 at 5:53 PM, Neal Cardwell <ncardwell@google.com> wrote:
> In some situations tcp_send_loss_probe() can realize that it's unable
> to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
> to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
> realizes that the RTO was eligible to fire immediately or at some
> point in the past (delta_us <= 0). Previously in such cases
> tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
> icsk_rto, which caused needless delays of hundreds of milliseconds
> (and non-linear behavior that made reproducible testing
> difficult). This commit changes the logic to schedule "overdue" RTOs
> ASAP, rather than at now + icsk_rto.
>
> Suggested-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
Forgot to mention:
Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
This is a -stable candidate.
thanks,
neal
^ permalink raw reply
* [PATCH net-next] dsa: remove unused net_device arg from handlers
From: Florian Westphal @ 2017-08-17 14:47 UTC (permalink / raw)
To: netdev; +Cc: f.fainelli, vivien.didelot, Florian Westphal
compile tested only, but saw no warnings/errors with
allmodconfig build.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/dsa.h | 6 ++----
net/dsa/dsa.c | 4 ++--
net/dsa/tag_brcm.c | 3 +--
net/dsa/tag_dsa.c | 3 +--
net/dsa/tag_edsa.c | 3 +--
net/dsa/tag_ksz.c | 3 +--
net/dsa/tag_lan9303.c | 2 +-
net/dsa/tag_mtk.c | 3 +--
net/dsa/tag_qca.c | 3 +--
net/dsa/tag_trailer.c | 3 +--
10 files changed, 12 insertions(+), 21 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 7f46b521313e..398ca8d70ccd 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -104,8 +104,7 @@ struct packet_type;
struct dsa_device_ops {
struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev);
struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *pt,
- struct net_device *orig_dev);
+ struct packet_type *pt);
int (*flow_dissect)(const struct sk_buff *skb, __be16 *proto,
int *offset);
};
@@ -134,8 +133,7 @@ struct dsa_switch_tree {
/* Copy of tag_ops->rcv for faster access in hot path */
struct sk_buff * (*rcv)(struct sk_buff *skb,
struct net_device *dev,
- struct packet_type *pt,
- struct net_device *orig_dev);
+ struct packet_type *pt);
/*
* The switch port to which the CPU is attached.
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 99e38af85fc5..03c58b0eb082 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -186,7 +186,7 @@ struct net_device *dsa_dev_to_net_device(struct device *dev)
EXPORT_SYMBOL_GPL(dsa_dev_to_net_device);
static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *pt, struct net_device *orig_dev)
+ struct packet_type *pt, struct net_device *unused)
{
struct dsa_switch_tree *dst = dev->dsa_ptr;
struct sk_buff *nskb = NULL;
@@ -202,7 +202,7 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
if (!skb)
return 0;
- nskb = dst->rcv(skb, dev, pt, orig_dev);
+ nskb = dst->rcv(skb, dev, pt);
if (!nskb) {
kfree_skb(skb);
return 0;
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index c697d9815177..de74c3f77818 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -89,8 +89,7 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device *dev
}
static struct sk_buff *brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *pt,
- struct net_device *orig_dev)
+ struct packet_type *pt)
{
struct dsa_switch_tree *dst = dev->dsa_ptr;
struct dsa_port *cpu_dp = dsa_get_cpu_port(dst);
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index 12867a4b458f..fbf9ca954773 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -65,8 +65,7 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct net_device *dev)
}
static struct sk_buff *dsa_rcv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *pt,
- struct net_device *orig_dev)
+ struct packet_type *pt)
{
struct dsa_switch_tree *dst = dev->dsa_ptr;
struct dsa_switch *ds;
diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index 67a9d26f9075..76367ba1b2e2 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -78,8 +78,7 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, struct net_device *dev)
}
static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *pt,
- struct net_device *orig_dev)
+ struct packet_type *pt)
{
struct dsa_switch_tree *dst = dev->dsa_ptr;
struct dsa_switch *ds;
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index de66ca8e6201..17f30675c15c 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -76,8 +76,7 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
}
static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *pt,
- struct net_device *orig_dev)
+ struct packet_type *pt)
{
struct dsa_switch_tree *dst = dev->dsa_ptr;
struct dsa_port *cpu_dp = dsa_get_cpu_port(dst);
diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index e23e7635fa00..0b9826105e42 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -68,7 +68,7 @@ static struct sk_buff *lan9303_xmit(struct sk_buff *skb, struct net_device *dev)
}
static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *pt, struct net_device *orig_dev)
+ struct packet_type *pt)
{
u16 *lan9303_tag;
struct dsa_switch_tree *dst = dev->dsa_ptr;
diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index 02163c045a96..ec8ee5f43255 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -44,8 +44,7 @@ static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb,
}
static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *pt,
- struct net_device *orig_dev)
+ struct packet_type *pt)
{
struct dsa_switch_tree *dst = dev->dsa_ptr;
struct dsa_switch *ds;
diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
index 1867a3d11f28..1d4c70711c0f 100644
--- a/net/dsa/tag_qca.c
+++ b/net/dsa/tag_qca.c
@@ -63,8 +63,7 @@ static struct sk_buff *qca_tag_xmit(struct sk_buff *skb, struct net_device *dev)
}
static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *pt,
- struct net_device *orig_dev)
+ struct packet_type *pt)
{
struct dsa_switch_tree *dst = dev->dsa_ptr;
struct dsa_port *cpu_dp = dsa_get_cpu_port(dst);
diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c
index b09e56214005..8707157dea32 100644
--- a/net/dsa/tag_trailer.c
+++ b/net/dsa/tag_trailer.c
@@ -56,8 +56,7 @@ static struct sk_buff *trailer_xmit(struct sk_buff *skb, struct net_device *dev)
}
static struct sk_buff *trailer_rcv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *pt,
- struct net_device *orig_dev)
+ struct packet_type *pt)
{
struct dsa_switch_tree *dst = dev->dsa_ptr;
struct dsa_port *cpu_dp = dsa_get_cpu_port(dst);
--
2.13.0
^ permalink raw reply related
* Re: [PATCH v4 05/12] Documentation: net: phy: Add phy-is-internal binding
From: Rob Herring @ 2017-08-17 15:10 UTC (permalink / raw)
To: David.Wu
Cc: Andrew Lunn, Florian Fainelli, davem, heiko, mark.rutland,
catalin.marinas, will.deacon, olof, linux, arnd, peppe.cavallaro,
alexandre.torgue, huangtao, hwg, netdev, linux-arm-kernel,
linux-rockchip, devicetree, linux-kernel
In-Reply-To: <1bff1793-fe46-f263-fbf5-ef65ffbc5dc9@rock-chips.com>
On Thu, Aug 10, 2017 at 06:57:40PM +0800, David.Wu wrote:
> Hi Andrew, Florian
>
> 在 2017/8/10 8:20, Andrew Lunn 写道:
> > Hi Florian, David.
> >
> > I'm happy with the property name. But i think the text needs more
> > description. We deal with Ethernet switches with integrated PHYs. Yet
> > for us, this property is unneeded.
> >
> > Seeing this property means some bit of software needs to ensure the
> > internal PHY should be used, when given the choice between an internal
> > and external PHY. So i would say something like:
> >
> > If set, indicates that the PHY is integrated into the same
> > physical package as the Ethernet MAC. If needed, muxers should be
> > configured to ensure the (internal) PHY is used. The absence of this
> > property indicates the muxers should be configured so that the
> > external PHY is used.
>
> Are we supposed to replace the words "internal" with "integrated" here?
> So we have three kinds of PHY, they are intenal, external and integrated
> PHYs.
And you can have both a XAUI/serdes phy and the ethernet phy. It should
be clear this is for the latter.
Rob
^ permalink raw reply
* Re: [PATCH v4 06/12] net: stmmac: dwmac-rk: Add internal phy support
From: Rob Herring @ 2017-08-17 15:10 UTC (permalink / raw)
To: David Wu
Cc: davem, heiko, andrew, f.fainelli, mark.rutland, catalin.marinas,
will.deacon, olof, linux, arnd, peppe.cavallaro, alexandre.torgue,
huangtao, hwg, netdev, linux-arm-kernel, linux-rockchip,
devicetree, linux-kernel
In-Reply-To: <1502280661-2308-1-git-send-email-david.wu@rock-chips.com>
On Wed, Aug 09, 2017 at 08:11:01PM +0800, David Wu wrote:
> To make internal phy work, need to configure the phy_clock,
> phy cru_reset and related registers.
>
> Signed-off-by: David Wu <david.wu@rock-chips.com>
> ---
> change in v4:
> - PHY is internal or not base on the phy-is-internal property via phy node.
>
> .../devicetree/bindings/net/rockchip-dwmac.txt | 4 +-
> drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 88 ++++++++++++++++++++++
> 2 files changed, 91 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> index 8f42755..4f51305 100644
> --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> @@ -25,7 +25,8 @@ Required properties:
> - clock-names: One name for each entry in the clocks property.
> - phy-mode: See ethernet.txt file in the same directory.
> - pinctrl-names: Names corresponding to the numbered pinctrl states.
> - - pinctrl-0: pin-control mode. can be <&rgmii_pins> or <&rmii_pins>.
> + - pinctrl-0: pin-control mode. can be <&rgmii_pins>, <&rmii_pins> or led pins
> + for internal phy mode.
> - clock_in_out: For RGMII, it must be "input", means main clock(125MHz)
> is not sourced from SoC's PLL, but input from PHY; For RMII, "input" means
> PHY provides the reference clock(50MHz), "output" means GMAC provides the
> @@ -40,6 +41,7 @@ Optional properties:
> - tx_delay: Delay value for TXD timing. Range value is 0~0x7F, 0x30 as default.
> - rx_delay: Delay value for RXD timing. Range value is 0~0x7F, 0x10 as default.
> - phy-supply: phandle to a regulator if the PHY needs one
> + - clocks: <&cru MAC_PHY>: Clock selector for internal macphy
I assume this is required if internal phy is used. 'clocks' is already
documented above, so this needs to be documented with it.
Rob
^ permalink raw reply
* Re: [PATCH v4 2/4] dt-bindings: can: can-transceiver: Document new binding
From: Rob Herring @ 2017-08-17 15:10 UTC (permalink / raw)
To: Franklin S Cooper Jr
Cc: linux-kernel, devicetree, netdev, linux-can, wg, mkl,
quentin.schulz, dev.kurt, andrew, sergei.shtylyov, socketcan
In-Reply-To: <20170810005916.27163-3-fcooper@ti.com>
On Wed, Aug 09, 2017 at 07:59:14PM -0500, Franklin S Cooper Jr wrote:
> Add documentation to describe usage of the new can-transceiver binding.
> This new binding is applicable for any CAN device therefore it exists as
> its own document.
>
> Signed-off-by: Franklin S Cooper Jr <fcooper@ti.com>
> ---
> Version 4 changes:
> Drop unit address.
> Switch from using fixed-transceiver to can-transceiver
>
> .../bindings/net/can/can-transceiver.txt | 24 ++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/net/can/can-transceiver.txt
>
> diff --git a/Documentation/devicetree/bindings/net/can/can-transceiver.txt b/Documentation/devicetree/bindings/net/can/can-transceiver.txt
> new file mode 100644
> index 0000000..2c31dc0
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/can/can-transceiver.txt
> @@ -0,0 +1,24 @@
> +Generic CAN transceiver Device Tree binding
> +------------------------------
> +
> +CAN transceiver typically limits the max speed in standard CAN and CAN FD
> +modes. Typically these limitations are static and the transceivers themselves
> +provide no way to detect this limitation at runtime. For this situation,
> +the "can-transceiver" node can be used.
> +
> +Required Properties:
> + max-bitrate: a positive non 0 value that determines the max
> + speed that CAN/CAN-FD can run. Any other value
> + will be ignored.
> +
> +Examples:
> +
> +Based on Texas Instrument's TCAN1042HGV CAN Transceiver
> +
> +m_can0 {
> + ....
> + can-transceiver@ {
Need to drop the @ too. With that,
Acked-by: Rob Herring <robh@kernel.org>
> + max-bitrate = <5000000>;
> + };
> + ...
> +};
> --
> 2.9.4.dirty
>
^ permalink raw reply
* Re: [PATCH v4 3/4] dt-bindings: can: m_can: Document new can transceiver binding
From: Rob Herring @ 2017-08-17 15:10 UTC (permalink / raw)
To: Franklin S Cooper Jr
Cc: linux-kernel, devicetree, netdev, linux-can, wg, mkl,
quentin.schulz, dev.kurt, andrew, sergei.shtylyov, socketcan
In-Reply-To: <20170810005916.27163-4-fcooper@ti.com>
On Wed, Aug 09, 2017 at 07:59:15PM -0500, Franklin S Cooper Jr wrote:
> Add information regarding can-transceiver binding. This is especially
> important for MCAN since the IP allows CAN FD mode to run significantly
> faster than what most transceivers are capable of.
>
> Signed-off-by: Franklin S Cooper Jr <fcooper@ti.com>
> ---
> Drop unit address.
> Switch from using fixed-transceiver to can-transceiver
> Indicate that can-transceiver is an optional subnode not a property.
>
> Documentation/devicetree/bindings/net/can/m_can.txt | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt b/Documentation/devicetree/bindings/net/can/m_can.txt
> index 9e33177..ee90aac 100644
> --- a/Documentation/devicetree/bindings/net/can/m_can.txt
> +++ b/Documentation/devicetree/bindings/net/can/m_can.txt
> @@ -43,6 +43,11 @@ Required properties:
> Please refer to 2.4.1 Message RAM Configuration in
> Bosch M_CAN user manual for details.
>
> +Optional Subnode:
> +- can-transceiver : Can-transceiver subnode describing maximum speed
> + that can be used for CAN/CAN-FD modes. See
> + Documentation/devicetree/bindings/net/can/can-transceiver.txt
> + for details.
> Example:
> SoC dtsi:
> m_can1: can@020e8000 {
> @@ -64,4 +69,8 @@ Board dts:
> pinctrl-names = "default";
> pinctrl-0 = <&pinctrl_m_can1>;
> status = "enabled";
> +
> + can-transceiver@ {
ditto
Acked-by: Rob Herring <robh@kernel.org>
> + max-bitrate = <5000000>;
> + };
> };
> --
> 2.9.4.dirty
>
^ permalink raw reply
* Re: [PATCH 2/3] ARM: sun8i: sunxi-h3-h5: add phy-is-integrated property to internal PHY
From: Rob Herring @ 2017-08-17 15:10 UTC (permalink / raw)
To: Florian Fainelli
Cc: Corentin Labbe, Chen-Yu Tsai, mark.rutland, Russell King,
Maxime Ripard, Giuseppe Cavallaro, alexandre.torgue, netdev,
devicetree, linux-kernel, linux-arm-kernel, andrew
In-Reply-To: <302496B2-46F1-456D-A9A0-8257B5582695@gmail.com>
On Fri, Aug 11, 2017 at 08:03:29AM -0700, Florian Fainelli wrote:
> On August 11, 2017 6:25:26 AM PDT, Corentin Labbe <clabbe.montjoie@gmail.com> wrote:
> >On Fri, Aug 11, 2017 at 04:22:11PM +0800, Chen-Yu Tsai wrote:
> >> On Fri, Aug 11, 2017 at 4:19 PM, Corentin Labbe
> >> <clabbe.montjoie@gmail.com> wrote:
> >> > On Fri, Aug 11, 2017 at 04:11:13PM +0800, Chen-Yu Tsai wrote:
> >> >> On Fri, Aug 11, 2017 at 4:05 PM, Corentin Labbe
> >> >> <clabbe.montjoie@gmail.com> wrote:
> >> >> > On Fri, Aug 11, 2017 at 10:42:51AM +0800, Chen-Yu Tsai wrote:
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Thu, Aug 10, 2017 at 4:51 PM, Corentin Labbe
> >> >> >> <clabbe.montjoie@gmail.com> wrote:
> >> >> >> > This patch add the new phy-is-integrated property to the
> >internal PHY
> >> >> >> > node.
> >> >> >> >
> >> >> >> > Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> >> >> >> > ---
> >> >> >> > arch/arm/boot/dts/sunxi-h3-h5.dtsi | 1 +
> >> >> >> > 1 file changed, 1 insertion(+)
> >> >> >> >
> >> >> >> > diff --git a/arch/arm/boot/dts/sunxi-h3-h5.dtsi
> >b/arch/arm/boot/dts/sunxi-h3-h5.dtsi
> >> >> >> > index 4b599b5d26f6..54fc24e4c569 100644
> >> >> >> > --- a/arch/arm/boot/dts/sunxi-h3-h5.dtsi
> >> >> >> > +++ b/arch/arm/boot/dts/sunxi-h3-h5.dtsi
> >> >> >> > @@ -425,6 +425,7 @@
> >> >> >> > reg = <1>;
> >> >> >> > clocks = <&ccu
> >CLK_BUS_EPHY>;
> >> >> >> > resets = <&ccu
> >RST_BUS_EPHY>;
> >> >> >> > + phy-is-integrated;
> >> >> >>
> >> >> >> You also need to "delete" this property at the board level for
> >> >> >> any board that has the external PHY at address <1>. Otherwise
> >> >> >> they will stop working. This is due to the internal and
> >external
> >> >> >> PHYs having the same path and node name in the device tree, so
> >> >> >> they are effectively the same node.
> >> >> >>
> >> >> >> ChenYu
> >> >> >>
> >> >> >
> >> >> > They have not the same name, ext_rgmii_phy vs int_mii_phy.
> >> >>
> >> >> That is just the label. The label plays no part in device tree
> >merging. The path
> >> >>
> >> >> /soc/ethernet@1c30000/mdio/ethernet-phy@1
> >> >>
> >> >> is the same. You can look under
> >> >>
> >> >> /proc/device-tree/soc/ethernet@1c30000/mdio
> >> >>
> >> >> on the OrangePI Plus 2E or any other H3 board that uses an
> >> >> external PHY at address 1.
> >> >>
> >> >> ChenYu
> >> >
> >> > Since we get the phy node by phy-handle and not by path, I think
> >all should be good.
> >>
> >> You are not getting me. The fact that the two seemingly separate
> >> nodes are merged together means, whatever properties you put in
> >> the internal PHY node, also affect the external PHY node. Once
> >> compiled, they are the SAME node.
> >
> >Hello Rob, florian, mark
> >
> >Adding a delete property on all external ethernet-phy@1 is a bit
> >overkill, and I dont like the idea that nodes are merged.
>
> This is not exactly up to you that's just how DTC works.
>
> >What do you think about other possible solutions:
> >- Using integrated-phy@1 for the integrated PHY node name
>
> That might be okay although you are using now a seemingly non-standard unit name.
>
> >- Using a fake address like 31 (see patch below)
>
> You could also drop the address part in the unit name although we'd probably get a DTC warning for that.
>
> I suspect both of your solutions and what I mentioned above will be producing DTC warnings to some extent... Rob what do you think?
If you have 2 devices at the same address, then there is some mux in the
middle. Describe that and you problems should be solved. The internal
phy is always there, so it should be able to always be in the DT.
Rob
^ permalink raw reply
* Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
From: Jorgen S. Hansen @ 2017-08-17 15:16 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Dexuan Cui, davem@davemloft.net, netdev@vger.kernel.org,
gregkh@linuxfoundation.org, devel@linuxdriverproject.org,
KY Srinivasan, Haiyang Zhang, Stephen Hemminger, George Zhang,
Michal Kubecek, Asias He, Vitaly Kuznetsov, Cathy Avery,
jasowang@redhat.com, Rolf Neugebauer, Dave Scott, Marcelo Cerri,
apw@canonical.com
In-Reply-To: <20170817135559.GG5539@stefanha-x1.localdomain>
> On Aug 17, 2017, at 3:55 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Thu, Aug 17, 2017 at 08:00:29AM +0000, Dexuan Cui wrote:
>>
>> Without the patch, vmw_vsock_vmci_transport.ko can automatically load
>> when an application creates an AF_VSOCK socket.
>>
>> This is the expected good behavior on VMware hypervisor, but as we
>> are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we
>> should make sure vmw_vsock_vmci_transport.ko can't load on Hyper-V,
>> otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko
>> and hv_sock.ko try to call vsock_core_init() on Hyper-V.
>>
>> On the other hand, hv_sock.ko can only load on Hyper-V, because it
>> depends on hv_vmbus.ko, which detects Hyper-V in hv_acpi_init().
>>
>> KVM's vsock_virtio_transport doesn't have the issue because it doesn't
>> define MODULE_ALIAS_NETPROTO(PF_VSOCK).
>
> Thanks for sending this patch, vmci's MODULE_ALIAS_NETPROTO(PF_VSOCK) is
> a problem for vhost_vsock.ko (the virtio host driver) too. A host
> userspace program can create a AF_VSOCK socket before vhost_vsock is
> loaded. The vmci transport will be unconditionally loaded and that's
> not the right behavior.
>
> Putting aside nested virtualization, I want to load the transport (vmci,
> Hyper-V, vsock) for which there is paravirtualized hardware present
> inside the guest.
Good points. Completely agree that this is the desired behavior for a guest.
> It's a little tricker on the host side (doesn't matter for Hyper-V and
> probably also doesn't for VMware) because the host-side driver is a
> software device with no hardware backing it. In KVM we assume the
> vhost_vsock.ko kernel module will be loaded sufficiently early.
Since the vmci driver is currently tied to PF_VSOCK it hasn’t been a problem, but on the host side the VMCI driver has no hardware backing it either, so when we move to a more appropriate solution, this will be an issue for VMCI as well. I’ll check our shipped products, but they most likely assume that if an upstreamed vmci module is present, it will be loaded automatically.
>
> Things get trickier with nested virtualization because the VM might want
> to talk to its host but also to its nested VMs. The simple way of
> fixing this would be to allow two transports loaded simultaneously and
> route traffic destined to CID 2 to the host transport and all other
> traffic to the guest transport.
This is close to the routing the VMCI driver does in a nested environment, but that is with the assumption that there is only one type of transport. Having two different transports would require that we delay resolving the transport type until the socket endpoint has been bound to an address. Things get trickier if listening sockets use VMADDR_CID_ANY - if only one transport is present, this would allow the socket to accept connections from both guests and outer host, but with multiple transports that won’t work, since we can’t associate a socket with a transport until the socket is bound.
>
> Perhaps we should discuss these cases a bit more to figure out how to
> avoid conflicts over MODULE_ALIAS_NETPROTO(PF_VSOCK).
Agreed.
>
>>
>> The patch also adds a module parameter "skip_hypervisor_check" for
>> vmw_vsock_vmci_transport.ko.
>>
>> Signed-off-by: Dexuan Cui <decui@microsoft.com>
>> Cc: Alok Kataria <akataria@vmware.com>
>> Cc: Andy King <acking@vmware.com>
>> Cc: Adit Ranadive <aditr@vmware.com>
>> Cc: George Zhang <georgezhang@vmware.com>
>> Cc: Jorgen Hansen <jhansen@vmware.com>
>> Cc: K. Y. Srinivasan <kys@microsoft.com>
>> Cc: Haiyang Zhang <haiyangz@microsoft.com>
>> Cc: Stephen Hemminger <sthemmin@microsoft.com>
>> ---
>> net/vmw_vsock/Kconfig | 2 +-
>> net/vmw_vsock/vmci_transport.c | 11 +++++++++++
>> 2 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
>> index a24369d..3f52929 100644
>> --- a/net/vmw_vsock/Kconfig
>> +++ b/net/vmw_vsock/Kconfig
>> @@ -17,7 +17,7 @@ config VSOCKETS
>>
>> config VMWARE_VMCI_VSOCKETS
>> tristate "VMware VMCI transport for Virtual Sockets"
>> - depends on VSOCKETS && VMWARE_VMCI
>> + depends on VSOCKETS && VMWARE_VMCI && HYPERVISOR_GUEST
>> help
>> This module implements a VMCI transport for Virtual Sockets.
>>
>> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
>> index 10ae782..c068873 100644
>> --- a/net/vmw_vsock/vmci_transport.c
>> +++ b/net/vmw_vsock/vmci_transport.c
>> @@ -16,6 +16,7 @@
>> #include <linux/types.h>
>> #include <linux/bitops.h>
>> #include <linux/cred.h>
>> +#include <linux/hypervisor.h>
>> #include <linux/init.h>
>> #include <linux/io.h>
>> #include <linux/kernel.h>
>> @@ -73,6 +74,10 @@ struct vmci_transport_recv_pkt_info {
>> struct vmci_transport_packet pkt;
>> };
>>
>> +static bool skip_hypervisor_check;
>> +module_param(skip_hypervisor_check, bool, 0444);
>> +MODULE_PARM_DESC(hot_add, "If set, attempt to load on non-VMware platforms");
>> +
>> static LIST_HEAD(vmci_transport_cleanup_list);
>> static DEFINE_SPINLOCK(vmci_transport_cleanup_lock);
>> static DECLARE_WORK(vmci_transport_cleanup_work, vmci_transport_cleanup);
>> @@ -2085,6 +2090,12 @@ static int __init vmci_transport_init(void)
>> {
>> int err;
>>
>> + /* Check if we are running on VMware's hypervisor and bail out
>> + * if we are not.
>> + */
>> + if (!skip_hypervisor_check && x86_hyper != &x86_hyper_vmware)
>> + return -ENODEV;
>> +
I’ma bit concerned with this. On the host-side, we still want to be able to use the VMCI transport, so to allow that, the above should also allow loading the transport when x86_hyper == NULL. However, this may still cause a conflict with virtio host side support, so it looks like we need to find a better overall way to make the protocols co-exist.
>> /* Create the datagram handle that we will use to send and receive all
>> * VSocket control messages for this context.
>> */
>> --
>> 2.7.4
^ permalink raw reply
* [PATCH net-next 1/2] bpf: don't enable preemption twice in smap_do_verdict
From: Daniel Borkmann @ 2017-08-17 15:22 UTC (permalink / raw)
To: davem; +Cc: john.fastabend, ast, netdev, Daniel Borkmann
In-Reply-To: <cover.1502983087.git.daniel@iogearbox.net>
In smap_do_verdict(), the fall-through branch leads to call
preempt_enable() twice for the SK_REDIRECT, which creates an
imbalance. Only enable it for all remaining cases again.
Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
kernel/bpf/sockmap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index f7e5e6c..39de541 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -135,7 +135,8 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
/* Fall through and free skb otherwise */
case SK_DROP:
default:
- preempt_enable();
+ if (rc != SK_REDIRECT)
+ preempt_enable();
kfree_skb(skb);
}
}
--
1.9.3
^ permalink raw reply related
* [PATCH net-next 0/2] Two BPF smap related followups
From: Daniel Borkmann @ 2017-08-17 15:22 UTC (permalink / raw)
To: davem; +Cc: john.fastabend, ast, netdev, Daniel Borkmann
Fixing preemption imbalance and consolidating prologue
generation. Thanks!
Daniel Borkmann (2):
bpf: don't enable preemption twice in smap_do_verdict
bpf: reuse tc bpf prologue for sk skb progs
kernel/bpf/sockmap.c | 3 ++-
net/core/filter.c | 47 ++++++++++-------------------------------------
2 files changed, 12 insertions(+), 38 deletions(-)
--
1.9.3
^ permalink raw reply
* [PATCH net-next 2/2] bpf: reuse tc bpf prologue for sk skb progs
From: Daniel Borkmann @ 2017-08-17 15:22 UTC (permalink / raw)
To: davem; +Cc: john.fastabend, ast, netdev, Daniel Borkmann
In-Reply-To: <cover.1502983087.git.daniel@iogearbox.net>
Given both program types are effecitvely doing the same in the
prologue, just reuse the one that we had for tc and only adapt
to the corresponding drop verdict value. That way, we don't need
to have the duplicate from 8a31db561566 ("bpf: add access to sock
fields and pkt data from sk_skb programs") to maintain.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
net/core/filter.c | 47 ++++++++++-------------------------------------
1 file changed, 10 insertions(+), 37 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index e9f8dce..9865a98 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3456,8 +3456,8 @@ static bool sock_filter_is_valid_access(int off, int size,
return true;
}
-static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
- const struct bpf_prog *prog)
+static int bpf_unclone_prologue(struct bpf_insn *insn_buf, bool direct_write,
+ const struct bpf_prog *prog, int drop_verdict)
{
struct bpf_insn *insn = insn_buf;
@@ -3484,7 +3484,7 @@ static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
* return TC_ACT_SHOT;
*/
*insn++ = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2);
- *insn++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_0, TC_ACT_SHOT);
+ *insn++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_0, drop_verdict);
*insn++ = BPF_EXIT_INSN();
/* restore: */
@@ -3495,6 +3495,12 @@ static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
return insn - insn_buf;
}
+static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
+ const struct bpf_prog *prog)
+{
+ return bpf_unclone_prologue(insn_buf, direct_write, prog, TC_ACT_SHOT);
+}
+
static bool tc_cls_act_is_valid_access(int off, int size,
enum bpf_access_type type,
struct bpf_insn_access_aux *info)
@@ -3601,40 +3607,7 @@ static bool sock_ops_is_valid_access(int off, int size,
static int sk_skb_prologue(struct bpf_insn *insn_buf, bool direct_write,
const struct bpf_prog *prog)
{
- struct bpf_insn *insn = insn_buf;
-
- if (!direct_write)
- return 0;
-
- /* if (!skb->cloned)
- * goto start;
- *
- * (Fast-path, otherwise approximation that we might be
- * a clone, do the rest in helper.)
- */
- *insn++ = BPF_LDX_MEM(BPF_B, BPF_REG_6, BPF_REG_1, CLONED_OFFSET());
- *insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_6, CLONED_MASK);
- *insn++ = BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 7);
-
- /* ret = bpf_skb_pull_data(skb, 0); */
- *insn++ = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
- *insn++ = BPF_ALU64_REG(BPF_XOR, BPF_REG_2, BPF_REG_2);
- *insn++ = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
- BPF_FUNC_skb_pull_data);
- /* if (!ret)
- * goto restore;
- * return SK_DROP;
- */
- *insn++ = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2);
- *insn++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_0, SK_DROP);
- *insn++ = BPF_EXIT_INSN();
-
- /* restore: */
- *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_6);
- /* start: */
- *insn++ = prog->insnsi[0];
-
- return insn - insn_buf;
+ return bpf_unclone_prologue(insn_buf, direct_write, prog, SK_DROP);
}
static bool sk_skb_is_valid_access(int off, int size,
--
1.9.3
^ permalink raw reply related
* [PATCH net 1/1] net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
From: Saeed Mahameed @ 2017-08-17 15:29 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Kamal Heib, Huy Nguyen, Saeed Mahameed
From: Huy Nguyen <huyn@mellanox.com>
enable_4k_uar module parameter was added in patch cited below to
address the backward compatibility issue in SRIOV when the VM has
system's PAGE_SIZE uar implementation and the Hypervisor has 4k uar
implementation.
The above compatibility issue does not exist in the non SRIOV case.
In this patch, we always enable 4k uar implementation if SRIOV
is not enabled on mlx4's supported cards.
Fixes: 76e39ccf9c36 ("net/mlx4_core: Fix backward compatibility on VFs")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 09b9bc17bce9..5fe5cdc51357 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -432,7 +432,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
/* Virtual PCI function needs to determine UAR page size from
* firmware. Only master PCI function can set the uar page size
*/
- if (enable_4k_uar)
+ if (enable_4k_uar || !dev->persist->num_vfs)
dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
else
dev->uar_page_shift = PAGE_SHIFT;
@@ -2277,7 +2277,7 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
dev->caps.max_fmr_maps = (1 << (32 - ilog2(dev->caps.num_mpts))) - 1;
- if (enable_4k_uar) {
+ if (enable_4k_uar || !dev->persist->num_vfs) {
init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
PAGE_SHIFT - DEFAULT_UAR_PAGE_SHIFT;
init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
--
2.13.0
^ permalink raw reply related
* Re: [net-next 11/15] net/mlx5e: Properly indent within conditional statements
From: Saeed Mahameed @ 2017-08-17 15:35 UTC (permalink / raw)
To: David Laight
Cc: Saeed Mahameed, David S. Miller, netdev@vger.kernel.org,
Leon Romanovsky, Or Gerlitz
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DD0059721@AcuExch.aculab.com>
On Thu, Aug 17, 2017 at 5:18 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Saeed Mahameed
>> Sent: 17 August 2017 14:30
>> To: David S. Miller
>> Cc: netdev@vger.kernel.org; Leon Romanovsky; Or Gerlitz; Saeed Mahameed
>> Subject: [net-next 11/15] net/mlx5e: Properly indent within conditional statements
>>
>> From: Or Gerlitz <ogerlitz@mellanox.com>
>>
>> To fix these checkpatch complaints:
>>
>> WARNING: suspect code indent for conditional statements (8, 24)
>> + if (eth_proto & (MLX5E_PROT_MASK(MLX5E_10GBASE_SR)
>> [...]
>> + return PORT_FIBRE;
>>
>> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>> ---
>> .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 22 +++++++++++-----------
>> 1 file changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> index a75ac4d11c5b..ed161312a773 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> @@ -988,23 +988,23 @@ static u8 get_connector_port(u32 eth_proto, u8 connector_type)
>> return ptys2connector_type[connector_type];
>>
>> if (eth_proto & (MLX5E_PROT_MASK(MLX5E_10GBASE_SR)
>> - | MLX5E_PROT_MASK(MLX5E_40GBASE_SR4)
>> - | MLX5E_PROT_MASK(MLX5E_100GBASE_SR4)
>> - | MLX5E_PROT_MASK(MLX5E_1000BASE_CX_SGMII))) {
>> - return PORT_FIBRE;
>> + | MLX5E_PROT_MASK(MLX5E_40GBASE_SR4)
>> + | MLX5E_PROT_MASK(MLX5E_100GBASE_SR4)
>> + | MLX5E_PROT_MASK(MLX5E_1000BASE_CX_SGMII))) {
>> + return PORT_FIBRE;
>
> Gah, that is why the rules are stupid.
> If anything the continuation lines want indenting a few more bytes.
>
Totally agree,
But if Or wants to follow the rules, I won't stand in his way :).
> David
>
^ permalink raw reply
* Re: [PATCH net] datagram: When peeking datagrams with offset < 0 don't skip empty skbs
From: Matthew Dawson @ 2017-08-17 15:47 UTC (permalink / raw)
To: Willem de Bruijn; +Cc: Paolo Abeni, Network Development, Macieira, Thiago
In-Reply-To: <CAF=yD-JryS8g=8nB7yq9WVdCOjqSb7uNwncrRfWPmMQbdYrh3w@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4304 bytes --]
On Wednesday, August 16, 2017 7:27:17 PM EDT Willem de Bruijn wrote:
> On Wed, Aug 16, 2017 at 4:20 PM, Paolo Abeni <pabeni@redhat.com> wrote:
> > On Wed, 2017-08-16 at 11:18 -0400, Willem de Bruijn wrote:
> >> > If I read the above correctly, you are arguining in favor of the
> >> > addittional flag version, right?
> >>
> >> I was. Though if we are going to thread the argument from the caller
> >> to __skb_try_recv_from_queue to avoid rereading sk->sk_peek_off,
> >
> >> on second thought it might be simpler to do it through off:
> > [...]
> >
> >> This, of course, requires restricting sk_peek_off to protect against
> >> overflow.>
> > Ok, even if I'm not 100% sure overall this will be simpler when adding
> > also the overflow check.
>
> Actually, it is safe even without the check. Overflow of the signed integer
> is benign here.
>
> >> If I'm not mistaken, the test in udp_recvmsg currently incorrectly sets
> >>
> >> peeking to false when peeking at offset zero:
> >> peeking = off = sk_peek_offset(sk, flags);
> >
> > I think you are right, does not look correct.
>
> By shifting the offset by two, we could even make both assignments
> become correct. Return 0 without peek, 1 on peek without SO_PEEK_OFF,
> 2+ otherwise, including overflow up to INT_MIN + 1.
>
> But the end result is more readable if we just separate those two
> assignments.
>
> @@ -1574,7 +1574,8 @@ int udp_recvmsg(struct sock *sk, struct msghdr
> *msg, size_t len, int noblock,
> return ip_recv_error(sk, msg, len, addr_len);
>
> try_again:
> - peeking = off = sk_peek_offset(sk, flags);
> + peeking = flags & MSG_PEEK;
> + off = sk_peek_offset(sk, flags);
> skb = __skb_recv_udp(sk, flags, noblock, &peeked, &off, &err);
> if (!skb)
> return err;
>
> @@ -362,7 +362,8 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr
> *msg, size_t len,
> return ipv6_recv_rxpmtu(sk, msg, len, addr_len);
>
> try_again:
> - peeking = off = sk_peek_offset(sk, flags);
> + peeking = flags & MSG_PEEK;
> + off = sk_peek_offset(sk, flags);
> skb = __skb_recv_udp(sk, flags, noblock, &peeked, &off, &err);
> if (!skb)
> return err;
>
> At which point there is also no longer a need for the variable shift
> at sk_peek_offset. Just pass the raw value down to
> __skb_try_recv_from_queue and disambiguate there:
>
> @@ -506,11 +506,8 @@ int sk_set_peek_off(struct sock *sk, int val);
>
> static inline int sk_peek_offset(struct sock *sk, int flags)
> {
> - if (unlikely(flags & MSG_PEEK)) {
> - s32 off = READ_ONCE(sk->sk_peek_off);
> - if (off >= 0)
> - return off;
> - }
> + if (unlikely(flags & MSG_PEEK))
> + return READ_ONCE(sk->sk_peek_off);
>
> return 0;
> }
>
> @@ -169,14 +169,20 @@ struct sk_buff *__skb_try_recv_from_queue(struct sock
> *sk, int *peeked, int *off, int *err, struct sk_buff **last)
> {
> + bool peek_at_off = false;
> struct sk_buff *skb;
> - int _off = *off;
> + int _off = 0;
> +
> + if (flags & MSG_PEEK && (*off) >= 0) {
> + peek_at_off = true;
> + _off = *off;
> + }
>
> *last = queue->prev;
> skb_queue_walk(queue, skb) {
> if (flags & MSG_PEEK) {
> - if (_off >= skb->len && (skb->len || _off ||
> - skb->peeked)) {
> + if (peek_at_off && _off >= skb->len &&
> + (skb->len || _off || skb->peeked)) {
^ I'm pretty sure we can remove this check
(that skb->len is not zero) in this if statement. If _off is zero, then skb-
>len must also be zero (since _off >= skb->len, if _off is 0, skb->len <= 0.
If skb->len can't be negative, then skb->len <= 0 => skb->len == 0). If _off
is not zero, then checking skb->len is redundant.
> _off -= skb->len;
> continue;
> }
Is this queued to go in already? Or can I help by updating my patch with what
was discussed here? I can do that today if wanted.
--
Matthew
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH net-next] bpf: no need to nullify ri->map in xdp_do_redirect
From: John Fastabend @ 2017-08-17 15:59 UTC (permalink / raw)
To: Daniel Borkmann, davem; +Cc: ast, netdev
In-Reply-To: <782d0b6850cb1a63230c0fd634f1e61d67a01e60.1502975082.git.daniel@iogearbox.net>
On 08/17/2017 06:07 AM, Daniel Borkmann wrote:
> We are guaranteed to have a NULL ri->map in this branch since
> we test for it earlier, so we don't need to reset it here.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
> net/core/filter.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index e9f8dce..ea3ca34 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2568,7 +2568,6 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
>
> fwd = dev_get_by_index_rcu(dev_net(dev), index);
> ri->ifindex = 0;
> - ri->map = NULL;
> if (unlikely(!fwd)) {
> bpf_warn_invalid_xdp_redirect(index);
> return -EINVAL;
>
Thanks this was on my todo list as well.
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next 1/2] bpf: don't enable preemption twice in smap_do_verdict
From: John Fastabend @ 2017-08-17 16:03 UTC (permalink / raw)
To: Daniel Borkmann, davem; +Cc: ast, netdev
In-Reply-To: <d90eed71c2a723edd75f19fb1b921fec74d8a514.1502983087.git.daniel@iogearbox.net>
On 08/17/2017 08:22 AM, Daniel Borkmann wrote:
> In smap_do_verdict(), the fall-through branch leads to call
> preempt_enable() twice for the SK_REDIRECT, which creates an
> imbalance. Only enable it for all remaining cases again.
>
> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
> kernel/bpf/sockmap.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
> index f7e5e6c..39de541 100644
> --- a/kernel/bpf/sockmap.c
> +++ b/kernel/bpf/sockmap.c
> @@ -135,7 +135,8 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
> /* Fall through and free skb otherwise */
> case SK_DROP:
> default:
> - preempt_enable();
> + if (rc != SK_REDIRECT)
> + preempt_enable();
> kfree_skb(skb);
> }
> }
>
Yep looks good, nice catch Alexei. Thanks! I'll add a selftests entry
in test_maps to catch this fall through case. Looks like we don't hit
this case during selftests at the moment.
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next 2/2] bpf: reuse tc bpf prologue for sk skb progs
From: John Fastabend @ 2017-08-17 16:05 UTC (permalink / raw)
To: Daniel Borkmann, davem; +Cc: ast, netdev
In-Reply-To: <efce6714acbd646c1e83876504ae7c5fb5e12625.1502983087.git.daniel@iogearbox.net>
On 08/17/2017 08:22 AM, Daniel Borkmann wrote:
> Given both program types are effecitvely doing the same in the
> prologue, just reuse the one that we had for tc and only adapt
> to the corresponding drop verdict value. That way, we don't need
> to have the duplicate from 8a31db561566 ("bpf: add access to sock
> fields and pkt data from sk_skb programs") to maintain.
>
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
> net/core/filter.c | 47 ++++++++++-------------------------------------
> 1 file changed, 10 insertions(+), 37 deletions(-)
>
Nice clean-up, Thanks.
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next 1/2] bpf: don't enable preemption twice in smap_do_verdict
From: Alexei Starovoitov @ 2017-08-17 16:10 UTC (permalink / raw)
To: John Fastabend, Daniel Borkmann, davem; +Cc: netdev
In-Reply-To: <5995BE4D.10503@gmail.com>
On 8/17/17 9:03 AM, John Fastabend wrote:
> On 08/17/2017 08:22 AM, Daniel Borkmann wrote:
>> In smap_do_verdict(), the fall-through branch leads to call
>> preempt_enable() twice for the SK_REDIRECT, which creates an
>> imbalance. Only enable it for all remaining cases again.
>>
>> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
>> Reported-by: Alexei Starovoitov <ast@kernel.org>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>> kernel/bpf/sockmap.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
>> index f7e5e6c..39de541 100644
>> --- a/kernel/bpf/sockmap.c
>> +++ b/kernel/bpf/sockmap.c
>> @@ -135,7 +135,8 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
>> /* Fall through and free skb otherwise */
>> case SK_DROP:
>> default:
>> - preempt_enable();
>> + if (rc != SK_REDIRECT)
>> + preempt_enable();
>> kfree_skb(skb);
>> }
>> }
>>
>
> Yep looks good, nice catch Alexei. Thanks! I'll add a selftests entry
> in test_maps to catch this fall through case. Looks like we don't hit
> this case during selftests at the moment.
>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [PATCH net-next 2/2] bpf: reuse tc bpf prologue for sk skb progs
From: Alexei Starovoitov @ 2017-08-17 16:12 UTC (permalink / raw)
To: John Fastabend, Daniel Borkmann, davem; +Cc: netdev
In-Reply-To: <5995BEB1.8020509@gmail.com>
On 8/17/17 9:05 AM, John Fastabend wrote:
> On 08/17/2017 08:22 AM, Daniel Borkmann wrote:
>> Given both program types are effecitvely doing the same in the
>> prologue, just reuse the one that we had for tc and only adapt
>> to the corresponding drop verdict value. That way, we don't need
>> to have the duplicate from 8a31db561566 ("bpf: add access to sock
>> fields and pkt data from sk_skb programs") to maintain.
>>
>> Reported-by: Alexei Starovoitov <ast@kernel.org>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>> net/core/filter.c | 47 ++++++++++-------------------------------------
>> 1 file changed, 10 insertions(+), 37 deletions(-)
>>
>
> Nice clean-up, Thanks.
>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox