* Inter-VRF routing on a single machine
From: Darwin Dingel @ 2016-04-12 10:09 UTC (permalink / raw)
To: netdev
Hi All,
Have anyone tried the following setup on a single machine with 2 TCP
sockets on different VRF's and succeeded?
- client_socket on VRF1
- server_socket on VRF2
- ip rules and iproutes for inter-VRF set up
- client_socket sends TCP connect to server_socket. skb was sent using
VRF1 interface
- skb received in loopback interface
- TCP code got SYN but cannot route back to VRF1 to send ACK.
I was wondering if this is a known limitation of VRF as of the moment,
or could work with proper iprules/iproute.
Thanks!
Darwin
^ permalink raw reply
* [PATCH net-next v1 1/1] tipc: fix a race condition leading to subscriber refcnt bug
From: Parthasarathy Bhuvaragan @ 2016-04-12 11:05 UTC (permalink / raw)
To: netdev; +Cc: tipc-discussion
Until now, the requests sent to topology server are queued
to a workqueue by the generic server framework.
These messages are processed by worker threads and trigger the
registered callbacks.
To reduce latency on uniprocessor systems, explicit rescheduling
is performed using cond_resched() after MAX_RECV_MSG_COUNT(25)
messages.
This implementation on SMP systems leads to an subscriber refcnt
error as described below:
When a worker thread yields by calling cond_resched() in a SMP
system, a new worker is created on another CPU to process the
pending workitem. Sometimes the sleeping thread wakes up before
the new thread finishes execution.
This breaks the assumption on ordering and being single threaded.
The fault is more frequent when MAX_RECV_MSG_COUNT is lowered.
If the first thread was processing subscription create and the
second thread processing close(), the close request will free
the subscriber and the create request oops as follows:
[31.224137] WARNING: CPU: 2 PID: 266 at include/linux/kref.h:46 tipc_subscrb_rcv_cb+0x317/0x380 [tipc]
[31.228143] CPU: 2 PID: 266 Comm: kworker/u8:1 Not tainted 4.5.0+ #97
[31.228377] Workqueue: tipc_rcv tipc_recv_work [tipc]
[...]
[31.228377] Call Trace:
[31.228377] [<ffffffff812fbb6b>] dump_stack+0x4d/0x72
[31.228377] [<ffffffff8105a311>] __warn+0xd1/0xf0
[31.228377] [<ffffffff8105a3fd>] warn_slowpath_null+0x1d/0x20
[31.228377] [<ffffffffa0098067>] tipc_subscrb_rcv_cb+0x317/0x380 [tipc]
[31.228377] [<ffffffffa00a4984>] tipc_receive_from_sock+0xd4/0x130 [tipc]
[31.228377] [<ffffffffa00a439b>] tipc_recv_work+0x2b/0x50 [tipc]
[31.228377] [<ffffffff81071925>] process_one_work+0x145/0x3d0
[31.246554] ---[ end trace c3882c9baa05a4fd ]---
[31.248327] BUG: spinlock bad magic on CPU#2, kworker/u8:1/266
[31.249119] BUG: unable to handle kernel NULL pointer dereference at 0000000000000428
[31.249323] IP: [<ffffffff81099d0c>] spin_dump+0x5c/0xe0
[31.249323] PGD 0
[31.249323] Oops: 0000 [#1] SMP
In this commit, we
- rename tipc_conn_shutdown() to tipc_conn_release().
- move connection release callback execution from tipc_close_conn()
to a new function tipc_sock_release(), which is executed before
we free the connection.
Thus we release the subscriber during connection release procedure
rather than connection shutdown procedure.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
---
net/tipc/server.c | 19 +++++++++++++------
net/tipc/server.h | 4 ++--
net/tipc/subscr.c | 4 ++--
3 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/net/tipc/server.c b/net/tipc/server.c
index 2446bfbaa309..7a0af2dc0406 100644
--- a/net/tipc/server.c
+++ b/net/tipc/server.c
@@ -86,6 +86,7 @@ struct outqueue_entry {
static void tipc_recv_work(struct work_struct *work);
static void tipc_send_work(struct work_struct *work);
static void tipc_clean_outqueues(struct tipc_conn *con);
+static void tipc_sock_release(struct tipc_conn *con);
static void tipc_conn_kref_release(struct kref *kref)
{
@@ -102,6 +103,7 @@ static void tipc_conn_kref_release(struct kref *kref)
}
saddr->scope = -TIPC_NODE_SCOPE;
kernel_bind(sock, (struct sockaddr *)saddr, sizeof(*saddr));
+ tipc_sock_release(con);
sock_release(sock);
con->sock = NULL;
}
@@ -184,26 +186,31 @@ static void tipc_unregister_callbacks(struct tipc_conn *con)
write_unlock_bh(&sk->sk_callback_lock);
}
+static void tipc_sock_release(struct tipc_conn *con)
+{
+ struct tipc_server *s = con->server;
+
+ if (con->conid)
+ s->tipc_conn_release(con->conid, con->usr_data);
+
+ tipc_unregister_callbacks(con);
+}
+
static void tipc_close_conn(struct tipc_conn *con)
{
struct tipc_server *s = con->server;
if (test_and_clear_bit(CF_CONNECTED, &con->flags)) {
- if (con->conid)
- s->tipc_conn_shutdown(con->conid, con->usr_data);
spin_lock_bh(&s->idr_lock);
idr_remove(&s->conn_idr, con->conid);
s->idr_in_use--;
spin_unlock_bh(&s->idr_lock);
- tipc_unregister_callbacks(con);
-
/* We shouldn't flush pending works as we may be in the
* thread. In fact the races with pending rx/tx work structs
* are harmless for us here as we have already deleted this
- * connection from server connection list and set
- * sk->sk_user_data to 0 before releasing connection object.
+ * connection from server connection list.
*/
kernel_sock_shutdown(con->sock, SHUT_RDWR);
diff --git a/net/tipc/server.h b/net/tipc/server.h
index 9015faedb1b0..34f8055afa3b 100644
--- a/net/tipc/server.h
+++ b/net/tipc/server.h
@@ -53,7 +53,7 @@
* @send_wq: send workqueue
* @max_rcvbuf_size: maximum permitted receive message length
* @tipc_conn_new: callback will be called when new connection is incoming
- * @tipc_conn_shutdown: callback will be called when connection is shut down
+ * @tipc_conn_release: callback will be called before releasing the connection
* @tipc_conn_recvmsg: callback will be called when message arrives
* @saddr: TIPC server address
* @name: server name
@@ -70,7 +70,7 @@ struct tipc_server {
struct workqueue_struct *send_wq;
int max_rcvbuf_size;
void *(*tipc_conn_new)(int conid);
- void (*tipc_conn_shutdown)(int conid, void *usr_data);
+ void (*tipc_conn_release)(int conid, void *usr_data);
void (*tipc_conn_recvmsg)(struct net *net, int conid,
struct sockaddr_tipc *addr, void *usr_data,
void *buf, size_t len);
diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c
index e6cb386fbf34..79de588c7bd6 100644
--- a/net/tipc/subscr.c
+++ b/net/tipc/subscr.c
@@ -302,7 +302,7 @@ static void tipc_subscrp_subscribe(struct net *net, struct tipc_subscr *s,
}
/* Handle one termination request for the subscriber */
-static void tipc_subscrb_shutdown_cb(int conid, void *usr_data)
+static void tipc_subscrb_release_cb(int conid, void *usr_data)
{
tipc_subscrb_delete((struct tipc_subscriber *)usr_data);
}
@@ -365,7 +365,7 @@ int tipc_topsrv_start(struct net *net)
topsrv->max_rcvbuf_size = sizeof(struct tipc_subscr);
topsrv->tipc_conn_recvmsg = tipc_subscrb_rcv_cb;
topsrv->tipc_conn_new = tipc_subscrb_connect_cb;
- topsrv->tipc_conn_shutdown = tipc_subscrb_shutdown_cb;
+ topsrv->tipc_conn_release = tipc_subscrb_release_cb;
strncpy(topsrv->name, name, strlen(name) + 1);
tn->topsrv = topsrv;
--
2.1.4
^ permalink raw reply related
* Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption
From: Fridolin Pokorny @ 2016-04-12 11:13 UTC (permalink / raw)
To: Tadeusz Struk
Cc: Tom Herbert, Herbert Xu, linux-crypto, LKML, David S. Miller,
Linux Kernel Network Developers, davejwatson, nmav,
fridolin.pokorny
In-Reply-To: <CALx6S37m_ayZJ4nth0SPNr2Km2+uBZUCtK4iqPKHTARv2eB4aA@mail.gmail.com>
On 08.04.2016 04:58, Tom Herbert wrote:
> On Thu, Apr 7, 2016 at 11:52 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> On Wed, Apr 06, 2016 at 10:56:12AM -0700, Tadeusz Struk wrote:
>>>
>>> The intend is to enable HW acceleration of the TLS protocol.
>>> The way it will work is that the user space will send a packet of data
>>> via AF_ALG and HW will authenticate and encrypt it in one go.
>>
>> There have been suggestions to implement TLS data-path within
>> the kernel. So we should decide whether we pursue that or go
>> with your approach before we start adding algorithms.
>>
> Yes, please see Dave Watson's patches on this.
>
Hi Tadeusz,
we were experimenting with this. We have a prove of concept of a kernel
TLS type socket, so called AF_KTLS, which is based on Dave Watson's
RFC5288 patch. It handles both TLS and DTLS, unfortunately it is not
ready now to be proposed here. There are still issues which should be
solved (but mostly user space API design) [1]. If you are interested, we
could combine efforts.
Regards,
Fridolin Pokorny
[1] https://github.com/fridex/af_ktls
^ permalink raw reply
* [PATCH 4.6 fix] bgmac: reset & enable Ethernet core before using it
From: Rafał Miłecki @ 2016-04-12 11:30 UTC (permalink / raw)
To: David S. Miller, netdev
Cc: Hauke Mehrtens, Rafał Miłecki, Felix Fietkau
This fixes Ethernet on D-Link DIR-885L with BCM47094 SoC. Felix reported
similar fix was needed for his BCM4709 device (Buffalo WXR-1900DHP?).
I tested this for regressions on BCM4706, BCM4708A0 and BCM47081A0.
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
---
drivers/net/ethernet/broadcom/bgmac.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index 99b30a9..38db2e4 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1572,6 +1572,11 @@ static int bgmac_probe(struct bcma_device *core)
dev_warn(&core->dev, "Using random MAC: %pM\n", mac);
}
+ /* This (reset &) enable is not preset in specs or reference driver but
+ * Broadcom does it in arch PCI code when enabling fake PCI device.
+ */
+ bcma_core_enable(core, 0);
+
/* Allocation and references */
net_dev = alloc_etherdev(sizeof(*bgmac));
if (!net_dev)
--
1.8.4.5
^ permalink raw reply related
* [PATCH v2] mwifiex: fix possible NULL dereference
From: Sudip Mukherjee @ 2016-04-12 11:46 UTC (permalink / raw)
To: Amitkumar Karwar, Nishant Sarmukadam, Kalle Valo
Cc: linux-kernel, linux-wireless, netdev, Sudip Mukherjee,
Christian Daudt
From: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
We have a check for card just after dereferencing it. So if it is NULL
we have already dereferenced it before its check. Lets dereference it
after checking card for NULL.
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index edf8b07..d4db9db 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -2884,10 +2884,11 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
{
struct pcie_service_card *card = adapter->card;
const struct mwifiex_pcie_card_reg *reg;
- struct pci_dev *pdev = card->dev;
int i;
if (card) {
+ struct pci_dev *pdev = card->dev;
+
if (card->msix_enable) {
for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++)
synchronize_irq(card->msix_entries[i].vector);
--
1.9.1
^ permalink raw reply related
* Re: [PATCH v2] mwifiex: fix possible NULL dereference
From: Arend van Spriel @ 2016-04-12 11:50 UTC (permalink / raw)
To: Sudip Mukherjee, Amitkumar Karwar, Nishant Sarmukadam, Kalle Valo
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, Sudip Mukherjee, Christian Daudt
In-Reply-To: <1460461597-7309-1-git-send-email-sudipm.mukherjee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On 12-04-16 13:46, Sudip Mukherjee wrote:
> From: Sudip Mukherjee <sudip.mukherjee-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>
>
> We have a check for card just after dereferencing it. So if it is NULL
> we have already dereferenced it before its check. Lets dereference it
> after checking card for NULL.
And you are changing the scope of the pdev variable.
Regards,
Arend
> Signed-off-by: Sudip Mukherjee <sudip.mukherjee-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>
> ---
> drivers/net/wireless/marvell/mwifiex/pcie.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index edf8b07..d4db9db 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -2884,10 +2884,11 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
> {
> struct pcie_service_card *card = adapter->card;
> const struct mwifiex_pcie_card_reg *reg;
> - struct pci_dev *pdev = card->dev;
> int i;
>
> if (card) {
> + struct pci_dev *pdev = card->dev;
> +
> if (card->msix_enable) {
> for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++)
> synchronize_irq(card->msix_entries[i].vector);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v2] mwifiex: fix possible NULL dereference
From: Sudip Mukherjee @ 2016-04-12 11:56 UTC (permalink / raw)
To: Arend van Spriel, Amitkumar Karwar, Nishant Sarmukadam,
Kalle Valo
Cc: linux-kernel, linux-wireless, netdev, Sudip Mukherjee,
Christian Daudt
In-Reply-To: <570CE113.20200@broadcom.com>
On Tuesday 12 April 2016 05:20 PM, Arend van Spriel wrote:
>
>
> On 12-04-16 13:46, Sudip Mukherjee wrote:
>> From: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
>>
>> We have a check for card just after dereferencing it. So if it is NULL
>> we have already dereferenced it before its check. Lets dereference it
>> after checking card for NULL.
>
> And you are changing the scope of the pdev variable.
yes, and since all usage of pdev is inside the "if" block so it should
not matter.
regards
sudip
>
> Regards,
> Arend
>
>> Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
>> ---
>> drivers/net/wireless/marvell/mwifiex/pcie.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> index edf8b07..d4db9db 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> @@ -2884,10 +2884,11 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
>> {
>> struct pcie_service_card *card = adapter->card;
>> const struct mwifiex_pcie_card_reg *reg;
>> - struct pci_dev *pdev = card->dev;
>> int i;
>>
>> if (card) {
>> + struct pci_dev *pdev = card->dev;
>> +
>> if (card->msix_enable) {
>> for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++)
>> synchronize_irq(card->msix_entries[i].vector);
>>
^ permalink raw reply
* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Yang Yingliang @ 2016-04-12 12:31 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, davem, Ding Tianhong
In-Reply-To: <570C6483.5020502@huawei.com>
On 2016/4/12 10:59, Yang Yingliang wrote:
>
>
> On 2016/4/11 20:13, Eric Dumazet wrote:
>> On Mon, 2016-04-11 at 19:57 +0800, Yang Yingliang wrote:
>>>
>>> On 2016/4/8 22:44, Eric Dumazet wrote:
>>>> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
>>>>
>>>>> I expand tcp_adv_win_scale and tcp_rmem. It has no effect.
>>>>
>>>> Try :
>>>>
>>>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>>>
>>>> And restart your flows.
>>>>
>>> cat /proc/sys/net/ipv4/tcp_rmem
>>> 10240 2097152 10485760
>>
>> What about leaving the default values ?
> I tried, it did not work.
>
>>
>> $ cat /proc/sys/net/ipv4/tcp_rmem
>> 4096 87380 6291456
>>
>>>
>>> echo 102400 20971520 104857600 > /proc/sys/net/ipv4/tcp_rmem
>>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>>
>>> It seems has not effect.
>>>
>>
>> I have no idea what you did on the sender side to allow it to send more
>> than 1.5 MB then.
>
> We are doing performance test. The sender send 256KB per-block with 128
> threads to one socket. And the receiver uses 10Gb NIC to handle the
> data on ARM64. The data flow is driver->ip layer->tcp layer->iscsi.
>
> I added some debug messages and found handling backlog packets in
> __release_sock() cost about 11ms at most. This can cause backlog queue
> overflow. The sk_data_ready is re-assigned, it may cost time in our
> program. I will check it out.
>
I traced the cost cycles of handling backlog packets in
__release_sock().
16.97 ms to handling about 12MB backlog packets, of which 13.66ms to do
sk_data_ready.
The speed of handling packets in TCP is 5.65Gb/s which is smaller than
the NIC's bandwidth. So the packets will be dropped.
If the cost of sk_data_read cannot be reduced, do we have other choice
exclude dropping packets ?
^ permalink raw reply
* Re: GMII2RGMII Converter support in macb driver
From: Nicolas Ferre @ 2016-04-12 13:21 UTC (permalink / raw)
To: Appana Durga Kedareswara Rao, netdev@vger.kernel.org,
Michal Simek
Cc: Punnaiah Choudary Kalluri, Harini Katakam, Anirudha Sarangi,
Appana Durga Kedareswara Rao
In-Reply-To: <C246CAC1457055469EF09E3A7AC4E11A4A57592E@XAP-PVEXMBX01.xlnx.xilinx.com>
Le 12/04/2016 15:03, Appana Durga Kedareswara Rao a écrit :
> Hi All,
>
>
>
>
>
> There is a Xilinx custom IP for GMII to RGMII conversion
> data sheet here
> (http://www.xilinx.com/support/documentation/ip_documentation/gmii_to_rgmii/v4_0/pg160-gmii-to-rgmii.pdf
> )
>
>
>
>
>
> Unlike other Phy’s this IP won’t support auto negotiation
> and other features that usually normal Phy’s support.
>
> This IP has only one register (Control register) which needs to be
> programmed based on the external phy auto negotiation
>
> (Based on the external phy negotiated speed).
>
>
>
> I am able to make it work for GEM driver by doing the below changes in
> the driver (drivers/net/ethernet/cadence/macb.c).
>
>
>
> +#define XEMACPS_GMII2RGMII_FULLDPLX BMCR_FULLDPLX
>
> +#define XEMACPS_GMII2RGMII_SPEED1000 BMCR_SPEED1000
>
> +#define XEMACPS_GMII2RGMII_SPEED100 BMCR_SPEED100
>
> +#define
> XEMACPS_GMII2RGMII_REG_NUM 0x10
>
> +
>
> /*
>
> * Graceful stop timeouts in us. We should allow up to
>
> * 1 frame time (10 Mbits/s, full-duplex, ignoring collisions)
>
> @@ -311,8 +317,10 @@ static void macb_handle_link_change(struct
> net_device *dev)
>
> {
>
> struct macb *bp = netdev_priv(dev);
>
> struct phy_device *phydev = bp->phy_dev;
>
> + struct phy_device *gmii2rgmii_phydev = bp->gmii2rgmii_phy_dev;
>
> unsigned long flags;
>
> int status_change = 0;
>
> + u16 gmii2rgmii_reg = 0;
>
> spin_lock_irqsave(&bp->lock, flags);
>
> @@ -326,15 +334,27 @@ static void macb_handle_link_change(struct
> net_device *dev)
>
> if (macb_is_gem(bp))
>
> reg &=
> ~GEM_BIT(GBE);
>
> - if (phydev->duplex)
>
> + if (phydev->duplex) {
>
> reg |=
> MACB_BIT(FD);
>
> - if (phydev->speed == SPEED_100)
>
> +
> gmii2rgmii_reg |= XEMACPS_GMII2RGMII_FULLDPLX;
>
> + }
>
> + if (phydev->speed ==
> SPEED_100) {
>
> reg |=
> MACB_BIT(SPD);
>
> +
> gmii2rgmii_reg |= XEMACPS_GMII2RGMII_SPEED100;
>
> + }
>
> if (phydev->speed ==
> SPEED_1000 &&
>
> - bp->caps &
> MACB_CAPS_GIGABIT_MODE_AVAILABLE)
>
> + bp->caps &
> MACB_CAPS_GIGABIT_MODE_AVAILABLE) {
>
> reg |=
> GEM_BIT(GBE);
>
> +
> gmii2rgmii_reg |= XEMACPS_GMII2RGMII_SPEED1000;
>
> + }
>
> macb_or_gem_writel(bp,
> NCFGR, reg);
>
> + if (gmii2rgmii_phydev != NULL) {
>
> +
> macb_mdio_write(bp->mii_bus,
>
> + gmii2rgmii_phydev->addr,
>
> + XEMACPS_GMII2RGMII_REG_NUM,
>
> + gmii2rgmii_reg);
>
> + }
>
> bp->speed = phydev->speed;
>
> bp->duplex = phydev->duplex;
>
> @@ -382,6 +402,19 @@ static int macb_mii_probe(struct net_device *dev)
>
> int phy_irq;
>
> int ret;
>
> + if (bp->gmii2rgmii_phy_node) {
>
> + phydev = of_phy_attach(bp->dev,
>
> + bp->gmii2rgmii_phy_node,
>
> + 0,
> 0);
>
> + if (!phydev) {
>
> + dev_err(&bp->pdev->dev, "%s:
> no gmii to rgmii converter found\n",
>
> + dev->name);
>
> + return -1;
>
> + }
>
> + bp->gmii2rgmii_phy_dev = phydev;
>
> + } else
>
> + bp->gmii2rgmii_phy_dev = NULL;
>
> +
>
> phydev = phy_find_first(bp->mii_bus);
>
> if (!phydev) {
>
> netdev_err(dev, "no PHY found\n");
>
> @@ -402,6 +435,8 @@ static int macb_mii_probe(struct net_device *dev)
>
>
> bp->phy_interface);
>
> if (ret) {
>
> netdev_err(dev, "Could not attach to PHY\n");
>
> + if (bp->gmii2rgmii_phy_dev)
>
> +
> phy_disconnect(bp->gmii2rgmii_phy_dev);
>
> return ret;
>
> }
>
> @@ -3368,6 +3403,9 @@ static int macb_probe(struct platform_device *pdev)
>
> bp->phy_interface = err;
>
> }
>
> + bp->gmii2rgmii_phy_node =
> of_parse_phandle(bp->pdev->dev.of_node,
>
> +
> "gmii2rgmii-phy-handle", 0);
>
> +
>
> macb_reset_phy(pdev);
>
> /* IP specific init */
>
> @@ -3422,6 +3460,8 @@ static int macb_remove(struct platform_device *pdev)
>
> bp = netdev_priv(dev);
>
> if (bp->phy_dev)
>
> phy_disconnect(bp->phy_dev);
>
> + if (bp->gmii2rgmii_phy_dev)
>
> +
> phy_disconnect(bp->gmii2rgmii_phy_dev);
>
>
>
> But doing above changes making driver looks odd.
>
> could you please suggest any better option to add support for this IP in
> the macb driver?
Appana,
I certainly can't prototype the solution based on your datasheet and the
code sent... do a sensible proposal, then we can evaluate.
As the IP is separated from the Eth controller, make it a separate
driver (an emulated phy one for instance... even if I don't know if it
makes sense).
I don't know if others have already made such an adaptation layer
between GMII to RGMII but I'm pretty sure it can't be inserted into the
macb driver.
Bye,
--
Nicolas Ferre
^ permalink raw reply
* Re: [PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: Thomas Graf @ 2016-04-12 13:21 UTC (permalink / raw)
To: roopa; +Cc: netdev, jhs, davem, Nikolay Aleksandrov
In-Reply-To: <570C7133.8070109@cumulusnetworks.com>
On 04/11/16 at 08:53pm, roopa wrote:
> Top level stats attributes can be netdev or global attributes: We can include string "LINK" in
> the names of all stats belonging to a netdev to make it easier to recognize the netdev stats (example):
> IFLA_STATS_LINK64, (netdev)
> IFLA_STATS_LINK_INET6, (netdev)
> IFLA_STATS_TCP, (non-netdev, global tcp stats)
This is fine as well. It means that we cant mix netdev and non-netdev
stats or stats for multiple netdevs in the same request which would
not be the case if you nest it and have a top level attribute which
is a list of requests. That may be borderline to overengineering
though so I'm fine this as well.
> We will need a field in netlink_callback to indicate global or netdev stats when the stats
> crosses skb boundaries. A single nlmsg cannot have both netdev and global stats.
I would treat each IFLA_STATS_ as its own nlmsg in the reply and
enforce an NLM_F_DUMP request for any multi request message.
^ permalink raw reply
* RE: GMII2RGMII Converter support in macb driver
From: Appana Durga Kedareswara Rao @ 2016-04-12 13:31 UTC (permalink / raw)
To: Nicolas Ferre, netdev@vger.kernel.org, Michal Simek
Cc: Punnaiah Choudary Kalluri, Harini Katakam, Anirudha Sarangi
In-Reply-To: <570CF663.4000908@atmel.com>
Hi Nicolas Ferre,
> -----Original Message-----
> From: Nicolas Ferre [mailto:nicolas.ferre@atmel.com]
> Sent: Tuesday, April 12, 2016 6:52 PM
> To: Appana Durga Kedareswara Rao <appanad@xilinx.com>;
> netdev@vger.kernel.org; Michal Simek <michals@xilinx.com>
> Cc: Punnaiah Choudary Kalluri <punnaia@xilinx.com>; Harini Katakam
> <harinik@xilinx.com>; Anirudha Sarangi <anirudh@xilinx.com>; Appana Durga
> Kedareswara Rao <appanad@xilinx.com>
> Subject: Re: GMII2RGMII Converter support in macb driver
>
> Le 12/04/2016 15:03, Appana Durga Kedareswara Rao a écrit :
> > Hi All,
> >
> >
> >
> >
> >
> > There is a Xilinx custom IP for GMII to RGMII
> > conversion data sheet here
> > (http://www.xilinx.com/support/documentation/ip_documentation/gmii_to_
> > rgmii/v4_0/pg160-gmii-to-rgmii.pdf
> > )
> >
> >
> >
> >
> >
> > Unlike other Phy's this IP won't support auto
> > negotiation and other features that usually normal Phy's support.
> >
> > This IP has only one register (Control register) which needs to be
> > programmed based on the external phy auto negotiation
> >
> > (Based on the external phy negotiated speed).
> >
> >
> >
> > I am able to make it work for GEM driver by doing the below changes in
> > the driver (drivers/net/ethernet/cadence/macb.c).
> >
> >
> >
> > +#define XEMACPS_GMII2RGMII_FULLDPLX BMCR_FULLDPLX
> >
> > +#define XEMACPS_GMII2RGMII_SPEED1000 BMCR_SPEED1000
> >
> > +#define XEMACPS_GMII2RGMII_SPEED100 BMCR_SPEED100
> >
> > +#define
> > XEMACPS_GMII2RGMII_REG_NUM 0x10
> >
> > +
> >
> > /*
> >
> > * Graceful stop timeouts in us. We should allow up to
> >
> > * 1 frame time (10 Mbits/s, full-duplex, ignoring collisions)
> >
> > @@ -311,8 +317,10 @@ static void macb_handle_link_change(struct
> > net_device *dev)
> >
> > {
> >
> > struct macb *bp = netdev_priv(dev);
> >
> > struct phy_device *phydev = bp->phy_dev;
> >
> > + struct phy_device *gmii2rgmii_phydev =
> > + bp->gmii2rgmii_phy_dev;
> >
> > unsigned long flags;
> >
> > int status_change = 0;
> >
> > + u16 gmii2rgmii_reg = 0;
> >
> > spin_lock_irqsave(&bp->lock, flags);
> >
> > @@ -326,15 +334,27 @@ static void macb_handle_link_change(struct
> > net_device *dev)
> >
> > if (macb_is_gem(bp))
> >
> > reg &=
> > ~GEM_BIT(GBE);
> >
> > - if (phydev->duplex)
> >
> > + if (phydev->duplex) {
> >
> > reg |=
> > MACB_BIT(FD);
> >
> > - if (phydev->speed == SPEED_100)
> >
> > +
> > gmii2rgmii_reg |= XEMACPS_GMII2RGMII_FULLDPLX;
> >
> > + }
> >
> > + if (phydev->speed ==
> > SPEED_100) {
> >
> > reg |=
> > MACB_BIT(SPD);
> >
> > +
> > gmii2rgmii_reg |= XEMACPS_GMII2RGMII_SPEED100;
> >
> > + }
> >
> > if (phydev->speed ==
> > SPEED_1000 &&
> >
> > - bp->caps &
> > MACB_CAPS_GIGABIT_MODE_AVAILABLE)
> >
> > + bp->caps &
> > MACB_CAPS_GIGABIT_MODE_AVAILABLE) {
> >
> > reg |=
> > GEM_BIT(GBE);
> >
> > +
> > gmii2rgmii_reg |= XEMACPS_GMII2RGMII_SPEED1000;
> >
> > + }
> >
> > macb_or_gem_writel(bp,
> > NCFGR, reg);
> >
> > + if (gmii2rgmii_phydev !=
> > + NULL) {
> >
> > +
> > macb_mdio_write(bp->mii_bus,
> >
> > +
> > + gmii2rgmii_phydev->addr,
> >
> > +
> > + XEMACPS_GMII2RGMII_REG_NUM,
> >
> > +
> > + gmii2rgmii_reg);
> >
> > + }
> >
> > bp->speed =
> > phydev->speed;
> >
> > bp->duplex =
> > phydev->duplex;
> >
> > @@ -382,6 +402,19 @@ static int macb_mii_probe(struct net_device *dev)
> >
> > int phy_irq;
> >
> > int ret;
> >
> > + if (bp->gmii2rgmii_phy_node) {
> >
> > + phydev = of_phy_attach(bp->dev,
> >
> > +
> > + bp->gmii2rgmii_phy_node,
> >
> > +
> > + 0,
> > 0);
> >
> > + if (!phydev) {
> >
> > + dev_err(&bp->pdev->dev, "%s:
> > no gmii to rgmii converter found\n",
> >
> > + dev->name);
> >
> > + return -1;
> >
> > + }
> >
> > + bp->gmii2rgmii_phy_dev = phydev;
> >
> > + } else
> >
> > + bp->gmii2rgmii_phy_dev = NULL;
> >
> > +
> >
> > phydev = phy_find_first(bp->mii_bus);
> >
> > if (!phydev) {
> >
> > netdev_err(dev, "no PHY found\n");
> >
> > @@ -402,6 +435,8 @@ static int macb_mii_probe(struct net_device *dev)
> >
> >
> > bp->phy_interface);
> >
> > if (ret) {
> >
> > netdev_err(dev, "Could not attach to
> > PHY\n");
> >
> > + if (bp->gmii2rgmii_phy_dev)
> >
> > +
> > phy_disconnect(bp->gmii2rgmii_phy_dev);
> >
> > return ret;
> >
> > }
> >
> > @@ -3368,6 +3403,9 @@ static int macb_probe(struct platform_device
> > *pdev)
> >
> > bp->phy_interface = err;
> >
> > }
> >
> > + bp->gmii2rgmii_phy_node =
> > of_parse_phandle(bp->pdev->dev.of_node,
> >
> > +
> > "gmii2rgmii-phy-handle", 0);
> >
> > +
> >
> > macb_reset_phy(pdev);
> >
> > /* IP specific init */
> >
> > @@ -3422,6 +3460,8 @@ static int macb_remove(struct platform_device
> > *pdev)
> >
> > bp = netdev_priv(dev);
> >
> > if (bp->phy_dev)
> >
> >
> > phy_disconnect(bp->phy_dev);
> >
> > + if (bp->gmii2rgmii_phy_dev)
> >
> > +
> > phy_disconnect(bp->gmii2rgmii_phy_dev);
> >
> >
> >
> > But doing above changes making driver looks odd.
> >
> > could you please suggest any better option to add support for this IP
> > in the macb driver?
>
> Appana,
>
> I certainly can't prototype the solution based on your datasheet and the code
> sent... do a sensible proposal, then we can evaluate.
Thanks for the quick response will come up with a sensible proposal soon...
Regards,
Kedar.
>
> As the IP is separated from the Eth controller, make it a separate driver (an
> emulated phy one for instance... even if I don't know if it makes sense).
>
> I don't know if others have already made such an adaptation layer between
> GMII to RGMII but I'm pretty sure it can't be inserted into the macb driver.
>
> Bye,
> --
> Nicolas Ferre
^ permalink raw reply
* Re: GMII2RGMII Converter support in macb driver
From: Phil Reid @ 2016-04-12 13:37 UTC (permalink / raw)
To: Nicolas Ferre, Appana Durga Kedareswara Rao,
netdev@vger.kernel.org, Michal Simek
Cc: Punnaiah Choudary Kalluri, Harini Katakam, Anirudha Sarangi,
Appana Durga Kedareswara Rao
In-Reply-To: <570CF663.4000908@atmel.com>
On 12/04/2016 9:21 PM, Nicolas Ferre wrote:
> Le 12/04/2016 15:03, Appana Durga Kedareswara Rao a écrit :
>> Hi All,
>>
>>
>>
>>
>>
>> There is a Xilinx custom IP for GMII to RGMII conversion
>> data sheet here
>> (http://www.xilinx.com/support/documentation/ip_documentation/gmii_to_rgmii/v4_0/pg160-gmii-to-rgmii.pdf
>> )
>>
>>
>>
>>
>>
>> Unlike other Phy’s this IP won’t support auto negotiation
>> and other features that usually normal Phy’s support.
>>
>> This IP has only one register (Control register) which needs to be
>> programmed based on the external phy auto negotiation
>>
>> (Based on the external phy negotiated speed).
>>
>>
>>
>> I am able to make it work for GEM driver by doing the below changes in
>> the driver (drivers/net/ethernet/cadence/macb.c).
>>
>>
>>
>> +#define XEMACPS_GMII2RGMII_FULLDPLX BMCR_FULLDPLX
>>
>> +#define XEMACPS_GMII2RGMII_SPEED1000 BMCR_SPEED1000
>>
>> +#define XEMACPS_GMII2RGMII_SPEED100 BMCR_SPEED100
>>
>> +#define
>> XEMACPS_GMII2RGMII_REG_NUM 0x10
>>
>> +
>>
>> /*
>>
>> * Graceful stop timeouts in us. We should allow up to
>>
>> * 1 frame time (10 Mbits/s, full-duplex, ignoring collisions)
>>
>> @@ -311,8 +317,10 @@ static void macb_handle_link_change(struct
>> net_device *dev)
>>
>> {
>>
>> struct macb *bp = netdev_priv(dev);
>>
>> struct phy_device *phydev = bp->phy_dev;
>>
>> + struct phy_device *gmii2rgmii_phydev = bp->gmii2rgmii_phy_dev;
>>
>> unsigned long flags;
>>
>> int status_change = 0;
>>
>> + u16 gmii2rgmii_reg = 0;
>>
>> spin_lock_irqsave(&bp->lock, flags);
>>
>> @@ -326,15 +334,27 @@ static void macb_handle_link_change(struct
>> net_device *dev)
>>
>> if (macb_is_gem(bp))
>>
>> reg &=
>> ~GEM_BIT(GBE);
>>
>> - if (phydev->duplex)
>>
>> + if (phydev->duplex) {
>>
>> reg |=
>> MACB_BIT(FD);
>>
>> - if (phydev->speed == SPEED_100)
>>
>> +
>> gmii2rgmii_reg |= XEMACPS_GMII2RGMII_FULLDPLX;
>>
>> + }
>>
>> + if (phydev->speed ==
>> SPEED_100) {
>>
>> reg |=
>> MACB_BIT(SPD);
>>
>> +
>> gmii2rgmii_reg |= XEMACPS_GMII2RGMII_SPEED100;
>>
>> + }
>>
>> if (phydev->speed ==
>> SPEED_1000 &&
>>
>> - bp->caps &
>> MACB_CAPS_GIGABIT_MODE_AVAILABLE)
>>
>> + bp->caps &
>> MACB_CAPS_GIGABIT_MODE_AVAILABLE) {
>>
>> reg |=
>> GEM_BIT(GBE);
>>
>> +
>> gmii2rgmii_reg |= XEMACPS_GMII2RGMII_SPEED1000;
>>
>> + }
>>
>> macb_or_gem_writel(bp,
>> NCFGR, reg);
>>
>> + if (gmii2rgmii_phydev != NULL) {
>>
>> +
>> macb_mdio_write(bp->mii_bus,
>>
>> + gmii2rgmii_phydev->addr,
>>
>> + XEMACPS_GMII2RGMII_REG_NUM,
>>
>> + gmii2rgmii_reg);
>>
>> + }
>>
>> bp->speed = phydev->speed;
>>
>> bp->duplex = phydev->duplex;
>>
>> @@ -382,6 +402,19 @@ static int macb_mii_probe(struct net_device *dev)
>>
>> int phy_irq;
>>
>> int ret;
>>
>> + if (bp->gmii2rgmii_phy_node) {
>>
>> + phydev = of_phy_attach(bp->dev,
>>
>> + bp->gmii2rgmii_phy_node,
>>
>> + 0,
>> 0);
>>
>> + if (!phydev) {
>>
>> + dev_err(&bp->pdev->dev, "%s:
>> no gmii to rgmii converter found\n",
>>
>> + dev->name);
>>
>> + return -1;
>>
>> + }
>>
>> + bp->gmii2rgmii_phy_dev = phydev;
>>
>> + } else
>>
>> + bp->gmii2rgmii_phy_dev = NULL;
>>
>> +
>>
>> phydev = phy_find_first(bp->mii_bus);
>>
>> if (!phydev) {
>>
>> netdev_err(dev, "no PHY found\n");
>>
>> @@ -402,6 +435,8 @@ static int macb_mii_probe(struct net_device *dev)
>>
>>
>> bp->phy_interface);
>>
>> if (ret) {
>>
>> netdev_err(dev, "Could not attach to PHY\n");
>>
>> + if (bp->gmii2rgmii_phy_dev)
>>
>> +
>> phy_disconnect(bp->gmii2rgmii_phy_dev);
>>
>> return ret;
>>
>> }
>>
>> @@ -3368,6 +3403,9 @@ static int macb_probe(struct platform_device *pdev)
>>
>> bp->phy_interface = err;
>>
>> }
>>
>> + bp->gmii2rgmii_phy_node =
>> of_parse_phandle(bp->pdev->dev.of_node,
>>
>> +
>> "gmii2rgmii-phy-handle", 0);
>>
>> +
>>
>> macb_reset_phy(pdev);
>>
>> /* IP specific init */
>>
>> @@ -3422,6 +3460,8 @@ static int macb_remove(struct platform_device *pdev)
>>
>> bp = netdev_priv(dev);
>>
>> if (bp->phy_dev)
>>
>> phy_disconnect(bp->phy_dev);
>>
>> + if (bp->gmii2rgmii_phy_dev)
>>
>> +
>> phy_disconnect(bp->gmii2rgmii_phy_dev);
>>
>>
>>
>> But doing above changes making driver looks odd.
>>
>> could you please suggest any better option to add support for this IP in
>> the macb driver?
>
> Appana,
>
> I certainly can't prototype the solution based on your datasheet and the
> code sent... do a sensible proposal, then we can evaluate.
>
> As the IP is separated from the Eth controller, make it a separate
> driver (an emulated phy one for instance... even if I don't know if it
> makes sense).
>
> I don't know if others have already made such an adaptation layer
> between GMII to RGMII but I'm pretty sure it can't be inserted into the
> macb driver.
>
> Bye,
>
This sounds very similar to the altera emac-splitter.
See stmmac driver for how they handled this.
--
Regards
Phil Reid
^ permalink raw reply
* Re: [PATCH RFT 2/2] macb: kill PHY reset code
From: Andrew Lunn @ 2016-04-12 13:40 UTC (permalink / raw)
To: Nicolas Ferre; +Cc: Sergei Shtylyov, netdev, linux-kernel
In-Reply-To: <570CBE42.50309@atmel.com>
On Tue, Apr 12, 2016 at 11:22:10AM +0200, Nicolas Ferre wrote:
> Le 11/04/2016 04:28, Andrew Lunn a écrit :
> > On Sat, Apr 09, 2016 at 01:25:03AM +0300, Sergei Shtylyov wrote:
> >> With the 'phylib' now being aware of the "reset-gpios" PHY node property,
> >> there should be no need to frob the PHY reset in this driver anymore...
> >>
> >> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> >>
> >> ---
> >> drivers/net/ethernet/cadence/macb.c | 17 -----------------
> >> drivers/net/ethernet/cadence/macb.h | 1 -
> >> 2 files changed, 18 deletions(-)
> >>
> >> Index: net-next/drivers/net/ethernet/cadence/macb.c
> >> ===================================================================
> >> --- net-next.orig/drivers/net/ethernet/cadence/macb.c
> >> +++ net-next/drivers/net/ethernet/cadence/macb.c
> >> @@ -2884,7 +2884,6 @@ static int macb_probe(struct platform_de
> >> = macb_clk_init;
> >> int (*init)(struct platform_device *) = macb_init;
> >> struct device_node *np = pdev->dev.of_node;
> >> - struct device_node *phy_node;
> >> const struct macb_config *macb_config = NULL;
> >> struct clk *pclk, *hclk = NULL, *tx_clk = NULL;
> >> unsigned int queue_mask, num_queues;
> >> @@ -2977,18 +2976,6 @@ static int macb_probe(struct platform_de
> >> else
> >> macb_get_hwaddr(bp);
> >>
> >> - /* Power up the PHY if there is a GPIO reset */
> >> - phy_node = of_get_next_available_child(np, NULL);
> >> - if (phy_node) {
> >> - int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
> >> -
> >> - if (gpio_is_valid(gpio)) {
> >> - bp->reset_gpio = gpio_to_desc(gpio);
> >> - gpiod_direction_output(bp->reset_gpio, 1);
> >
> > Hi Sergei
> >
> > The code you are deleting would of ignored the flags in the gpio
> I don't parse this.
>
> The code deleted does take the flag into account. And the DT property
> associated to it seems correct to me (I mean, with proper flag
> specification).
Hi Nicolas
of_get_named_gpio() does not do anything with the flags. So for
example,
gpios = <&gpio0 12 GPIO_ACTIVE_LOW>;
the GPIO_ACTIVE_LOW would be ignored. If you want the flags to be
respected, you need to use the gpiod API for all calls, in particular,
you need to use something which calls gpiod_get_index(), since that is
the only function to call gpiod_parse_flags() to translate
GPIO_ACTIVE_LOW into a flag within the gpio descriptor.
Andrew
^ permalink raw reply
* Re: [PATCH RFT 2/2] macb: kill PHY reset code
From: Sergei Shtylyov @ 2016-04-12 13:54 UTC (permalink / raw)
To: Nicolas Ferre, Andrew Lunn; +Cc: netdev, linux-kernel
In-Reply-To: <570CBE42.50309@atmel.com>
Hello.
On 4/12/2016 12:22 PM, Nicolas Ferre wrote:
>>> With the 'phylib' now being aware of the "reset-gpios" PHY node property,
>>> there should be no need to frob the PHY reset in this driver anymore...
>>>
>>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>>>
>>> ---
>>> drivers/net/ethernet/cadence/macb.c | 17 -----------------
>>> drivers/net/ethernet/cadence/macb.h | 1 -
>>> 2 files changed, 18 deletions(-)
>>>
>>> Index: net-next/drivers/net/ethernet/cadence/macb.c
>>> ===================================================================
>>> --- net-next.orig/drivers/net/ethernet/cadence/macb.c
>>> +++ net-next/drivers/net/ethernet/cadence/macb.c
[...]
>>> @@ -2977,18 +2976,6 @@ static int macb_probe(struct platform_de
>>> else
>>> macb_get_hwaddr(bp);
>>>
>>> - /* Power up the PHY if there is a GPIO reset */
>>> - phy_node = of_get_next_available_child(np, NULL);
>>> - if (phy_node) {
>>> - int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
>>> -
>>> - if (gpio_is_valid(gpio)) {
>>> - bp->reset_gpio = gpio_to_desc(gpio);
>>> - gpiod_direction_output(bp->reset_gpio, 1);
>>
>> Hi Sergei
>>
>> The code you are deleting would of ignored the flags in the gpio
>
> I don't parse this.
> The code deleted does take the flag into account.
Not really -- you need to call of_get_named_gpio_flags() (with a valid
last argument) for that.
> And the DT property
> associated to it seems correct to me (I mean, with proper flag
> specification).
It apparently is not as it have GPIO_ACTIVE_HIGH and the driver assumes
active-low reset signal.
[...]
> Bye,
MBR, Sergei
^ permalink raw reply
* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Casey Schaufler @ 2016-04-12 13:57 UTC (permalink / raw)
To: Paolo Abeni, Paul Moore
Cc: Florian Westphal, linux-security-module, David S. Miller,
James Morris, Andreas Gruenbacher, Stephen Smalley, netdev,
selinux, Casey Schaufler
In-Reply-To: <1460451162.5965.16.camel@redhat.com>
On 4/12/2016 1:52 AM, Paolo Abeni wrote:
> On Thu, 2016-04-07 at 14:55 -0400, Paul Moore wrote:
>> On Thursday, April 07, 2016 01:45:32 AM Florian Westphal wrote:
>>> Paul Moore <paul@paul-moore.com> wrote:
>>>> On Wed, Apr 6, 2016 at 6:14 PM, Florian Westphal <fw@strlen.de> wrote:
>>>>> netfilter hooks are per namespace -- so there is hook unregister when
>>>>> netns is destroyed.
>>>> Looking around, I see the global and per-namespace registration
>>>> functions (nf_register_hook and nf_register_net_hook, respectively),
>>>> but I'm looking to see if/how newly created namespace inherit
>>>> netfilter hooks from the init network namespace ... if you can create
>>>> a network namespace and dodge the SELinux hooks, that isn't a good
>>>> thing from a SELinux point of view, although it might be a plus
>>>> depending on where you view Paolo's original patches ;)
>>> Heh :-)
>>>
>>> If you use nf_register_net_hook, the hook is only registered in the
>>> namespace.
>>>
>>> If you use nf_register_hook, the hook is put on a global list and
>>> registed in all existing namespaces.
>>>
>>> New namespaces will have the hook added as well (see
>>> netfilter_net_init -> nf_register_hook_list in netfilter/core.c )
>>>
>>> Since nf_register_hook is used it should be impossible to get a netns
>>> that doesn't call these hooks.
>> Great, thanks.
>>
>>>>> Do you think it makes sense to rework the patch to delay registering
>>>>> of the netfiler hooks until the system is in a state where they're
>>>>> needed, without the 'unregister' aspect?
>>>> I would need to see the patch to say for certain, but in principle
>>>> that seems perfectly reasonable and I think would satisfy both the
>>>> netdev and SELinux camps - good suggestion. My main goal is to drop
>>>> the selinux_nf_ip_init() entirely so it can't be used as a ROP gadget.
>>>>
>>>> We might even be able to trim the secmark_active and peerlbl_active
>>>> checks in the SELinux netfilter hooks (an earlier attempt at
>>>> optimization; contrary to popular belief, I do care about SELinux
>>>> performance), although that would mean that enabling the network
>>>> access controls would be one way ... I guess you can disregard that
>>>> last bit, I'm thinking aloud again.
>>> One way is fine I think.
>> Yes, just disregard my second paragraph above.
>>
>>>>> Ideally this would even be per netns -- in perfect world we would
>>>>> be able to make it so that a new netns are created with an empty
>>>>> hook list.
>>>> In general SELinux doesn't care about namespaces, for reasons that are
>>>> sorta beyond the scope of this conversation, so I would like to stick
>>>> to a all or nothing approach to enabling the SELinux netfilter hooks
>>>> across namespaces. Perhaps we can revisit this at a later time, but
>>>> let's keep it simple right now.
>>> Okay, I'd prefer to stick to your recommendation anyway wrt. to selinux
>>> (Casey, I read your comment regarding smack. Noted, we don't want to
>>> break smack either...)
>>>
>>> I think that in this case the entire question is:
>>>
>>> In your experience, how likely is a config where selinux is enabled BUT the
>>> hooks are not needed (i.e., where we hit the
>>>
>>> if (!selinux_policycap_netpeer)
>>> return NF_ACCEPT;
>>>
>>> if (!secmark_active && !peerlbl_active)
>>> return NF_ACCEPT;
>>>
>>> tests inside the hooks)? If such setups are uncommon we should just
>>> drop this idea or at least put it on the back burner until the more
>>> expensive netfilter hooks (conntrack, cough) are out of the way.
>> A few years ago I would have said that it is relatively uncommon for admins to
>> enable the SELinux network access controls; it was typically just
>> government/intelligence agencies who had very strict access control
>> requirements and represented a small portion of SELinux users. However, over
>> the past few years I've been fielding more and more questions from admins/devs
>> in the virtualization space who are interested in some of these capabilities;
>> it isn't clear to me how many of these people are switching it on, but there
>> is definitely more interest than I have seen in the past and the interested is
>> centered around some rather common use cases.
>>
>> So, to summarize, I don't know ;)
>>
>> If you've got bigger sources of overhead, my opinion would be to go tackle
>> those first. Perhaps I can even find the time to work on the
>> SELinux/netfilter stuff while you are off slaying the bigger dragons, no
>> promises at the moment.
> Double checking if I got the above correctly.
>
> Will be ok if we post a v2 version of this series, removing the hooks
> de-registration bits, but preserving the selinux nf-hooks and
> socket_sock_rcv_skb() on-demand/delayed registration ?
Imagine that I have two security modules that control sockets.
The work I'm knee deep in will allow this. If adding hooks after
the init phase is allowed you have to face the possibility that
blob sizes (in this case sock->sk_security) may change. That
requires checking on every hook that uses blobs to determine
whether the blob has data for all the modules using it. I know
that that is a simple matter of arithmetic, but it's additional
overhead on every hook call. It also makes creating kmem caches
for security blobs much more difficult. Another performance
optimization that becomes unavailable.
We know of a number of ways we can improve networking performance
in the face of security modules. Many would make the code a whole
lot cleaner. Your proposed change is clever, but targets one case
at the expense of the general case. If there really is concern
about the performance of networking in the presence of security
modules I would suggest that we revisit some of the changes that
have already been proposed.
> Will that fit
> with the post-init read only memory usage that you are planning ?
>
> Regards,
>
> Paolo
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK
From: Stefan Hajnoczi @ 2016-04-12 13:59 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Stefan Hajnoczi, Ian Campbell, kvm, netdev, Matt Benjamin,
Christoffer Dall, Alex Bennée, marius vlad, areis,
Claudio Imbrenda, Greg Kurz, virtualization
In-Reply-To: <20160411154517-mutt-send-email-mst@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3633 bytes --]
On Mon, Apr 11, 2016 at 03:54:08PM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 11, 2016 at 11:45:48AM +0100, Stefan Hajnoczi wrote:
> > On Fri, Apr 08, 2016 at 04:35:05PM +0100, Ian Campbell wrote:
> > > On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote:
> > > > This series is based on Michael Tsirkin's vhost branch (v4.5-rc6).
> > > >
> > > > I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock but
> > > > first I want to share the latest version of the code. Several people are
> > > > playing with vsock now so sharing the latest code should avoid duplicate work.
> > >
> > > Thanks for this, I've been using it in my project and it mostly seems
> > > fine.
> > >
> > > One wrinkle I came across, which I'm not sure if it is by design or a
> > > problem is that I can see this sequence coming from the guest (with
> > > other activity in between):
> > >
> > > 1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX
> > > 2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX
> > > 3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX
How did you trigger this sequence? I'd like to reproduce it.
> > > I orignally had my backend close things down at #2, however this meant
> > > that when #3 arrived it was for a non-existent socket (or, worse, an
> > > active one if the ports got reused). I checked v5 of the spec
> > > proposal[0] which says:
> > > If these bits are set and there are no more virtqueue buffers
> > > pending the socket is disconnected.
> > >
> > > but I'm not entirely sure if this behaviour contradicts this or not
> > > (the bits have both been set at #2, but not at the same time).
> > >
> > > BTW, how does one tell if there are no more virtqueue buffers pending
> > > or not while processing the op?
> >
> > #2 is odd. The shutdown bits are sticky so they cannot be cleared once
> > set. I would have expected just #1 and #3. The behavior you observe
> > look like a bug.
> >
> > The spec text does not convey the meaning of OP_SHUTDOWN well.
> > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible
> > for this connection. "there are no more virtqueue buffers pending the
> > socket" really means that this isn't an immediate close from the
> > perspective of the application. If the application still has unread rx
> > buffers then the socket stays readable until the rx data has been fully
> > read.
>
> Yes but you also wrote:
> If these bits are set and there are no more virtqueue buffers
> pending the socket is disconnected.
>
> how does remote know that there are no buffers pending and so it's safe
> to reuse the same source/destination address now? Maybe destination
> should send RST at that point?
You are right, the source/destination address could be reused while the
remote still has the connection in their table. Connection
establishment would fail with a RST reply.
I can think of two solutions:
1. Implementations must remove connections from their table as soon as
SHUTDOWN_TX|SHUTDOWN_RX is received. This way the source/destination
address tuple can be reused immediately, i.e. new connections with
the same source/destination would be possible while an application is
still draining the receive buffers of an old connection.
2. Extend the connection lifecycle so that an A->B
SHUTDOWN_TX|SHUTDOWN_RX must be followed by a by a B->A RST to close
a connection. This way the source/destination address is only in use
once at a time.
Option #2 seems safer because there is no overlap in source/destination
address usage.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply
* Re: AF_VSOCK status
From: Stefan Hajnoczi @ 2016-04-12 14:03 UTC (permalink / raw)
To: Antoine Martin; +Cc: netdev
In-Reply-To: <570B9021.5050709@nagafix.co.uk>
[-- Attachment #1: Type: text/plain, Size: 579 bytes --]
On Mon, Apr 11, 2016 at 06:53:05PM +0700, Antoine Martin wrote:
> > There are a few existing ways to achieve that without involving
> > virtio-vsock: vhost-user or ivshmem.
> Yes, I've looked at those and they seem a bit overkill for what we
> want to achieve. We don't want sharing with multiple guests, or
> interrupts.
> All we want is a chunk of host memory to be accessible from the guest..
ivshmem does that. It operates in two modes, it sounds like you want
the first and simpler mode ("ivshmem-plain").
Take a look at hw/misc/ivshmem.c in QEMU.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply
* Re: Unhandled fault during system suspend in sky2_shutdown
From: Sudeep Holla @ 2016-04-12 14:06 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Sudeep Holla, linux-kernel, netdev, Mirko Lindner
In-Reply-To: <20160411112430.2a03e296@xeon-e3>
On 11/04/16 19:24, Stephen Hemminger wrote:
> On Mon, 11 Apr 2016 17:24:37 +0100
> Sudeep Holla <sudeep.holla@arm.com> wrote:
>
[...]
>>
>> diff --git i/drivers/net/ethernet/marvell/sky2.c
>> w/drivers/net/ethernet/marvell/sky2.c
>> index ec0a22119e09..0ff0434e32fc 100644
>> --- i/drivers/net/ethernet/marvell/sky2.c
>> +++ w/drivers/net/ethernet/marvell/sky2.c
>> @@ -5220,6 +5220,13 @@ static SIMPLE_DEV_PM_OPS(sky2_pm_ops,
>> sky2_suspend, sky2_resume);
>>
>> static void sky2_shutdown(struct pci_dev *pdev)
>> {
>> + struct sky2_hw *hw = pci_get_drvdata(pdev);
>> + int i;
>> +
>> + for (i = hw->ports - 1; i >= 0; --i) {
>> + sky2_detach(hw->dev[i]);
>> + unregister_netdev(hw->dev[i]);
>> + }
>> sky2_suspend(&pdev->dev);
>> pci_wake_from_d3(pdev, device_may_wakeup(&pdev->dev));
>> pci_set_power_state(pdev, PCI_D3hot);
>
> This is not the correct fix, the device is supposed to stay
> registered. The correct way to fix this would be to make get_stats
> ignore requests for device when suspended.
Yes I agree it's not correct fix. But I tried ignoring it in get_stat32
but the crash just moves the bug elsewhere. IMO patching all the places
to check the suspended state might be bit heavy ?
E.g. with something like below the crash moves to sky2_get_eeprom_len
function.
sky2_get_eeprom_len+0x10/0x30
dev_ethtool+0x29c/0x1d78
dev_ioctl+0x31c/0x5a8
sock_ioctl+0x2ac/0x310
do_vfs_ioctl+0xa4/0x750
SyS_ioctl+0x8c/0xa0
el0_svc_naked+0x24/0x2
Sorry if I am missing something fundamental, I am bit new to net drivers
Regards,
Sudeep
-->8
diff --git i/drivers/net/ethernet/marvell/sky2.c
w/drivers/net/ethernet/marvell/sky2.c
index ec0a22119e09..d4cfcd89e7e5 100644
--- i/drivers/net/ethernet/marvell/sky2.c
+++ w/drivers/net/ethernet/marvell/sky2.c
@@ -5175,6 +5175,7 @@ static int sky2_suspend(struct device *dev)
}
sky2_power_aux(hw);
+ hw->suspended = true;
rtnl_unlock();
return 0;
@@ -5198,6 +5199,7 @@ static int sky2_resume(struct device *dev)
}
rtnl_lock();
+ hw->suspended = false;
sky2_reset(hw);
sky2_all_up(hw);
rtnl_unlock();
diff --git i/drivers/net/ethernet/marvell/sky2.h
w/drivers/net/ethernet/marvell/sky2.h
index ec6dcd80152b..1386e5b635ff 100644
--- i/drivers/net/ethernet/marvell/sky2.h
+++ w/drivers/net/ethernet/marvell/sky2.h
@@ -2308,6 +2308,7 @@ struct sky2_hw {
wait_queue_head_t msi_wait;
char irq_name[0];
+ bool suspended;
};
static inline int sky2_is_copper(const struct sky2_hw *hw)
@@ -2378,6 +2379,9 @@ static inline u32 get_stats32(struct sky2_hw *hw,
unsigned port, unsigned reg)
{
u32 val;
+ if (hw->suspended)
+ return 0;
+
do {
val = gma_read32(hw, port, reg);
} while (gma_read32(hw, port, reg) != val);
^ permalink raw reply related
* Re: [PATCH] mwifiex: fix possible NULL dereference
From: Andy Shevchenko @ 2016-04-12 14:08 UTC (permalink / raw)
To: Sudip Mukherjee
Cc: Amitkumar Karwar, Nishant Sarmukadam, Kalle Valo,
linux-kernel@vger.kernel.org, open list:TI WILINK WIRELES...,
netdev, Sudip Mukherjee
In-Reply-To: <1460388459-21090-1-git-send-email-sudipm.mukherjee@gmail.com>
On Mon, Apr 11, 2016 at 6:27 PM, Sudip Mukherjee
<sudipm.mukherjee@gmail.com> wrote:
> From: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
>
> We have a check for card just after dereferencing it. So if it is NULL
> we have already dereferenced it before its check. Lets dereference it
> after checking card for NULL.
IIUC the code does nothing with dereference.
I would have told NAK if I would have been maintainer.
>
> Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
> ---
> drivers/net/wireless/marvell/mwifiex/pcie.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index edf8b07..84562d0 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -2884,10 +2884,11 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
> {
> struct pcie_service_card *card = adapter->card;
Let's say it's 0.
> const struct mwifiex_pcie_card_reg *reg;
> - struct pci_dev *pdev = card->dev;
This would be equal to offset of dev member in pcie_service_card struct.
Nothing wrong here.
> + struct pci_dev *pdev;
> int i;
>
> if (card) {
> + pdev = card->dev;
> if (card->msix_enable) {
> for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++)
> synchronize_irq(card->msix_entries[i].vector);
> --
> 1.9.1
>
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* [PATCH net,stable] cdc_mbim: apply "NDP to end" quirk to all Huawei devices
From: Bjørn Mork @ 2016-04-12 14:11 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA, Andreas Fett, Andrei Burd,
Bjørn Mork
We now have a positive report of another Huawei device needing
this quirk: The ME906s-158 (12d1:15c1). This is an m.2 form
factor modem with no obvious relationship to the E3372 (12d1:157d)
we already have a quirk entry for. This is reason enough to
believe the quirk might be necessary for any number of current
and future Huawei devices.
Applying the quirk to all Huawei devices, since it is crucial
to any device affected by the firmware bug, while the impact
on non-affected devices is negligible.
The quirk can if necessary be disabled per-device by writing
N to /sys/class/net/<iface>/cdc_ncm/ndp_to_end
Reported-by: Andreas Fett <andreas.fett-opNxpl+3fjRBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org>
---
I'm requesting this for stable, but it depends on commit
f8c0cfa5eca9 ("net: cdc_mbim: add "NDP to end" quirk for Huawei E3372")
so it is only applicable to v4.3 (where the dependency is
backported), v4.4 and v4.5
Bjørn
drivers/net/usb/cdc_mbim.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/cdc_mbim.c b/drivers/net/usb/cdc_mbim.c
index bdd83d95ec0a..96a5028621c8 100644
--- a/drivers/net/usb/cdc_mbim.c
+++ b/drivers/net/usb/cdc_mbim.c
@@ -617,8 +617,13 @@ static const struct usb_device_id mbim_devs[] = {
{ USB_VENDOR_AND_INTERFACE_INFO(0x0bdb, USB_CLASS_COMM, USB_CDC_SUBCLASS_MBIM, USB_CDC_PROTO_NONE),
.driver_info = (unsigned long)&cdc_mbim_info,
},
- /* Huawei E3372 fails unless NDP comes after the IP packets */
- { USB_DEVICE_AND_INTERFACE_INFO(0x12d1, 0x157d, USB_CLASS_COMM, USB_CDC_SUBCLASS_MBIM, USB_CDC_PROTO_NONE),
+
+ /* Some Huawei devices, ME906s-158 (12d1:15c1) and E3372
+ * (12d1:157d), are known to fail unless the NDP is placed
+ * after the IP packets. Applying the quirk to all Huawei
+ * devices is broader than necessary, but harmless.
+ */
+ { USB_VENDOR_AND_INTERFACE_INFO(0x12d1, USB_CLASS_COMM, USB_CDC_SUBCLASS_MBIM, USB_CDC_PROTO_NONE),
.driver_info = (unsigned long)&cdc_mbim_info_ndp_to_end,
},
/* default entry */
--
2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH RFT 2/2] macb: kill PHY reset code
From: Nicolas Ferre @ 2016-04-12 14:45 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Sergei Shtylyov, netdev, linux-kernel, Gregory CLEMENT
In-Reply-To: <20160412134001.GB29895@lunn.ch>
Le 12/04/2016 15:40, Andrew Lunn a écrit :
> On Tue, Apr 12, 2016 at 11:22:10AM +0200, Nicolas Ferre wrote:
>> Le 11/04/2016 04:28, Andrew Lunn a écrit :
>>> On Sat, Apr 09, 2016 at 01:25:03AM +0300, Sergei Shtylyov wrote:
>>>> With the 'phylib' now being aware of the "reset-gpios" PHY node property,
>>>> there should be no need to frob the PHY reset in this driver anymore...
>>>>
>>>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>>>>
>>>> ---
>>>> drivers/net/ethernet/cadence/macb.c | 17 -----------------
>>>> drivers/net/ethernet/cadence/macb.h | 1 -
>>>> 2 files changed, 18 deletions(-)
>>>>
>>>> Index: net-next/drivers/net/ethernet/cadence/macb.c
>>>> ===================================================================
>>>> --- net-next.orig/drivers/net/ethernet/cadence/macb.c
>>>> +++ net-next/drivers/net/ethernet/cadence/macb.c
>>>> @@ -2884,7 +2884,6 @@ static int macb_probe(struct platform_de
>>>> = macb_clk_init;
>>>> int (*init)(struct platform_device *) = macb_init;
>>>> struct device_node *np = pdev->dev.of_node;
>>>> - struct device_node *phy_node;
>>>> const struct macb_config *macb_config = NULL;
>>>> struct clk *pclk, *hclk = NULL, *tx_clk = NULL;
>>>> unsigned int queue_mask, num_queues;
>>>> @@ -2977,18 +2976,6 @@ static int macb_probe(struct platform_de
>>>> else
>>>> macb_get_hwaddr(bp);
>>>>
>>>> - /* Power up the PHY if there is a GPIO reset */
>>>> - phy_node = of_get_next_available_child(np, NULL);
>>>> - if (phy_node) {
>>>> - int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
>>>> -
>>>> - if (gpio_is_valid(gpio)) {
>>>> - bp->reset_gpio = gpio_to_desc(gpio);
>>>> - gpiod_direction_output(bp->reset_gpio, 1);
>>>
>>> Hi Sergei
>>>
>>> The code you are deleting would of ignored the flags in the gpio
>> I don't parse this.
>>
>> The code deleted does take the flag into account. And the DT property
>> associated to it seems correct to me (I mean, with proper flag
>> specification).
>
> Hi Nicolas
>
> of_get_named_gpio() does not do anything with the flags. So for
> example,
>
> gpios = <&gpio0 12 GPIO_ACTIVE_LOW>;
>
> the GPIO_ACTIVE_LOW would be ignored. If you want the flags to be
> respected, you need to use the gpiod API for all calls, in particular,
> you need to use something which calls gpiod_get_index(), since that is
> the only function to call gpiod_parse_flags() to translate
> GPIO_ACTIVE_LOW into a flag within the gpio descriptor.
Ok, I remember what confused me now: this code, used to be something around:
devm_gpiod_get_optional(&bp->pdev->dev, "phy-reset", GPIOD_OUT_HIGH);
before it has been changed to the chunk above... So, yes, the DT flag
was not handled anyway...
Sorry for the noise and thanks for the clarification.
Bye,
--
Nicolas Ferre
^ permalink raw reply
* VLAN aux info for AF_PACKET available only with ETH_P_ALL
From: Peter Palúch @ 2016-04-12 14:40 UTC (permalink / raw)
To: netdev
Greetings,
I am running vanilla Linux kernel v4.4.6.
When using AF_PACKET sockets with PACKET_AUXDATA socket option to access
the VLAN TCI information of received frames, I have noticed that the
VLAN information in struct tpacket_auxdata, namely,
- tp_vlan_tci
- tp_vlan_tpid
- TP_STATUS_VLAN_VALID and TP_STATUS_VLAN_TPID_VALID flags
is filled in only when the socket is bound to htons (ETH_P_ALL). If the
socket is bound to any specific protocol, the VLAN information fields in
struct tpacket_auxdata are set to 0 even if the datagram of the specific
protocol was received in an 802.1Q-tagged frame.
Is this behavior intentional? If not, I would be honored to try to
provide a patch but I am not well-versed in kernel internals so any
guidance would be most appreciated.
Thanks!
Best regards,
Peter
^ permalink raw reply
* Re: TCP reaching to maximum throughput after a long time
From: Eric Dumazet @ 2016-04-12 14:52 UTC (permalink / raw)
To: Machani, Yaniv, netdev
Cc: David S. Miller, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
Nandita Dukkipati, open list, Kama, Meirav
In-Reply-To: <AE1C82FB3D0EC64DB1F752C81CBD110139100057@DFRE01.ent.ti.com>
On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
> Hi,
> After updating from Kernel 3.14 to Kernel 4.4 we have seen a TCP performance degradation over Wi-Fi.
> In 3.14 kernel, TCP got to its max throughout after less than a second, while in the 4.4 it is taking ~20-30 seconds.
> UDP TX/RX and TCP RX performance is as expected.
> We are using a Beagle Bone Black and a WiLink8 device.
>
> Were there any related changes that might cause such behavior ?
> Kernel configuration and sysctl values were compared, but no significant differences have been found.
>
> See a log of the behavior below :
> -----------------------------------------------------------
> Client connecting to 10.2.46.5, TCP port 5001
> TCP window size: 320 KByte (WARNING: requested 256 KByte)
> ------------------------------------------------------------
> [ 3] local 10.2.46.6 port 49282 connected with 10.2.46.5 port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 5.75 MBytes 48.2 Mbits/sec
> [ 3] 1.0- 2.0 sec 6.50 MBytes 54.5 Mbits/sec
> [ 3] 2.0- 3.0 sec 6.50 MBytes 54.5 Mbits/sec
> [ 3] 3.0- 4.0 sec 6.50 MBytes 54.5 Mbits/sec
> [ 3] 4.0- 5.0 sec 6.75 MBytes 56.6 Mbits/sec
> [ 3] 5.0- 6.0 sec 3.38 MBytes 28.3 Mbits/sec
> [ 3] 6.0- 7.0 sec 6.38 MBytes 53.5 Mbits/sec
> [ 3] 7.0- 8.0 sec 6.88 MBytes 57.7 Mbits/sec
> [ 3] 8.0- 9.0 sec 7.12 MBytes 59.8 Mbits/sec
> [ 3] 9.0-10.0 sec 7.12 MBytes 59.8 Mbits/sec
> [ 3] 10.0-11.0 sec 7.12 MBytes 59.8 Mbits/sec
> [ 3] 11.0-12.0 sec 7.25 MBytes 60.8 Mbits/sec
> [ 3] 12.0-13.0 sec 7.12 MBytes 59.8 Mbits/sec
> [ 3] 13.0-14.0 sec 7.25 MBytes 60.8 Mbits/sec
> [ 3] 14.0-15.0 sec 7.62 MBytes 64.0 Mbits/sec
> [ 3] 15.0-16.0 sec 7.88 MBytes 66.1 Mbits/sec
> [ 3] 16.0-17.0 sec 8.12 MBytes 68.2 Mbits/sec
> [ 3] 17.0-18.0 sec 8.25 MBytes 69.2 Mbits/sec
> [ 3] 18.0-19.0 sec 8.50 MBytes 71.3 Mbits/sec
> [ 3] 19.0-20.0 sec 8.88 MBytes 74.4 Mbits/sec
> [ 3] 20.0-21.0 sec 8.75 MBytes 73.4 Mbits/sec
> [ 3] 21.0-22.0 sec 8.62 MBytes 72.4 Mbits/sec
> [ 3] 22.0-23.0 sec 8.75 MBytes 73.4 Mbits/sec
> [ 3] 23.0-24.0 sec 8.50 MBytes 71.3 Mbits/sec
> [ 3] 24.0-25.0 sec 8.62 MBytes 72.4 Mbits/sec
> [ 3] 25.0-26.0 sec 8.62 MBytes 72.4 Mbits/sec
> [ 3] 26.0-27.0 sec 8.62 MBytes 72.4 Mbits/sec
>
CC netdev, where this is better discussed.
This could be a lot of different factors, and caused by a sender
problem, a receiver problem, ...
TCP behavior depends on the drivers, so maybe a change there can explain
this.
You could capture the first 5000 frames of the flow and post the pcap ?
(-s 128 to capture only the headers)
tcpdump -p -s 128 -i eth0 -c 5000 host 10.2.46.5 -w flow.pcap
Also, while test is running, you could fetch
ss -temoi dst 10.2.46.5:5001
^ permalink raw reply
* Re: [PATCH RFT 2/2] macb: kill PHY reset code
From: Nicolas Ferre @ 2016-04-12 14:57 UTC (permalink / raw)
To: Sergei Shtylyov, Andrew Lunn; +Cc: netdev, linux-kernel, Gregory CLEMENT
In-Reply-To: <570CFE13.3040100@cogentembedded.com>
Le 12/04/2016 15:54, Sergei Shtylyov a écrit :
> Hello.
>
> On 4/12/2016 12:22 PM, Nicolas Ferre wrote:
>
>>>> With the 'phylib' now being aware of the "reset-gpios" PHY node property,
>>>> there should be no need to frob the PHY reset in this driver anymore...
>>>>
>>>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>>>>
>>>> ---
>>>> drivers/net/ethernet/cadence/macb.c | 17 -----------------
>>>> drivers/net/ethernet/cadence/macb.h | 1 -
>>>> 2 files changed, 18 deletions(-)
>>>>
>>>> Index: net-next/drivers/net/ethernet/cadence/macb.c
>>>> ===================================================================
>>>> --- net-next.orig/drivers/net/ethernet/cadence/macb.c
>>>> +++ net-next/drivers/net/ethernet/cadence/macb.c
> [...]
>>>> @@ -2977,18 +2976,6 @@ static int macb_probe(struct platform_de
>>>> else
>>>> macb_get_hwaddr(bp);
>>>>
>>>> - /* Power up the PHY if there is a GPIO reset */
>>>> - phy_node = of_get_next_available_child(np, NULL);
>>>> - if (phy_node) {
>>>> - int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
>>>> -
>>>> - if (gpio_is_valid(gpio)) {
>>>> - bp->reset_gpio = gpio_to_desc(gpio);
>>>> - gpiod_direction_output(bp->reset_gpio, 1);
>>>
>>> Hi Sergei
>>>
>>> The code you are deleting would of ignored the flags in the gpio
>>
>> I don't parse this.
>
>> The code deleted does take the flag into account.
>
> Not really -- you need to call of_get_named_gpio_flags() (with a valid
> last argument) for that.
Yep,
>> And the DT property
>> associated to it seems correct to me (I mean, with proper flag
>> specification).
>
> It apparently is not as it have GPIO_ACTIVE_HIGH and the driver assumes
> active-low reset signal.
Yes, logic was inverted and... anyway, the flag never used for real...
Thanks Sergei.
No problem for me accepting a patch for the at91-vinco.dts.
Bye,
--
Nicolas Ferre
^ permalink raw reply
* Re: TCP reaching to maximum throughput after a long time
From: Ben Greear @ 2016-04-12 15:04 UTC (permalink / raw)
To: Machani, Yaniv, netdev
Cc: Eric Dumazet, David S. Miller, Eric Dumazet, Neal Cardwell,
Yuchung Cheng, Nandita Dukkipati, open list, Kama, Meirav
In-Reply-To: <1460472764.6473.589.camel@edumazet-glaptop3.roam.corp.google.com>
On 04/12/2016 07:52 AM, Eric Dumazet wrote:
> On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
>> Hi,
>> After updating from Kernel 3.14 to Kernel 4.4 we have seen a TCP performance degradation over Wi-Fi.
>> In 3.14 kernel, TCP got to its max throughout after less than a second, while in the 4.4 it is taking ~20-30 seconds.
>> UDP TX/RX and TCP RX performance is as expected.
>> We are using a Beagle Bone Black and a WiLink8 device.
>>
>> Were there any related changes that might cause such behavior ?
>> Kernel configuration and sysctl values were compared, but no significant differences have been found.
If you are using 'Cubic' TCP congestion control, then please try something different.
It was broken last I checked, at least when used with the ath10k driver.
https://marc.info/?l=linux-netdev&m=144405216005715&w=2
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox