* Re: [PATCH net-next RESEND 1/1] driver: ipvlan: Remove unnecessary ipvlan NULL check in ipvlan_count_rx
From: David Miller @ 2016-12-28 19:24 UTC (permalink / raw)
To: fgao; +Cc: maheshb, edumazet, netdev, gfree.wind
In-Reply-To: <1482914862-2793-1-git-send-email-fgao@ikuai8.com>
From: fgao@ikuai8.com
Date: Wed, 28 Dec 2016 16:47:42 +0800
> From: Gao Feng <fgao@ikuai8.com>
>
> There are three functions which would invoke the ipvlan_count_rx. They
> are ipvlan_process_multicast, ipvlan_rcv_frame, and ipvlan_nf_input.
> The former two functions already use the ipvlan directly before
> ipvlan_count_rx, and ipvlan_nf_input gets the ipvlan from
> ipvl_addr->master, it is not possible to be NULL too.
> So the ipvlan pointer check is unnecessary in ipvlan_count_rx.
>
> Signed-off-by: Gao Feng <fgao@ikuai8.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] sctp: add pr_debug for tracking asocs not found
From: David Miller @ 2016-12-28 19:26 UTC (permalink / raw)
To: marcelo.leitner; +Cc: netdev, linux-sctp, vyasevich, nhorman, lucien.xin
In-Reply-To: <2e7482596bbf75efb56e313e33179f9e1c0e6996.1482924698.git.marcelo.leitner@gmail.com>
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Wed, 28 Dec 2016 09:51:56 -0200
> This pr_debug may help identify why the system is generating some
> Aborts. It's not something a sysadmin would be expected to use.
>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net] net/sched: cls_flower: Fix missing addr_type in classify
From: David Miller @ 2016-12-28 19:29 UTC (permalink / raw)
To: paulb; +Cc: netdev, jiri, hadarh, ogerlitz, roid
In-Reply-To: <1482929687-20159-1-git-send-email-paulb@mellanox.com>
From: Paul Blakey <paulb@mellanox.com>
Date: Wed, 28 Dec 2016 14:54:47 +0200
> Since we now use a non zero mask on addr_type, we are matching on its
> value (IPV4/IPV6). So before this fix, matching on enc_src_ip/enc_dst_ip
> failed in SW/classify path since its value was zero.
> This patch sets the proper value of addr_type for encapsulated packets.
>
> Fixes: 970bfcd09791 ('net/sched: cls_flower: Use mask for addr_type')
> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH net 00/12] Mellanox 100G mlx5 fixes 28-12-2016
From: David Miller @ 2016-12-28 19:38 UTC (permalink / raw)
To: saeedm; +Cc: netdev
In-Reply-To: <1482929922-32626-1-git-send-email-saeedm@mellanox.com>
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Wed, 28 Dec 2016 14:58:30 +0200
> Some fixes for mlx5 core and ethernet driver.
>
> for -stable:
> net/mlx5: Check FW limitations on log_max_qp before setting it
> net/mlx5: Cancel recovery work in remove flow
> net/mlx5: Avoid shadowing numa_node
> net/mlx5: Mask destination mac value in ethtool steering rules
> net/mlx5: Prevent setting multicast macs for VFs
> net/mlx5e: Don't sync netdev state when not registered
> net/mlx5e: Disable netdev after close
Series applied, and the patches from the list above have been queued up
for -stable.
Thanks!
^ permalink raw reply
* Re: [PATCH] net: avoid put_cmsg() possible copy longer data than input
From: David Miller @ 2016-12-28 19:48 UTC (permalink / raw)
To: cugyly; +Cc: netdev, Linyu.Yuan
In-Reply-To: <1482935663-3428-1-git-send-email-cugyly@163.com>
From: yuan linyu <cugyly@163.com>
Date: Wed, 28 Dec 2016 22:34:23 +0800
> From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
>
> if CMSG_ALIGN(sizeof(struct cmsghdr)) > sizeof(struct cmsghdr),
> original (cmlen - sizeof(struct cmsghdr)) may greater than
> input len.
You are doing a lot of unrelated cleanups in this change. This
makes it hard to review.
The important parts of the fix seems to be the added checks to make
sure that we don't access the CMSG_DATA() unless we have more than
CMSG_ALIGN(sizeof(struct cmsghdr)) bytes.
I think you can fix that with a few one-line tests rather than
restructuring all of the CMSG_*() macros.
Also:
> @@ -223,7 +223,7 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
> if (MSG_CMSG_COMPAT & msg->msg_flags)
> return put_cmsg_compat(msg, level, type, len, data);
>
> - if (cm==NULL || msg->msg_controllen < sizeof(*cm)) {
> + if (cm == NULL || msg->msg_controllen < sizeof(*cm)) {
> msg->msg_flags |= MSG_CTRUNC;
> return 0; /* XXX: return error? check spec. */
> }
This is a coding style fix unrelated to the purpose of this change.
Thanks.
^ permalink raw reply
* Re: [PATCH] net: atm: Fix warnings in net/atm/lec.c when !CONFIG_PROC_FS
From: David Miller @ 2016-12-28 20:11 UTC (permalink / raw)
To: augustocaringi
Cc: netdev, felipe.balbi, keescook, mugunthanvnm, jarod, javier, fw,
linux-kernel
In-Reply-To: <1482941006-28052-1-git-send-email-augustocaringi@gmail.com>
From: Augusto Mecking Caringi <augustocaringi@gmail.com>
Date: Wed, 28 Dec 2016 16:02:05 +0000
> This patch fixes the following warnings when CONFIG_PROC_FS is not set:
>
> linux/net/atm/lec.c: In function ‘lane_module_cleanup’:
> linux/net/atm/lec.c:1062:27: error: ‘atm_proc_root’ undeclared (first
> use in this function)
> remove_proc_entry("lec", atm_proc_root);
> ^
> linux/net/atm/lec.c:1062:27: note: each undeclared identifier is
> reported only once for each function it appears in
>
> Signed-off-by: Augusto Mecking Caringi <augustocaringi@gmail.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH] net: wan: slic_ds26522: fix spelling mistake: "configurated" -> "configured"
From: David Miller @ 2016-12-28 20:12 UTC (permalink / raw)
To: colin.king; +Cc: javier, qiang.zhao, netdev, linux-kernel
In-Reply-To: <20161228164423.19070-1-colin.king@canonical.com>
From: Colin King <colin.king@canonical.com>
Date: Wed, 28 Dec 2016 16:44:23 +0000
> From: Colin Ian King <colin.king@canonical.com>
>
> trivial fix to spelling mistake in pr_info message
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCHv2 net-next 00/16] net: mvpp2: add basic support for PPv2.2
From: Thomas Petazzoni @ 2016-12-28 21:08 UTC (permalink / raw)
To: David Miller
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg, pawel.moll-5wv7dgnIgG8,
mark.rutland-5wv7dgnIgG8, galak-sgV2jX0FEOL9JmXXK+q4OQ,
jason-NLaQJdtUoK4Be96aLqz0jA, andrew-g2DYL2Zd6BY,
sebastian.hesselbarth-Re5JQEeQqe8AvxtiuMwx3w,
gregory.clement-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
nadavh-eYqpPyKDWXRBDgjK7y7TUQ, hannah-eYqpPyKDWXRBDgjK7y7TUQ,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
stefanc-eYqpPyKDWXRBDgjK7y7TUQ, mw-nYOzD4b6Jr9Wk0Htik3J/w
In-Reply-To: <20161228.120644.1166014191192724301.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Hello,
On Wed, 28 Dec 2016 12:06:44 -0500 (EST), David Miller wrote:
> > This series depends on the series named "net: mvpp2: misc improvements
> > and preparation patches".
>
> Please in the future only submit one patch series at a time.
>
> If I've told you that a large patch series is hard to review and that
> therefore one should keep each submitted series small and to a
> reasonable size, that is completely undermined when you submit
> multiple series to work around that request.
Sure. I'll wait for the first patch series to be merged (potentially
after several iterations) before resending the second patch series.
Thanks for the feedback!
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] Bluetooth: fix spelling mistake: "advetising" -> "advertising"
From: Colin King @ 2016-12-28 21:17 UTC (permalink / raw)
To: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S . Miller,
linux-bluetooth, netdev
Cc: linux-kernel
From: Colin Ian King <colin.king@canonical.com>
trivial fix to spelling mistake in BT_ERR_RATELIMITED error message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
net/bluetooth/hci_event.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index e17aacb..0b4dba0 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -4749,7 +4749,7 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr,
case LE_ADV_SCAN_RSP:
break;
default:
- BT_ERR_RATELIMITED("Unknown advetising packet type: 0x%02x",
+ BT_ERR_RATELIMITED("Unknown advertising packet type: 0x%02x",
type);
return;
}
--
2.10.2
^ permalink raw reply related
* Re: [RFC PATCH] i40e: enable PCIe relax ordering for SPARC
From: tndave @ 2016-12-28 21:55 UTC (permalink / raw)
To: maowenan, jeffrey.t.kirsher@intel.com,
intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, weiyongjun (A), Dingtianhong
In-Reply-To: <F95AC9340317A84688A5F0DF0246F3F201521164@szxeml504-mbs.china.huawei.com>
On 12/27/2016 04:40 PM, maowenan wrote:
>
>
>> -----Original Message-----
>> From: tndave [mailto:tushar.n.dave@oracle.com]
>> Sent: Wednesday, December 28, 2016 6:28 AM
>> To: maowenan; jeffrey.t.kirsher@intel.com; intel-wired-lan@lists.osuosl.org
>> Cc: netdev@vger.kernel.org; weiyongjun (A); Dingtianhong
>> Subject: Re: [RFC PATCH] i40e: enable PCIe relax ordering for SPARC
>>
>>
>>
>> On 12/26/2016 03:39 AM, maowenan wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: netdev-owner@vger.kernel.org
>>>> [mailto:netdev-owner@vger.kernel.org]
>>>> On Behalf Of Tushar Dave
>>>> Sent: Tuesday, December 06, 2016 1:07 AM
>>>> To: jeffrey.t.kirsher@intel.com; intel-wired-lan@lists.osuosl.org
>>>> Cc: netdev@vger.kernel.org
>>>> Subject: [RFC PATCH] i40e: enable PCIe relax ordering for SPARC
>>>>
>>>> Unlike previous generation NIC (e.g. ixgbe) i40e doesn't seem to have
>>>> standard CSR where PCIe relaxed ordering can be set. Without PCIe
>>>> relax ordering enabled, i40e performance is significantly low on SPARC.
>>>>
>>> [Mao Wenan]Hi Tushar, you have referred to i40e doesn't seem to have
>>> standard CSR to set PCIe relaxed ordering, this CSR like TX&Rx DCA Control
>> Register in 82599, right?
>> Yes.
>> i40e datasheet mentions some CSR that can be used to enable/disable PCIe
>> relaxed ordering in device; however I don't see the exact definition of those
>> register in datasheet.
>> (https://www.mail-archive.com/netdev@vger.kernel.org/msg117219.html).
>>
>>> Is DMA_ATTR_WEAK_ORDERING the same as TX&RX control register in
>> 82599?
>> No.
>> DMA_ATTR_WEAK_ORDERING applies to the PCIe root complex of the system.
>>
>> -Tushar
>
> I understand that the PCIe Root Complex is the Host Bridge in the CPU that
> connects the CPU and memory to the PCIe architecture. So this attribute
> DMA_ATTR_WEAK_ORDERING is only applied on CPU side(the SPARC in you
> system), it can't apply on i40e, is it right?
Yes.
> And it is not the same as 82599 DCA control register's relax ordering bits.
It is not same as 82599 DCA control register's relax ordering bits.
-Tushar
> -Mao Wenan
>
>>>
>>> And to enable relax ordering mode in 82599 for SPARC using below codes:
>>> s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw) {
>>> u32 i;
>>>
>>> /* Clear the rate limiters */
>>> for (i = 0; i < hw->mac.max_tx_queues; i++) {
>>> IXGBE_WRITE_REG(hw, IXGBE_RTTDQSEL, i);
>>> IXGBE_WRITE_REG(hw, IXGBE_RTTBCNRC, 0);
>>> }
>>> IXGBE_WRITE_FLUSH(hw);
>>>
>>> #ifndef CONFIG_SPARC
>>> /* Disable relaxed ordering */
>>> for (i = 0; i < hw->mac.max_tx_queues; i++) {
>>> u32 regval;
>>>
>>> regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
>>> regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
>>> IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
>>> }
>>>
>>> for (i = 0; i < hw->mac.max_rx_queues; i++) {
>>> u32 regval;
>>>
>>> regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
>>> regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
>>> IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
>>> IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
>>> }
>>> #endif
>>> return 0;
>>> }
>>>
>>>
>>>
>>>> This patch sets PCIe relax ordering for SPARC arch by setting dma
>>>> attr DMA_ATTR_WEAK_ORDERING for every tx and rx DMA map/unmap.
>>>> This has shown 10x increase in performance numbers.
>>>>
>>>> e.g.
>>>> iperf TCP test with 10 threads on SPARC S7
>>>>
>>>> Test 1: Without this patch
>>>>
>>>> [root@brm-snt1-03 net]# iperf -s
>>>> ------------------------------------------------------------
>>>> Server listening on TCP port 5001
>>>> TCP window size: 85.3 KByte (default)
>>>> ------------------------------------------------------------
>>>> [ 4] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40926 [
>>>> 5] local
>>>> 16.0.0.7 port 5001 connected with 16.0.0.1 port 40934 [ 6] local
>>>> 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40930 [ 7] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40928 [ 8] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40922 [ 9] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40932 [ 10] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40920 [ 11] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40924 [ 14] local
>>>> 16.0.0.7 port 5001 connected with 16.0.0.1 port 40982 [ 12] local
>>>> 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40980
>>>> [ ID] Interval Transfer Bandwidth
>>>> [ 4] 0.0-20.0 sec 566 MBytes 237 Mbits/sec
>>>> [ 5] 0.0-20.0 sec 532 MBytes 223 Mbits/sec
>>>> [ 6] 0.0-20.0 sec 537 MBytes 225 Mbits/sec
>>>> [ 8] 0.0-20.0 sec 546 MBytes 229 Mbits/sec
>>>> [ 11] 0.0-20.0 sec 592 MBytes 248 Mbits/sec
>>>> [ 7] 0.0-20.0 sec 539 MBytes 226 Mbits/sec
>>>> [ 9] 0.0-20.0 sec 572 MBytes 240 Mbits/sec
>>>> [ 10] 0.0-20.0 sec 604 MBytes 253 Mbits/sec
>>>> [ 14] 0.0-20.0 sec 567 MBytes 238 Mbits/sec
>>>> [ 12] 0.0-20.0 sec 511 MBytes 214 Mbits/sec
>>>> [SUM] 0.0-20.0 sec 5.44 GBytes 2.33 Gbits/sec
>>>>
>>>> Test 2: with this patch:
>>>>
>>>> [root@brm-snt1-03 net]# iperf -s
>>>> ------------------------------------------------------------
>>>> Server listening on TCP port 5001
>>>> TCP window size: 85.3 KByte (default)
>>>> ------------------------------------------------------------
>>>> TCP: request_sock_TCP: Possible SYN flooding on port 5001. Sending
>> cookies.
>>>> Check SNMP counters.
>>>> [ 4] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46876 [
>>>> 5] local
>>>> 16.0.0.7 port 5001 connected with 16.0.0.1 port 46874 [ 6] local
>>>> 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46872 [ 7] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46880 [ 8] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46878 [ 9] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46884 [ 10] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46886 [ 11] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46890 [ 12] local
>>>> 16.0.0.7 port 5001 connected with 16.0.0.1 port 46888 [ 13] local
>>>> 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46882
>>>> [ ID] Interval Transfer Bandwidth
>>>> [ 4] 0.0-20.0 sec 7.45 GBytes 3.19 Gbits/sec [ 5] 0.0-20.0 sec
>>>> 7.48 GBytes 3.21 Gbits/sec [ 7] 0.0-20.0 sec 7.34 GBytes 3.15
>>>> Gbits/sec [ 8] 0.0-20.0 sec 7.42 GBytes 3.18 Gbits/sec [ 9]
>>>> 0.0-20.0 sec 7.24 GBytes 3.11 Gbits/sec [ 10] 0.0-20.0 sec 7.40
>>>> GBytes 3.17 Gbits/sec [ 12] 0.0-20.0 sec 7.49 GBytes 3.21
>>>> Gbits/sec [ 6] 0.0-20.0 sec 7.30 GBytes 3.13 Gbits/sec [ 11]
>>>> 0.0-20.0 sec 7.44 GBytes 3.19 Gbits/sec [ 13] 0.0-20.0 sec 7.22
>>>> GBytes 3.10 Gbits/sec [SUM] 0.0-20.0 sec 73.8 GBytes 31.6
>>>> Gbits/sec
>>>>
>>>> NOTE: In my testing, this patch does _not_ show any harm to i40e
>>>> performance numbers on x86.
>>>>
>>>> Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
>>>> ---
>>>> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 69
>>>> ++++++++++++++++++++---------
>>>> ++++++++++++++++++++drivers/net/ethernet/intel/i40e/i40e_txrx.h |
>>>> 1 +
>>>> 2 files changed, 49 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>>> b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>>> index 6287bf6..800dca7 100644
>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>>> @@ -551,15 +551,17 @@ static void
>>>> i40e_unmap_and_free_tx_resource(struct i40e_ring *ring,
>>>> else
>>>> dev_kfree_skb_any(tx_buffer->skb);
>>>> if (dma_unmap_len(tx_buffer, len))
>>>> - dma_unmap_single(ring->dev,
>>>> - dma_unmap_addr(tx_buffer, dma),
>>>> - dma_unmap_len(tx_buffer, len),
>>>> - DMA_TO_DEVICE);
>>>> + dma_unmap_single_attrs(ring->dev,
>>>> + dma_unmap_addr(tx_buffer, dma),
>>>> + dma_unmap_len(tx_buffer, len),
>>>> + DMA_TO_DEVICE,
>>>> + ring->dma_attrs);
>>>> } else if (dma_unmap_len(tx_buffer, len)) {
>>>> - dma_unmap_page(ring->dev,
>>>> - dma_unmap_addr(tx_buffer, dma),
>>>> - dma_unmap_len(tx_buffer, len),
>>>> - DMA_TO_DEVICE);
>>>> + dma_unmap_single_attrs(ring->dev,
>>>> + dma_unmap_addr(tx_buffer, dma),
>>>> + dma_unmap_len(tx_buffer, len),
>>>> + DMA_TO_DEVICE,
>>>> + ring->dma_attrs);
>>>> }
>>>>
>>>> tx_buffer->next_to_watch = NULL;
>>>> @@ -662,6 +664,8 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>>> struct i40e_tx_buffer *tx_buf;
>>>> struct i40e_tx_desc *tx_head;
>>>> struct i40e_tx_desc *tx_desc;
>>>> + dma_addr_t addr;
>>>> + size_t size;
>>>> unsigned int total_bytes = 0, total_packets = 0;
>>>> unsigned int budget = vsi->work_limit;
>>>>
>>>> @@ -696,10 +700,11 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>>> napi_consume_skb(tx_buf->skb, napi_budget);
>>>>
>>>> /* unmap skb header data */
>>>> - dma_unmap_single(tx_ring->dev,
>>>> - dma_unmap_addr(tx_buf, dma),
>>>> - dma_unmap_len(tx_buf, len),
>>>> - DMA_TO_DEVICE);
>>>> + dma_unmap_single_attrs(tx_ring->dev,
>>>> + dma_unmap_addr(tx_buf, dma),
>>>> + dma_unmap_len(tx_buf, len),
>>>> + DMA_TO_DEVICE,
>>>> + tx_ring->dma_attrs);
>>>>
>>>> /* clear tx_buffer data */
>>>> tx_buf->skb = NULL;
>>>> @@ -717,12 +722,15 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>>> tx_desc = I40E_TX_DESC(tx_ring, 0);
>>>> }
>>>>
>>>> + addr = dma_unmap_addr(tx_buf, dma);
>>>> + size = dma_unmap_len(tx_buf, len);
>>>> /* unmap any remaining paged data */
>>>> if (dma_unmap_len(tx_buf, len)) {
>>>> - dma_unmap_page(tx_ring->dev,
>>>> - dma_unmap_addr(tx_buf, dma),
>>>> - dma_unmap_len(tx_buf, len),
>>>> - DMA_TO_DEVICE);
>>>> + dma_unmap_single_attrs(tx_ring->dev,
>>>> + addr,
>>>> + size,
>>>> + DMA_TO_DEVICE,
>>>> + tx_ring->dma_attrs);
>>>> dma_unmap_len_set(tx_buf, len, 0);
>>>> }
>>>> }
>>>> @@ -1010,6 +1018,11 @@ int i40e_setup_tx_descriptors(struct i40e_ring
>>>> *tx_ring)
>>>> */
>>>> tx_ring->size += sizeof(u32);
>>>> tx_ring->size = ALIGN(tx_ring->size, 4096);
>>>> +#ifdef CONFIG_SPARC
>>>> + tx_ring->dma_attrs = DMA_ATTR_WEAK_ORDERING; #else
>>>> + tx_ring->dma_attrs = 0;
>>>> +#endif
>>>> tx_ring->desc = dma_alloc_coherent(dev, tx_ring->size,
>>>> &tx_ring->dma, GFP_KERNEL);
>>>> if (!tx_ring->desc) {
>>>> @@ -1053,7 +1066,11 @@ void i40e_clean_rx_ring(struct i40e_ring
>> *rx_ring)
>>>> if (!rx_bi->page)
>>>> continue;
>>>>
>>>> - dma_unmap_page(dev, rx_bi->dma, PAGE_SIZE,
>>>> DMA_FROM_DEVICE);
>>>> + dma_unmap_single_attrs(dev,
>>>> + rx_bi->dma,
>>>> + PAGE_SIZE,
>>>> + DMA_FROM_DEVICE,
>>>> + rx_ring->dma_attrs);
>>>> __free_pages(rx_bi->page, 0);
>>>>
>>>> rx_bi->page = NULL;
>>>> @@ -1113,6 +1130,11 @@ int i40e_setup_rx_descriptors(struct i40e_ring
>>>> *rx_ring)
>>>> /* Round up to nearest 4K */
>>>> rx_ring->size = rx_ring->count * sizeof(union i40e_32byte_rx_desc);
>>>> rx_ring->size = ALIGN(rx_ring->size, 4096);
>>>> +#ifdef CONFIG_SPARC
>>>> + rx_ring->dma_attrs = DMA_ATTR_WEAK_ORDERING; #else
>>>> + rx_ring->dma_attrs = 0;
>>>> +#endif
>>>> rx_ring->desc = dma_alloc_coherent(dev, rx_ring->size,
>>>> &rx_ring->dma, GFP_KERNEL);
>>>>
>>>> @@ -1182,7 +1204,8 @@ static bool i40e_alloc_mapped_page(struct
>>>> i40e_ring *rx_ring,
>>>> }
>>>>
>>>> /* map page for use */
>>>> - dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE,
>>>> DMA_FROM_DEVICE);
>>>> + dma = dma_map_single_attrs(rx_ring->dev, page_address(page),
>>>> PAGE_SIZE,
>>>> + DMA_FROM_DEVICE, rx_ring->dma_attrs);
>>>>
>>>> /* if mapping failed free memory back to system since
>>>> * there isn't much point in holding memory we can't use @@
>> -1695,8
>>>> +1718,11 @@ struct sk_buff *i40e_fetch_rx_buffer(struct i40e_ring
>>>> +*rx_ring,
>>>> rx_ring->rx_stats.page_reuse_count++;
>>>> } else {
>>>> /* we are not reusing the buffer so unmap it */
>>>> - dma_unmap_page(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
>>>> - DMA_FROM_DEVICE);
>>>> + dma_unmap_single_attrs(rx_ring->dev,
>>>> + rx_buffer->dma,
>>>> + PAGE_SIZE,
>>>> + DMA_FROM_DEVICE,
>>>> + rx_ring->dma_attrs);
>>>> }
>>>>
>>>> /* clear contents of buffer_info */ @@ -2737,7 +2763,8 @@ static
>>>> inline void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff
>>>> *skb,
>>>> first->skb = skb;
>>>> first->tx_flags = tx_flags;
>>>>
>>>> - dma = dma_map_single(tx_ring->dev, skb->data, size, DMA_TO_DEVICE);
>>>> + dma = dma_map_single_attrs(tx_ring->dev, skb->data, size,
>>>> + DMA_TO_DEVICE, tx_ring->dma_attrs);
>>>>
>>>> tx_desc = I40E_TX_DESC(tx_ring, i);
>>>> tx_bi = first;
>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>>>> b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>>>> index 5088405..9a86212 100644
>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>>>> @@ -327,6 +327,7 @@ struct i40e_ring {
>>>>
>>>> unsigned int size; /* length of descriptor ring in bytes */
>>>> dma_addr_t dma; /* physical address of ring */
>>>> + unsigned long dma_attrs; /* DMA attributes */
>>>>
>>>> struct i40e_vsi *vsi; /* Backreference to associated VSI */
>>>> struct i40e_q_vector *q_vector; /* Backreference to associated
>> vector
>>>> */
>>>> --
>>>> 1.9.1
>>>
>>>
>
^ permalink raw reply
* (unknown),
From: doctornina @ 2016-12-28 22:43 UTC (permalink / raw)
To: netdev
[-- Attachment #1: 24488470886248_netdev.zip --]
[-- Type: application/zip, Size: 43724 bytes --]
^ permalink raw reply
* [PATCH] net: ethernet: ti: davinci_cpdma: fix access to uninitialized variable in cpdma_chan_set_descs()
From: Grygorii Strashko @ 2016-12-28 23:42 UTC (permalink / raw)
To: David S. Miller, netdev, Mugunthan V N
Cc: Sekhar Nori, linux-kernel, linux-omap, Ivan Khoronzhuk,
Grygorii Strashko
Now below code sequence causes "Unable to handle kernel NULL pointer
dereference.." exception and system crash during CPSW CPDMA initialization:
cpsw_probe
|-cpdma_chan_create (TX channel)
|-cpdma_chan_split_pool
|-cpdma_chan_set_descs(for TX channels)
|-cpdma_chan_set_descs(for RX channels) [1]
- and -
static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
int rx, int desc_num,
int per_ch_desc)
{
struct cpdma_chan *chan, *most_chan = NULL;
...
for (i = min; i < max; i++) {
chan = ctlr->channels[i];
if (!chan)
continue;
...
if (most_dnum < chan->desc_num) {
most_dnum = chan->desc_num;
most_chan = chan;
}
}
/* use remains */
most_chan->desc_num += desc_cnt; [2]
}
So, most_chan value will never be reassigned when cpdma_chan_set_descs() is
called second time [1], because there are no RX channels yet and system
will crash at [2].
Hence, fix the issue by checking most_chan for NULL before accessing it.
Fixes: 0fc6432cc78d ("net: ethernet: ti: davinci_cpdma: add weight function for channels")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
drivers/net/ethernet/ti/davinci_cpdma.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 36518fc..b349d572 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -708,7 +708,8 @@ static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
}
}
/* use remains */
- most_chan->desc_num += desc_cnt;
+ if (most_chan)
+ most_chan->desc_num += desc_cnt;
}
/**
--
2.10.1.dirty
^ permalink raw reply related
* [PATCH net] net: stmmac: Fix error path after register_netdev move
From: Florian Fainelli @ 2016-12-28 23:44 UTC (permalink / raw)
To: netdev
Cc: Florian Fainelli, pavel, Joao.Pinto, seraphin.bonnaffe,
alexandre.torgue, manabian, niklas.cassel, johan, boon.leong.ong,
weifeng.voon, lars.persson, linux-kernel, Giuseppe Cavallaro,
Alexandre Torgue
Commit 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and
stmmac_open") re-ordered how the MDIO bus registration and the network
device are registered, but missed to unwind the MDIO bus registration in
case we fail to register the network device.
Fixes: 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and stmmac_open")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 5910ea51f8f6..39eb7a65bb9f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3366,12 +3366,19 @@ int stmmac_dvr_probe(struct device *device,
}
ret = register_netdev(ndev);
- if (ret)
+ if (ret) {
netdev_err(priv->dev, "%s: ERROR %i registering the device\n",
__func__, ret);
+ goto error_netdev_register;
+ }
return ret;
+error_netdev_register:
+ if (priv->hw->pcs != STMMAC_PCS_RGMII &&
+ priv->hw->pcs != STMMAC_PCS_TBI &&
+ priv->hw->pcs != STMMAC_PCS_RTBI)
+ stmmac_mdio_unregister(ndev);
error_mdio_register:
netif_napi_del(&priv->napi);
error_hw_init:
--
2.9.3
^ permalink raw reply related
* RE: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and stmmac_dvr_probe
From: Kweh, Hock Leong @ 2016-12-29 0:26 UTC (permalink / raw)
To: Kishan Sandeep
Cc: David Miller, f.fainelli@gmail.com, Joao.Pinto@synopsys.com,
peppe.cavallaro@st.com, seraphin.bonnaffe@st.com,
alexandre.torgue@gmail.com, manabian@gmail.com,
niklas.cassel@axis.com, johan@kernel.org, pavel@ucw.cz,
lars.persson@axis.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <CAJ3s=NAzfUSCnxxZph5j6Vk2T=pw2F9U3FLP7dOKRp21mmQvcg@mail.gmail.com>
> -----Original Message-----
> From: Kishan Sandeep [mailto:sandeepkishan108@gmail.com]
> Sent: Wednesday, December 28, 2016 7:56 PM
> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> Cc: David Miller <davem@davemloft.net>; f.fainelli@gmail.com;
> Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> seraphin.bonnaffe@st.com; alexandre.torgue@gmail.com;
> manabian@gmail.com; niklas.cassel@axis.com; johan@kernel.org;
> pavel@ucw.cz; lars.persson@axis.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and
> stmmac_dvr_probe
>
> On Wed, Dec 28, 2016 at 7:10 AM, Kweh, Hock Leong
> <hock.leong.kweh@intel.com> wrote:
> >> -----Original Message-----
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: Wednesday, December 28, 2016 12:34 AM
> >> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> >> Cc: Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> >> seraphin.bonnaffe@st.com; f.fainelli@gmail.com;
> >> alexandre.torgue@gmail.com; manabian@gmail.com;
> >> niklas.cassel@axis.com; johan@kernel.org; pavel@ucw.cz; Ong, Boon
> >> Leong <boon.leong.ong@intel.com>; Voon, Weifeng
> >> <weifeng.voon@intel.com>; lars.persson@axis.com;
> >> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize
> >> stmmac_open and stmmac_dvr_probe
> >>
> >> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> >> Date: Tue, 27 Dec 2016 22:42:36 +0800
> >>
> >> > From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> >>
> >> You are not the author of this change, do not take credit for it.
> >>
> >> You have copied Florian's patch character by character, therefore he
> >> is the author.
> >>
> >> You also didn't CC: the netdev mailing list properly.
> >
> > Noted & Thanks.
> >
> > Hi Florian, could you submit this fix from your side so that you are the author.
> > I will help to test out.
> >
> > Thanks & Regards,
> > Wilson
> >
> I think you can give *--author* for giving author name. Try git commit -am
> "commit message" --author="Author_name <author_email>"
Oh ... I am not aware of that. Thanks for informing.
:-)
Regards,
Wilson
^ permalink raw reply
* RE: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and stmmac_dvr_probe
From: Kweh, Hock Leong @ 2016-12-29 0:28 UTC (permalink / raw)
To: Florian Fainelli, David Miller
Cc: Joao.Pinto@synopsys.com, peppe.cavallaro@st.com,
seraphin.bonnaffe@st.com, alexandre.torgue@gmail.com,
manabian@gmail.com, niklas.cassel@axis.com, johan@kernel.org,
pavel@ucw.cz, Ong, Boon Leong, Voon, Weifeng,
lars.persson@axis.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <e88996a8-e010-c07f-7dee-d220a6dfc706@gmail.com>
> -----Original Message-----
> From: Florian Fainelli [mailto:f.fainelli@gmail.com]
> Sent: Thursday, December 29, 2016 2:43 AM
> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>; David Miller
> <davem@davemloft.net>
> Cc: Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> seraphin.bonnaffe@st.com; alexandre.torgue@gmail.com;
> manabian@gmail.com; niklas.cassel@axis.com; johan@kernel.org;
> pavel@ucw.cz; Ong, Boon Leong <boon.leong.ong@intel.com>; Voon, Weifeng
> <weifeng.voon@intel.com>; lars.persson@axis.com; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and
> stmmac_dvr_probe
>
> On 12/27/2016 09:49 PM, Kweh, Hock Leong wrote:
> >> -----Original Message-----
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: Wednesday, December 28, 2016 12:34 AM
> >> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> >> Cc: Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> >> seraphin.bonnaffe@st.com; f.fainelli@gmail.com;
> >> alexandre.torgue@gmail.com; manabian@gmail.com;
> >> niklas.cassel@axis.com; johan@kernel.org; pavel@ucw.cz; Ong, Boon
> >> Leong <boon.leong.ong@intel.com>; Voon, Weifeng
> >> <weifeng.voon@intel.com>; lars.persson@axis.com;
> >> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize
> >> stmmac_open and stmmac_dvr_probe
> >>
> >> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> >> Date: Tue, 27 Dec 2016 22:42:36 +0800
> >>
> >>> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> >>
> >> You are not the author of this change, do not take credit for it.
> >>
> >> You have copied Florian's patch character by character, therefore he
> >> is the author.
> >>
> >> You also didn't CC: the netdev mailing list properly.
> >
> > Hi David & Florian,
> >
> > Just to clarify that I do not copy exactly from Florian.
> > I have changed it to have proper handling on mdio unregister while
> > netdev_register() failed as showed below:
> >
> > return 0;
> >
> > -error_mdio_register:
> > - unregister_netdev(ndev);
> > error_netdev_register:
> > + stmmac_mdio_unregister(ndev);
>
> Although this is required, we can't be doing it in all circumstances, we need to
> mimic what stmmac_drv_remove() does.
>
> Let me submit an incremental fix which takes care of mdio bus unregistration.
> --
> Florian
Noted & Thanks. Will test it out once you submitted.
Thanks & Regards,
Wilson
^ permalink raw reply
* Re: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and stmmac_dvr_probe
From: Florian Fainelli @ 2016-12-29 0:40 UTC (permalink / raw)
To: Kweh, Hock Leong, David Miller
Cc: Joao.Pinto@synopsys.com, peppe.cavallaro@st.com,
seraphin.bonnaffe@st.com, alexandre.torgue@gmail.com,
manabian@gmail.com, niklas.cassel@axis.com, johan@kernel.org,
pavel@ucw.cz, Ong, Boon Leong, Voon, Weifeng,
lars.persson@axis.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <F54AEECA5E2B9541821D670476DAE19C5A916EB3@PGSMSX102.gar.corp.intel.com>
On 12/28/2016 04:28 PM, Kweh, Hock Leong wrote:
>> Although this is required, we can't be doing it in all circumstances, we need to
>> mimic what stmmac_drv_remove() does.
>>
>> Let me submit an incremental fix which takes care of mdio bus unregistration.
>> --
>> Florian
>
> Noted & Thanks. Will test it out once you submitted.
It's done:
https://www.spinics.net/lists/netdev/msg411934.html
--
Florian
^ permalink raw reply
* Re: [PATCH] net: ethernet: ti: davinci_cpdma: fix access to uninitialized variable in cpdma_chan_set_descs()
From: Ivan Khoronzhuk @ 2016-12-29 1:49 UTC (permalink / raw)
To: Grygorii Strashko
Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
linux-omap
In-Reply-To: <20161228234213.22166-1-grygorii.strashko@ti.com>
On Wed, Dec 28, 2016 at 05:42:13PM -0600, Grygorii Strashko wrote:
Grygorii,
> Now below code sequence causes "Unable to handle kernel NULL pointer
> dereference.." exception and system crash during CPSW CPDMA initialization:
>
> cpsw_probe
> |-cpdma_chan_create (TX channel)
> |-cpdma_chan_split_pool
> |-cpdma_chan_set_descs(for TX channels)
> |-cpdma_chan_set_descs(for RX channels) [1]
>
> - and -
> static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
> int rx, int desc_num,
> int per_ch_desc)
> {
> struct cpdma_chan *chan, *most_chan = NULL;
>
> ...
>
> for (i = min; i < max; i++) {
> chan = ctlr->channels[i];
> if (!chan)
> continue;
> ...
>
> if (most_dnum < chan->desc_num) {
> most_dnum = chan->desc_num;
> most_chan = chan;
> }
> }
> /* use remains */
> most_chan->desc_num += desc_cnt; [2]
> }
>
> So, most_chan value will never be reassigned when cpdma_chan_set_descs() is
> called second time [1], because there are no RX channels yet and system
> will crash at [2].
How did you get this?
I just remember as I fixed it before sending patchset.
Maybe it was some experiment with it.
I just wonder and want to find actual reason what's happening.
Look bellow:
cpsw_probe
|-cpdma_chan_create (TX channel)
|-cpdma_chan_split_pool
|-cpdma_chan_set_descs(for TX channels)
|-cpdma_chan_set_descs(for RX channels) [1]
|-cpdma_chan_set_descs(for RX channels) in case you'be described has to be
called with rx_desc_num = 0, because all descs are assigned already for tx
channel. And, if desc_num = 0, cpdma_chan_set_descs just exits and no issues.
So, could you please explain how you get this, in what circumstances.
>
> Hence, fix the issue by checking most_chan for NULL before accessing it.
>
> Fixes: 0fc6432cc78d ("net: ethernet: ti: davinci_cpdma: add weight function for channels")
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
> drivers/net/ethernet/ti/davinci_cpdma.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
> index 36518fc..b349d572 100644
> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> @@ -708,7 +708,8 @@ static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
> }
> }
> /* use remains */
> - most_chan->desc_num += desc_cnt;
> + if (most_chan)
> + most_chan->desc_num += desc_cnt;
> }
>
> /**
> --
> 2.10.1.dirty
>
^ permalink raw reply
* Re: [PATCH] net: fix incorrect original ingress device index in PKTINFO
From: David Ahern @ 2016-12-29 4:42 UTC (permalink / raw)
To: David Miller, asuka.com
Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <20161227.140313.1837464529059496066.davem@davemloft.net>
On 12/27/16 12:03 PM, David Miller wrote:
> From: Wei Zhang <asuka.com@163.com>
> Date: Tue, 27 Dec 2016 17:52:24 +0800
>
>> When we send a packet for our own local address on a non-loopback
>> interface (e.g. eth0), due to the change had been introduced from
>> commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the
>> original ingress device index would be set as the loopback interface.
>> However, the packet should be considered as if it is being arrived via the
>> sending interface (eth0), otherwise it would break the expectation of the
>> userspace application (e.g. the DHCPRELEASE message from dhcp_release
>> binary would be ignored by the dnsmasq daemon, since it come from lo which
>> is not the interface dnsmasq bind to)
>>
Add a Fixes line before the sign-off:
Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO")
>> Signed-off-by: Wei Zhang <asuka.com@163.com>
>
> When you are fixing a problem introduced by another change, always CC:
> the author of that change as I have done so here.
>
> David, please take a look at this, thanks.
>
>> ---
>> net/ipv4/ip_sockglue.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>> index b8a2d63..76d78a7 100644
>> --- a/net/ipv4/ip_sockglue.c
>> +++ b/net/ipv4/ip_sockglue.c
>> @@ -1202,8 +1202,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
>> * which has interface index (iif) as the first member of the
>> * underlying inet{6}_skb_parm struct. This code then overlays
>> * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
>> - * element so the iif is picked up from the prior IPCB
>> + * element so the iif is picked up from the prior IPCB except
>> + * iif is loopback interface which the packet should be
>> + * considered as if it is being arrived via the sending
>> + * interface
That comment change could use an adjustment (adjust to fit with in the 80 columns):
element so the iif is picked up from the prior IPCB. If iif
is the loopback interface, then return the sending interface
(e.g., process binds socket to eth0 for Tx which is redirected
to loopback in the rtable/dst).
>> */
>> + if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX)
>> + pktinfo->ipi_ifindex = inet_iif(skb);
>> +
>> pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
>> } else {
>> pktinfo->ipi_ifindex = 0;
The actual change looks ok to me.
Acked-by: David Ahern <dsa@cumulusnetworks.com>
^ permalink raw reply
* [PATCH v2] net: fix incorrect original ingress device index in PKTINFO
From: Wei Zhang @ 2016-12-29 8:45 UTC (permalink / raw)
To: davem, kuznet, jmorris, yoshfuji, kaber, dsa; +Cc: netdev, linux-kernel
When we send a packet for our own local address on a non-loopback
interface (e.g. eth0), due to the change had been introduced from
commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the
original ingress device index would be set as the loopback interface.
However, the packet should be considered as if it is being arrived via the
sending interface (eth0), otherwise it would break the expectation of the
userspace application (e.g. the DHCPRELEASE message from dhcp_release
binary would be ignored by the dnsmasq daemon, since it come from lo which
is not the interface dnsmasq bind to)
Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO")
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Wei Zhang <asuka.com@163.com>
---
v2:
- add the missing Fixes line
- better comment come from David Ahern
net/ipv4/ip_sockglue.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 57e1405..53ae0c6 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1225,8 +1225,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
* which has interface index (iif) as the first member of the
* underlying inet{6}_skb_parm struct. This code then overlays
* PKTINFO_SKB_CB and in_pktinfo also has iif as the first
- * element so the iif is picked up from the prior IPCB
+ * element so the iif is picked up from the prior IPCB. If iif
+ * is the loopback interface, then return the sending interface
+ * (e.g., process binds socket to eth0 for Tx which is
+ * redirected to loopback in the rtable/dst).
*/
+ if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX)
+ pktinfo->ipi_ifindex = inet_iif(skb);
+
pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
} else {
pktinfo->ipi_ifindex = 0;
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH 05/12] Support for NIC-specific code
From: David VomLehn @ 2016-12-29 9:35 UTC (permalink / raw)
To: Rami Rosen
Cc: Netdev, Simon Edelhaus, Dmitrii Tarakanov, Alexander Loktionov,
Pavel Belous
In-Reply-To: <CAKoUArn+zfeU16SgJbPxh1f0ud+M8ERi2j6c34cqeO=+LGjuCw@mail.gmail.com>
Responses inline.
On 12/27/2016 09:21 PM, Rami Rosen wrote:
> Hi, David,
>
> Several nitpicks and comments, from a brief overview:
>
> The commented label //err_exit: should be removed
>> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>> @@ -0,0 +1,993 @@
>> +//err_exit:
>> +//err_exit:
> Shouldn't aq_nic_rss_init() be static? isn't it called only from
> aq_nic_cfg_init_defaults()?
> and it always returns 0, shouldn't it be void as well ? (+ remove
> checking the return code when invoking it in
> aq_nic_cfg_init_defaults())
Yes, thanks.
>> +int aq_nic_rss_init(struct aq_nic_s *self, unsigned int num_rss_queues)
>> +{
>> + struct aq_nic_cfg_s *cfg = &self->aq_nic_cfg;
>> + struct aq_receive_scale_parameters *rss_params = &cfg->aq_rss;
>> + int i = 0;
>> +
> ...
>> + return 0;
>> +}
>
> Shouldn't aq_nic_ndev_alloc() be static ? Isn't it invoked only from
> aq_nic_alloc_cold()?
Yes.
>
>> +struct net_device *aq_nic_ndev_alloc(void)
>> +{
> ...
>> +}
>
>
>> +
>> +static unsigned int aq_nic_map_skb_lso(struct aq_nic_s *self,
>> + struct sk_buff *skb,
>> + struct aq_ring_buff_s *dx)
>> +{
>> + unsigned int ret = 0U;
>> +
>> + dx->flags = 0U;
>> + dx->len_pkt = skb->len;
>> + dx->len_l2 = ETH_HLEN;
>> + dx->len_l3 = ip_hdrlen(skb);
>> + dx->len_l4 = tcp_hdrlen(skb);
>> + dx->mss = skb_shinfo(skb)->gso_size;
>> + dx->is_txc = 1U;
>> + ret = 1U;
>> +
> Why not remove this "ret" variable, and simply return 1 ? the method
> always returns 1:
>
>> + return ret;
>> +}
>> +
Yes, better.
>> +int aq_nic_xmit(struct aq_nic_s *self, struct sk_buff *skb)
>> +{
>> + struct aq_ring_s *ring = NULL;
>> + unsigned int frags = 0U;
>> + unsigned int vec = skb->queue_mapping % self->aq_nic_cfg.vecs;
>> + unsigned int tc = 0U;
>> + int err = 0;
>> + bool is_nic_in_bad_state;
>> + bool is_locked = false;
>> + bool is_busy = false;
>> + struct aq_ring_buff_s buffers[AQ_CFG_SKB_FRAGS_MAX];
>> +
>> + frags = skb_shinfo(skb)->nr_frags + 1;
>> +
>> + ring = self->aq_ring_tx[AQ_NIC_TCVEC2RING(self, tc, vec)];
>> +
>> + atomic_inc(&self->busy_count);
>> + is_busy = true;
>> +
>> + if (frags > AQ_CFG_SKB_FRAGS_MAX) {
>> + dev_kfree_skb_any(skb);
>> + goto err_exit;
>> + }
>> +
>> + is_nic_in_bad_state = AQ_OBJ_TST(self, AQ_NIC_FLAGS_IS_NOT_TX_READY) ||
>> + (aq_ring_avail_dx(ring) < AQ_CFG_SKB_FRAGS_MAX);
>> +
>> + if (is_nic_in_bad_state) {
>> + aq_nic_ndev_queue_stop(self, ring->idx);
>> + err = NETDEV_TX_BUSY;
>> + goto err_exit;
>> + }
>> +
> Usage of this internal block is not common (unless it is under #ifdef,
> and also not very common also in that case). I suggest move "unsigned
> int trys" to the variables definitions in the beginning of the method
> and remove the opening and closing brackets of the following block:
>> + {
>> + unsigned int trys = AQ_CFG_LOCK_TRYS;
>> +
>> + frags = aq_nic_map_skb(self, skb, &buffers[0]);
>> +
>> + do {
>> + is_locked = spin_trylock(&ring->lock);
>> + } while (--trys && !is_locked);
>> + if (!(is_locked)) {
>> + err = NETDEV_TX_BUSY;
>> + goto err_exit;
>> + }
>> +
Yes, this is better.
> Usually you don't let the mtu be less than 68, for example:
> http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/i40e/i40e_main.c#L2246
> See also RFV 791:
> https://tools.ietf.org/html/rfc791
>
>
>> +int aq_nic_set_mtu(struct aq_nic_s *self, int new_mtu)
>> +{
>> + int err = 0;
>> +
>> + if (new_mtu > self->aq_hw_caps.mtu) {
>> + err = 0;
>> + goto err_exit;
>> + }
>> + self->aq_nic_cfg.mtu = new_mtu;
>> +
>> +err_exit:
>> + return err;
>> +}
Clearly a must--thanks!
>> +
>> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
>> new file mode 100644
>> index 0000000..89958e7
>> --- /dev/null
>> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
>> @@ -0,0 +1,111 @@
>> +/*
>> + * Aquantia Corporation Network Driver
>> + * Copyright (C) 2014-2016 Aquantia Corporation. All rights reserved
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + */
>> +
>> +/*
> Should be, of course, aq_nic.h:
>
>> + * File aq_nic.c: Declaration of common code for NIC.
>> + */
>> +
Good point. Better still, including the name of the file has little
value and makes the comment incorrect if it gets renamed. So, thanks!
> Regards,
> Rami Rosen
--
David VL
^ permalink raw reply
* RE: [PATCH net] net: stmmac: Fix error path after register_netdev move
From: Kweh, Hock Leong @ 2016-12-29 9:36 UTC (permalink / raw)
To: Florian Fainelli, David Miller
Cc: Joao.Pinto@synopsys.com, peppe.cavallaro@st.com,
seraphin.bonnaffe@st.com, alexandre.torgue@gmail.com,
manabian@gmail.com, niklas.cassel@axis.com, johan@kernel.org,
pavel@ucw.cz, Ong, Boon Leong, Voon, Weifeng,
lars.persson@axis.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Giuseppe Cavallaro,
Alexandre Torgue
In-Reply-To: <D6759987A7968C4889FDA6FA91D5CBC801CA3364@PGSMSX103.gar.corp.intel.com>
> -----Original Message-----
> From: Florian Fainelli [mailto:f.fainelli@gmail.com]
> Sent: Thursday, December 29, 2016 7:45 AM
> To: netdev@vger.kernel.org
> Cc: Florian Fainelli <f.fainelli@gmail.com>; pavel@ucw.cz;
> Joao.Pinto@synopsys.com; seraphin.bonnaffe@st.com;
> alexandre.torgue@gmail.com; manabian@gmail.com; niklas.cassel@axis.com;
> johan@kernel.org; Ong, Boon Leong <boon.leong.ong@intel.com>; Voon,
> Weifeng <weifeng.voon@intel.com>; lars.persson@axis.com; linux-
> kernel@vger.kernel.org; Giuseppe Cavallaro <peppe.cavallaro@st.com>;
> Alexandre Torgue <alexandre.torgue@st.com>
> Subject: [PATCH net] net: stmmac: Fix error path after register_netdev move
>
> Commit 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and
> stmmac_open") re-ordered how the MDIO bus registration and the network
> device are registered, but missed to unwind the MDIO bus registration in case
> we fail to register the network device.
>
> Fixes: 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and
> stmmac_open")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
Acked-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 5910ea51f8f6..39eb7a65bb9f 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3366,12 +3366,19 @@ int stmmac_dvr_probe(struct device *device,
> }
>
> ret = register_netdev(ndev);
> - if (ret)
> + if (ret) {
> netdev_err(priv->dev, "%s: ERROR %i registering the device\n",
> __func__, ret);
> + goto error_netdev_register;
> + }
>
> return ret;
>
> +error_netdev_register:
> + if (priv->hw->pcs != STMMAC_PCS_RGMII &&
> + priv->hw->pcs != STMMAC_PCS_TBI &&
> + priv->hw->pcs != STMMAC_PCS_RTBI)
> + stmmac_mdio_unregister(ndev);
> error_mdio_register:
> netif_napi_del(&priv->napi);
> error_hw_init:
> --
> 2.9.3
^ permalink raw reply
* [PATCH v4] net: dev_weight: TX/RX orthogonality
From: Matthias Tafelmeier @ 2016-12-29 9:58 UTC (permalink / raw)
To: netdev; +Cc: hagen, fw, edumazet, daniel
In-Reply-To: <20161228.141751.81302085672323860.davem@davemloft.net>
Oftenly, introducing side effects on packet processing on the other half
of the stack by adjusting one of TX/RX via sysctl is not desirable.
There are cases of demand for asymmetric, orthogonal configurability.
This holds true especially for nodes where RPS for RFS usage on top is
configured and therefore use the 'old dev_weight'. This is quite a
common base configuration setup nowadays, even with NICs of superior processing
support (e.g. aRFS).
A good example use case are nodes acting as noSQL data bases with a
large number of tiny requests and rather fewer but large packets as responses.
It's affordable to have large budget and rx dev_weights for the
requests. But as a side effect having this large a number on TX
processed in one run can overwhelm drivers.
This patch therefore introduces an independent configurability via sysctl to
userland.
---
Documentation/sysctl/net.txt | 21 +++++++++++++++++++++
include/linux/netdevice.h | 4 ++++
net/core/dev.c | 6 +++++-
net/core/sysctl_net_core.c | 31 ++++++++++++++++++++++++++++++-
net/sched/sch_generic.c | 2 +-
5 files changed, 61 insertions(+), 3 deletions(-)
diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index f0480f7..53cef32 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -61,6 +61,27 @@ The maximum number of packets that kernel can handle on a NAPI interrupt,
it's a Per-CPU variable.
Default: 64
+dev_weight_rx_bias
+--------------
+
+RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function
+of the driver for the per softirq cycle netdev_budget. This parameter influences
+the proportion of the configured netdev_budget that is spent on RPS based packet
+processing during RX softirq cycles. It is further meant for making current
+dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack.
+(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based
+on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias).
+Default: 1
+
+dev_weight_tx_bias
+--------------
+
+Scales the maximum number of packets that can be processed during a TX softirq cycle.
+Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric
+net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog.
+Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias).
+Default: 1
+
default_qdisc
--------------
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 994f742..ecd78b3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3795,6 +3795,10 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
extern int netdev_max_backlog;
extern int netdev_tstamp_prequeue;
extern int weight_p;
+extern int dev_weight_rx_bias;
+extern int dev_weight_tx_bias;
+extern int dev_rx_weight;
+extern int dev_tx_weight;
bool netdev_has_upper_dev(struct net_device *dev, struct net_device *upper_dev);
struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index 8db5a0b..f2fe98b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3428,6 +3428,10 @@ EXPORT_SYMBOL(netdev_max_backlog);
int netdev_tstamp_prequeue __read_mostly = 1;
int netdev_budget __read_mostly = 300;
int weight_p __read_mostly = 64; /* old backlog weight */
+int dev_weight_rx_bias __read_mostly = 1; /* bias for backlog weight */
+int dev_weight_tx_bias __read_mostly = 1; /* bias for output_queue quota */
+int dev_rx_weight __read_mostly = weight_p;
+int dev_tx_weight __read_mostly = weight_p;
/* Called with irq disabled */
static inline void ____napi_schedule(struct softnet_data *sd,
@@ -4833,7 +4837,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
net_rps_action_and_irq_enable(sd);
}
- napi->weight = weight_p;
+ napi->weight = dev_rx_weight;
while (again) {
struct sk_buff *skb;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 2a46e40..698ddd7 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -222,6 +222,21 @@ static int set_default_qdisc(struct ctl_table *table, int write,
}
#endif
+static int proc_do_dev_weight(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ int ret;
+
+ ret = proc_dointvec(table, write, buffer, lenp, ppos);
+ if (ret != 0)
+ return ret;
+
+ dev_rx_weight = weight_p * dev_weight_rx_bias;
+ dev_tx_weight = weight_p * dev_weight_tx_bias;
+
+ return ret;
+}
+
static int proc_do_rss_key(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
@@ -273,7 +288,21 @@ static struct ctl_table net_core_table[] = {
.data = &weight_p,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = proc_do_dev_weight,
+ },
+ {
+ .procname = "dev_weight_rx_bias",
+ .data = &dev_weight_rx_bias,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_do_dev_weight,
+ },
+ {
+ .procname = "dev_weight_tx_bias",
+ .data = &dev_weight_tx_bias,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_do_dev_weight,
},
{
.procname = "netdev_max_backlog",
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 6eb9c8e..b052b27 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -247,7 +247,7 @@ static inline int qdisc_restart(struct Qdisc *q, int *packets)
void __qdisc_run(struct Qdisc *q)
{
- int quota = weight_p;
+ int quota = dev_tx_weight;
int packets;
while (qdisc_restart(q, &packets)) {
--
2.7.4
^ permalink raw reply related
* Re: [PATCH] vif queue counters from int to long
From: Wei Liu @ 2016-12-29 10:47 UTC (permalink / raw)
To: Mart van Santen; +Cc: netdev, wei.liu2
In-Reply-To: <585D3E23.10509@greenhost.nl>
On Fri, Dec 23, 2016 at 04:09:23PM +0100, Mart van Santen wrote:
>
> Hello,
>
> This patch fixes an issue where counters in the queue have type int,
> while the counters of the vif itself are specified as long. This can
> cause incorrect reporting of tx/rx values of the vif interface.
> More extensively reported on xen-devel mailinglist.
>
Hello,
Please also CC xen-devel@lists.xenproject.org for your future patch(es).
And please note that the most up to date maintainer information should
be used.
Wei.
>
>
> Signed-off-by: Mart van Santen <mart@greenhost.nl>
> --- a/drivers/net/xen-netback/common.h 2016-12-22 15:41:07.785535748 +0000
> +++ b/drivers/net/xen-netback/common.h 2016-12-23 13:08:18.123080064 +0000
> @@ -113,10 +113,10 @@ struct xenvif_stats {
> * A subset of struct net_device_stats that contains only the
> * fields that are updated in netback.c for each queue.
> */
> - unsigned int rx_bytes;
> - unsigned int rx_packets;
> - unsigned int tx_bytes;
> - unsigned int tx_packets;
> + unsigned long rx_bytes;
> + unsigned long rx_packets;
> + unsigned long tx_bytes;
> + unsigned long tx_packets;
>
> /* Additional stats used by xenvif */
> unsigned long rx_gso_checksum_fixup;
>
> --
> Mart van Santen
> Greenhost
> E: mart@greenhost.nl
> T: +31 20 4890444
> W: https://greenhost.nl
>
> A PGP signature can be attached to this e-mail,
> you need PGP software to verify it.
> My public key is available in keyserver(s)
> see: http://tinyurl.com/openpgp-manual
>
> PGP Fingerprint: CA85 EB11 2B70 042D AF66 B29A 6437 01A1 10A3 D3A5
>
>
^ permalink raw reply
* [PATCH] scm: fix possible control message header alignment issue
From: yuan linyu @ 2016-12-29 12:10 UTC (permalink / raw)
To: netdev; +Cc: David S . Miller, yuan linyu
From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
1. put_cmsg{_compat}() may copy data to user when buffer free space less than
control message header alignment size.
2. scm_detach_fds{_compat}() may calc wrong fdmax if control message header
have greater alignment size.
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
---
net/compat.c | 10 ++++++++--
net/core/scm.c | 8 +++++---
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/net/compat.c b/net/compat.c
index 96c544b..fe1f41c 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -245,7 +245,9 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
return -EFAULT;
- if (copy_to_user(CMSG_COMPAT_DATA(cm), data, cmlen - sizeof(struct compat_cmsghdr)))
+ if (cmlen > CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) &&
+ copy_to_user(CMSG_COMPAT_DATA(cm), data,
+ cmlen - CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr))))
return -EFAULT;
cmlen = CMSG_COMPAT_SPACE(len);
if (kmsg->msg_controllen < cmlen)
@@ -258,12 +260,16 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
{
struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
- int fdmax = (kmsg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
+ int fdmax = 0;
int fdnum = scm->fp->count;
struct file **fp = scm->fp->fp;
int __user *cmfptr;
int err = 0, i;
+ if (kmsg->msg_controllen > CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)))
+ fdmax = (kmsg->msg_controllen -
+ CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr))) / sizeof(int);
+
if (fdnum < fdmax)
fdmax = fdnum;
diff --git a/net/core/scm.c b/net/core/scm.c
index d882043..5d8ef4f 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -238,7 +238,9 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
err = -EFAULT;
if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
goto out;
- if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr)))
+ if (cmlen > CMSG_ALIGN(sizeof(struct cmsghdr)) &&
+ copy_to_user(CMSG_DATA(cm), data,
+ cmlen - CMSG_ALIGN(sizeof(struct cmsghdr))))
goto out;
cmlen = CMSG_SPACE(len);
if (msg->msg_controllen < cmlen)
@@ -267,8 +269,8 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
return;
}
- if (msg->msg_controllen > sizeof(struct cmsghdr))
- fdmax = ((msg->msg_controllen - sizeof(struct cmsghdr))
+ if (msg->msg_controllen > CMSG_ALIGN(sizeof(struct cmsghdr)))
+ fdmax = ((msg->msg_controllen - CMSG_ALIGN(sizeof(struct cmsghdr)))
/ sizeof(int));
if (fdnum < fdmax)
--
2.7.4
^ permalink raw reply related
* [PATCH v2] scm: fix possible control message header alignment issue
From: yuan linyu @ 2016-12-29 12:39 UTC (permalink / raw)
To: netdev; +Cc: David S . Miller, yuan linyu
From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
1. put_cmsg{_compat}() may copy data to user when buffer free space less than
control message header alignment size.
2. scm_detach_fds{_compat}() may calc wrong fdmax if control message header
have greater alignment size.
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
---
net/compat.c | 10 ++++++++--
net/core/scm.c | 8 +++++---
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/net/compat.c b/net/compat.c
index 96c544b..ffe7a04 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -245,7 +245,9 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
return -EFAULT;
- if (copy_to_user(CMSG_COMPAT_DATA(cm), data, cmlen - sizeof(struct compat_cmsghdr)))
+ if (cmlen > CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) &&
+ copy_to_user(CMSG_COMPAT_DATA(cm), data,
+ cmlen - CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr))))
return -EFAULT;
cmlen = CMSG_COMPAT_SPACE(len);
if (kmsg->msg_controllen < cmlen)
@@ -258,12 +260,16 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
{
struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
- int fdmax = (kmsg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
+ int fdmax = 0;
int fdnum = scm->fp->count;
struct file **fp = scm->fp->fp;
int __user *cmfptr;
int err = 0, i;
+ if (kmsg->msg_controllen > CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)))
+ fdmax = (kmsg->msg_controllen -
+ CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr))) / sizeof(int);
+
if (fdnum < fdmax)
fdmax = fdnum;
diff --git a/net/core/scm.c b/net/core/scm.c
index d882043..b2e60fd 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -238,7 +238,9 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
err = -EFAULT;
if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
goto out;
- if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr)))
+ if (cmlen > CMSG_ALIGN(sizeof(struct cmsghdr)) &&
+ copy_to_user(CMSG_DATA(cm), data,
+ cmlen - CMSG_ALIGN(sizeof(struct cmsghdr))))
goto out;
cmlen = CMSG_SPACE(len);
if (msg->msg_controllen < cmlen)
@@ -267,8 +269,8 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
return;
}
- if (msg->msg_controllen > sizeof(struct cmsghdr))
- fdmax = ((msg->msg_controllen - sizeof(struct cmsghdr))
+ if (msg->msg_controllen > CMSG_ALIGN(sizeof(struct cmsghdr)))
+ fdmax = ((msg->msg_controllen - CMSG_ALIGN(sizeof(struct cmsghdr)))
/ sizeof(int));
if (fdnum < fdmax)
--
2.7.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox