* Re: [PATCH 2/2] be2net: drop non-tso frames longer than mtu
From: Ivan Vecera @ 2013-10-15 14:30 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Sathya Perla, netdev
In-Reply-To: <1381844840.2045.37.camel@edumazet-glaptop.roam.corp.google.com>
On 10/15/2013 03:47 PM, Eric Dumazet wrote:
> On Tue, 2013-10-15 at 17:26 +0530, Sathya Perla wrote:
>> From: Vasundhara Volam <vasundhara.volam@emulex.com>
>>
>> Pktgen can generate non-TSO frames of arbitrary length disregarding
>> the MTU value of the physical interface. Drop such frames in the driver
>> instead of sending them to HW as it cannot handle such frames.
>>
>> Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
>> Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
>> ---
>> drivers/net/ethernet/emulex/benet/be_main.c | 9 +++++++--
>> 1 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
>> index 2c38cc4..76057b8 100644
>> --- a/drivers/net/ethernet/emulex/benet/be_main.c
>> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
>> @@ -855,6 +855,13 @@ static struct sk_buff *be_xmit_workarounds(struct be_adapter *adapter,
>> unsigned int eth_hdr_len;
>> struct iphdr *ip;
>>
>> + /* Don't allow non-TSO packets longer than MTU */
>> + eth_hdr_len = (ntohs(skb->protocol) == ETH_P_8021Q) ?
>> + VLAN_ETH_HLEN : ETH_HLEN;
>> + if (!skb_is_gso(skb) &&
>> + (skb->len - eth_hdr_len) > adapter->netdev->mtu)
>> + goto tx_drop;
>> +
>
> When you say 'cannot handle them', is it some kind of nasty thing like
> hang / crash ?
AFAIK, the firmware in the card becomes unresponsive and reboot is
needed to make the NIC working.
Ivan
^ permalink raw reply
* Re: [PATCH] net: sctp: fix a cacc_saw_newack missetting issue
From: Vlad Yasevich @ 2013-10-15 14:34 UTC (permalink / raw)
To: Chang, nhorman; +Cc: davem, linux-sctp, netdev, linux-kernel
In-Reply-To: <525D50BD.5030301@gmail.com>
On 10/15/2013 10:27 AM, Chang wrote:
>
> On 10/15/2013 04:11 PM, Vlad Yasevich wrote:
>> On 10/14/2013 09:33 AM, Chang Xiangzhong wrote:
>>> For for each TSN t being newly acked (Not only cumulatively,
>>> but also SELECTIVELY) cacc_saw_newack should be set to 1.
>>>
>>> Signed-off-by: Xiangzhong Chang <changxiangzhong@gmail.com>
>>> ---
>>> net/sctp/outqueue.c | 42 +++++++++++++++++++++---------------------
>>> 1 file changed, 21 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
>>> index 94df758..d86032b 100644
>>> --- a/net/sctp/outqueue.c
>>> +++ b/net/sctp/outqueue.c
>>> @@ -1398,6 +1398,27 @@ static void sctp_check_transmitted(struct
>>> sctp_outq *q,
>>> forward_progress = true;
>>> }
>>>
>>> + if (!tchunk->tsn_gap_acked) {
>>
>> You can remove this test since the block just above already performs
>> it. Just fold this code into the block above.
>>
>> -vlad
>>
> Sorry, I'm not sure if I fully understand you. There are code blocks
> which checking the tchunk->tsn_gap_acked. In addition, they check other
> states as well.
The flow is:
if (sctp_acked(sack, tsn)) {
...
if (transport) {
....
}
if (!tchunk->tsn_gap_acked) {
....
}
if (TSN_lte(tsn, sack_ctsn)) {
....
/* SFR-CACC ...
}
Since you are moving this up, you can simply re-use
the if (!tchunk->tsn_gap_acked) immediately above.
>>> + /*
>>> + * SFR-CACC algorithm:
>>> + * 2) If the SACK contains gap acks
>>> + * and the flag CHANGEOVER_ACTIVE is
>>> + * set the receiver of the SACK MUST
>>> + * take the following action:
>>> + *
>>> + * B) For each TSN t being acked that
>>> + * has not been acked in any SACK so
>>> + * far, set cacc_saw_newack to 1 for
>>> + * the destination that the TSN was
>>> + * sent to.
>>> + */
>>> + if (transport &&
>>> + sack->num_gap_ack_blocks &&
>>> + q->asoc->peer.primary_path->cacc.
>>> + changeover_active)
>>> + transport->cacc.cacc_saw_newack = 1;
^^^^
Don't need that many spaces...
-vlad
>>> + }
>>> +
>>> if (TSN_lte(tsn, sack_ctsn)) {
>>> /* RFC 2960 6.3.2 Retransmission Timer Rules
>>> *
>>> @@ -1411,27 +1432,6 @@ static void sctp_check_transmitted(struct
>>> sctp_outq *q,
>>> restart_timer = 1;
>>> forward_progress = true;
>>>
>>> - if (!tchunk->tsn_gap_acked) {
>>> - /*
>>> - * SFR-CACC algorithm:
>>> - * 2) If the SACK contains gap acks
>>> - * and the flag CHANGEOVER_ACTIVE is
>>> - * set the receiver of the SACK MUST
>>> - * take the following action:
>>> - *
>>> - * B) For each TSN t being acked that
>>> - * has not been acked in any SACK so
>>> - * far, set cacc_saw_newack to 1 for
>>> - * the destination that the TSN was
>>> - * sent to.
>>> - */
>>> - if (transport &&
>>> - sack->num_gap_ack_blocks &&
>>> - q->asoc->peer.primary_path->cacc.
>>> - changeover_active)
>>> - transport->cacc.cacc_saw_newack
>>> - = 1;
>>> - }
>>>
>>> list_add_tail(&tchunk->transmitted_list,
>>> &q->sacked);
>>>
>>
>
^ permalink raw reply
* Re: [PATCH] usb: serial: option: blacklist Olivetti Olicard200
From: Dan Williams @ 2013-10-15 14:47 UTC (permalink / raw)
To: Enrico Mioso
Cc: gregkh, davem, bjorn, christian.schmiedl, linux-usb, netdev,
linux-kernel, Antonella Pellizzari
In-Reply-To: <1381842408-10800-1-git-send-email-mrkiko.rs@gmail.com>
On Tue, 2013-10-15 at 15:06 +0200, Enrico Mioso wrote:
> Interface 6 of this device speaks QMI as per tests done by us.
> Credits go to Antonella for providing the hardware.
>
> Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
> Signed-off-by: Antonella Pellizzari <anto.pellizzari83@gmail.com>
Tested-by: Dan Williams <dcbw@redhat.com>
> diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
> index 80a7104..d7c10d6 100644
> --- a/drivers/usb/serial/option.c
> +++ b/drivers/usb/serial/option.c
> @@ -1257,7 +1257,9 @@ static const struct usb_device_id option_ids[] = {
>
> { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD100) },
> { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD145) },
> - { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD200) },
> + { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD200),
> + .driver_info = (kernel_ulong_t)&net_intf6_blacklist
> + },
> { USB_DEVICE(CELOT_VENDOR_ID, CELOT_PRODUCT_CT680M) }, /* CT-650 CDMA 450 1xEVDO modem */
> { USB_DEVICE_AND_INTERFACE_INFO(SAMSUNG_VENDOR_ID, SAMSUNG_PRODUCT_GT_B3730, USB_CLASS_CDC_DATA, 0x00, 0x00) }, /* Samsung GT-B3730 LTE USB modem.*/
> { USB_DEVICE(YUGA_VENDOR_ID, YUGA_PRODUCT_CEM600) },
^ permalink raw reply
* Re: DomU's network interface will hung when Dom0 running 32bit
From: Wei Liu @ 2013-10-15 14:49 UTC (permalink / raw)
To: jianhai luan; +Cc: Wei Liu, Ian Campbell, xen-devel, netdev, ANNIE LI
In-Reply-To: <525D513B.70406@oracle.com>
On Tue, Oct 15, 2013 at 10:29:15PM +0800, jianhai luan wrote:
>
> On 2013-10-15 20:58, Wei Liu wrote:
> >On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
> >[...]
> >>>>>Can you propose a patch?
> >>>>Because credit_timeout.expire always after jiffies, i judge the
> >>>>value over the range of time_after_eq() by time_before(now,
> >>>>vif->credit_timeout.expires). please check the patch.
> >>>I don't think this really fix the issue for you. You still have chance
> >>>that now wraps around and falls between expires and next_credit. In that
> >>>case it's stalled again.
> >>if time_before(now, vif->credit_timeout.expires) is true, time wrap
> >>and do operation. Otherwise time_before(now,
> >>vif->credit_timeout.expires) isn't true, now -
> >>vif->credit_timeout.expires should be letter than ULONG_MAX/2.
> >>Because next_credit large than vif->credit_timeout.expires
> >>(next_crdit = vif->credit_timeout.expires +
> >>msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
> >>next_credit should be in range of time_after_eq(). So
> >>time_after_eq() do correctly judge.
> >>
> >Not sure I understand you. Consider "now" is placed like this:
> >
> > expires now next_credit
> > ----time increases this direction--->
> >
> >* time_after_eq(now, next_credit) -> false
> >* time_before(now, expires) -> false
>
> If now is placed in above environment, the result will be correct
> (Sending package will be not allowed until next_credit).
No, it is not necessarily correct. Keep in mind that "now" wraps around,
which is the issue you try to fix. You still have a window to stall your
frontend.
> * time_after_eq(now, next_credit) --> false will include two environment:
> expires now next_credit
> -----------time increases this direction ---->
>
> Or
> expires next_credit next_credit + MAX_LONG/2 now
> -----------time increases this direction ---->
>
>
> the first environment should be correct to control transmit. the
> second environment is our included environment.
>
> Jason
> >
> >Then it's stuck again. You're merely narrowing the window, not fixing
> >the real problem.
> >
> >Wei.
> >
> >>Jason
> >>>Wei.
^ permalink raw reply
* Re: [PATCH] net: qmi_wwan: Olivetti Olicard 200 support
From: Dan Williams @ 2013-10-15 14:49 UTC (permalink / raw)
To: Enrico Mioso
Cc: gregkh, davem, bjorn, christian.schmiedl, linux-usb, netdev,
linux-kernel, Antonella Pellizzari
In-Reply-To: <1381842408-10800-2-git-send-email-mrkiko.rs@gmail.com>
On Tue, 2013-10-15 at 15:06 +0200, Enrico Mioso wrote:
> This is a QMI device, manufactured by TCT Mobile Phones.
> A companion patch blacklisting this device's QMI interface in the option.c
> driver has been sent.
>
> Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
> Signed-off-by: Antonella Pellizzari <anto.pellizzari83@gmail.com>
Good find. For the record, mine has:
PX1522E16X 1 [Oct 15 2010 02:00:00]
ctl (1.4)
wds (1.8)
dms (1.3)
nas (1.2)
qos (1.2)
wms (1.1)
pds (1.4)
auth (1.0)
voice (1.0)
cat2 (1.1)
Tested-by: Dan Williams <dcbw@redhat.com>
> diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
> index 3d6aaf7..818ce90 100644
> --- a/drivers/net/usb/qmi_wwan.c
> +++ b/drivers/net/usb/qmi_wwan.c
> @@ -714,6 +714,7 @@ static const struct usb_device_id products[] = {
> {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */
> {QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */
> {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */
> + {QMI_FIXED_INTF(0x0b3c, 0xc005, 6)}, /* Olivetti Olicard 200 */
> {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */
>
> /* 4. Gobi 1000 devices */
^ permalink raw reply
* Re: DomU's network interface will hung when Dom0 running 32bit
From: Ian Campbell @ 2013-10-15 14:50 UTC (permalink / raw)
To: Wei Liu; +Cc: jianhai luan, xen-devel, netdev, ANNIE LI
In-Reply-To: <20131015144910.GS11739@zion.uk.xensource.com>
On Tue, 2013-10-15 at 15:49 +0100, Wei Liu wrote:
> On Tue, Oct 15, 2013 at 10:29:15PM +0800, jianhai luan wrote:
> >
> > On 2013-10-15 20:58, Wei Liu wrote:
> > >On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
> > >[...]
> > >>>>>Can you propose a patch?
> > >>>>Because credit_timeout.expire always after jiffies, i judge the
> > >>>>value over the range of time_after_eq() by time_before(now,
> > >>>>vif->credit_timeout.expires). please check the patch.
> > >>>I don't think this really fix the issue for you. You still have chance
> > >>>that now wraps around and falls between expires and next_credit. In that
> > >>>case it's stalled again.
> > >>if time_before(now, vif->credit_timeout.expires) is true, time wrap
> > >>and do operation. Otherwise time_before(now,
> > >>vif->credit_timeout.expires) isn't true, now -
> > >>vif->credit_timeout.expires should be letter than ULONG_MAX/2.
> > >>Because next_credit large than vif->credit_timeout.expires
> > >>(next_crdit = vif->credit_timeout.expires +
> > >>msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
> > >>next_credit should be in range of time_after_eq(). So
> > >>time_after_eq() do correctly judge.
> > >>
> > >Not sure I understand you. Consider "now" is placed like this:
> > >
> > > expires now next_credit
> > > ----time increases this direction--->
> > >
> > >* time_after_eq(now, next_credit) -> false
> > >* time_before(now, expires) -> false
> >
> > If now is placed in above environment, the result will be correct
> > (Sending package will be not allowed until next_credit).
>
> No, it is not necessarily correct. Keep in mind that "now" wraps around,
> which is the issue you try to fix. You still have a window to stall your
> frontend.
Remember that time_after_eq is supposed to work even with wraparound
occurring, so long as the two times are less than MAX_LONG/2 apart.
>
> > * time_after_eq(now, next_credit) --> false will include two environment:
> > expires now next_credit
> > -----------time increases this direction ---->
> >
> > Or
> > expires next_credit next_credit + MAX_LONG/2 now
> > -----------time increases this direction ---->
> >
> >
> > the first environment should be correct to control transmit. the
> > second environment is our included environment.
> >
> > Jason
> > >
> > >Then it's stuck again. You're merely narrowing the window, not fixing
> > >the real problem.
> > >
> > >Wei.
> > >
> > >>Jason
> > >>>Wei.
^ permalink raw reply
* [PATCH net-next 0/4] net/mlx4: Mellanox driver update 15-10-2013
From: Amir Vadai @ 2013-10-15 14:55 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eyal Perry, Amir Vadai
Hi Dave,
This patchset contains small code cleaning patches, and a patch to make
mlx4_core use module_request() in order to load the relevant link layer module
(mlx4_en or mlx4_ib) according to the port type.
Thanks,
Amir
Amir Vadai (1):
net/mlx4: Unused local variable in mlx4_opreq_action
Eyal Perry (1):
net/mlx4_core: Load higher level modules according to ports type
Or Gerlitz (2):
net/mlx4: Clean the code to eliminate trivial build warnings
net/mlx4: Fix typo, move similar defs to same location
drivers/net/ethernet/mellanox/mlx4/cmd.c | 2 --
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/fw.c | 4 +---
drivers/net/ethernet/mellanox/mlx4/main.c | 29 ++++++++++++++++++++++++++
drivers/net/ethernet/mellanox/mlx4/mcg.c | 6 +++---
drivers/net/ethernet/mellanox/mlx4/srq.c | 1 +
include/linux/mlx4/cmd.h | 6 ++----
include/linux/mlx4/device.h | 2 +-
8 files changed, 38 insertions(+), 14 deletions(-)
--
1.8.3.4
^ permalink raw reply
* [PATCH net-next 1/4] net/mlx4: Clean the code to eliminate trivial build warnings
From: Amir Vadai @ 2013-10-15 14:55 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eyal Perry, Amir Vadai, Or Gerlitz
In-Reply-To: <1381848924-18992-1-git-send-email-amirv@mellanox.com>
From: Or Gerlitz <ogerlitz@mellanox.com>
Remove code that triggers trivial build warnings.
drivers/net/ethernet/mellanox/mlx4/cmd.c: In function ‘mlx4_set_vf_vlan’:
drivers/net/ethernet/mellanox/mlx4/cmd.c:2256: warning: variable ‘vf_oper’ set but not used
drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_map_sw_to_hw_steering_mode’:
drivers/net/ethernet/mellanox/mlx4/mcg.c:648: warning: comparison of unsigned expression < 0 is always false
drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_map_sw_to_hw_steering_id’:
drivers/net/ethernet/mellanox/mlx4/mcg.c:685: warning: comparison of unsigned expression < 0 is always false
drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_hw_rule_sz’:
drivers/net/ethernet/mellanox/mlx4/mcg.c:712: warning: comparison of unsigned expression < 0 is always false
drivers/net/ethernet/mellanox/mlx4/fw.c: In function ‘mlx4_opreq_action’:
drivers/net/ethernet/mellanox/mlx4/fw.c:1732: warning: variable ‘type_m’ set but not used
drivers/net/ethernet/mellanox/mlx4/srq.c:302: warning: no previous prototype for ‘mlx4_srq_lookup’
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/cmd.c | 2 --
drivers/net/ethernet/mellanox/mlx4/mcg.c | 6 +++---
drivers/net/ethernet/mellanox/mlx4/srq.c | 1 +
3 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index ea20182..735765c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2253,7 +2253,6 @@ EXPORT_SYMBOL_GPL(mlx4_set_vf_mac);
int mlx4_set_vf_vlan(struct mlx4_dev *dev, int port, int vf, u16 vlan, u8 qos)
{
struct mlx4_priv *priv = mlx4_priv(dev);
- struct mlx4_vport_oper_state *vf_oper;
struct mlx4_vport_state *vf_admin;
int slave;
@@ -2269,7 +2268,6 @@ int mlx4_set_vf_vlan(struct mlx4_dev *dev, int port, int vf, u16 vlan, u8 qos)
return -EINVAL;
vf_admin = &priv->mfunc.master.vf_admin[slave].vport[port];
- vf_oper = &priv->mfunc.master.vf_oper[slave].vport[port];
if ((0 == vlan) && (0 == qos))
vf_admin->default_vlan = MLX4_VGT;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index 55f6245..70f0213 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -645,7 +645,7 @@ static const u8 __promisc_mode[] = {
int mlx4_map_sw_to_hw_steering_mode(struct mlx4_dev *dev,
enum mlx4_net_trans_promisc_mode flow_type)
{
- if (flow_type >= MLX4_FS_MODE_NUM || flow_type < 0) {
+ if (flow_type >= MLX4_FS_MODE_NUM) {
mlx4_err(dev, "Invalid flow type. type = %d\n", flow_type);
return -EINVAL;
}
@@ -681,7 +681,7 @@ const u16 __sw_id_hw[] = {
int mlx4_map_sw_to_hw_steering_id(struct mlx4_dev *dev,
enum mlx4_net_trans_rule_id id)
{
- if (id >= MLX4_NET_TRANS_RULE_NUM || id < 0) {
+ if (id >= MLX4_NET_TRANS_RULE_NUM) {
mlx4_err(dev, "Invalid network rule id. id = %d\n", id);
return -EINVAL;
}
@@ -706,7 +706,7 @@ static const int __rule_hw_sz[] = {
int mlx4_hw_rule_sz(struct mlx4_dev *dev,
enum mlx4_net_trans_rule_id id)
{
- if (id >= MLX4_NET_TRANS_RULE_NUM || id < 0) {
+ if (id >= MLX4_NET_TRANS_RULE_NUM) {
mlx4_err(dev, "Invalid network rule id. id = %d\n", id);
return -EINVAL;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/srq.c b/drivers/net/ethernet/mellanox/mlx4/srq.c
index 79fd269..9e08e35 100644
--- a/drivers/net/ethernet/mellanox/mlx4/srq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/srq.c
@@ -34,6 +34,7 @@
#include <linux/init.h>
#include <linux/mlx4/cmd.h>
+#include <linux/mlx4/srq.h>
#include <linux/export.h>
#include <linux/gfp.h>
--
1.8.3.4
^ permalink raw reply related
* [PATCH net-next 4/4] net/mlx4_core: Load higher level modules according to ports type
From: Amir Vadai @ 2013-10-15 14:55 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eyal Perry, Amir Vadai
In-Reply-To: <1381848924-18992-1-git-send-email-amirv@mellanox.com>
From: Eyal Perry <eyalpe@mellanox.com>
Mellanox ConnectX architecture is: mlx4_core is the lower level
PCI driver which register on the PCI id, and protocol specific drivers
are depended on it: mlx4_en - for Ethernet and mlx4_ib for Infiniband.
NIC could have multiple ports which can change their type dynamically.
We use the request_module() call to load the relevant protocol driver
when needed: on loading time or at port type change event.
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 60c9f4f..179d267 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -42,6 +42,7 @@
#include <linux/io-mapping.h>
#include <linux/delay.h>
#include <linux/netdevice.h>
+#include <linux/kmod.h>
#include <linux/mlx4/device.h>
#include <linux/mlx4/doorbell.h>
@@ -650,6 +651,27 @@ err_mem:
return err;
}
+static void mlx4_request_modules(struct mlx4_dev *dev)
+{
+ int port;
+ int has_ib_port = false;
+ int has_eth_port = false;
+#define EN_DRV_NAME "mlx4_en"
+#define IB_DRV_NAME "mlx4_ib"
+
+ for (port = 1; port <= dev->caps.num_ports; port++) {
+ if (dev->caps.port_type[port] == MLX4_PORT_TYPE_IB)
+ has_ib_port = true;
+ else if (dev->caps.port_type[port] == MLX4_PORT_TYPE_ETH)
+ has_eth_port = true;
+ }
+
+ if (has_ib_port)
+ request_module_nowait(IB_DRV_NAME);
+ if (has_eth_port)
+ request_module_nowait(EN_DRV_NAME);
+}
+
/*
* Change the port configuration of the device.
* Every user of this function must hold the port mutex.
@@ -681,6 +703,11 @@ int mlx4_change_port_types(struct mlx4_dev *dev,
}
mlx4_set_port_mask(dev);
err = mlx4_register_device(dev);
+ if (err) {
+ mlx4_err(dev, "Failed to register device\n");
+ goto out;
+ }
+ mlx4_request_modules(dev);
}
out:
@@ -2305,6 +2332,8 @@ slave_start:
if (err)
goto err_port;
+ mlx4_request_modules(dev);
+
mlx4_sense_init(dev);
mlx4_start_sense(dev);
--
1.8.3.4
^ permalink raw reply related
* [PATCH net-next 3/4] net/mlx4: Unused local variable in mlx4_opreq_action
From: Amir Vadai @ 2013-10-15 14:55 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eyal Perry, Amir Vadai
In-Reply-To: <1381848924-18992-1-git-send-email-amirv@mellanox.com>
Clean up warning added by commit fe6f700d "net/mlx4_core: Respond to
operation request by firmware".
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/fw.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index a377484..c151e7a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -1713,7 +1713,6 @@ void mlx4_opreq_action(struct work_struct *work)
u32 *outbox;
u32 modifier;
u16 token;
- u16 type_m;
u16 type;
int err;
u32 num_qps;
@@ -1746,7 +1745,6 @@ void mlx4_opreq_action(struct work_struct *work)
MLX4_GET(modifier, outbox, GET_OP_REQ_MODIFIER_OFFSET);
MLX4_GET(token, outbox, GET_OP_REQ_TOKEN_OFFSET);
MLX4_GET(type, outbox, GET_OP_REQ_TYPE_OFFSET);
- type_m = type >> 12;
type &= 0xfff;
switch (type) {
--
1.8.3.4
^ permalink raw reply related
* [PATCH net-next 2/4] net/mlx4: Fix typo, move similar defs to same location
From: Amir Vadai @ 2013-10-15 14:55 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eyal Perry, Amir Vadai, Or Gerlitz
In-Reply-To: <1381848924-18992-1-git-send-email-amirv@mellanox.com>
From: Or Gerlitz <ogerlitz@mellanox.com>
Small code cleanup:
1. change MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN to MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN
2. put MLX4_SET_PORT_PRIO2TC and MLX4_SET_PORT_SCHEDULER in the same union with the
other MLX4_SET_PORT_yyy
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +-
include/linux/mlx4/cmd.h | 6 ++----
include/linux/mlx4/device.h | 2 +-
4 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index fa37b7a..85d9166 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1733,7 +1733,7 @@ void mlx4_en_stop_port(struct net_device *dev, int detach)
/* Unregister Mac address for the port */
mlx4_en_put_qp(priv);
- if (!(mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN))
+ if (!(mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN))
mdev->mac_removed[priv->port] = 1;
/* Free RX Rings */
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 0d63daa..a377484 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -652,7 +652,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
QUERY_DEV_CAP_RSVD_LKEY_OFFSET);
MLX4_GET(field, outbox, QUERY_DEV_CAP_FW_REASSIGN_MAC);
if (field & 1<<6)
- dev_cap->flags2 |= MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN;
+ dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN;
MLX4_GET(dev_cap->max_icm_sz, outbox,
QUERY_DEV_CAP_MAX_ICM_SZ_OFFSET);
if (dev_cap->flags & MLX4_DEV_CAP_FLAG_COUNTERS)
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index cd1fdf7..8df61bc 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -154,10 +154,6 @@ enum {
MLX4_CMD_QUERY_IF_STAT = 0X54,
MLX4_CMD_SET_IF_STAT = 0X55,
- /* set port opcode modifiers */
- MLX4_SET_PORT_PRIO2TC = 0x8,
- MLX4_SET_PORT_SCHEDULER = 0x9,
-
/* register/delete flow steering network rules */
MLX4_QP_FLOW_STEERING_ATTACH = 0x65,
MLX4_QP_FLOW_STEERING_DETACH = 0x66,
@@ -182,6 +178,8 @@ enum {
MLX4_SET_PORT_VLAN_TABLE = 0x3,
MLX4_SET_PORT_PRIO_MAP = 0x4,
MLX4_SET_PORT_GID_TABLE = 0x5,
+ MLX4_SET_PORT_PRIO2TC = 0x8,
+ MLX4_SET_PORT_SCHEDULER = 0x9,
};
enum {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 24ce6bd..9ad0c18 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -155,7 +155,7 @@ enum {
MLX4_DEV_CAP_FLAG2_RSS_TOP = 1LL << 1,
MLX4_DEV_CAP_FLAG2_RSS_XOR = 1LL << 2,
MLX4_DEV_CAP_FLAG2_FS_EN = 1LL << 3,
- MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN = 1LL << 4,
+ MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN = 1LL << 4,
MLX4_DEV_CAP_FLAG2_TS = 1LL << 5,
MLX4_DEV_CAP_FLAG2_VLAN_CONTROL = 1LL << 6,
MLX4_DEV_CAP_FLAG2_FSM = 1LL << 7,
--
1.8.3.4
^ permalink raw reply related
* Re: [PATCH] net: sctp: fix a cacc_saw_newack missetting issue
From: Chang @ 2013-10-15 15:13 UTC (permalink / raw)
To: Vlad Yasevich, nhorman; +Cc: davem, linux-sctp, netdev, linux-kernel
In-Reply-To: <525D525E.8010508@gmail.com>
Thanks, I've got it and will submit a new patch later.
On 10/15/2013 04:34 PM, Vlad Yasevich wrote:
> On 10/15/2013 10:27 AM, Chang wrote:
>>
>> On 10/15/2013 04:11 PM, Vlad Yasevich wrote:
>>> On 10/14/2013 09:33 AM, Chang Xiangzhong wrote:
>>>> For for each TSN t being newly acked (Not only cumulatively,
>>>> but also SELECTIVELY) cacc_saw_newack should be set to 1.
>>>>
>>>> Signed-off-by: Xiangzhong Chang <changxiangzhong@gmail.com>
>>>> ---
>>>> net/sctp/outqueue.c | 42
>>>> +++++++++++++++++++++---------------------
>>>> 1 file changed, 21 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
>>>> index 94df758..d86032b 100644
>>>> --- a/net/sctp/outqueue.c
>>>> +++ b/net/sctp/outqueue.c
>>>> @@ -1398,6 +1398,27 @@ static void sctp_check_transmitted(struct
>>>> sctp_outq *q,
>>>> forward_progress = true;
>>>> }
>>>>
>>>> + if (!tchunk->tsn_gap_acked) {
>>>
>>> You can remove this test since the block just above already performs
>>> it. Just fold this code into the block above.
>>>
>>> -vlad
>>>
>> Sorry, I'm not sure if I fully understand you. There are code blocks
>> which checking the tchunk->tsn_gap_acked. In addition, they check other
>> states as well.
>
> The flow is:
>
> if (sctp_acked(sack, tsn)) {
> ...
> if (transport) {
> ....
> }
>
> if (!tchunk->tsn_gap_acked) {
> ....
> }
>
> if (TSN_lte(tsn, sack_ctsn)) {
> ....
> /* SFR-CACC ...
> }
>
> Since you are moving this up, you can simply re-use
> the if (!tchunk->tsn_gap_acked) immediately above.
>
>>>> + /*
>>>> + * SFR-CACC algorithm:
>>>> + * 2) If the SACK contains gap acks
>>>> + * and the flag CHANGEOVER_ACTIVE is
>>>> + * set the receiver of the SACK MUST
>>>> + * take the following action:
>>>> + *
>>>> + * B) For each TSN t being acked that
>>>> + * has not been acked in any SACK so
>>>> + * far, set cacc_saw_newack to 1 for
>>>> + * the destination that the TSN was
>>>> + * sent to.
>>>> + */
>>>> + if (transport &&
>>>> + sack->num_gap_ack_blocks &&
>>>> + q->asoc->peer.primary_path->cacc.
>>>> + changeover_active)
>>>> + transport->cacc.cacc_saw_newack = 1;
> ^^^^
>
> Don't need that many spaces...
>
> -vlad
>>>> + }
>>>> +
>>>> if (TSN_lte(tsn, sack_ctsn)) {
>>>> /* RFC 2960 6.3.2 Retransmission Timer Rules
>>>> *
>>>> @@ -1411,27 +1432,6 @@ static void sctp_check_transmitted(struct
>>>> sctp_outq *q,
>>>> restart_timer = 1;
>>>> forward_progress = true;
>>>>
>>>> - if (!tchunk->tsn_gap_acked) {
>>>> - /*
>>>> - * SFR-CACC algorithm:
>>>> - * 2) If the SACK contains gap acks
>>>> - * and the flag CHANGEOVER_ACTIVE is
>>>> - * set the receiver of the SACK MUST
>>>> - * take the following action:
>>>> - *
>>>> - * B) For each TSN t being acked that
>>>> - * has not been acked in any SACK so
>>>> - * far, set cacc_saw_newack to 1 for
>>>> - * the destination that the TSN was
>>>> - * sent to.
>>>> - */
>>>> - if (transport &&
>>>> - sack->num_gap_ack_blocks &&
>>>> - q->asoc->peer.primary_path->cacc.
>>>> - changeover_active)
>>>> - transport->cacc.cacc_saw_newack
>>>> - = 1;
>>>> - }
>>>>
>>>> list_add_tail(&tchunk->transmitted_list,
>>>> &q->sacked);
>>>>
>>>
>>
>
^ permalink raw reply
* Re: DomU's network interface will hung when Dom0 running 32bit
From: jianhai luan @ 2013-10-15 15:19 UTC (permalink / raw)
To: Ian Campbell, Wei Liu; +Cc: xen-devel, netdev, ANNIE LI
In-Reply-To: <1381848632.21901.42.camel@kazak.uk.xensource.com>
On 2013-10-15 22:50, Ian Campbell wrote:
> On Tue, 2013-10-15 at 15:49 +0100, Wei Liu wrote:
>> On Tue, Oct 15, 2013 at 10:29:15PM +0800, jianhai luan wrote:
>>> On 2013-10-15 20:58, Wei Liu wrote:
>>>> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
>>>> [...]
>>>>>>>> Can you propose a patch?
>>>>>>> Because credit_timeout.expire always after jiffies, i judge the
>>>>>>> value over the range of time_after_eq() by time_before(now,
>>>>>>> vif->credit_timeout.expires). please check the patch.
>>>>>> I don't think this really fix the issue for you. You still have chance
>>>>>> that now wraps around and falls between expires and next_credit. In that
>>>>>> case it's stalled again.
>>>>> if time_before(now, vif->credit_timeout.expires) is true, time wrap
>>>>> and do operation. Otherwise time_before(now,
>>>>> vif->credit_timeout.expires) isn't true, now -
>>>>> vif->credit_timeout.expires should be letter than ULONG_MAX/2.
>>>>> Because next_credit large than vif->credit_timeout.expires
>>>>> (next_crdit = vif->credit_timeout.expires +
>>>>> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
>>>>> next_credit should be in range of time_after_eq(). So
>>>>> time_after_eq() do correctly judge.
>>>>>
>>>> Not sure I understand you. Consider "now" is placed like this:
>>>>
>>>> expires now next_credit
>>>> ----time increases this direction--->
>>>>
>>>> * time_after_eq(now, next_credit) -> false
>>>> * time_before(now, expires) -> false
>>> If now is placed in above environment, the result will be correct
>>> (Sending package will be not allowed until next_credit).
>> No, it is not necessarily correct. Keep in mind that "now" wraps around,
>> which is the issue you try to fix. You still have a window to stall your
>> frontend.
> Remember that time_after_eq is supposed to work even with wraparound
> occurring, so long as the two times are less than MAX_LONG/2 apart.
Sorry for my misunderstand explanation. I mean that
* time_after_eq()/time_before_eq() fix the jiffies wraparound, so
please think about jiffies in line increasing.
* time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2),
the judge will be wrong if out of the range.
So please think about three kind environment
- expires now next_credit
--------time increases this direction ---------->
- expires [next_credit now next_credit+MAX_LONG/2
--------time increase this direction ----------->
- expires next_credit next_credit+MAX_LONG/2 now
--------time increadse this direction ---------->
The first environment should be netfront consume all credit_byte before
next_credit, So we should pending one timer to calculator the new
credit_byte, and don't transmit until next_credit.
the second environment should be calculator the credit_byte because
netfront don't consume all credit_byte before next_credit, and
time_after_eq() do correct judge.
the third environment should be calculator in time because netfront
don't consume all credit_byte until next_credit.But time_after_eq do
error judge (time_after_eq(now, next_credit) is false), so the
remaining_byte isn't be increased.
and I work on the third environment. You know now >
next_credit+MAX_LONG/2, time_before(now, expire) should be
true(time_before(now, expire) is false in first environment)
>
>>> * time_after_eq(now, next_credit) --> false will include two environment:
>>> expires now next_credit
>>> -----------time increases this direction ---->
>>>
>>> Or
>>> expires next_credit next_credit + MAX_LONG/2 now
>>> -----------time increases this direction ---->
>>>
>>>
>>> the first environment should be correct to control transmit. the
>>> second environment is our included environment.
>>>
>>> Jason
>>>> Then it's stuck again. You're merely narrowing the window, not fixing
>>>> the real problem.
>>>>
>>>> Wei.
>>>>
>>>>> Jason
>>>>>> Wei.
>
^ permalink raw reply
* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Alexander Gordeev @ 2013-10-15 15:30 UTC (permalink / raw)
To: Mark Lord
Cc: H. Peter Anvin, Benjamin Herrenschmidt, linux-kernel,
Bjorn Helgaas, Ralf Baechle, Michael Ellerman, Martin Schwidefsky,
Ingo Molnar, Tejun Heo, Dan Williams, Andy King, Jon Mason,
Matt Porter, linux-pci, linux-mips, linuxppc-dev, linux390,
linux-s390, x86, linux-ide, iss_storagedev, linux-nvme,
linux-rdma, netdev, e1000-dev
In-Reply-To: <52585FB3.7080508@start.ca>
On Fri, Oct 11, 2013 at 04:29:39PM -0400, Mark Lord wrote:
> > static int xx_alloc_msix_irqs(struct xx_dev *dev, int nvec)
> > {
> > nvec = roundup_pow_of_two(nvec); /* assume 0 > nvec <= 16 */
> >
> > xx_disable_all_irqs(dev);
> >
> > pci_lock_msi(dev->pdev);
> >
> > rc = pci_get_msix_limit(dev->pdev, nvec);
> > if (rc < 0)
> > goto err;
> >
> > nvec = min(nvec, rc); /* if limit is more than requested */
> > nvec = rounddown_pow_of_two(nvec); /* (a) */
> >
> > xx_prep_for_msix_vectors(dev, nvec);
> >
> > rc = pci_enable_msix(dev->pdev, dev->irqs, nvec); /* (b) */
> > if (rc < 0)
> > goto err;
> >
> > pci_unlock_msi(dev->pdev);
> >
> > dev->num_vectors = nvec; /* (b) */
> > return 0;
> >
> > err:
> > pci_unlock_msi(dev->pdev);
> >
> > kerr(dev->name, "pci_enable_msix() failed, err=%d", rc);
> > dev->num_vectors = 0;
> > return rc;
> > }
>
> That would still need a loop, to handle the natural race between
> the calls to pci_get_msix_limit() and pci_enable_msix() -- the driver and device
> can and should fall back to a smaller number of vectors when pci_enable_msix() fails.
Could you please explain why the value returned by pci_get_msix_limit()
might change before pci_enable_msix() returned, while both protected by
pci_lock_msi()?
Anyway, although the loop-free code (IMHO) reads better, pci_lock_msi()
it is not a part of the original proposal and the more I think about it
the less I like it.
--
Regards,
Alexander Gordeev
agordeev@redhat.com
^ permalink raw reply
* Re: [PATCH v3 net-next] openvswitch: fix vport-netdev unregister
From: Jesse Gross @ 2013-10-15 15:31 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: David S. Miller, Pravin B Shelar, Jiri Pirko, Cong Wang,
dev@openvswitch.org, netdev
In-Reply-To: <1381722652-3689-1-git-send-email-ast@plumgrid.com>
On Sun, Oct 13, 2013 at 8:50 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
> index c323567..ffa429a 100644
> --- a/net/openvswitch/dp_notify.c
> +++ b/net/openvswitch/dp_notify.c
> @@ -59,15 +59,9 @@ void ovs_dp_notify_wq(struct work_struct *work)
> struct hlist_node *n;
>
> hlist_for_each_entry_safe(vport, n, &dp->ports[i], dp_hash_node) {
> - struct netdev_vport *netdev_vport;
> -
> if (vport->ops->type != OVS_VPORT_TYPE_NETDEV)
> continue;
> -
> - netdev_vport = netdev_vport_priv(vport);
> - if (netdev_vport->dev->reg_state == NETREG_UNREGISTERED ||
> - netdev_vport->dev->reg_state == NETREG_UNREGISTERING)
> - dp_detach_port_notify(vport);
> + dp_detach_port_notify(vport);
Doesn't this free *all* ports of type OVS_VPORT_TYPE_NETDEV when any
one of them is removed?
^ permalink raw reply
* Re: [PATCH RFC 5/5] net: macb: Adjust tx_clk when link speed changes
From: Sören Brinkmann @ 2013-10-15 15:34 UTC (permalink / raw)
To: Michal Simek; +Cc: Nicolas Ferre, netdev, David Miller, linux-kernel
In-Reply-To: <dd225b7c-ee2a-4747-9a50-55b77a114376@DB9EHSMHS006.ehs.local>
On Tue, Oct 15, 2013 at 09:58:09AM +0200, Michal Simek wrote:
> On 10/15/2013 09:54 AM, Nicolas Ferre wrote:
> > On 15/10/2013 01:59, Soren Brinkmann :
> >> Adjust the ethernet clock according to the negotiated link speed.
> >>
> >> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
> >
> > I will need more time to study this one.
> >
> > Moreover, I will have to add the "tx_clk" to every user of this driver before switchin to the addition of this clock.
>
> As I am reading this patch, Soren just protected this
> case that if this clk is not specified then it is not used.
That is how I sketched things in this patch. But as I said, I'm not
fully convinced this approach fits all or is the best. So, if anybody
has a better approach, let us know.
Sören
^ permalink raw reply
* Re: [PATCH net-next] 8390 ei_debug : Reenable the use of debugging in 8390 based chips
From: Matthew Whitehead @ 2013-10-15 15:35 UTC (permalink / raw)
To: netdev
In-Reply-To: <1381782794-11334-1-git-send-email-tedheadster@gmail.com>
Dave,
please decline this patch set and instead use the later one posted with subject
"[net-next REPOST] 8390 ei_debug : Reenable the use of debugging in 8390 based chips"
- Matthew
^ permalink raw reply
* Re: kernel policy routing table src ip not respected since 2.6.37 and commit 9fc3bbb4a752
From: Vincent Li @ 2013-10-15 16:02 UTC (permalink / raw)
To: Julian Anastasov; +Cc: netdev@vger.kernel.org, Joel Sing
In-Reply-To: <alpine.LFD.2.03.1310151144580.1562@ssi.bg>
thanks for the clue, the arp indeed is from 10.1.1.2 in my test and i
made 10.1.1.9 ip reachable and tcpdump on 10.1.1.9 indeed show
sourcing from ip 10.1.1.2:
08:48:41.576588 IP 10.1.1.2 > 10.1.1.9: ICMP echo request, id 6972,
seq 1, length 64
08:48:41.576614 IP 10.1.1.9 > 10.1.1.2: ICMP echo reply, id 6972, seq
1, length 64
08:48:42.576909 IP 10.1.1.2 > 10.1.1.9: ICMP echo request, id 6972,
seq 2, length 64
08:48:42.576932 IP 10.1.1.9 > 10.1.1.2: ICMP echo reply, id 6972, seq
2, length 64
it is strange though when 10.1.1.9 is unreachable address and the ping
utility reports error 'Destination Host Unreachable' with source
10.1.1.1. before 2.6.37, it reports 10.1.1..2
the ping utility is standard ping command from centos6.4 and I am
running centos6.4 on KVM, here is strace
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(1025),
sin_addr=inet_addr("10.1.1.9")}, 16) = 0
getsockname(4, {sa_family=AF_INET, sin_port=htons(49991),
sin_addr=inet_addr("10.1.1.2")}, [16]) = 0
close(4) = 0
setsockopt(3, SOL_RAW, ICMP_FILTER,
~(ICMP_ECHOREPLY|ICMP_DEST_UNREACH|ICMP_SOURCE_QUENCH|ICMP_REDIRECT|ICMP_TIME_EXCEEDED|ICMP_PARAMETERPROB),
4) = 0
setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [324], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [4851439803083915264], [4]) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f0b4332e000
write(1, "PING 10.1.1.9 (10.1.1.9) 56(84) "..., 47PING 10.1.1.9
(10.1.1.9) 56(84) bytes of data.
) = 47
setsockopt(3, SOL_SOCKET, SO_TIMESTAMP, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDTIMEO,
"\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO,
"\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
getpid() = 15633
rt_sigaction(SIGINT, {0x7f0b43337a40, [], SA_RESTORER|SA_INTERRUPT,
0x7f0b42b7e920}, NULL, 8) = 0
rt_sigaction(SIGALRM, {0x7f0b43337a40, [], SA_RESTORER|SA_INTERRUPT,
0x7f0b42b7e920}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {0x7f0b43337a50, [], SA_RESTORER|SA_INTERRUPT,
0x7f0b42b7e920}, NULL, 8) = 0
gettimeofday({1381852599, 797511}, NULL) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=71, ws_col=158, ws_xpixel=0, ws_ypixel=0}) = 0
gettimeofday({1381852599, 797672}, NULL) = 0
gettimeofday({1381852599, 797708}, NULL) = 0
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("10.1.1.9")},
msg_iov(1)=[{"\10\0\373\n\21=\0\1\267e]R\0\0\0\0\f,\f\0\0\0\0\0\20\21\22\23\24\25\26\27"...,
64}], msg_controllen=0, msg_flags=0}, 0) = 64
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={10, 0}}, NULL) = 0
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("127.0.0.1")},
msg_iov(1)=[{"E\300\0D\237\210\0\0@\1\334n\177\0\0\1\177\0\0\1\3\3v9\0\0\0\0E\0\0("...,
192}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET,
cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 68
setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER,
"\10\0\0\0\0\0\0\0\300\305SC\v\177\0\0", 16) = 0
recvmsg(3, 0x7fffccde57b0, MSG_DONTWAIT) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, 0x7fffccde57b0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, 0x7fffccde57b0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("10.1.1.1")},
msg_iov(1)=[{"E\300\0pd{\0\0@\1\377M\n\1\1\1\n\1\1\2\3\1\374\376\0\0\0\0E\0\0T"...,
192}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET,
cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 112
recvmsg(3, 0x7fffccde57b0, 0) = -1 EHOSTUNREACH (No route to host)
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("10.1.1.9")}, msg_iov(1)=[{"\10\0\373\n\21=\0\1",
8}], msg_controllen=80, {cmsg_len=32, cmsg_level=SOL_SOCKET,
cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=MSG_TRUNC|MSG_ERRQUEUE},
MSG_ERRQUEUE|MSG_DONTWAIT) = 8
setsockopt(3, SOL_RAW, ICMP_FILTER,
~(ICMP_ECHOREPLY|ICMP_SOURCE_QUENCH|ICMP_REDIRECT), 4) = 0
write(1, "From 10.1.1.1 icmp_seq=1 Destina"..., 54From 10.1.1.1
icmp_seq=1 Destination Host Unreachable
) = 54
gettimeofday({1381852602, 805123}, NULL) = 0
write(1, "\n", 1
) = 1
write(1, "--- 10.1.1.9 ping statistics ---"..., 33--- 10.1.1.9 ping
statistics ---
) = 33
write(1, "1 packets transmitted, 0 receive"..., 761 packets
transmitted, 0 received, +1 errors, 100% packet loss, time 3007ms
) = 76
write(1, "\n", 1
) = 1
exit_group(1) = ?
I
On Tue, Oct 15, 2013 at 1:51 AM, Julian Anastasov <ja@ssi.bg> wrote:
>
> Hello,
>
> On Mon, 14 Oct 2013, Vincent Li wrote:
>
>> I had a simple bash script to test if the policy routing table src ip
>> is respected or not, git bisect found the commit 9fc3bbb4a752 to
>> change the policy routing table source ip behavior.
>>
>> commit 9fc3bbb4a752f108cf096d96640f3b548bbbce6c
>> Author: Joel Sing <jsing@google.com>
>> Date: Mon Jan 3 20:24:20 2011 +0000
>>
>> ipv4/route.c: respect prefsrc for local routes
>>
>> The preferred source address is currently ignored for local routes,
>> which results in all local connections having a src address that is the
>> same as the local dst address. Fix this by respecting the preferred source
>> address when it is provided for local routes.
>>
>> test script:
>>
>> #!/bin/bash
>> ip addr add 10.1.1.1/24 dev eth0
>> ip addr add 10.1.1.2/24 dev eth0
>> ip rule add priority 245 table 245
>> ip route add 10.1.1.0/24 dev eth0 proto kernel scope link src
>> 10.1.1.2 table 245 <===source ip 10.1.1.2 to be preferred
>>
>> ip addr show dev eth0
>> ip route list table main
>> ip route list table 245
>>
>>
>> tcpdump -nn -i eth0 host 10.1.1.9 and icmp &
>>
>> ping 10.1.1.9
>>
>>
>>
>> --before commit 9fc3bbb4a752
>>
>> the source is from ip 10.1.1.2 as expected
>>
>> --after commit 9fc3bbb4a752
>>
>> the source is from ip 10.1.1.1 which not expected since I have high
>> priority table 245 with source ip 10.1.1.2
>>
>> is this regression of commit 9fc3bbb4a752 ?
>
> Hm, it works here on 3.11.3. ARP request uses
> 10.1.1.2 and ICMP packet has such source. May be something with
> the ping tool you are using? Check 'strace ping -c 1 10.1.1.9', may
> be it binds to first device IP?
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: DomU's network interface will hung when Dom0 running 32bit
From: Wei Liu @ 2013-10-15 16:03 UTC (permalink / raw)
To: jianhai luan; +Cc: Ian Campbell, Wei Liu, xen-devel, netdev, ANNIE LI
In-Reply-To: <525D5D0E.5070107@oracle.com>
On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote:
[...]
> >>>>
> >>>>* time_after_eq(now, next_credit) -> false
> >>>>* time_before(now, expires) -> false
> >>>If now is placed in above environment, the result will be correct
> >>>(Sending package will be not allowed until next_credit).
> >>No, it is not necessarily correct. Keep in mind that "now" wraps around,
> >>which is the issue you try to fix. You still have a window to stall your
> >>frontend.
> >Remember that time_after_eq is supposed to work even with wraparound
> >occurring, so long as the two times are less than MAX_LONG/2 apart.
>
> Sorry for my misunderstand explanation. I mean that
> * time_after_eq()/time_before_eq() fix the jiffies wraparound, so
> please think about jiffies in line increasing.
> * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2),
> the judge will be wrong if out of the range.
>
> So please think about three kind environment
> - expires now next_credit
> --------time increases this direction ---------->
>
> - expires [next_credit now next_credit+MAX_LONG/2
> --------time increase this direction ----------->
>
> - expires next_credit next_credit+MAX_LONG/2 now
> --------time increadse this direction ---------->
>
> The first environment should be netfront consume all credit_byte
> before next_credit, So we should pending one timer to calculator the
> new credit_byte, and don't transmit until next_credit.
>
> the second environment should be calculator the credit_byte because
> netfront don't consume all credit_byte before next_credit, and
> time_after_eq() do correct judge.
>
> the third environment should be calculator in time because netfront
> don't consume all credit_byte until next_credit.But time_after_eq do
> error judge (time_after_eq(now, next_credit) is false), so the
> remaining_byte isn't be increased.
>
> and I work on the third environment. You know now >
> next_credit+MAX_LONG/2, time_before(now, expire) should be
> true(time_before(now, expire) is false in first environment)
> >
Thanks for staighten this out for me. I'm just too dumb for this, please
be patient with me. :-)
Could you prove that time_before(now, expire) is always true in third
case? That's where my main cencern lies. Is it because msecs_to_jiffies
always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most?
Wei.
^ permalink raw reply
* Re: [PATCH] net: qmi_wwan: Olivetti Olicard 200 support
From: Enrico Mioso @ 2013-10-15 16:07 UTC (permalink / raw)
To: Dan Williams
Cc: gregkh, davem, bjorn, christian.schmiedl, linux-usb, netdev,
linux-kernel, Antonella Pellizzari
In-Reply-To: <1381848597.25397.4.camel@dcbw.foobar.com>
:) I'm very happy you got it working.
The firmware of our device seems so fragile still - and several QMI calls can
bring it to a crashing state, especially when asking a network scan to the NAS
service.
On Tue, 15 Oct 2013, Dan Williams wrote:
==Date: Tue, 15 Oct 2013 09:49:57 -0500
==From: Dan Williams <dcbw@redhat.com>
==To: Enrico Mioso <mrkiko.rs@gmail.com>
==Cc: gregkh@linuxfoundation.org, davem@davemloft.net, bjorn@mork.no,
== christian.schmiedl@gemalto.com, linux-usb@vger.kernel.org,
== netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
== Antonella Pellizzari <anto.pellizzari83@gmail.com>
==Subject: Re: [PATCH] net: qmi_wwan: Olivetti Olicard 200 support
==
==On Tue, 2013-10-15 at 15:06 +0200, Enrico Mioso wrote:
==> This is a QMI device, manufactured by TCT Mobile Phones.
==> A companion patch blacklisting this device's QMI interface in the option.c
==> driver has been sent.
==>
==> Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
==> Signed-off-by: Antonella Pellizzari <anto.pellizzari83@gmail.com>
==
==Good find. For the record, mine has:
==
==PX1522E16X 1 [Oct 15 2010 02:00:00]
==
== ctl (1.4)
== wds (1.8)
== dms (1.3)
== nas (1.2)
== qos (1.2)
== wms (1.1)
== pds (1.4)
== auth (1.0)
== voice (1.0)
== cat2 (1.1)
==
==Tested-by: Dan Williams <dcbw@redhat.com>
==
==> diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
==> index 3d6aaf7..818ce90 100644
==> --- a/drivers/net/usb/qmi_wwan.c
==> +++ b/drivers/net/usb/qmi_wwan.c
==> @@ -714,6 +714,7 @@ static const struct usb_device_id products[] = {
==> {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */
==> {QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */
==> {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */
==> + {QMI_FIXED_INTF(0x0b3c, 0xc005, 6)}, /* Olivetti Olicard 200 */
==> {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */
==>
==> /* 4. Gobi 1000 devices */
==
==
==
^ permalink raw reply
* getting lldp DCB_CMD_IEEE_GET after DCB_CMD_GCAP fails
From: Olaf Hering @ 2013-10-15 16:18 UTC (permalink / raw)
To: netdev
If this list is the wrong place, please point me to the right direction.
In the example code below the part which grabs DCB_CMD_IEEE_GET fails,
unless the "hack" part is executed. What I see inside the kernel is that
__netlink_dump_start gets to the err=-EBUSY case because nlk->cb is
still set. The nl_ack_handler is executed, an "empty" error is returned.
I have compared my code with open-lldp-0.9.46/test/nltest.c and did not
spot the difference.
Is the code below supposed to work anyway? Thanks for any help.
Olaf
/* cc -lnl lldptest.c -o lldptest */
#include <stdio.h>
#include <netlink/msg.h>
#include <linux/dcbnl.h>
static int nl_dump_valid(struct nl_msg *msg, void *p)
{
struct nlmsghdr *nlh = p;
nlh = nlmsg_hdr(msg);
printf("%s: %x\n", __func__, nlh->nlmsg_len);
return NL_OK;
}
static int nl_ack_handler(struct nl_msg *msg, void *arg)
{
arg = msg;
printf("%s: %p\n", __func__, arg);
return NL_STOP;
}
static int nl_error_handler(struct sockaddr_nl *sender, struct nlmsgerr *err, void *arg)
{
char *s = arg;
printf("%s: %p %s %x\n", __func__, sender, s, err->error);
return NL_STOP;
}
int main(int argc, char *argv[])
{
struct nl_handle *nl_handle;
struct nl_cb *nl_cb, *tmp_cb;
struct nl_msg *msg;
struct nlattr *nla;
struct dcbmsg dcb = {.dcb_family = AF_UNSPEC, };
char *ifname;
int hack;
int protocol = 0;
int ret = 1;
if (argc < 2) {
printf("Usage: %s <ifname> [hack]\n", argv[0]);
goto out;
}
ifname = argv[1];
hack = !!argv[2];
nl_cb = nl_cb_alloc(NL_CB_DEFAULT);
if (!nl_cb) {
perror("nl_cb_alloc");
goto out;
}
nl_handle = nl_handle_alloc_cb(nl_cb);
if (nl_connect(nl_handle, protocol) < 0) {
perror("nl_connect");
goto out;
}
tmp_cb = nl_cb_clone(nl_cb);
dcb.cmd = DCB_CMD_GCAP;
nl_cb_err(tmp_cb, NL_CB_CUSTOM, nl_error_handler, "DCB_CMD_GCAP");
nl_cb_set(tmp_cb, NL_CB_ACK, NL_CB_CUSTOM, nl_ack_handler, NULL);
nl_cb_set(tmp_cb, NL_CB_VALID, NL_CB_CUSTOM, nl_dump_valid, NULL);
msg = nlmsg_alloc_simple(RTM_GETDCB, NLM_F_REQUEST);
if (!msg) {
perror("nlmsg_alloc_simple");
goto out;
}
if (nlmsg_append(msg, &dcb, sizeof(dcb), NLMSG_ALIGNTO) < 0) {
perror("nlmsg_append");
goto out;
}
NLA_PUT_STRING(msg, DCB_ATTR_IFNAME, ifname);
nla = nla_nest_start(msg, DCB_ATTR_CAP);
NLA_PUT_FLAG(msg, DCB_CAP_ATTR_ALL);
nla_nest_end(msg, nla);
if (nl_send_auto_complete(nl_handle, msg) < 0) {
perror("nl_send_auto_complete");
goto out;
}
if ((nl_recvmsgs(nl_handle, tmp_cb)) < 0) {
perror("nl_recvmsgs");
goto out;
}
nl_cb_put(tmp_cb);
nlmsg_free(msg);
if (hack) {
nl_close(nl_handle);
nl_handle = nl_handle_alloc_cb(nl_cb);
if (nl_connect(nl_handle, protocol) < 0) {
perror("nl_connect");
goto out;
}
}
tmp_cb = nl_cb_clone(nl_cb);
dcb.cmd = DCB_CMD_IEEE_GET;
nl_cb_err(tmp_cb, NL_CB_CUSTOM, nl_error_handler, "DCB_CMD_IEEE_GET");
nl_cb_set(tmp_cb, NL_CB_ACK, NL_CB_CUSTOM, nl_ack_handler, NULL);
nl_cb_set(tmp_cb, NL_CB_VALID, NL_CB_CUSTOM, nl_dump_valid, NULL);
msg = nlmsg_alloc_simple(RTM_GETDCB, NLM_F_REQUEST);
if (!msg) {
perror("nlmsg_alloc_simple");
goto out;
}
if (nlmsg_append(msg, &dcb, sizeof(dcb), NLMSG_ALIGNTO) < 0) {
perror("nlmsg_append");
goto out;
}
NLA_PUT_STRING(msg, DCB_ATTR_IFNAME, ifname);
if (nl_send_auto_complete(nl_handle, msg) < 0) {
perror("nl_send_auto_complete");
goto out;
}
if ((nl_recvmsgs(nl_handle, tmp_cb)) < 0) {
perror("nl_recvmsgs");
goto out;
}
nl_cb_put(tmp_cb);
nlmsg_free(msg);
ret = 0;
nla_put_failure:
out:
return ret;
}
^ permalink raw reply
* Re: DomU's network interface will hung when Dom0 running 32bit
From: jianhai luan @ 2013-10-15 16:23 UTC (permalink / raw)
To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev, ANNIE LI
In-Reply-To: <20131015160336.GT11739@zion.uk.xensource.com>
On 2013-10-16 0:03, Wei Liu wrote:
> On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote:
> [...]
>>>>>> * time_after_eq(now, next_credit) -> false
>>>>>> * time_before(now, expires) -> false
>>>>> If now is placed in above environment, the result will be correct
>>>>> (Sending package will be not allowed until next_credit).
>>>> No, it is not necessarily correct. Keep in mind that "now" wraps around,
>>>> which is the issue you try to fix. You still have a window to stall your
>>>> frontend.
>>> Remember that time_after_eq is supposed to work even with wraparound
>>> occurring, so long as the two times are less than MAX_LONG/2 apart.
>> Sorry for my misunderstand explanation. I mean that
>> * time_after_eq()/time_before_eq() fix the jiffies wraparound, so
>> please think about jiffies in line increasing.
>> * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2),
>> the judge will be wrong if out of the range.
>>
>> So please think about three kind environment
>> - expires now next_credit
>> --------time increases this direction ---------->
>>
>> - expires [next_credit now next_credit+MAX_LONG/2
>> --------time increase this direction ----------->
>>
>> - expires next_credit next_credit+MAX_LONG/2 now
>> --------time increadse this direction ---------->
>>
>> The first environment should be netfront consume all credit_byte
>> before next_credit, So we should pending one timer to calculator the
>> new credit_byte, and don't transmit until next_credit.
>>
>> the second environment should be calculator the credit_byte because
>> netfront don't consume all credit_byte before next_credit, and
>> time_after_eq() do correct judge.
>>
>> the third environment should be calculator in time because netfront
>> don't consume all credit_byte until next_credit.But time_after_eq do
>> error judge (time_after_eq(now, next_credit) is false), so the
>> remaining_byte isn't be increased.
>>
>> and I work on the third environment. You know now >
>> next_credit+MAX_LONG/2, time_before(now, expire) should be
>> true(time_before(now, expire) is false in first environment)
> Thanks for staighten this out for me. I'm just too dumb for this, please
> be patient with me. :-)
>
> Could you prove that time_before(now, expire) is always true in third
> case? That's where my main cencern lies. Is it because msecs_to_jiffies
> always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most?
I have wrong judge in third environment. If now large than expires +
MAX_UNLONG, time_before(now, expires) will be false.
expires next_credit next_credit+MAX_UNLONG/2 expires +
MAX_UNLONG now next_credit+MAX_UNLONG
--------------------------------------------------------- time
increadse this direction ---------------------------------->
In the above environment, time_before(now, expires) will return
false. But the jiffies elapsed more time and next_credit will be
reachable in soon(time_after_eq(now, next_credit) will be true).
>
> Wei.
^ permalink raw reply
* Re: [PATCH 02/18] net: use wrapper functions of net_ratelimit() to simplify code
From: Joe Perches @ 2013-10-15 16:24 UTC (permalink / raw)
To: Kefeng Wang
Cc: linux-kernel, Greg Kroah-Hartman, David S. Miller,
Pablo Neira Ayuso, Stephen Hemminger, Johannes Berg,
John W. Linville, Stanislaw Gruszka, Johannes Berg,
Francois Romieu, Ben Hutchings, Chas Williams, Marc Kleine-Budde,
Samuel Ortiz, Paul Mackerras, Oliver Neukum,
Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
Rusty Russell, Michael S. Tsirkin, netfilter
In-Reply-To: <1381837514-50660-3-git-send-email-wangkefeng.wang@huawei.com>
On Tue, 2013-10-15 at 19:44 +0800, Kefeng Wang wrote:
> Wrapper functions net_ratelimited_function() and net_XXX_ratelimited()
> are called to simplify code.
[]
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
[]
> @@ -465,10 +465,8 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
> if (likely(fdb)) {
> /* attempt to update an entry for a local interface */
> if (unlikely(fdb->is_local)) {
> - if (net_ratelimit())
> - br_warn(br, "received packet on %s with "
> - "own address as source address\n",
> - source->dev->name);
> + net_ratelimited_function(br_warn, br, "received packet on %s "
> + "with own address as source address\n", source->dev->name);
Hello Kefeng.
When these types of lines are changed, please coalesce the
fragmented format pieces into a single string.
It makes grep a bit easier and 80 columns limits don't
apply to formats.
I think using net_ratelimited_function is not particularly
clarifying here.
Maybe net_ratelimited_function should be removed instead
of its use sites expanded.
Perhaps adding macros like #define br_warn_ratelimited()
would be better.
This comment applies to the whole series.
^ permalink raw reply
* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
From: Nicolas Dichtel @ 2013-10-15 16:44 UTC (permalink / raw)
To: Eric W. Biederman, Stephen Hemminger; +Cc: David Miller, yamato, netdev
In-Reply-To: <87li22vv1w.fsf@xmission.com>
Le 10/10/2013 02:17, Eric W. Biederman a écrit :
> Stephen Hemminger <stephen@networkplumber.org> writes:
>
>> On Tue, 8 Oct 2013 14:13:37 -0700
>> Stephen Hemminger <stephen@networkplumber.org> wrote:
>>
>>> On Tue, 08 Oct 2013 15:23:49 -0400 (EDT)
>>> David Miller <davem@davemloft.net> wrote:
>>>
>>>> From: Masatake YAMATO <yamato@redhat.com>
>>>> Date: Fri, 4 Oct 2013 11:34:21 +0900
>>>>
>>>>> ip link has ability to show extra information of net work device if
>>>>> kernel provides sunh information. With this patch veth driver can
>>>>> provide its peer ifindex information to ip command via netlink
>>>>> interface.
>>>>>
>>>>> Signed-off-by: Masatake YAMATO <yamato@redhat.com>
>>>>
>>>> Applied to net-next, thank you.
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> Please revert this. It is incorrect.
>>> The info returned by any netlink message should be equal to the message
>>> for setting.
>>>
>>> I think the correct patch would be something like this (compile tested only).
>>>
>>> --- a/drivers/net/veth.c 2013-10-06 14:48:23.806461177 -0700
>>> +++ b/drivers/net/veth.c 2013-10-08 14:11:42.434074690 -0700
>>> @@ -434,6 +434,35 @@ static const struct nla_policy veth_poli
>>> [VETH_INFO_PEER] = { .len = sizeof(struct ifinfomsg) },
>>> };
>>>
>>> +static size_t veth_get_size(const struct net_device *dev)
>>> +{
>>> + return nla_total_size(sizeof(struct ifinfomsg)) + /* VETH_INFO_PEER */
>>> + 0;
>>> +}
>>> +
>>> +static int veth_fill_info(struct sk_buff *skb, const struct net_device *dev)
>>> +{
>>> + struct veth_priv *priv = netdev_priv(dev);
>>> + struct net_device *peer = rtnl_dereference(priv->peer);
>>> +
>>> + if (peer) {
>>> + struct ifinfomsg ifi = {
>>> + .ifi_family = AF_UNSPEC,
>>> + .ifi_type = peer->type,
>>> + .ifi_index = peer->ifindex,
>>> + .ifi_flags = dev_get_flags(peer),
>>> + };
>>> +
>>> + if (nla_put(skb, VETH_INFO_PEER, sizeof(ifi), &ifi))
>>> + goto nla_put_failure;
>>> + }
>>> +
>>> + return 0;
>>> +
>>> +nla_put_failure:
>>> + return -EMSGSIZE;
>>> +}
>>> +
>>> static struct rtnl_link_ops veth_link_ops = {
>>> .kind = DRV_NAME,
>>> .priv_size = sizeof(struct veth_priv),
>>> @@ -443,6 +472,8 @@ static struct rtnl_link_ops veth_link_op
>>> .dellink = veth_dellink,
>>> .policy = veth_policy,
>>> .maxtype = VETH_INFO_MAX,
>>> + .get_size = veth_get_size,
>>> + .fill_info = veth_fill_info,
>>> };
>>>
>>> /*
>>>
>>>
>>
>> This patch is ok as RFC starting point but the full implementation needs to
>> add on IFLA_NAME and other attributes such that the full peer can be reconstructed.
>>
>> Ideally, the output of 'ip link' command can be in a format that can be used
>> to recreate the same veth pair.
>>
>> One issue is that veth has the ability to make a peer in a different namespace
>> and the network namespace code does not appear to have the ability to be invertable.
>> I.e it is not possible to construct IFLA_NET_NS_PID or IFLA_NET_NS_FD attributes
>> from an existing network device namespace.
>
> Right.
>
> IFLA_NET_NS_PID is not invertible as there may be no processes running
> in a pid namespace.
>
> IFLA_NET_NS_FD is in principle invertible. We just need to add a file
> descriptor to the callers fd table. I don't see IFLA_NET_NS_FD being
> invertible for broadcast messages, but for unicast it looks like a bit
> of a pain but there are no fundamental problems.
I'm not sure to understand why it is invertible only for unicast message.
Or are you saying that it is invertible only for the netns where the caller
stands (and then not for the veth peer)?
>
> I don't know if we care enough yet to write the code for the
> IFLA_NET_NS_FD attribute but it is doable.
I care ;-)
Has somebody already started to write a patch?
^ permalink raw reply
* Re: [PATCH v3 net-next] openvswitch: fix vport-netdev unregister
From: Alexei Starovoitov @ 2013-10-15 16:53 UTC (permalink / raw)
To: Jesse Gross
Cc: David S. Miller, Pravin B Shelar, Jiri Pirko, Cong Wang,
dev@openvswitch.org, netdev
In-Reply-To: <CAEP_g=-hkjjph8qO8MQL=BBA8MnaUdE2vTy3jN+m7Vh5skfa6g@mail.gmail.com>
On Tue, Oct 15, 2013 at 8:31 AM, Jesse Gross <jesse@nicira.com> wrote:
> On Sun, Oct 13, 2013 at 8:50 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
>> index c323567..ffa429a 100644
>> --- a/net/openvswitch/dp_notify.c
>> +++ b/net/openvswitch/dp_notify.c
>> @@ -59,15 +59,9 @@ void ovs_dp_notify_wq(struct work_struct *work)
>> struct hlist_node *n;
>>
>> hlist_for_each_entry_safe(vport, n, &dp->ports[i], dp_hash_node) {
>> - struct netdev_vport *netdev_vport;
>> -
>> if (vport->ops->type != OVS_VPORT_TYPE_NETDEV)
>> continue;
>> -
>> - netdev_vport = netdev_vport_priv(vport);
>> - if (netdev_vport->dev->reg_state == NETREG_UNREGISTERED ||
>> - netdev_vport->dev->reg_state == NETREG_UNREGISTERING)
>> - dp_detach_port_notify(vport);
>> + dp_detach_port_notify(vport);
>
> Doesn't this free *all* ports of type OVS_VPORT_TYPE_NETDEV when any
> one of them is removed?
sorry. not sure what I was thinking on Sunday evening. will respin
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox