* ss filter problem
From: Phil Sutter @ 2016-03-29 19:32 UTC (permalink / raw)
To: Vadim Kochan; +Cc: Stephen Hemminger, netdev
Hi,
I am trying to fix a bug in ss filter code, but feel quite lost right
now. The issue is this:
| ss -nl -f inet '( sport = :22 )'
prints not only listening sockets (as requested by -l flag), but
established ones as well (reproduce by opening ssh connection to
127.0.0.1 before calling above).
In contrast, the following both don't show the established sockets:
| ss -nl '( sport = :22 )'
| ss -nl -f inet
My investigation led me to see that current_filter.states is altered
after ssfilter_parse() returns, and using gdb with a watchpoint I was
able to identify parse_hostcond() to be the bad guy: In line 1560, it
calls filter_af_set() after checking for fam != AF_UNSPEC (which is the
case, since fam = preferred_family and the latter is changed to AF_INET
when parsing '-f inet' parameter).
This whole jumping back and forth confuses me quite effectively. Since
you did some fixes in the past already, are you possibly able to point
out where/how this tiny mess has to be fixed?
I guess in an ideal world we would translate '-l' to 'state listen', '-f
inet' to 'src inet:*' and pass everything ANDed together to
ssfilter_parse(). Or maybe that would make things even worse. ;)
Cheers, Phil
^ permalink raw reply
* Re: [PATCH net-next v2 2/2] net: dsa: mv88e6xxx: Clear the PDOWN bit on setup
From: Vivien Didelot @ 2016-03-29 18:49 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Patrick Uiterwijk, linux, davem, netdev, dennis, pbrobinson
In-Reply-To: <20160329162825.GC4690@lunn.ch>
Hi Andrew, Patrick,
Andrew Lunn <andrew@lunn.ch> writes:
> On Tue, Mar 29, 2016 at 12:23:06PM -0400, Vivien Didelot wrote:
>> Hi Patrick,
>>
>> Two comments below.
>>
>> Patrick Uiterwijk <patrick@puiterwijk.org> writes:
>>
>> > +static int mv88e6xxx_power_on_serdes(struct dsa_switch *ds)
>>
>> Since this function assumes the SMI lock is already held, its name
>> should be prefixed with _ by convention (_mv88e6xxx_power_on_serdes).
>
> We decided to drop at, since nearly everything would end up with a _
> prefix. The assert_smi_lock() should find any missing locks, and
> lockdep/deadlocks will make it clear when the lock is taken twice.
OK, I didn't know that. This makes sense. There is no need to respin a
v3 only for my previous &= comment then.
Thanks,
-v
^ permalink raw reply
* RE: [PATCH] qed: initialize return rc to avoid returning garbage
From: Yuval Mintz @ 2016-03-29 18:28 UTC (permalink / raw)
To: Colin King, netdev; +Cc: linux-kernel, Dept-Eng Everest Linux L2, Ariel Elior
In-Reply-To: <1459270850-16177-1-git-send-email-colin.king@canonical.com>
> From: Colin Ian King <colin.king@canonical.com>
>
> in the case where qed_slowpath_irq_req is not called, rc is not assigned and so
> qed_int_igu_enable will return a garbage value.
> Fix this by initializing rc to 0.
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Thanks Colin.
Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
^ permalink raw reply
* [RFC] Add netdev all_adj_list refcnt propagation to fix panic
From: Andrew Collins @ 2016-03-29 17:25 UTC (permalink / raw)
To: netdev; +Cc: acollins, davem, mschiffer
This is an RFC patch to fix a relatively easily reproducible kernel
panic related to the all_adj_list handling for netdevs in recent kernels.
This is more to generate discussion than anything else. I don't
particularly like this approach, I'm hoping someone has a better idea.
The following sequence of commands will reproduce the issue:
ip link add link eth0 name eth0.100 type vlan id 100
ip link add link eth0 name eth0.200 type vlan id 200
ip link add name testbr type bridge
ip link set eth0.100 master testbr
ip link set eth0.200 master testbr
ip link add link testbr mac0 type macvlan
ip link delete dev testbr
This creates an upper/lower tree of (excuse the poor ASCII art):
/---eth0.100-eth0
mac0-testbr-
\---eth0.200-eth0
When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
reference to eth0 is added, so this results in a panic.
This change adds reference count propagation so things are handled properly.
Matthias Schiffer reported a similar crash in batman-adv:
https://github.com/freifunk-gluon/gluon/issues/680
https://www.open-mesh.org/issues/247
which this patch also seems to resolve.
---
net/core/dev.c | 68 ++++++++++++++++++++++++++++++++--------------------------
1 file changed, 37 insertions(+), 31 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index b9bcbe7..4b4ef6b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5471,6 +5471,7 @@ static inline bool netdev_adjacent_is_neigh_list(struct net_device *dev,
static int __netdev_adjacent_dev_insert(struct net_device *dev,
struct net_device *adj_dev,
+ u16 ref_nr,
struct list_head *dev_list,
void *private, bool master)
{
@@ -5480,7 +5481,7 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
adj = __netdev_find_adj(adj_dev, dev_list);
if (adj) {
- adj->ref_nr++;
+ adj->ref_nr += ref_nr;
return 0;
}
@@ -5490,7 +5491,7 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
adj->dev = adj_dev;
adj->master = master;
- adj->ref_nr = 1;
+ adj->ref_nr = ref_nr;
adj->private = private;
dev_hold(adj_dev);
@@ -5529,6 +5530,7 @@ free_adj:
static void __netdev_adjacent_dev_remove(struct net_device *dev,
struct net_device *adj_dev,
+ u16 ref_nr,
struct list_head *dev_list)
{
struct netdev_adjacent *adj;
@@ -5541,10 +5543,10 @@ static void __netdev_adjacent_dev_remove(struct net_device *dev,
BUG();
}
- if (adj->ref_nr > 1) {
- pr_debug("%s to %s ref_nr-- = %d\n", dev->name, adj_dev->name,
- adj->ref_nr-1);
- adj->ref_nr--;
+ if (adj->ref_nr > ref_nr) {
+ pr_debug("%s to %s ref_nr-%d = %d\n", dev->name, adj_dev->name,
+ ref_nr, adj->ref_nr-ref_nr);
+ adj->ref_nr -= ref_nr;
return;
}
@@ -5563,21 +5565,22 @@ static void __netdev_adjacent_dev_remove(struct net_device *dev,
static int __netdev_adjacent_dev_link_lists(struct net_device *dev,
struct net_device *upper_dev,
+ u16 ref_nr,
struct list_head *up_list,
struct list_head *down_list,
void *private, bool master)
{
int ret;
- ret = __netdev_adjacent_dev_insert(dev, upper_dev, up_list, private,
- master);
+ ret = __netdev_adjacent_dev_insert(dev, upper_dev, ref_nr, up_list,
+ private, master);
if (ret)
return ret;
- ret = __netdev_adjacent_dev_insert(upper_dev, dev, down_list, private,
- false);
+ ret = __netdev_adjacent_dev_insert(upper_dev, dev, ref_nr, down_list,
+ private, false);
if (ret) {
- __netdev_adjacent_dev_remove(dev, upper_dev, up_list);
+ __netdev_adjacent_dev_remove(dev, upper_dev, ref_nr, up_list);
return ret;
}
@@ -5585,9 +5588,10 @@ static int __netdev_adjacent_dev_link_lists(struct net_device *dev,
}
static int __netdev_adjacent_dev_link(struct net_device *dev,
- struct net_device *upper_dev)
+ struct net_device *upper_dev,
+ u16 ref_nr)
{
- return __netdev_adjacent_dev_link_lists(dev, upper_dev,
+ return __netdev_adjacent_dev_link_lists(dev, upper_dev, ref_nr,
&dev->all_adj_list.upper,
&upper_dev->all_adj_list.lower,
NULL, false);
@@ -5595,17 +5599,19 @@ static int __netdev_adjacent_dev_link(struct net_device *dev,
static void __netdev_adjacent_dev_unlink_lists(struct net_device *dev,
struct net_device *upper_dev,
+ u16 ref_nr,
struct list_head *up_list,
struct list_head *down_list)
{
- __netdev_adjacent_dev_remove(dev, upper_dev, up_list);
- __netdev_adjacent_dev_remove(upper_dev, dev, down_list);
+ __netdev_adjacent_dev_remove(dev, upper_dev, ref_nr, up_list);
+ __netdev_adjacent_dev_remove(upper_dev, dev, ref_nr, down_list);
}
static void __netdev_adjacent_dev_unlink(struct net_device *dev,
- struct net_device *upper_dev)
+ struct net_device *upper_dev,
+ u16 ref_nr)
{
- __netdev_adjacent_dev_unlink_lists(dev, upper_dev,
+ __netdev_adjacent_dev_unlink_lists(dev, upper_dev, ref_nr,
&dev->all_adj_list.upper,
&upper_dev->all_adj_list.lower);
}
@@ -5614,17 +5620,17 @@ static int __netdev_adjacent_dev_link_neighbour(struct net_device *dev,
struct net_device *upper_dev,
void *private, bool master)
{
- int ret = __netdev_adjacent_dev_link(dev, upper_dev);
+ int ret = __netdev_adjacent_dev_link(dev, upper_dev, 1);
if (ret)
return ret;
- ret = __netdev_adjacent_dev_link_lists(dev, upper_dev,
+ ret = __netdev_adjacent_dev_link_lists(dev, upper_dev, 1,
&dev->adj_list.upper,
&upper_dev->adj_list.lower,
private, master);
if (ret) {
- __netdev_adjacent_dev_unlink(dev, upper_dev);
+ __netdev_adjacent_dev_unlink(dev, upper_dev, 1);
return ret;
}
@@ -5634,8 +5640,8 @@ static int __netdev_adjacent_dev_link_neighbour(struct net_device *dev,
static void __netdev_adjacent_dev_unlink_neighbour(struct net_device *dev,
struct net_device *upper_dev)
{
- __netdev_adjacent_dev_unlink(dev, upper_dev);
- __netdev_adjacent_dev_unlink_lists(dev, upper_dev,
+ __netdev_adjacent_dev_unlink(dev, upper_dev, 1);
+ __netdev_adjacent_dev_unlink_lists(dev, upper_dev, 1,
&dev->adj_list.upper,
&upper_dev->adj_list.lower);
}
@@ -5688,7 +5694,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
list_for_each_entry(j, &upper_dev->all_adj_list.upper, list) {
pr_debug("Interlinking %s with %s, non-neighbour\n",
i->dev->name, j->dev->name);
- ret = __netdev_adjacent_dev_link(i->dev, j->dev);
+ ret = __netdev_adjacent_dev_link(i->dev, j->dev, i->ref_nr);
if (ret)
goto rollback_mesh;
}
@@ -5698,7 +5704,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
list_for_each_entry(i, &upper_dev->all_adj_list.upper, list) {
pr_debug("linking %s's upper device %s with %s\n",
upper_dev->name, i->dev->name, dev->name);
- ret = __netdev_adjacent_dev_link(dev, i->dev);
+ ret = __netdev_adjacent_dev_link(dev, i->dev, i->ref_nr);
if (ret)
goto rollback_upper_mesh;
}
@@ -5707,7 +5713,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
list_for_each_entry(i, &dev->all_adj_list.lower, list) {
pr_debug("linking %s's lower device %s with %s\n", dev->name,
i->dev->name, upper_dev->name);
- ret = __netdev_adjacent_dev_link(i->dev, upper_dev);
+ ret = __netdev_adjacent_dev_link(i->dev, upper_dev, i->ref_nr);
if (ret)
goto rollback_lower_mesh;
}
@@ -5725,7 +5731,7 @@ rollback_lower_mesh:
list_for_each_entry(i, &dev->all_adj_list.lower, list) {
if (i == to_i)
break;
- __netdev_adjacent_dev_unlink(i->dev, upper_dev);
+ __netdev_adjacent_dev_unlink(i->dev, upper_dev, i->ref_nr);
}
i = NULL;
@@ -5735,7 +5741,7 @@ rollback_upper_mesh:
list_for_each_entry(i, &upper_dev->all_adj_list.upper, list) {
if (i == to_i)
break;
- __netdev_adjacent_dev_unlink(dev, i->dev);
+ __netdev_adjacent_dev_unlink(dev, i->dev, i->ref_nr);
}
i = j = NULL;
@@ -5747,7 +5753,7 @@ rollback_mesh:
list_for_each_entry(j, &upper_dev->all_adj_list.upper, list) {
if (i == to_i && j == to_j)
break;
- __netdev_adjacent_dev_unlink(i->dev, j->dev);
+ __netdev_adjacent_dev_unlink(i->dev, j->dev, i->ref_nr);
}
if (i == to_i)
break;
@@ -5827,16 +5833,16 @@ void netdev_upper_dev_unlink(struct net_device *dev,
*/
list_for_each_entry(i, &dev->all_adj_list.lower, list)
list_for_each_entry(j, &upper_dev->all_adj_list.upper, list)
- __netdev_adjacent_dev_unlink(i->dev, j->dev);
+ __netdev_adjacent_dev_unlink(i->dev, j->dev, i->ref_nr);
/* remove also the devices itself from lower/upper device
* list
*/
list_for_each_entry(i, &dev->all_adj_list.lower, list)
- __netdev_adjacent_dev_unlink(i->dev, upper_dev);
+ __netdev_adjacent_dev_unlink(i->dev, upper_dev, i->ref_nr);
list_for_each_entry(i, &upper_dev->all_adj_list.upper, list)
- __netdev_adjacent_dev_unlink(dev, i->dev);
+ __netdev_adjacent_dev_unlink(dev, i->dev, i->ref_nr);
call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev,
&changeupper_info.info);
--
2.5.5
^ permalink raw reply related
* Re: [PATCH 06/16] wcn36xx: Fetch private sta data from sta entry instead of from vif
From: kbuild test robot @ 2016-03-29 17:01 UTC (permalink / raw)
To: Bjorn Andersson
Cc: kbuild-all, Eugene Krasnikov, Kalle Valo, wcn36xx, linux-wireless,
netdev, linux-kernel, Pontus Fuchs
In-Reply-To: <1459231593-360-7-git-send-email-bjorn.andersson@linaro.org>
[-- Attachment #1: Type: text/plain, Size: 2097 bytes --]
Hi Pontus,
[auto build test ERROR on wireless-drivers/master]
[also build test ERROR on v4.6-rc1 next-20160329]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
url: https://github.com/0day-ci/linux/commits/Bjorn-Andersson/Misc-wcn36xx-fixes/20160329-141847
base: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git master
config: sparc64-allyesconfig (attached as .config)
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64
Note: the linux-review/Bjorn-Andersson/Misc-wcn36xx-fixes/20160329-141847 HEAD 8303daac889854237207e7caefaea94fee0b87f2 builds fine.
It only hurts bisectibility.
All error/warnings (new ones prefixed by >>):
drivers/net/wireless/ath/wcn36xx/main.c: In function 'wcn36xx_set_key':
>> drivers/net/wireless/ath/wcn36xx/main.c:389:9: error: implicit declaration of function 'wcn36xx_sta_to_priv' [-Werror=implicit-function-declaration]
struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
^
>> drivers/net/wireless/ath/wcn36xx/main.c:389:33: warning: initialization makes pointer from integer without a cast
struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
^
cc1: some warnings being treated as errors
vim +/wcn36xx_sta_to_priv +389 drivers/net/wireless/ath/wcn36xx/main.c
383 struct ieee80211_vif *vif,
384 struct ieee80211_sta *sta,
385 struct ieee80211_key_conf *key_conf)
386 {
387 struct wcn36xx *wcn = hw->priv;
388 struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
> 389 struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
390 int ret = 0;
391 u8 key[WLAN_MAX_KEY_LEN];
392
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 45816 bytes --]
^ permalink raw reply
* [PATCH] qed: initialize return rc to avoid returning garbage
From: Colin King @ 2016-03-29 17:00 UTC (permalink / raw)
To: Yuval Mintz, Ariel Elior, everest-linux-l2, netdev; +Cc: linux-kernel
From: Colin Ian King <colin.king@canonical.com>
in the case where qed_slowpath_irq_req is not called, rc is not
assigned and so qed_int_igu_enable will return a garbage value.
Fix this by initializing rc to 0.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/ethernet/qlogic/qed/qed_int.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c b/drivers/net/ethernet/qlogic/qed/qed_int.c
index ffd0acc..2017b01 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_int.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_int.c
@@ -2750,7 +2750,7 @@ void qed_int_igu_enable_int(struct qed_hwfn *p_hwfn,
int qed_int_igu_enable(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt,
enum qed_int_mode int_mode)
{
- int rc;
+ int rc = 0;
/* Configure AEU signal change to produce attentions */
qed_wr(p_hwfn, p_ptt, IGU_REG_ATTENTION_ENABLE, 0);
--
2.7.4
^ permalink raw reply related
* Re: [PATCH] rtl8xxxu: fix uninitialized return value in ret
From: Jes Sorensen @ 2016-03-29 16:47 UTC (permalink / raw)
To: Colin King; +Cc: Kalle Valo, linux-wireless, netdev, linux-kernel
In-Reply-To: <1459269876-15559-1-git-send-email-colin.king@canonical.com>
Colin King <colin.king@canonical.com> writes:
> From: Colin Ian King <colin.king@canonical.com>
>
> several functions are not initializing a return status in ret
> resulting in garbage to be returned instead of 0 for success.
> Currently, the calls to these functions are not checking the
> return, however, it seems prudent to return the correct status
> in case they are to be checked at a later date.
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
> drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
Thanks for the patch! I'm surprised the compiler didn't warn about this.
I'll add it to my queue for rtl8xxxu.
Cheers,
Jes
>
> diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
> index abdff45..9262aad 100644
> --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
> +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
> @@ -5231,7 +5231,7 @@ static void rtl8xxxu_set_ampdu_min_space(struct rtl8xxxu_priv *priv, u8 density)
> static int rtl8xxxu_active_to_emu(struct rtl8xxxu_priv *priv)
> {
> u8 val8;
> - int count, ret;
> + int count, ret = 0;
>
> /* Start of rtl8723AU_card_enable_flow */
> /* Act to Cardemu sequence*/
> @@ -5281,7 +5281,7 @@ static int rtl8723bu_active_to_emu(struct rtl8xxxu_priv *priv)
> u8 val8;
> u16 val16;
> u32 val32;
> - int count, ret;
> + int count, ret = 0;
>
> /* Turn off RF */
> rtl8xxxu_write8(priv, REG_RF_CTRL, 0);
> @@ -5338,7 +5338,7 @@ static int rtl8xxxu_active_to_lps(struct rtl8xxxu_priv *priv)
> {
> u8 val8;
> u8 val32;
> - int count, ret;
> + int count, ret = 0;
>
> rtl8xxxu_write8(priv, REG_TXPAUSE, 0xff);
^ permalink raw reply
* [PATCH] rtl8xxxu: fix uninitialized return value in ret
From: Colin King @ 2016-03-29 16:44 UTC (permalink / raw)
To: Jes Sorensen, Kalle Valo, linux-wireless, netdev; +Cc: linux-kernel
From: Colin Ian King <colin.king@canonical.com>
several functions are not initializing a return status in ret
resulting in garbage to be returned instead of 0 for success.
Currently, the calls to these functions are not checking the
return, however, it seems prudent to return the correct status
in case they are to be checked at a later date.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
index abdff45..9262aad 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
@@ -5231,7 +5231,7 @@ static void rtl8xxxu_set_ampdu_min_space(struct rtl8xxxu_priv *priv, u8 density)
static int rtl8xxxu_active_to_emu(struct rtl8xxxu_priv *priv)
{
u8 val8;
- int count, ret;
+ int count, ret = 0;
/* Start of rtl8723AU_card_enable_flow */
/* Act to Cardemu sequence*/
@@ -5281,7 +5281,7 @@ static int rtl8723bu_active_to_emu(struct rtl8xxxu_priv *priv)
u8 val8;
u16 val16;
u32 val32;
- int count, ret;
+ int count, ret = 0;
/* Turn off RF */
rtl8xxxu_write8(priv, REG_RF_CTRL, 0);
@@ -5338,7 +5338,7 @@ static int rtl8xxxu_active_to_lps(struct rtl8xxxu_priv *priv)
{
u8 val8;
u8 val32;
- int count, ret;
+ int count, ret = 0;
rtl8xxxu_write8(priv, REG_TXPAUSE, 0xff);
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net-next v2 2/2] net: dsa: mv88e6xxx: Clear the PDOWN bit on setup
From: Andrew Lunn @ 2016-03-29 16:28 UTC (permalink / raw)
To: Vivien Didelot
Cc: Patrick Uiterwijk, linux, davem, netdev, dennis, pbrobinson
In-Reply-To: <87fuv9fe39.fsf@ketchup.mtl.sfl>
On Tue, Mar 29, 2016 at 12:23:06PM -0400, Vivien Didelot wrote:
> Hi Patrick,
>
> Two comments below.
>
> Patrick Uiterwijk <patrick@puiterwijk.org> writes:
>
> > +static int mv88e6xxx_power_on_serdes(struct dsa_switch *ds)
>
> Since this function assumes the SMI lock is already held, its name
> should be prefixed with _ by convention (_mv88e6xxx_power_on_serdes).
We decided to drop at, since nearly everything would end up with a _
prefix. The assert_smi_lock() should find any missing locks, and
lockdep/deadlocks will make it clear when the lock is taken twice.
Andrew
^ permalink raw reply
* Re: [PATCH net-next v2 1/2] net: dsa: mv88e6xxx: Introduce _mv88e6xxx_phy_page_{read,write}
From: Vivien Didelot @ 2016-03-29 16:26 UTC (permalink / raw)
To: Patrick Uiterwijk, linux, davem, andrew
Cc: netdev, dennis, pbrobinson, Patrick Uiterwijk
In-Reply-To: <1459249908-4556-1-git-send-email-patrick@puiterwijk.org>
Patrick Uiterwijk <patrick@puiterwijk.org> writes:
> Add versions of the phy_page_read and _write functions to
> be used in a context where the SMI mutex is held.
>
> Signed-off-by: Patrick Uiterwijk <patrick@puiterwijk.org>
Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Thanks,
Vivien
^ permalink raw reply
* Re: [PATCH net-next v2 2/2] net: dsa: mv88e6xxx: Clear the PDOWN bit on setup
From: Vivien Didelot @ 2016-03-29 16:23 UTC (permalink / raw)
To: Patrick Uiterwijk, linux, davem, andrew
Cc: netdev, dennis, pbrobinson, Patrick Uiterwijk
In-Reply-To: <1459249908-4556-2-git-send-email-patrick@puiterwijk.org>
Hi Patrick,
Two comments below.
Patrick Uiterwijk <patrick@puiterwijk.org> writes:
> +static int mv88e6xxx_power_on_serdes(struct dsa_switch *ds)
Since this function assumes the SMI lock is already held, its name
should be prefixed with _ by convention (_mv88e6xxx_power_on_serdes).
> +{
> + int ret;
> +
> + ret = _mv88e6xxx_phy_page_read(ds, REG_FIBER_SERDES, PAGE_FIBER_SERDES,
> + MII_BMCR);
> + if (ret < 0)
> + return ret;
> +
> + if (ret & BMCR_PDOWN) {
> + ret = ret & ~BMCR_PDOWN;
ret &= ~BMCR_PDOWN;
> + ret = _mv88e6xxx_phy_page_write(ds, REG_FIBER_SERDES,
> + PAGE_FIBER_SERDES, MII_BMCR,
> + ret);
> + }
> +
> + return ret;
> +}
Thanks,
Vivien
^ permalink raw reply
* Re: [PATCH] bus: mvebu-mbus: use %pad to print phys_addr_t
From: Gregory CLEMENT @ 2016-03-29 16:04 UTC (permalink / raw)
To: Arnd Bergmann
Cc: David S . Miller, Marcin Wojtas, Evan Wang, netdev,
Thomas Petazzoni, Nicolas Schichan, linux-kernel
In-Reply-To: <1458036233-2770406-1-git-send-email-arnd@arndb.de>
Hi Arnd,
On mar., mars 15 2016, Arnd Bergmann <arnd@arndb.de> wrote:
> A recent change to the mbus driver added a warning printk that
> prints a phys_addr_t using the %x format string, which fails in
> case we build with 64-bit phys_addr_t:
>
> drivers/bus/mvebu-mbus.c: In function 'mvebu_mbus_get_dram_win_info':
> drivers/bus/mvebu-mbus.c:975:9: error: format '%x' expects argument of type 'unsigned int', but argument 2 has type 'phys_addr_t {aka long long unsigned int}' [-Werror=format=]
>
> This uses the special %pa format string instead, so we always
> print the correct type.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Fixes: f2900acea801 ("bus: mvebu-mbus: provide api for obtaining IO
> and DRAM window information")
What is the status of this patch?
Do you plan to send a second version with the title fixed as suggested
by Joe Perches?
Also do you expect that I collect this patch in the mvebu subsystem?
Thanks,
Gregory
> ---
> The warning was introduced in today's linux-next through the net-next
> tree, please apply this fixup on top.
>
> drivers/bus/mvebu-mbus.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
> index c2e52864bb03..ce54a0160faa 100644
> --- a/drivers/bus/mvebu-mbus.c
> +++ b/drivers/bus/mvebu-mbus.c
> @@ -972,7 +972,7 @@ int mvebu_mbus_get_dram_win_info(phys_addr_t phyaddr, u8 *target, u8 *attr)
> }
> }
>
> - pr_err("invalid dram address 0x%x\n", phyaddr);
> + pr_err("invalid dram address %pa\n", &phyaddr);
> return -EINVAL;
> }
> EXPORT_SYMBOL_GPL(mvebu_mbus_get_dram_win_info);
> --
> 2.7.0
>
--
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
^ permalink raw reply
* [PATCH net] ipv6: udp: fix UDP_MIB_IGNOREDMULTI updates
From: Eric Dumazet @ 2016-03-29 15:43 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Rick Jones, Willem de Bruijn
From: Eric Dumazet <edumazet@google.com>
IPv6 counters updates use a different macro than IPv4.
Fixes: 36cbb2452cbaf ("udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Willem de Bruijn <willemb@google.com>
---
net/ipv6/udp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index fd25e447a5fa..8125931106be 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -843,8 +843,8 @@ start_lookup:
flush_stack(stack, count, skb, count - 1);
} else {
if (!inner_flushed)
- UDP_INC_STATS_BH(net, UDP_MIB_IGNOREDMULTI,
- proto == IPPROTO_UDPLITE);
+ UDP6_INC_STATS_BH(net, UDP_MIB_IGNOREDMULTI,
+ proto == IPPROTO_UDPLITE);
consume_skb(skb);
}
return 0;
^ permalink raw reply related
* RE: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
From: Rosen, Rami @ 2016-03-29 15:06 UTC (permalink / raw)
To: Netanel Belgazal, zorik@annapurnalabs.com,
saeed@annapurnalabs.com, alex@annapurnalabs.com, msw@amazon.com,
aliguori@amazon.com, davem@davemloft.net
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
antoine.tenart@free-electrons.com, Rosen, Rami
In-Reply-To: <1458039006-19956-1-git-send-email-netanel@annapurnalabs.com>
Hi, Netanel,
+into 5 levels and assignes interrupt delay value to each level.
Should be: assigns
+The ENA device AQ and AENQ are allocated on probe and freed ontermination.
Should be: on termination.
+ /* commit previously loaded firmare */
Should be: firmware
+static int ena_com_hash_key_destroy(struct ena_com_dev *ena_dev)
+{
+ struct ena_rss *rss = &ena_dev->rss;
+
+ if (rss->hash_key)
+ dma_free_coherent(ena_dev->dmadev,
+ sizeof(*rss->hash_key),
+ rss->hash_key,
+ rss->hash_key_dma_addr);
+ rss->hash_key = NULL;
+ return 0;
+}
This method always returns 0.
+static int ena_com_hash_ctrl_init(struct ena_com_dev *ena_dev)
+{
+ struct ena_rss *rss = &ena_dev->rss;
+
+ rss->hash_ctrl = dma_alloc_coherent(ena_dev->dmadev,
+ sizeof(*rss->hash_ctrl),
+ &rss->hash_ctrl_dma_addr,
+ GFP_KERNEL | __GFP_ZERO);
+
+ return 0;
+}
+
This method also always returns 0.
+static int ena_com_hash_ctrl_destroy(struct ena_com_dev *ena_dev)
+{
+ struct ena_rss *rss = &ena_dev->rss;
+
+ if (rss->hash_ctrl)
+ dma_free_coherent(ena_dev->dmadev,
+ sizeof(*rss->hash_ctrl),
+ rss->hash_ctrl,
+ rss->hash_ctrl_dma_addr);
+ rss->hash_ctrl = NULL;
+
+ return 0;
+}
+
This method also always returns 0.
+static int ena_com_indirect_table_destroy(struct ena_com_dev *ena_dev)
+{
+ struct ena_rss *rss = &ena_dev->rss;
+ size_t tbl_size = (1 << rss->tbl_log_size) *
+ sizeof(struct ena_admin_rss_ind_table_entry);
+
+ if (rss->rss_ind_tbl)
+ dma_free_coherent(ena_dev->dmadev,
+ tbl_size,
+ rss->rss_ind_tbl,
+ rss->rss_ind_tbl_dma_addr);
+ rss->rss_ind_tbl = NULL;
+
+ if (rss->host_rss_ind_tbl)
+ devm_kfree(ena_dev->dmadev, rss->host_rss_ind_tbl);
+ rss->host_rss_ind_tbl = NULL;
+
+ return 0;
+}
This method also always returns 0.
+int ena_com_rss_destroy(struct ena_com_dev *ena_dev)
+{
+ ena_com_indirect_table_destroy(ena_dev);
+ ena_com_hash_key_destroy(ena_dev);
+ ena_com_hash_ctrl_destroy(ena_dev);
+
+ memset(&ena_dev->rss, 0x0, sizeof(ena_dev->rss));
+
+ return 0;
+}
This method also always returns 0.
Regards,
Rami Rosen
Intel Corporation
^ permalink raw reply
* [net-next] bond: set mac address only if necessary
From: Zhang Shengju @ 2016-03-29 15:14 UTC (permalink / raw)
To: j.vosburgh, vfalico, gospo; +Cc: netdev
Bond device gets it's mac address from the first slave device, it's not
necessary to set slave device's mac address to bond if equal.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
---
drivers/net/bonding/bond_main.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 97fad05..6569219 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1473,8 +1473,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
*/
ether_addr_copy(new_slave->perm_hwaddr, slave_dev->dev_addr);
- if (!bond->params.fail_over_mac ||
- BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
+ if ((!bond->params.fail_over_mac ||
+ BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) &&
+ !ether_addr_equal_64bits(bond_dev->dev_addr, slave_dev->dev_addr)) {
/* Set slave to master's mac address. The application already
* set the master's mac address to that of the first slave
*/
--
1.8.3.1
^ permalink raw reply related
* Re: bpf: net/core/filter.c:2115 suspicious rcu_dereference_protected() usage!
From: Daniel Borkmann @ 2016-03-29 14:39 UTC (permalink / raw)
To: Michal Kubecek, Sasha Levin
Cc: Jiri Slaby, David S. Miller, ast, netdev@vger.kernel.org, LKML
In-Reply-To: <56FA8935.8030109@iogearbox.net>
On 03/29/2016 03:55 PM, Daniel Borkmann wrote:
> [ dropping my old email address ]
>
> On 03/29/2016 02:58 PM, Michal Kubecek wrote:
>> On Mon, Feb 22, 2016 at 10:31:33AM -0500, Sasha Levin wrote:
>>>
>>> I've hit the following warning while fuzzing with trinity inside a kvmtool guest
>>> running the latest -next kernel:
>>>
>>> [ 1343.104588] ===============================
>>> [ 1343.104591] [ INFO: suspicious RCU usage. ]
>>> [ 1343.104619] 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978 Not tainted
>>> [ 1343.104624] -------------------------------
>>> [ 1343.104635] net/core/filter.c:2115 suspicious rcu_dereference_protected() usage!
>>> [ 1343.104641]
>>> [ 1343.104641] other info that might help us debug this:
>>> [ 1343.104641]
>>> [ 1343.104650]
>>> [ 1343.104650] rcu_scheduler_active = 1, debug_locks = 0
>>> [ 1343.104660] 1 lock held by syz-executor/17916:
>>> [ 1343.104784] #0: (rtnl_mutex){+.+.+.}, at: rtnl_lock (net/core/rtnetlink.c:71)
>>> [ 1343.104789]
>>> [ 1343.104789] stack backtrace:
>>> [ 1343.104820] CPU: 1 PID: 17916 Comm: trinity-c8 Not tainted 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978
>>> [ 1343.104868] 1ffff10036968f44 ffff8801b4b47aa8 ffffffffa23d9a9d ffffffff00000001
>>> [ 1343.104891] fffffbfff5c2a630 0000000041b58ab3 ffffffffadb3a8f2 ffffffffa23d9905
>>> [ 1343.104914] 0000000000000000 ffff8801b5419b40 fffffbfff7596522 0000000000000001
>>> [ 1343.104919] Call Trace:
>>> [ 1343.104985] dump_stack (lib/dump_stack.c:53)
>>> [ 1343.105060] lockdep_rcu_suspicious (kernel/locking/lockdep.c:4282)
>>> [ 1343.105093] sk_detach_filter (net/core/filter.c:2114 (discriminator 5))
>>> [ 1343.105193] tun_detach_filter (drivers/net/tun.c:1808 (discriminator 7))
>>> [ 1343.105238] __tun_chr_ioctl (drivers/net/tun.c:2133)
>>> [ 1343.105370] tun_chr_ioctl (drivers/net/tun.c:2161)
>>> [ 1343.105407] do_vfs_ioctl (fs/ioctl.c:44 fs/ioctl.c:674)
>>> [ 1343.105506] SyS_ioctl (fs/ioctl.c:689 fs/ioctl.c:680)
>>> [ 1343.105542] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:200)
>>
>> Looks like sk_detach_filter() wants the socket to be owned which neither
>> tun_detach_filter() does not do, unlike sock_setsockopt(). Could you
>> check if the patch below helps?
>>
>> I'm also not really sure if it is safe to ignore return value of
>> sk_detach_filter() and just sets tun->filter_attached to false - but
>> it's not really clear what should be done if one of the calls fails
>> after some succeeded.
>
> Wrt return value, afaik SOCK_FILTER_LOCKED cannot be set for tun devs, so we
> should be okay.
>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index afdf950617c3..7417d7c20bab 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1818,11 +1818,13 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
>> static void tun_detach_filter(struct tun_struct *tun, int n)
>> {
>> int i;
>> - struct tun_file *tfile;
>>
>> for (i = 0; i < n; i++) {
>> - tfile = rtnl_dereference(tun->tfiles[i]);
>> - sk_detach_filter(tfile->socket.sk);
>> + struct sock *sk = rtnl_dereference(tun->tfiles[i])->socket.sk;
>> +
>> + lock_sock(sk);
>> + sk_detach_filter(sk);
>> + release_sock(sk);
>> }
>>
>> tun->filter_attached = false;
>>
>
> In tun case, the control path for tun_attach_filter() and tun_detach_filter()
> is under RTNL lock (held in __tun_chr_ioctl()).
>
> So in the BPF core the rcu_dereference_protected(<sk_filter>, sock_owned_by_user(sk))
> looks like a false positive in this specific use case to me, that we should probably
> just silence.
>
> Running the filter via sk_filter() in tun device happens under rcu_read_lock(),
> so the dereference and assignment pair seems okay to me.
>
> Was wondering whether we should convert this to unattached BPF filter, but this
> would break with existing expectations from sk_filter() (e.g. security modules).
If we want to silence it, could be something like the below (only compile-tested):
drivers/net/tun.c | 8 +++++---
include/linux/filter.h | 4 ++++
net/core/filter.c | 33 +++++++++++++++++++++------------
3 files changed, 30 insertions(+), 15 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afdf950..510e90a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -622,7 +622,8 @@ static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filte
/* Re-attach the filter to persist device */
if (!skip_filter && (tun->filter_attached == true)) {
- err = sk_attach_filter(&tun->fprog, tfile->socket.sk);
+ err = __sk_attach_filter(&tun->fprog, tfile->socket.sk,
+ lockdep_rtnl_is_held());
if (!err)
goto out;
}
@@ -1822,7 +1823,7 @@ static void tun_detach_filter(struct tun_struct *tun, int n)
for (i = 0; i < n; i++) {
tfile = rtnl_dereference(tun->tfiles[i]);
- sk_detach_filter(tfile->socket.sk);
+ __sk_detach_filter(tfile->socket.sk, lockdep_rtnl_is_held());
}
tun->filter_attached = false;
@@ -1835,7 +1836,8 @@ static int tun_attach_filter(struct tun_struct *tun)
for (i = 0; i < tun->numqueues; i++) {
tfile = rtnl_dereference(tun->tfiles[i]);
- ret = sk_attach_filter(&tun->fprog, tfile->socket.sk);
+ ret = __sk_attach_filter(&tun->fprog, tfile->socket.sk,
+ lockdep_rtnl_is_held());
if (ret) {
tun_detach_filter(tun, i);
return ret;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 43aa1f8..a51a536 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -465,10 +465,14 @@ int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog,
void bpf_prog_destroy(struct bpf_prog *fp);
int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
+int __sk_attach_filter(struct sock_fprog *fprog, struct sock *sk,
+ bool locked);
int sk_attach_bpf(u32 ufd, struct sock *sk);
int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk);
int sk_reuseport_attach_bpf(u32 ufd, struct sock *sk);
int sk_detach_filter(struct sock *sk);
+int __sk_detach_filter(struct sock *sk, bool locked);
+
int sk_get_filter(struct sock *sk, struct sock_filter __user *filter,
unsigned int len);
diff --git a/net/core/filter.c b/net/core/filter.c
index 2429918..02f2f6c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1149,7 +1149,8 @@ void bpf_prog_destroy(struct bpf_prog *fp)
}
EXPORT_SYMBOL_GPL(bpf_prog_destroy);
-static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk)
+static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk,
+ bool locked)
{
struct sk_filter *fp, *old_fp;
@@ -1165,10 +1166,8 @@ static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk)
return -ENOMEM;
}
- old_fp = rcu_dereference_protected(sk->sk_filter,
- sock_owned_by_user(sk));
+ old_fp = rcu_dereference_protected(sk->sk_filter, locked);
rcu_assign_pointer(sk->sk_filter, fp);
-
if (old_fp)
sk_filter_uncharge(sk, old_fp);
@@ -1247,7 +1246,8 @@ struct bpf_prog *__get_filter(struct sock_fprog *fprog, struct sock *sk)
* occurs or there is insufficient memory for the filter a negative
* errno code is returned. On success the return is zero.
*/
-int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
+int __sk_attach_filter(struct sock_fprog *fprog, struct sock *sk,
+ bool locked)
{
struct bpf_prog *prog = __get_filter(fprog, sk);
int err;
@@ -1255,7 +1255,7 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
if (IS_ERR(prog))
return PTR_ERR(prog);
- err = __sk_attach_prog(prog, sk);
+ err = __sk_attach_prog(prog, sk, locked);
if (err < 0) {
__bpf_prog_release(prog);
return err;
@@ -1263,7 +1263,12 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
return 0;
}
-EXPORT_SYMBOL_GPL(sk_attach_filter);
+EXPORT_SYMBOL_GPL(__sk_attach_filter);
+
+int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
+{
+ return __sk_attach_filter(fprog, sk, sock_owned_by_user(sk));
+}
int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk)
{
@@ -1309,7 +1314,7 @@ int sk_attach_bpf(u32 ufd, struct sock *sk)
if (IS_ERR(prog))
return PTR_ERR(prog);
- err = __sk_attach_prog(prog, sk);
+ err = __sk_attach_prog(prog, sk, sock_owned_by_user(sk));
if (err < 0) {
bpf_prog_put(prog);
return err;
@@ -2445,7 +2450,7 @@ static int __init register_sk_filter_ops(void)
}
late_initcall(register_sk_filter_ops);
-int sk_detach_filter(struct sock *sk)
+int __sk_detach_filter(struct sock *sk, bool locked)
{
int ret = -ENOENT;
struct sk_filter *filter;
@@ -2453,8 +2458,7 @@ int sk_detach_filter(struct sock *sk)
if (sock_flag(sk, SOCK_FILTER_LOCKED))
return -EPERM;
- filter = rcu_dereference_protected(sk->sk_filter,
- sock_owned_by_user(sk));
+ filter = rcu_dereference_protected(sk->sk_filter, locked);
if (filter) {
RCU_INIT_POINTER(sk->sk_filter, NULL);
sk_filter_uncharge(sk, filter);
@@ -2463,7 +2467,12 @@ int sk_detach_filter(struct sock *sk)
return ret;
}
-EXPORT_SYMBOL_GPL(sk_detach_filter);
+EXPORT_SYMBOL_GPL(__sk_detach_filter);
+
+int sk_detach_filter(struct sock *sk)
+{
+ return __sk_detach_filter(sk, sock_owned_by_user(sk));
+}
int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
unsigned int len)
--
1.9.3
^ permalink raw reply related
* Re: [net-next RFC 0/4] SO_BINDTOPREFIX
From: Eric Dumazet @ 2016-03-29 14:31 UTC (permalink / raw)
To: Gilberto Bertin; +Cc: netdev, tom, markzzzsmith
In-Reply-To: <1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com>
On Wed, 2016-03-23 at 02:26 +0000, Gilberto Bertin wrote:
> Since the net-next window just opened, I'm resubmitting my RFC for the
> SO_BINDTOSUBNET patch, following Mark Smith's suggestion to rename the
> whole thing to a more clear SO_BINDTOPREFIX.
Please do not add such monolithic option.
BPF is absolutely the way to go here, as it allows for whatever user
specified tweaks, like a list of destination subnetwork, or/and a list
of source network, or the date/time of the day, or port knocking without
netfilter, or ... you name it.
Simply add an option to load a BPF filter on a socket, used to vary the
various compute_score() functions.
No hard coded knowledge in the kernel, but a generic interface.
^ permalink raw reply
* Re: bpf: net/core/filter.c:2115 suspicious rcu_dereference_protected() usage!
From: Daniel Borkmann @ 2016-03-29 13:55 UTC (permalink / raw)
To: Michal Kubecek, Sasha Levin
Cc: Jiri Slaby, David S. Miller, ast, netdev@vger.kernel.org, LKML
In-Reply-To: <20160329125823.GB15048@unicorn.suse.cz>
[ dropping my old email address ]
On 03/29/2016 02:58 PM, Michal Kubecek wrote:
> On Mon, Feb 22, 2016 at 10:31:33AM -0500, Sasha Levin wrote:
>>
>> I've hit the following warning while fuzzing with trinity inside a kvmtool guest
>> running the latest -next kernel:
>>
>> [ 1343.104588] ===============================
>> [ 1343.104591] [ INFO: suspicious RCU usage. ]
>> [ 1343.104619] 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978 Not tainted
>> [ 1343.104624] -------------------------------
>> [ 1343.104635] net/core/filter.c:2115 suspicious rcu_dereference_protected() usage!
>> [ 1343.104641]
>> [ 1343.104641] other info that might help us debug this:
>> [ 1343.104641]
>> [ 1343.104650]
>> [ 1343.104650] rcu_scheduler_active = 1, debug_locks = 0
>> [ 1343.104660] 1 lock held by syz-executor/17916:
>> [ 1343.104784] #0: (rtnl_mutex){+.+.+.}, at: rtnl_lock (net/core/rtnetlink.c:71)
>> [ 1343.104789]
>> [ 1343.104789] stack backtrace:
>> [ 1343.104820] CPU: 1 PID: 17916 Comm: trinity-c8 Not tainted 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978
>> [ 1343.104868] 1ffff10036968f44 ffff8801b4b47aa8 ffffffffa23d9a9d ffffffff00000001
>> [ 1343.104891] fffffbfff5c2a630 0000000041b58ab3 ffffffffadb3a8f2 ffffffffa23d9905
>> [ 1343.104914] 0000000000000000 ffff8801b5419b40 fffffbfff7596522 0000000000000001
>> [ 1343.104919] Call Trace:
>> [ 1343.104985] dump_stack (lib/dump_stack.c:53)
>> [ 1343.105060] lockdep_rcu_suspicious (kernel/locking/lockdep.c:4282)
>> [ 1343.105093] sk_detach_filter (net/core/filter.c:2114 (discriminator 5))
>> [ 1343.105193] tun_detach_filter (drivers/net/tun.c:1808 (discriminator 7))
>> [ 1343.105238] __tun_chr_ioctl (drivers/net/tun.c:2133)
>> [ 1343.105370] tun_chr_ioctl (drivers/net/tun.c:2161)
>> [ 1343.105407] do_vfs_ioctl (fs/ioctl.c:44 fs/ioctl.c:674)
>> [ 1343.105506] SyS_ioctl (fs/ioctl.c:689 fs/ioctl.c:680)
>> [ 1343.105542] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:200)
>
> Looks like sk_detach_filter() wants the socket to be owned which neither
> tun_detach_filter() does not do, unlike sock_setsockopt(). Could you
> check if the patch below helps?
>
> I'm also not really sure if it is safe to ignore return value of
> sk_detach_filter() and just sets tun->filter_attached to false - but
> it's not really clear what should be done if one of the calls fails
> after some succeeded.
Wrt return value, afaik SOCK_FILTER_LOCKED cannot be set for tun devs, so we
should be okay.
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index afdf950617c3..7417d7c20bab 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1818,11 +1818,13 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
> static void tun_detach_filter(struct tun_struct *tun, int n)
> {
> int i;
> - struct tun_file *tfile;
>
> for (i = 0; i < n; i++) {
> - tfile = rtnl_dereference(tun->tfiles[i]);
> - sk_detach_filter(tfile->socket.sk);
> + struct sock *sk = rtnl_dereference(tun->tfiles[i])->socket.sk;
> +
> + lock_sock(sk);
> + sk_detach_filter(sk);
> + release_sock(sk);
> }
>
> tun->filter_attached = false;
>
In tun case, the control path for tun_attach_filter() and tun_detach_filter()
is under RTNL lock (held in __tun_chr_ioctl()).
So in the BPF core the rcu_dereference_protected(<sk_filter>, sock_owned_by_user(sk))
looks like a false positive in this specific use case to me, that we should probably
just silence.
Running the filter via sk_filter() in tun device happens under rcu_read_lock(),
so the dereference and assignment pair seems okay to me.
Was wondering whether we should convert this to unattached BPF filter, but this
would break with existing expectations from sk_filter() (e.g. security modules).
^ permalink raw reply
* [PATCH] sctp: use list_* in sctp_list_dequeue
From: Marcelo Ricardo Leitner @ 2016-03-29 13:51 UTC (permalink / raw)
To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp
Use list_* helpers in sctp_list_dequeue, more readable.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
include/net/sctp/sctp.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 65521cfdcadeee35d61f280165a387cc2164ab6d..03fb33efcae21d54192204629ff4ced2e36e7d4d 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -386,11 +386,9 @@ static inline struct list_head *sctp_list_dequeue(struct list_head *list)
{
struct list_head *result = NULL;
- if (list->next != list) {
+ if (!list_empty(list)) {
result = list->next;
- list->next = result->next;
- list->next->prev = list;
- INIT_LIST_HEAD(result);
+ list_del_init(result);
}
return result;
}
--
2.5.0
^ permalink raw reply related
* [PATCH 2/2] sctp: delay calls to sk_data_ready() as much as possible
From: Marcelo Ricardo Leitner @ 2016-03-29 13:49 UTC (permalink / raw)
To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp
In-Reply-To: <cover.1459259208.git.marcelo.leitner@gmail.com>
Currently processing of multiple chunks in a single SCTP packet leads to
multiple calls to sk_data_ready, causing multiple wake up signals which
are costy and doesn't make it wake up any faster.
With this patch it will note that the wake up is pending and will do it
before leaving the state machine interpreter, latest place possible to
do it realiably and cleanly.
Note that sk_data_ready events are not dependent on asocs, unlike waking
up writers.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
include/net/sctp/structs.h | 3 ++-
net/sctp/sm_sideeffect.c | 5 +++++
net/sctp/ulpqueue.c | 4 ++--
3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 1a6a626904bba4223b7921bbb4be41c2550271a7..21cb11107e378b4da1e7efde22fab4349496e35a 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -217,7 +217,8 @@ struct sctp_sock {
v4mapped:1,
frag_interleave:1,
recvrcvinfo:1,
- recvnxtinfo:1;
+ recvnxtinfo:1,
+ pending_data_ready:1;
atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 7fe56d0acabf66cfd8fe29dfdb45f7620b470ac7..e7042f9ce63b0cfca50cae252f51b60b68cb5731 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1742,6 +1742,11 @@ out:
error = sctp_outq_uncork(&asoc->outqueue, gfp);
} else if (local_cork)
error = sctp_outq_uncork(&asoc->outqueue, gfp);
+
+ if (sctp_sk(ep->base.sk)->pending_data_ready) {
+ ep->base.sk->sk_data_ready(ep->base.sk);
+ sctp_sk(ep->base.sk)->pending_data_ready = 0;
+ }
return error;
nomem:
error = -ENOMEM;
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index ce469d648ffbe166f9ae1c5650f481256f31a7f8..72e5b3e41cddf9d79371de8ab01484e4601b97b6 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -264,7 +264,7 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
sctp_ulpq_clear_pd(ulpq);
if (queue == &sk->sk_receive_queue)
- sk->sk_data_ready(sk);
+ sctp_sk(sk)->pending_data_ready = 1;
return 1;
out_free:
@@ -1140,5 +1140,5 @@ void sctp_ulpq_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
/* If there is data waiting, send it up the socket now. */
if (sctp_ulpq_clear_pd(ulpq) || ev)
- sk->sk_data_ready(sk);
+ sctp_sk(sk)->pending_data_ready = 1;
}
--
2.5.0
^ permalink raw reply related
* [PATCH 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock
From: Marcelo Ricardo Leitner @ 2016-03-29 13:49 UTC (permalink / raw)
To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp
In-Reply-To: <cover.1459259208.git.marcelo.leitner@gmail.com>
It wastes space and gets worse as we add new flags, so convert bit-wide
flags to a bitfield.
Currently it already saves 4 bytes in sctp_sock.
Note that do_auto_asconf cannot be merged, as explained in the comment
before it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
include/net/sctp/structs.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 6df1ce7a411c548bda4163840a90578b6e1b4cfe..1a6a626904bba4223b7921bbb4be41c2550271a7 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -210,14 +210,14 @@ struct sctp_sock {
int user_frag;
__u32 autoclose;
- __u8 nodelay;
- __u8 disable_fragments;
- __u8 v4mapped;
- __u8 frag_interleave;
__u32 adaptation_ind;
__u32 pd_point;
- __u8 recvrcvinfo;
- __u8 recvnxtinfo;
+ __u16 nodelay:1,
+ disable_fragments:1,
+ v4mapped:1,
+ frag_interleave:1,
+ recvrcvinfo:1,
+ recvnxtinfo:1;
atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
--
2.5.0
^ permalink raw reply related
* [PATCH 0/2] sctp: delay calls to sk_data_ready() as much as possible
From: Marcelo Ricardo Leitner @ 2016-03-29 13:49 UTC (permalink / raw)
To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp
1st patch is a preparation for the 2nd. The idea is to not call
->sk_data_ready() for every data chunk processed while processing
packets but only once before releasing the socket.
Marcelo Ricardo Leitner (2):
sctp: compress bit-wide flags to a bitfield on sctp_sock
sctp: delay calls to sk_data_ready() as much as possible
include/net/sctp/structs.h | 13 +++++++------
net/sctp/sm_sideeffect.c | 5 +++++
net/sctp/ulpqueue.c | 4 ++--
3 files changed, 14 insertions(+), 8 deletions(-)
--
2.5.0
^ permalink raw reply
* [PATCH] sctp: avoid refreshing heartbeat timer too often
From: Marcelo Ricardo Leitner @ 2016-03-29 13:41 UTC (permalink / raw)
To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp
Currently on high rate SCTP streams the heartbeat timer refresh can
consume quite a lot of resources as timer updates are costly and it
contains a random factor, which a) is also costly and b) invalidates
mod_timer() optimization for not editing a timer to the same value.
It may even cause the timer to be slightly advanced, for no good reason.
There are 3 places that call sctp_transport_reset_timers(), including
one that calls it for every single chunk added to a packet on
sctp_outq_flush(), even if they are bundled. As there is the possibility
of working on several transports, it's not trivial to just post-poned
this call as it would require to at least remember which transports were
used.
This change then moves the random factor to outside
sctp_transport_timeout(), allowing sctp_transport_reset_timers() to
check if a timer update is really necessary. For that, the random factor
is considered 0. If timer expires is still after it, the update is not
necessary as it's a possible random value and there is no
security/functional loss in doing so.
On loopback with MTU of 65535 and data chunks with 1636, so that we
have a considerable amount of chunks without stressing system calls,
netperf -t SCTP_STREAM -l 30, perf looked like this before:
Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
Overhead Command Shared Object Symbol
+ 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
- 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore
- _raw_write_unlock_irqrestore
- 96,54% _raw_spin_unlock_irqrestore
- 36,14% mod_timer
+ 97,24% sctp_transport_reset_timers
+ 2,76% sctp_do_sm
+ 33,65% __wake_up_sync_key
+ 28,77% sctp_ulpq_tail_event
+ 1,40% del_timer
- 1,84% mod_timer
+ 99,03% sctp_transport_reset_timers
+ 0,97% sctp_do_sm
+ 1,50% sctp_ulpq_tail_event
And after this patch:
Samples: 120K of event 'cpu-clock', Event count (approx.): 30002250000
Overhead Command Shared Object Symbol
+ 7,34% netperf [kernel.vmlinux] [k] memcpy_erms
+ 6,67% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
- 5,34% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore
- _raw_write_unlock_irqrestore
- 98,00% _raw_spin_unlock_irqrestore
+ 54,24% __wake_up_sync_key
+ 41,54% sctp_ulpq_tail_event
- 2,61% mod_timer
+ 79,88% sctp_transport_update_pmtu
+ 20,12% sctp_do_sm
+ 1,61% del_timer
+ 1,83% sctp_ulpq_tail_event
+ 2,34% netperf [kernel.vmlinux] [k] pvclock_clocksource_read
+ 2,31% netperf [sctp] [k] sctp_sendmsg
+ 1,93% netperf [kernel.vmlinux] [k] __slab_free
+ 1,92% netperf [sctp] [k] sctp_packet_transmit
+ 1,85% netperf [kernel.vmlinux] [k] kfree
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
net/sctp/transport.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 9b6b48c7524e4b441a151b80f0babec81f539d49..c9d6c61f5a511f96f38a0f2c275e418e158e0632 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -185,6 +185,8 @@ static void sctp_transport_destroy(struct sctp_transport *transport)
*/
void sctp_transport_reset_timers(struct sctp_transport *transport)
{
+ unsigned long expires;
+
/* RFC 2960 6.3.2 Retransmission Timer Rules
*
* R1) Every time a DATA chunk is sent to any address(including a
@@ -199,8 +201,10 @@ void sctp_transport_reset_timers(struct sctp_transport *transport)
sctp_transport_hold(transport);
/* When a data chunk is sent, reset the heartbeat interval. */
- if (!mod_timer(&transport->hb_timer,
- sctp_transport_timeout(transport)))
+ expires = sctp_transport_timeout(transport);
+ if (time_before(transport->hb_timer.expires, expires) &&
+ !mod_timer(&transport->hb_timer,
+ expires + prandom_u32_max(transport->rto)))
sctp_transport_hold(transport);
}
@@ -595,7 +599,7 @@ void sctp_transport_burst_reset(struct sctp_transport *t)
unsigned long sctp_transport_timeout(struct sctp_transport *trans)
{
/* RTO + timer slack +/- 50% of RTO */
- unsigned long timeout = (trans->rto >> 1) + prandom_u32_max(trans->rto);
+ unsigned long timeout = trans->rto >> 1;
if (trans->state != SCTP_UNCONFIRMED &&
trans->state != SCTP_PF)
--
2.5.0
^ permalink raw reply related
* [PATCH] sctp: flush if we can't fit another DATA chunk
From: Marcelo Ricardo Leitner @ 2016-03-29 13:41 UTC (permalink / raw)
To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp
There is no point in delaying the packet if we can't fit a single byte
of data on it anymore. So lets just reduce the threshold by the amount
that a data chunk with 4 bytes (rounding) would use.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
net/sctp/output.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 97745351d58c2fb32b9f9b57d61831d7724d83b2..c518569123ce42a8f21f80754756306c39875013 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -705,7 +705,8 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet,
/* Check whether this chunk and all the rest of pending data will fit
* or delay in hopes of bundling a full sized packet.
*/
- if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
+ if (chunk->skb->len + q->out_qlen >
+ maxsize - packet->overhead - sizeof(sctp_data_chunk_t) - 4)
/* Enough data queued to fill a packet */
return SCTP_XMIT_OK;
--
2.5.0
^ permalink raw reply related
* [PATCH] sctp: really allow using GFP_KERNEL on sctp_packet_transmit
From: Marcelo Ricardo Leitner @ 2016-03-29 13:41 UTC (permalink / raw)
To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp
Somehow my patch for commit cea8768f333e ("sctp: allow
sctp_transmit_packet and others to use gfp") missed two important
chunks, which are now added.
Fixes: cea8768f333e ("sctp: allow sctp_transmit_packet and others to use gfp")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
net/sctp/output.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 736c004abfbc2787a3c50fa85168ebdf3b112787..97745351d58c2fb32b9f9b57d61831d7724d83b2 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -401,7 +401,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
sk = chunk->skb->sk;
/* Allocate the new skb. */
- nskb = alloc_skb(packet->size + MAX_HEADER, GFP_ATOMIC);
+ nskb = alloc_skb(packet->size + MAX_HEADER, gfp);
if (!nskb)
goto nomem;
@@ -523,8 +523,8 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
*/
if (auth)
sctp_auth_calculate_hmac(asoc, nskb,
- (struct sctp_auth_chunk *)auth,
- GFP_ATOMIC);
+ (struct sctp_auth_chunk *)auth,
+ gfp);
/* 2) Calculate the Adler-32 checksum of the whole packet,
* including the SCTP common header and all the
--
2.5.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox