Netdev List
 help / color / mirror / Atom feed
* merge window warning
From: David Miller @ 2013-09-04 17:10 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA


The merge window is now open.

This means if a feature patch isn't in patchwork already, and
completely ready to apply, you should not submit it at this time.

I will just automatically reject any inappropriate patches sent from
now until the net-next tree opens again after the merge window closes,
in order to assist with my ability to process the current backlog
efficiently.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch v2] sfc: check for allocation failure
From: David Miller @ 2013-09-04 17:08 UTC (permalink / raw)
  To: dan.carpenter; +Cc: linux-net-drivers, bhutchings, netdev, kernel-janitors
In-Reply-To: <20130904150727.GA32327@elgon.mountain>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Wed, 4 Sep 2013 18:07:27 +0300

> It upsets static analyzers when we don't check for allocation failure.
> 
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
> v2: rebased on latest linux-next

Applied, thanks Dan.


^ permalink raw reply

* Re: [PATCH] net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls
From: Gregory CLEMENT @ 2013-09-04 16:45 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: David S. Miller, netdev, Ezequiel Garcia, Lior Amsalem,
	linux-arm-kernel, Jason Cooper, Andrew Lunn
In-Reply-To: <1378304812-21390-1-git-send-email-thomas.petazzoni@free-electrons.com>

On 04/09/2013 16:26, Thomas Petazzoni wrote:
> This commit implements the ->ndo_do_ioctl() operation so that the
> PHY-related ioctl() calls can work from userspace, which allows
> applications like mii-tool or mii-diag to do their job.
> 

I tested it successfully with mii-diag for the Armada 370 on Mirabox
and for the Armada XP on Open Blocks AX3-4

Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>


> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 90ab292..389a854 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -2451,6 +2451,21 @@ static int mvneta_stop(struct net_device *dev)
>  	return 0;
>  }
>  
> +static int mvneta_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
> +{
> +	struct mvneta_port *pp = netdev_priv(dev);
> +	int ret;
> +
> +	if (!pp->phy_dev)
> +		return -ENOTSUPP;
> +
> +	ret = phy_mii_ioctl(pp->phy_dev, ifr, cmd);
> +	if (!ret)
> +		mvneta_adjust_link(dev);
> +
> +	return ret;
> +}
> +
>  /* Ethtool methods */
>  
>  /* Get settings (phy address, speed) for ethtools */
> @@ -2569,6 +2584,7 @@ static const struct net_device_ops mvneta_netdev_ops = {
>  	.ndo_change_mtu      = mvneta_change_mtu,
>  	.ndo_tx_timeout      = mvneta_tx_timeout,
>  	.ndo_get_stats64     = mvneta_get_stats64,
> +	.ndo_do_ioctl        = mvneta_ioctl,
>  };
>  
>  const struct ethtool_ops mvneta_eth_tool_ops = {
> 


-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply

* Re: [net-next 0/9][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2013-09-04 16:41 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1378300136-26003-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed,  4 Sep 2013 06:08:47 -0700

> This series contains updates to igb only.
> 
> Todd provides a fix for igb to not look for a PBA in the iNVM on
> devices that are flashless.
> 
> Akeem provides igb patches to add a new PHY id for i354, as well as
> a couple of patches to implement the new PHY id.  He also provides
> several patches to correctly report the appropriate media type as
> well as correctly report advertised/supported link for i354 devices.
> Lastly Akeem implements a 1 second delay mechanism for i210 devices
> to avoid erroneous link issue with the link partner.

Pulled, thanks a lot Jeff.

^ permalink raw reply

* Re: [PATCH] net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
From: Willy Tarreau @ 2013-09-04 16:32 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Lior Amsalem, Jochen De Smet, Simon Guinot, Ryan Press, netdev,
	vdonnefort, Ethan Tuttle, stable, Ezequiel Garcia,
	Chény Yves-Gael, Gregory Clement, Peter Sanford,
	David S. Miller, linux-arm-kernel
In-Reply-To: <1378304478-21237-1-git-send-email-thomas.petazzoni@free-electrons.com>

Hi Thomas!

On Wed, Sep 04, 2013 at 04:21:18PM +0200, Thomas Petazzoni wrote:
> This commit fixes a long-standing bug that has been reported by many
> users: on some Armada 370 platforms, only the network interface that
> has been used in U-Boot to tftp the kernel works properly in
> Linux. The other network interfaces can see a 'link up', but are
> unable to transmit data. The reports were generally made on the Armada
> 370-based Mirabox, but have also been given on the Armada 370-RD
> board.
(...)
> This patch has been tested on Armada 370 Mirabox, and now both network
> interfaces are usable after boot.

Just as a complementary check, I can also confirm that the OpenBlocks
AX3 continues to work fine after this change.

Best regards!
Willy

^ permalink raw reply

* RE: [PATCH net-next 4/5] driver/net: enic: Exposing symbols for Cisco's low latency driver
From: Christian Benvenuti (benve) @ 2013-09-04 16:32 UTC (permalink / raw)
  To: Ben Hutchings, Govindarajulu Varadarajan
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Sujith Sankar (ssujith),
	Nishank Trivedi (nistrive), Upinder Malhi (umalhi)
In-Reply-To: <1378306376.3133.7.camel@bwh-desktop.uk.level5networks.com>

Hi Ben,

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Ben Hutchings
> Sent: Wednesday, September 04, 2013 7:53 AM
> To: Govindarajulu Varadarajan
> Cc: davem@davemloft.net; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Christian Benvenuti (benve); Sujith Sankar
> (ssujith); Nishank Trivedi (nistrive); Upinder Malhi (umalhi)
> Subject: Re: [PATCH net-next 4/5] driver/net: enic: Exposing symbols for
> Cisco's low latency driver
> 
> On Wed, 2013-09-04 at 11:17 +0530, Govindarajulu Varadarajan wrote:
> > This patch exposes symbols for usnic low latency driver that can be
> > used to register and unregister vNics as well to traverse the resources on
> vNics.
> >
> > Signed-off-by: Upinder Malhi <umalhi@cisco.com>
> > Signed-off-by: Nishank Trivedi <nistrive@cisco.com>
> > Signed-off-by: Christian Benvenuti <benve@cisco.com>
> 
> Will usnic, or any other user of these symbols, be submitted for inclusion in-
> tree as well?  It is generally expected that exported functions do have an in-
> tree user.

The usnic driver has been posted yesterday on the rdma list.
(http://www.spinics.net/lists/linux-rdma/msg16999.html)

/Chris

> Also, header files under drivers/ generally won't be included in distribution
> -devel packages, so to support an out-of-tree module the function
> prototypes would need to be included in a header under include/ (or else
> you have to repeat them and hope the types never change).
> 
> Ben.
> 
> > ---
> >  drivers/net/ethernet/cisco/enic/vnic_dev.c | 10 ++++++++++
> > drivers/net/ethernet/cisco/enic/vnic_dev.h |  1 +
> >  2 files changed, 11 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.c
> > b/drivers/net/ethernet/cisco/enic/vnic_dev.c
> > index 97455c5..69dd925 100644
> > --- a/drivers/net/ethernet/cisco/enic/vnic_dev.c
> > +++ b/drivers/net/ethernet/cisco/enic/vnic_dev.c
> > @@ -175,6 +175,7 @@ unsigned int vnic_dev_get_res_count(struct
> > vnic_dev *vdev,  {
> >  	return vdev->res[type].count;
> >  }
> > +EXPORT_SYMBOL(vnic_dev_get_res_count);
> >
> >  void __iomem *vnic_dev_get_res(struct vnic_dev *vdev, enum
> vnic_res_type type,
> >  	unsigned int index)
> > @@ -193,6 +194,7 @@ void __iomem *vnic_dev_get_res(struct vnic_dev
> *vdev, enum vnic_res_type type,
> >  		return (char __iomem *)vdev->res[type].vaddr;
> >  	}
> >  }
> > +EXPORT_SYMBOL(vnic_dev_get_res);
> >
> >  static unsigned int vnic_dev_desc_ring_size(struct vnic_dev_ring *ring,
> >  	unsigned int desc_count, unsigned int desc_size) @@ -942,6 +944,7
> @@
> > void vnic_dev_unregister(struct vnic_dev *vdev)
> >  		kfree(vdev);
> >  	}
> >  }
> > +EXPORT_SYMBOL(vnic_dev_unregister);
> >
> >  struct vnic_dev *vnic_dev_register(struct vnic_dev *vdev,
> >  	void *priv, struct pci_dev *pdev, struct vnic_dev_bar *bar, @@
> > -969,6 +972,13 @@ err_out:
> >  	vnic_dev_unregister(vdev);
> >  	return NULL;
> >  }
> > +EXPORT_SYMBOL(vnic_dev_register);
> > +
> > +struct pci_dev *vnic_dev_get_pdev(struct vnic_dev *vdev) {
> > +	return vdev->pdev;
> > +}
> > +EXPORT_SYMBOL(vnic_dev_get_pdev);
> >
> >  int vnic_dev_init_prov2(struct vnic_dev *vdev, u8 *buf, u32 len)  {
> > diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.h
> > b/drivers/net/ethernet/cisco/enic/vnic_dev.h
> > index f3d9b79..e670029 100644
> > --- a/drivers/net/ethernet/cisco/enic/vnic_dev.h
> > +++ b/drivers/net/ethernet/cisco/enic/vnic_dev.h
> > @@ -127,6 +127,7 @@ int vnic_dev_set_ig_vlan_rewrite_mode(struct
> > vnic_dev *vdev,  struct vnic_dev *vnic_dev_register(struct vnic_dev *vdev,
> >  	void *priv, struct pci_dev *pdev, struct vnic_dev_bar *bar,
> >  	unsigned int num_bars);
> > +struct pci_dev *vnic_dev_get_pdev(struct vnic_dev *vdev);
> >  int vnic_dev_init_prov2(struct vnic_dev *vdev, u8 *buf, u32 len);
> > int vnic_dev_enable2(struct vnic_dev *vdev, int active);  int
> > vnic_dev_enable2_done(struct vnic_dev *vdev, int *status);
> 
> --
> Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer;
> that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v2 5/6] bonding: restructure and add rcu for bond_for_each_slave_next()
From: Veaceslav Falico @ 2013-09-04 16:29 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Ding Tianhong, Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <52274B84.2010509@gmail.com>

On Wed, Sep 04, 2013 at 11:02:28PM +0800, Ding Tianhong wrote:
>于 2013/9/4 18:35, Veaceslav Falico 写道:
>>On Wed, Sep 04, 2013 at 05:44:15PM +0800, Ding Tianhong wrote:
>>...snip...
>>>+/* Check whether the slave is the only one in bond */
>>>+#define bond_is_only_slave(bond, pos) \
>>>+ (((pos)->list.prev == &(bond)->slave_list) && \
>>>+ ((pos)->list.next == &(bond)->slave_list))
>>
>>Could be done without pos at all -
>>
>>!list_empty(&(bond)->slave_list) && \
>>&(bond)->slave_list.next == &(bond)->slave_list.prev
>>
>>If we have only one slave and pos is NOT our slave then... well.. we have
>>big troubles.
>>
>yes, more simple more beautiful, thanks.
>
>but if the pos is not our slave, it is the mistake, not bug. :)
>
>>>+
>>>/**
>>>* bond_for_each_slave_from - iterate the slaves list from a 
>>>starting point
>>>* @bond: the bond holding this list.
>>>* @pos: current slave.
>>>- * @cnt: counter for max number of moves
>>>* @start: starting point.
>>>*
>>>* Caller must hold bond->lock
>>>*/
>>>-#define bond_for_each_slave_from(bond, pos, cnt, start) \
>>>- for (cnt = 0, pos = start; pos && cnt < (bond)->slave_cnt; \
>>>- cnt++, pos = bond_next_slave(bond, pos))
>>>+#define bond_for_each_slave_from(bond, pos, start) \
>>>+ for (pos = start; pos && (bond_is_only_slave(bond, start) ? \
>>>+ &pos->list != &bond->slave_list : \
>>>+ &pos->list != &start->list); bond_is_only_slave(bond, start) ? \
>>>+ (pos = list_entry(pos->list.next, typeof(*pos), list)) : \
>>>+ (pos = bond_next_slave(bond, pos)))
>>
>>Did you check that?
>>
>>pos = slave1 (bond has more than one slave);
>>pos && &pos->list != &slave1->list - false.
>>
>>We won't ever enter this loop if we have >1 slaves.
>>
>>I don't understand this at all.
>>
>
>ok, the logic is : if slaves == 1, run once for the slave.

your code, actually, says differently:

(bond_is_only_slave(bond, start) ? &pos->list != &bond->slave_list :

which means that if bond_is_only_slave(bond, start) == true then we check
&pos->list != &bond->slave_list, so we run till the end of the list.

Ternary operator works like that

expression ? what_to_do_if_true : what_to_do_if_false

http://en.wikipedia.org/wiki/?:#C

>if slaves > 1, run loops until reach the end list.

Here we test for &pos->list != &start->list, which is false cause pos ==
start.

>I could not get a better way to simplify the function, I think it is 
>a suitable scheme.

Nope. bond_for_each_slave_from() is supposed to loop through slaves from a
starting slave and till that starting slave, not included.

Your loop... I don't understand what it does. But clearly not what it was
doing (and supposed to do).

>
>by the way, I test the function and works well. :)
>
>>>+
>>>+/**
>>>+ * bond_for_each_slave_from_rcu - iterate the slaves list from a 
>>>starting point
>>>+ * @bond: the bond holding this list.
>>>+ * @pos: current slave.
>>>+ * @start: starting point.
>>>+ *
>>>+ * Caller must hold rcu_read_lock
>>>+ */
>>>+#define bond_for_each_slave_from_rcu(bond, pos, start) \
>>>+ for (pos = start; pos && (bond_is_only_slave(bond, start) ? \
>>>+ &pos->list != &bond->slave_list : \
>>>+ &pos->list != &start->list); bond_is_only_slave(bond, start) ? \
>>>+ (pos = list_entry_rcu(pos->list.next, typeof(*pos), list)) : \
>>>+ (pos = bond_next_slave_rcu(bond, pos)))
>>
>>Ditto as bond_for_each_slave_from() and, also, see my comment about RCU
>>from patch 1.
>>
>>>
>>>/**
>>>* bond_for_each_slave - iterate over all slaves
>>>-- 
>>>1.8.2.1
>>>
>>>
>>>
>>-- 
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply

* Re: [PATCH 0/4] netfilter updates for net-next
From: David Miller @ 2013-09-04 16:28 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1378299625-4638-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed,  4 Sep 2013 15:00:21 +0200

> The following batch contains:
> 
> * Three fixes for the new synproxy target available in your
>   net-next tree, from Jesper D. Brouer and Patrick McHardy.
> 
> * One fix for TCPMSS to correctly handling the fragmentation
>   case, from Phil Oester. I'll pass this one to -stable.

Pulled, thanks Pablo.

^ permalink raw reply

* Re: [PATCH net-next v2 1/6] bonding: simplify and use RCU protection for 3ad xmit path
From: David Miller @ 2013-09-04 16:25 UTC (permalink / raw)
  To: vfalico; +Cc: dingtianhong, fubar, andy, nikolay, netdev
In-Reply-To: <20130904101823.GO1992@redhat.com>

From: Veaceslav Falico <vfalico@redhat.com>
Date: Wed, 4 Sep 2013 12:18:24 +0200

> On Wed, Sep 04, 2013 at 05:43:45PM +0800, Ding Tianhong wrote:
> ...snip...
>>+/**
>>+ * IMPORTANT: bond_first/last_slave_rcu can return NULL in case of an
>>empty list
>>+ * Caller must hold rcu_read_lock
>>+ */
>>+#define bond_first_slave_rcu(bond) \
>>+	list_first_or_null_rcu(&(bond)->slave_list, struct slave, list)
>>+#define bond_last_slave_rcu(bond) \
>>+	(list_empty(&(bond)->slave_list) ? NULL : \
>>+ bond_to_slave_rcu((bond)->slave_list.prev))
> 
> Here, bond_last_slave_rcu() is racy. The list can be non-empty when
> list_empty() is verified, however afterwards it might become empty,
> when
> you call bond_to_slave_rcu(), and thus you'll get
> bond_to_slave(bond->slave_list) in the result, which is not a slave.
> 
> Take a look at list_first_or_null_rcu() for a reference. The main idea
> is
> that it first gets the ->next pointer, with RCU protection, and then
> verifies if it's the list head or not, and if not - it gets the
> container
> already. This way the ->next pointer won't get away.
> 
> These kind of bugs are really rare, but are *EXTREMELY* hard to debug.

I agree with this analysis.

Ding, "rcu_read_lock()" doesn't "lock" anything.  It's just a memory
barrier.

All the list can still change on you asynchronously to your accesses.

That's why list_first_or_null_rcu() is so carefully arranged.
Therefore, you must make similar accomodations.

^ permalink raw reply

* Re: [PATCH 5/7] ixgbe: use pcie_capability_read_word() to simplify code
From: Bjorn Helgaas @ 2013-09-04 16:20 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Benjamin Herrenschmidt, Gavin Shan, James E.J. Bottomley,
	David S. Miller, linux-kernel, linux-pci, Hanjun Guo, e1000-devel,
	netdev, Jeff Kirsher, Jacob Keller
In-Reply-To: <1378193715-25328-5-git-send-email-wangyijing@huawei.com>

[+cc Jacob, Jeff]

On Tue, Sep 03, 2013 at 03:35:13PM +0800, Yijing Wang wrote:
> use pcie_capability_read_word() to simplify code.
> 
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: e1000-devel@lists.sourceforge.net
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    6 ++----
>  1 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index bad8f14..bfa0b06 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -152,7 +152,6 @@ MODULE_VERSION(DRV_VERSION);
>  static int ixgbe_read_pci_cfg_word_parent(struct ixgbe_adapter *adapter,
>  					  u32 reg, u16 *value)
>  {
> -	int pos = 0;
>  	struct pci_dev *parent_dev;
>  	struct pci_bus *parent_bus;
>  
> @@ -164,11 +163,10 @@ static int ixgbe_read_pci_cfg_word_parent(struct ixgbe_adapter *adapter,
>  	if (!parent_dev)
>  		return -1;
>  
> -	pos = pci_find_capability(parent_dev, PCI_CAP_ID_EXP);
> -	if (!pos)
> +	if (!pci_is_pcie(parent_dev))
>  		return -1;
>  
> -	pci_read_config_word(parent_dev, pos + reg, value);
> +	pcie_capability_read_word(parent_dev, reg, value);
>  	return 0;
>  }
>  

Here's the caller of ixgbe_read_pci_cfg_word_parent():

    /* Get the negotiated link width and speed from PCI config space of the
     * parent, as this device is behind a switch
     */
    err = ixgbe_read_pci_cfg_word_parent(adapter, 18, &link_status);

This should be using PCI_EXP_LNKSTA instead of "18".

But it would be even better if we could drop ixgbe_get_parent_bus_info()
completely.  It seems redundant after merging Jacob's new
pcie_get_minimum_link() stuff [1].

ixgbe_disable_pcie_master() looks like it should be using
pcie_capability_read_word() with PCI_EXP_DEVSTA instead of using
IXGBE_PCI_DEVICE_STATUS.  If fact, it looks like it could use the
new pci_wait_for_pending_transaction() interface [2].

It looks like all the #defines in the "PCI Bus Info" block
(IXGBE_PCI_DEVICE_STATUS, IXGBE_PCI_DEVICE_STATUS_TRANSACTION_PENDING,
IXGBE_PCI_LINK_STATUS, etc.) [3] are really for PCIe-generic things.  If
so, the IXGBE-specific ones should be dropped in favor of the generic
ones.

[1] http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c?id=e027d1aec4bb49030646d2c186a721f94372d7f2
[2] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/pci.c?id=3775a209d38aa3a0c7ed89a7d0f529e0230f280e
[3] http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/tree/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h#n1833

^ permalink raw reply

* Re: [PATCH] net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
From: Gregory CLEMENT @ 2013-09-04 16:08 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: David S. Miller, netdev, Lior Amsalem, Jochen De Smet,
	Simon Guinot, Ryan Press, vdonnefort, Ethan Tuttle, stable,
	Ezequiel Garcia, Chény Yves-Gael, Peter Sanford,
	Willy Tarreau, linux-arm-kernel
In-Reply-To: <1378304478-21237-1-git-send-email-thomas.petazzoni@free-electrons.com>

On 04/09/2013 16:21, Thomas Petazzoni wrote:
> This commit fixes a long-standing bug that has been reported by many
> users: on some Armada 370 platforms, only the network interface that
> has been used in U-Boot to tftp the kernel works properly in
> Linux. The other network interfaces can see a 'link up', but are
> unable to transmit data. The reports were generally made on the Armada
> 370-based Mirabox, but have also been given on the Armada 370-RD
> board.
> 
> The network MAC in the Armada 370/XP (supported by the mvneta driver
> in Linux) has a functionality that allows it to continuously poll the
> PHY and directly update the MAC configuration accordingly (speed,
> duplex, etc.). The very first versions of the driver submitted for
> review were using this hardware mechanism, but due to this, the driver
> was not integrated with the kernel phylib. Following reviews, the
> driver was changed to use the phylib, and therefore a software based
> polling. In software based polling, Linux regularly talks to the PHY
> over the MDIO bus, and sees if the link status has changed. If it's
> the case then the adjust_link() callback of the driver is called to
> update the MAC configuration accordingly.
> 
> However, it turns out that the adjust_link() callback was not
> configuring the hardware in a completely correct way: while it was
> setting the speed and duplex bits correctly, it wasn't telling the
> hardware to actually take into account those bits rather than what the
> hardware-based PHY polling mechanism has concluded. So, in fact the
> adjust_link() callback was basically a no-op.
> 
> However, the network happened to be working because on the network
> interfaces used by U-Boot for tftp on Armada 370 platforms because the
> hardware PHY polling was enabled by the bootloader, and left enabled
> by Linux. However, the second network interface not used for tftp (or
> both network interfaces if the kernel is loaded from USB, NAND or SD
> card) didn't had the hardware PHY polling enabled.
> 
> This patch fixes this situation by:
> 
>  (1) Making sure that the hardware PHY polling is disabled by clearing
>      the MVNETA_PHY_POLLING_ENABLE bit in the MVNETA_UNIT_CONTROL
>      register in the driver ->probe() function.
> 
>  (2) Making sure that the duplex and speed selections made by the
>      adjust_link() callback are taken into account by clearing the
>      MVNETA_GMAC_AN_SPEED_EN and MVNETA_GMAC_AN_DUPLEX_EN bits in the
>      MVNETA_GMAC_AUTONEG_CONFIG register.
> 
> This patch has been tested on Armada 370 Mirabox, and now both network
> interfaces are usable after boot.
> 

Well done Thomas!

I have successfully tested it on Armada 370 Mirabox:

Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>

Thanks

> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Cc: Willy Tarreau <w@1wt.eu>
> Cc: Jochen De Smet <jochen.armkernel@leahnim.org>
> Cc: Peter Sanford <psanford@nearbuy.io>
> Cc: Ethan Tuttle <ethan@ethantuttle.com>
> Cc: Chény Yves-Gael <yves@cheny.fr>
> Cc: Ryan Press <ryan@presslab.us>
> Cc: Simon Guinot <simon.guinot@sequanux.org>
> Cc: vdonnefort@lacie.com
> Cc: stable@vger.kernel.org
> ---
> David, this patch is a fix for a problem that has been here since 3.8
> (when the mvneta driver was introduced), so I've Cc'ed stable@ and if
> possible I'd like to patch to be included for 3.12.
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index b017818..90ab292 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -138,7 +138,9 @@
>  #define      MVNETA_GMAC_FORCE_LINK_PASS         BIT(1)
>  #define      MVNETA_GMAC_CONFIG_MII_SPEED        BIT(5)
>  #define      MVNETA_GMAC_CONFIG_GMII_SPEED       BIT(6)
> +#define      MVNETA_GMAC_AN_SPEED_EN             BIT(7)
>  #define      MVNETA_GMAC_CONFIG_FULL_DUPLEX      BIT(12)
> +#define      MVNETA_GMAC_AN_DUPLEX_EN            BIT(13)
>  #define MVNETA_MIB_COUNTERS_BASE                 0x3080
>  #define      MVNETA_MIB_LATE_COLLISION           0x7c
>  #define MVNETA_DA_FILT_SPEC_MCAST                0x3400
> @@ -915,6 +917,13 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
>  	/* Assign port SDMA configuration */
>  	mvreg_write(pp, MVNETA_SDMA_CONFIG, val);
>  
> +	/* Disable PHY polling in hardware, since we're using the
> +	 * kernel phylib to do this.
> +	 */
> +	val = mvreg_read(pp, MVNETA_UNIT_CONTROL);
> +	val &= ~MVNETA_PHY_POLLING_ENABLE;
> +	mvreg_write(pp, MVNETA_UNIT_CONTROL, val);
> +
>  	mvneta_set_ucast_table(pp, -1);
>  	mvneta_set_special_mcast_table(pp, -1);
>  	mvneta_set_other_mcast_table(pp, -1);
> @@ -2307,7 +2316,9 @@ static void mvneta_adjust_link(struct net_device *ndev)
>  			val = mvreg_read(pp, MVNETA_GMAC_AUTONEG_CONFIG);
>  			val &= ~(MVNETA_GMAC_CONFIG_MII_SPEED |
>  				 MVNETA_GMAC_CONFIG_GMII_SPEED |
> -				 MVNETA_GMAC_CONFIG_FULL_DUPLEX);
> +				 MVNETA_GMAC_CONFIG_FULL_DUPLEX |
> +				 MVNETA_GMAC_AN_SPEED_EN |
> +				 MVNETA_GMAC_AN_DUPLEX_EN);
>  
>  			if (phydev->duplex)
>  				val |= MVNETA_GMAC_CONFIG_FULL_DUPLEX;
> 


-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH] net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
From: yves @ 2013-09-04 16:00 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Thomas Petazzoni, Lior Amsalem, Jochen De Smet, Jason Cooper,
	Ryan Press, netdev, stable, Willy Tarreau, Ethan Tuttle,
	Ezequiel Garcia, Gregory Clement, Peter Sanford, David S. Miller,
	linux-arm-kernel, Simon Guinot
In-Reply-To: <20130904152018.GA21032@grp-vdonnefort>

Le 2013-09-04 17:20, Vincent Donnefort a écrit :
> On Wed, Sep 04, 2013 at 10:50:51AM -0400, Jason Cooper wrote:
>> On Wed, Sep 04, 2013 at 04:21:18PM +0200, Thomas Petazzoni wrote:
>> > This commit fixes a long-standing bug that has been reported by many
>> > users: on some Armada 370 platforms, only the network interface that
>> > has been used in U-Boot to tftp the kernel works properly in
>> > Linux. The other network interfaces can see a 'link up', but are
>> > unable to transmit data. The reports were generally made on the Armada
>> > 370-based Mirabox, but have also been given on the Armada 370-RD
>> > board.
>> >
>> > The network MAC in the Armada 370/XP (supported by the mvneta driver
>> > in Linux) has a functionality that allows it to continuously poll the
>> > PHY and directly update the MAC configuration accordingly (speed,
>> > duplex, etc.). The very first versions of the driver submitted for
>> > review were using this hardware mechanism, but due to this, the driver
>> > was not integrated with the kernel phylib. Following reviews, the
>> > driver was changed to use the phylib, and therefore a software based
>> > polling. In software based polling, Linux regularly talks to the PHY
>> > over the MDIO bus, and sees if the link status has changed. If it's
>> > the case then the adjust_link() callback of the driver is called to
>> > update the MAC configuration accordingly.
>> >
>> > However, it turns out that the adjust_link() callback was not
>> > configuring the hardware in a completely correct way: while it was
>> > setting the speed and duplex bits correctly, it wasn't telling the
>> > hardware to actually take into account those bits rather than what the
>> > hardware-based PHY polling mechanism has concluded. So, in fact the
>> > adjust_link() callback was basically a no-op.
>> >
>> > However, the network happened to be working because on the network
>> > interfaces used by U-Boot for tftp on Armada 370 platforms because the
>> > hardware PHY polling was enabled by the bootloader, and left enabled
>> > by Linux. However, the second network interface not used for tftp (or
>> > both network interfaces if the kernel is loaded from USB, NAND or SD
>> > card) didn't had the hardware PHY polling enabled.
>> >
>> > This patch fixes this situation by:
>> >
>> >  (1) Making sure that the hardware PHY polling is disabled by clearing
>> >      the MVNETA_PHY_POLLING_ENABLE bit in the MVNETA_UNIT_CONTROL
>> >      register in the driver ->probe() function.
>> >
>> >  (2) Making sure that the duplex and speed selections made by the
>> >      adjust_link() callback are taken into account by clearing the
>> >      MVNETA_GMAC_AN_SPEED_EN and MVNETA_GMAC_AN_DUPLEX_EN bits in the
>> >      MVNETA_GMAC_AUTONEG_CONFIG register.
>> >
>> > This patch has been tested on Armada 370 Mirabox, and now both network
>> > interfaces are usable after boot.
>> >
>> > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
>> > Cc: Willy Tarreau <w@1wt.eu>
>> > Cc: Jochen De Smet <jochen.armkernel@leahnim.org>
>> > Cc: Peter Sanford <psanford@nearbuy.io>
>> > Cc: Ethan Tuttle <ethan@ethantuttle.com>
>> > Cc: Chény Yves-Gael <yves@cheny.fr>
>> > Cc: Ryan Press <ryan@presslab.us>
>> > Cc: Simon Guinot <simon.guinot@sequanux.org>
>> > Cc: vdonnefort@lacie.com
>> > Cc: stable@vger.kernel.org
>> > ---
>> > David, this patch is a fix for a problem that has been here since 3.8
>> > (when the mvneta driver was introduced), so I've Cc'ed stable@ and if
>> > possible I'd like to patch to be included for 3.12.
>> 
>> David,
>> 
>> Offending patch is:
>> 
>>   c5aff18 net: mvneta: driver for Marvell Armada 370/XP network unit
>> 
>> Applies and builds cleanly against v3.8.13, v3.9.11, v3.10.10, and 
>> v3.11
>> 
>> Acked-by: Jason Cooper <jason@lakedaemon.net>
>> 
>> thx,
>> 
>> Jason.
> 
> Works with the armada370-rd board.
> 
> Tested-by: Vincent Donnefort <vdonnefort@gmail.com>
> 
> Thank you!
> 
> Vincent.
> 


Works with the mirabox armada370 devkit.

Tested-by: Yves-Gael Cheny <yves@cheny.fr>

Many thx,

Yves-Gaël .



>> 
>> > ---
>> >  drivers/net/ethernet/marvell/mvneta.c | 13 ++++++++++++-
>> >  1 file changed, 12 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
>> > index b017818..90ab292 100644
>> > --- a/drivers/net/ethernet/marvell/mvneta.c
>> > +++ b/drivers/net/ethernet/marvell/mvneta.c
>> > @@ -138,7 +138,9 @@
>> >  #define      MVNETA_GMAC_FORCE_LINK_PASS         BIT(1)
>> >  #define      MVNETA_GMAC_CONFIG_MII_SPEED        BIT(5)
>> >  #define      MVNETA_GMAC_CONFIG_GMII_SPEED       BIT(6)
>> > +#define      MVNETA_GMAC_AN_SPEED_EN             BIT(7)
>> >  #define      MVNETA_GMAC_CONFIG_FULL_DUPLEX      BIT(12)
>> > +#define      MVNETA_GMAC_AN_DUPLEX_EN            BIT(13)
>> >  #define MVNETA_MIB_COUNTERS_BASE                 0x3080
>> >  #define      MVNETA_MIB_LATE_COLLISION           0x7c
>> >  #define MVNETA_DA_FILT_SPEC_MCAST                0x3400
>> > @@ -915,6 +917,13 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
>> >  	/* Assign port SDMA configuration */
>> >  	mvreg_write(pp, MVNETA_SDMA_CONFIG, val);
>> >
>> > +	/* Disable PHY polling in hardware, since we're using the
>> > +	 * kernel phylib to do this.
>> > +	 */
>> > +	val = mvreg_read(pp, MVNETA_UNIT_CONTROL);
>> > +	val &= ~MVNETA_PHY_POLLING_ENABLE;
>> > +	mvreg_write(pp, MVNETA_UNIT_CONTROL, val);
>> > +
>> >  	mvneta_set_ucast_table(pp, -1);
>> >  	mvneta_set_special_mcast_table(pp, -1);
>> >  	mvneta_set_other_mcast_table(pp, -1);
>> > @@ -2307,7 +2316,9 @@ static void mvneta_adjust_link(struct net_device *ndev)
>> >  			val = mvreg_read(pp, MVNETA_GMAC_AUTONEG_CONFIG);
>> >  			val &= ~(MVNETA_GMAC_CONFIG_MII_SPEED |
>> >  				 MVNETA_GMAC_CONFIG_GMII_SPEED |
>> > -				 MVNETA_GMAC_CONFIG_FULL_DUPLEX);
>> > +				 MVNETA_GMAC_CONFIG_FULL_DUPLEX |
>> > +				 MVNETA_GMAC_AN_SPEED_EN |
>> > +				 MVNETA_GMAC_AN_DUPLEX_EN);
>> >
>> >  			if (phydev->duplex)
>> >  				val |= MVNETA_GMAC_CONFIG_FULL_DUPLEX;
>> > --
>> > 1.8.1.2
>> >
>> >
>> > _______________________________________________
>> > linux-arm-kernel mailing list
>> > linux-arm-kernel@lists.infradead.org
>> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH net-next 2/2] tuntap: orphan frags before trying to set tx timestamp
From: Richard Cochran @ 2013-09-04 15:57 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, mst
In-Reply-To: <1378269226-5969-2-git-send-email-jasowang@redhat.com>

On Wed, Sep 04, 2013 at 12:33:46PM +0800, Jason Wang wrote:
> sock_tx_timestamp() will clear all zerocopy flags of skb which may lead the
> frags never to be orphaned. This will break guest to guest traffic when zerocopy
> is enabled. Fix this by orphaning the frags before trying to set tx time stamp.
> 
> The issue were introduced by commit eda297729171fe16bf34fe5b0419dfb69060f623
> (tun: Support software transmit time stamping).
> 
> Cc: Richard Cochran <richardcochran@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next 1/2] tuntap: purge socket error queue on detach
From: Richard Cochran @ 2013-09-04 15:56 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, mst
In-Reply-To: <1378269226-5969-1-git-send-email-jasowang@redhat.com>

On Wed, Sep 04, 2013 at 12:33:45PM +0800, Jason Wang wrote:
> Commit eda297729171fe16bf34fe5b0419dfb69060f623
> (tun: Support software transmit time stamping) will queue skbs into error queue
> when tx stamping is enabled. But it forgets to purge the error queue during
> detach. This patch fixes this.
> 
> Cc: Richard Cochran <richardcochran@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] random: add prandom_u32_range and prandom_u32_max helpers
From: Joe Perches @ 2013-09-04 15:54 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, linux-kernel, Theodore Ts'o
In-Reply-To: <1378298247-29364-2-git-send-email-dborkman@redhat.com>

On Wed, 2013-09-04 at 14:37 +0200, Daniel Borkmann wrote:
> We have implemented the same function over and over, so introduce
> generic helpers that unify these implementations in order to migrate
> such code to use them. Make the API similarly to randomize_range()
> for consistency. prandom_u32_range() generates numbers in [start, end]
> interval and prandom_u32_max() generates numbers in [0, end] interval.

I think these helpers can in many cases cause
poorer compiler generated object code.

> +/**
> + * prandom_u32_range - return a random number in interval [start, end]
> + * @start: lower interval endpoint
> + * @end: higher interval endpoint
> + *
> + * Returns a number that is in the given interval:
> + *
> + *     [...... <range> .....]
> + *   start                  end
> + *
> + * Callers need to make sure that start <= end. Note that the result
> + * depends on PRNG being well distributed in [0, ~0U] space. Here we
> + * use maximally equidistributed combined Tausworthe generator.
> + */
> +static inline u32 prandom_u32_range(u32 start, u32 end)
> +{
> +	return (u32)(((u64) prandom_u32() * (end + 1 - start)) >> 32) + start;
> +}

This is effectively:

	return (prandom_u32() % (end - start)) + start;

and if start and end are constant, gcc can optimize the
division by constant to a 32 bit multiply/shift/add.

I think if you add __builtin_constant_p tests for start
and end and expand the code a little you can still get
the optimizations done.

^ permalink raw reply

* Re: [PATCH] xen-netback: count number required slots for an skb more carefully
From: Wei Liu @ 2013-09-04 15:44 UTC (permalink / raw)
  To: David Vrabel
  Cc: Wei Liu, xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Ian Campbell, netdev, msw, annie.li
In-Reply-To: <52273D59.2020205@citrix.com>

On Wed, Sep 04, 2013 at 03:02:01PM +0100, David Vrabel wrote:
[...]
> >>
> >> I think I prefer fixing the counting for backporting to stable kernels.
> > 
> > The original patch has coding style change. Sans that contextual change
> > it's not a very long patch.
> 
> The size of the patch isn't the main concern for backport-ability.  It's
> the frontend visible changes and thus any (unexpected) impacts on
> frontends -- this is especially important as only a small fraction of
> frontends in use will be tested with these changes.
> 
> >>  Xi's approach of packing the ring differently is a change in frontend
> >> visible behaviour and seems more risky. e.g., possible performance
> >> impact so I would like to see some performance analysis of that approach.
> >>
> > 
> > With Xi's approach it is more efficient for backend to process. As we
> > now use one less grant copy operation which means we copy the same
> > amount of data with less grant ops.
> 
> It think it uses more grant ops because the copies of the linear
> portion are in chunks that do not cross source page boundaries.
> 
> i.e., in netbk_gop_skb():
> 
> 	data = skb->data;
> 	while (data < skb_tail_pointer(skb)) {
> 		unsigned int offset = offset_in_page(data);
> 		unsigned int len = PAGE_SIZE - offset;
>                 [...]
> 
> It wasn't clear from the patch that this had been considered and that
> any extra space needed in the grant op array was made available.
> 

If I'm not mistaken the grant op array is already enormous. See the
comment in struct xen_netbk for grant_copy_op. The case that a buffer
straddles two slots was taken into consideration long ago -- that's
why you don't see any comment or code change WRT that...

> > From frontend's PoV I think the impact is minimal. Frontend is involved
> > in assembling the packets. It only takes what's in the ring and chain
> > them together. The operation involves copying so far is the
> > __pskb_pull_tail which happens a) in rare case when there's more frags
> > than frontend's MAX_SKB_FRAGS, b) when pull_to > skb_headlen which
> > happens. With Xi's change the rare case a) will even be rarer than
> > before as we use less slots. b) happens the same as it happens before
> > Xi's change, because the pull is guarded by "if (pull_to >
> > skb_headlen(skb))" and Xi's change doesn't affect skb_headlen.
> > 
> > So overall I don't see obvious downside.
> 
> The obvious downside is it doesn't exist (in a form that can be applied
> now), it hasn't been tested and I think there may well be a subtle bug
> that would need a careful review or testing to confirm/deny.
> 

It does exist and apply cleanly on top of net tree. I haven't posted
it yet because we haven't reached concensus which path to take. :-)

The only reason that last version didn't get upstreamed is that the
commit message wasn't clear enough. From the technical PoV it's quite
sound and I believe Amazon has been using it for a long time -- the
older reference dates back to Aug 2012 IIRC. It's just never properly
upstreamed.

> You are free to work on this as a future improvements but I really don't
> see why this critical bug fix needs to be delayed any further.
> 

True. I don't mean to hold off critical fix. Just want to make sure that
every option is presented and considered.

Wei.

> David

^ permalink raw reply

* Re: [PATCH v3 0/3] Send audit/procinfo/cgroup data in socket-level control message
From: Jan Kaluža @ 2013-09-04 15:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Richard Guy Briggs, Eric W. Biederman,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, LKML, netdev-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <1378308621.7360.110.camel@edumazet-glaptop>

On 09/04/2013 05:30 PM, Eric Dumazet wrote:
> On Wed, 2013-09-04 at 11:20 -0400, Richard Guy Briggs wrote:
>> On Wed, Sep 04, 2013 at 10:58:30AM -0400, Richard Guy Briggs wrote:
>>> On Wed, Sep 04, 2013 at 12:42:26AM -0700, Eric W. Biederman wrote:
>>>> Jan Kaluza <jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>>> this patchset against net-next (applies also to linux-next) adds 3 new types
>>>>> of "Socket"-level control message (SCM_AUDIT, SCM_PROCINFO and SCM_CGROUP).
>>>
>>>> By my count you have overflowed cb in struct sk_buff and are stomping on
>>>> _skb_refdest.
>>>
>>> For patch1/3 I count 56/48, then for patch3 I get 48/48.  Jan, you might
>>> do the conversion to a pointer in patch1/3 to avoid bisect breakage.
>>
>> Wait, that __aligned(8) is for cb[48], not for the contents.
>>
>> For patch1/3 I count 28/48 on 32-bit, 36/48 on 64-bit (or would that be
>> 56 by default on 64-bit arches without aligned specified?), then for
>> patch3 I get 24/48 on 32 and 40/48 on 64 (or again 48/48 by default?).
>
> Do not count, just add this :
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 86de99a..5b61320 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1329,6 +1329,7 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
>   {
>   	int i;
>
> +	BUILD_BUG_ON(sizeof(struct unix_skb_parms) > sizeof(skb->cb));
>   	scm->fp = UNIXCB(skb).fp;
>   	UNIXCB(skb).fp = NULL;
>

That's already in af_unix.c:

BUILD_BUG_ON(sizeof(struct unix_skb_parms) > FIELD_SIZEOF(struct 
sk_buff, cb));

This test is passing for me even with only patch1/3 applied when 
building 64bit kernel.

Jan Kaluza

^ permalink raw reply

* Re: [PATCH v3 0/3] Send audit/procinfo/cgroup data in socket-level control message
From: Eric Dumazet @ 2013-09-04 15:30 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Jan Kaluza, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	eparis-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <20130904152022.GD28517-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>

On Wed, 2013-09-04 at 11:20 -0400, Richard Guy Briggs wrote:
> On Wed, Sep 04, 2013 at 10:58:30AM -0400, Richard Guy Briggs wrote:
> > On Wed, Sep 04, 2013 at 12:42:26AM -0700, Eric W. Biederman wrote:
> > > Jan Kaluza <jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> > > > this patchset against net-next (applies also to linux-next) adds 3 new types
> > > > of "Socket"-level control message (SCM_AUDIT, SCM_PROCINFO and SCM_CGROUP).
> > 
> > > By my count you have overflowed cb in struct sk_buff and are stomping on
> > > _skb_refdest.
> > 
> > For patch1/3 I count 56/48, then for patch3 I get 48/48.  Jan, you might
> > do the conversion to a pointer in patch1/3 to avoid bisect breakage.
> 
> Wait, that __aligned(8) is for cb[48], not for the contents.
> 
> For patch1/3 I count 28/48 on 32-bit, 36/48 on 64-bit (or would that be
> 56 by default on 64-bit arches without aligned specified?), then for
> patch3 I get 24/48 on 32 and 40/48 on 64 (or again 48/48 by default?).

Do not count, just add this :

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 86de99a..5b61320 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1329,6 +1329,7 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 {
 	int i;
 
+	BUILD_BUG_ON(sizeof(struct unix_skb_parms) > sizeof(skb->cb));
 	scm->fp = UNIXCB(skb).fp;
 	UNIXCB(skb).fp = NULL;

^ permalink raw reply related

* Re: [PATCH v3 0/3] Send audit/procinfo/cgroup data in socket-level control message
From: Richard Guy Briggs @ 2013-09-04 15:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jan Kaluza, davem-fT/PcQaiUtIeIZ0/mPfg9Q, LKML,
	netdev-u79uwXL29TY76Z2rM5mHXA, eparis-H+wXaHxf7aLQT0dZR+AlfA,
	tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <20130904145830.GC28517-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>

On Wed, Sep 04, 2013 at 10:58:30AM -0400, Richard Guy Briggs wrote:
> On Wed, Sep 04, 2013 at 12:42:26AM -0700, Eric W. Biederman wrote:
> > Jan Kaluza <jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> > > this patchset against net-next (applies also to linux-next) adds 3 new types
> > > of "Socket"-level control message (SCM_AUDIT, SCM_PROCINFO and SCM_CGROUP).
> 
> > By my count you have overflowed cb in struct sk_buff and are stomping on
> > _skb_refdest.
> 
> For patch1/3 I count 56/48, then for patch3 I get 48/48.  Jan, you might
> do the conversion to a pointer in patch1/3 to avoid bisect breakage.

Wait, that __aligned(8) is for cb[48], not for the contents.

For patch1/3 I count 28/48 on 32-bit, 36/48 on 64-bit (or would that be
56 by default on 64-bit arches without aligned specified?), then for
patch3 I get 24/48 on 32 and 40/48 on 64 (or again 48/48 by default?).

> > Eric
> 
> - RGB

- RGB

--
Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545

^ permalink raw reply

* Re: [PATCH] net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
From: Vincent Donnefort @ 2013-09-04 15:20 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Thomas Petazzoni, Lior Amsalem, Jochen De Smet, Simon Guinot,
	Ryan Press, netdev, stable, Willy Tarreau, Ethan Tuttle,
	Ezequiel Garcia, Chény Yves-Gael, Gregory Clement,
	Peter Sanford, David S. Miller, linux-arm-kernel
In-Reply-To: <20130904145051.GO19598@titan.lakedaemon.net>

On Wed, Sep 04, 2013 at 10:50:51AM -0400, Jason Cooper wrote:
> On Wed, Sep 04, 2013 at 04:21:18PM +0200, Thomas Petazzoni wrote:
> > This commit fixes a long-standing bug that has been reported by many
> > users: on some Armada 370 platforms, only the network interface that
> > has been used in U-Boot to tftp the kernel works properly in
> > Linux. The other network interfaces can see a 'link up', but are
> > unable to transmit data. The reports were generally made on the Armada
> > 370-based Mirabox, but have also been given on the Armada 370-RD
> > board.
> > 
> > The network MAC in the Armada 370/XP (supported by the mvneta driver
> > in Linux) has a functionality that allows it to continuously poll the
> > PHY and directly update the MAC configuration accordingly (speed,
> > duplex, etc.). The very first versions of the driver submitted for
> > review were using this hardware mechanism, but due to this, the driver
> > was not integrated with the kernel phylib. Following reviews, the
> > driver was changed to use the phylib, and therefore a software based
> > polling. In software based polling, Linux regularly talks to the PHY
> > over the MDIO bus, and sees if the link status has changed. If it's
> > the case then the adjust_link() callback of the driver is called to
> > update the MAC configuration accordingly.
> > 
> > However, it turns out that the adjust_link() callback was not
> > configuring the hardware in a completely correct way: while it was
> > setting the speed and duplex bits correctly, it wasn't telling the
> > hardware to actually take into account those bits rather than what the
> > hardware-based PHY polling mechanism has concluded. So, in fact the
> > adjust_link() callback was basically a no-op.
> > 
> > However, the network happened to be working because on the network
> > interfaces used by U-Boot for tftp on Armada 370 platforms because the
> > hardware PHY polling was enabled by the bootloader, and left enabled
> > by Linux. However, the second network interface not used for tftp (or
> > both network interfaces if the kernel is loaded from USB, NAND or SD
> > card) didn't had the hardware PHY polling enabled.
> > 
> > This patch fixes this situation by:
> > 
> >  (1) Making sure that the hardware PHY polling is disabled by clearing
> >      the MVNETA_PHY_POLLING_ENABLE bit in the MVNETA_UNIT_CONTROL
> >      register in the driver ->probe() function.
> > 
> >  (2) Making sure that the duplex and speed selections made by the
> >      adjust_link() callback are taken into account by clearing the
> >      MVNETA_GMAC_AN_SPEED_EN and MVNETA_GMAC_AN_DUPLEX_EN bits in the
> >      MVNETA_GMAC_AUTONEG_CONFIG register.
> > 
> > This patch has been tested on Armada 370 Mirabox, and now both network
> > interfaces are usable after boot.
> > 
> > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> > Cc: Willy Tarreau <w@1wt.eu>
> > Cc: Jochen De Smet <jochen.armkernel@leahnim.org>
> > Cc: Peter Sanford <psanford@nearbuy.io>
> > Cc: Ethan Tuttle <ethan@ethantuttle.com>
> > Cc: Chény Yves-Gael <yves@cheny.fr>
> > Cc: Ryan Press <ryan@presslab.us>
> > Cc: Simon Guinot <simon.guinot@sequanux.org>
> > Cc: vdonnefort@lacie.com
> > Cc: stable@vger.kernel.org
> > ---
> > David, this patch is a fix for a problem that has been here since 3.8
> > (when the mvneta driver was introduced), so I've Cc'ed stable@ and if
> > possible I'd like to patch to be included for 3.12.
> 
> David,
> 
> Offending patch is:
> 
>   c5aff18 net: mvneta: driver for Marvell Armada 370/XP network unit
> 
> Applies and builds cleanly against v3.8.13, v3.9.11, v3.10.10, and v3.11
> 
> Acked-by: Jason Cooper <jason@lakedaemon.net>
> 
> thx,
> 
> Jason.

Works with the armada370-rd board.

Tested-by: Vincent Donnefort <vdonnefort@gmail.com>

Thank you!

Vincent.

> 
> > ---
> >  drivers/net/ethernet/marvell/mvneta.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > index b017818..90ab292 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -138,7 +138,9 @@
> >  #define      MVNETA_GMAC_FORCE_LINK_PASS         BIT(1)
> >  #define      MVNETA_GMAC_CONFIG_MII_SPEED        BIT(5)
> >  #define      MVNETA_GMAC_CONFIG_GMII_SPEED       BIT(6)
> > +#define      MVNETA_GMAC_AN_SPEED_EN             BIT(7)
> >  #define      MVNETA_GMAC_CONFIG_FULL_DUPLEX      BIT(12)
> > +#define      MVNETA_GMAC_AN_DUPLEX_EN            BIT(13)
> >  #define MVNETA_MIB_COUNTERS_BASE                 0x3080
> >  #define      MVNETA_MIB_LATE_COLLISION           0x7c
> >  #define MVNETA_DA_FILT_SPEC_MCAST                0x3400
> > @@ -915,6 +917,13 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
> >  	/* Assign port SDMA configuration */
> >  	mvreg_write(pp, MVNETA_SDMA_CONFIG, val);
> >  
> > +	/* Disable PHY polling in hardware, since we're using the
> > +	 * kernel phylib to do this.
> > +	 */
> > +	val = mvreg_read(pp, MVNETA_UNIT_CONTROL);
> > +	val &= ~MVNETA_PHY_POLLING_ENABLE;
> > +	mvreg_write(pp, MVNETA_UNIT_CONTROL, val);
> > +
> >  	mvneta_set_ucast_table(pp, -1);
> >  	mvneta_set_special_mcast_table(pp, -1);
> >  	mvneta_set_other_mcast_table(pp, -1);
> > @@ -2307,7 +2316,9 @@ static void mvneta_adjust_link(struct net_device *ndev)
> >  			val = mvreg_read(pp, MVNETA_GMAC_AUTONEG_CONFIG);
> >  			val &= ~(MVNETA_GMAC_CONFIG_MII_SPEED |
> >  				 MVNETA_GMAC_CONFIG_GMII_SPEED |
> > -				 MVNETA_GMAC_CONFIG_FULL_DUPLEX);
> > +				 MVNETA_GMAC_CONFIG_FULL_DUPLEX |
> > +				 MVNETA_GMAC_AN_SPEED_EN |
> > +				 MVNETA_GMAC_AN_DUPLEX_EN);
> >  
> >  			if (phydev->duplex)
> >  				val |= MVNETA_GMAC_CONFIG_FULL_DUPLEX;
> > -- 
> > 1.8.1.2
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Vincent


LaCie will welcome you at IBC Amsterdam (13-17 Sept) on booth 7.G17 (Hall7). Come and visit us.

^ permalink raw reply

* Re: [PATCH net-next v2 5/6] bonding: restructure and add rcu for bond_for_each_slave_next()
From: Ding Tianhong @ 2013-09-04 15:02 UTC (permalink / raw)
  To: Veaceslav Falico
  Cc: Ding Tianhong, Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <20130904103540.GP1992@redhat.com>

于 2013/9/4 18:35, Veaceslav Falico 写道:
> On Wed, Sep 04, 2013 at 05:44:15PM +0800, Ding Tianhong wrote:
> ...snip...
>> +/* Check whether the slave is the only one in bond */
>> +#define bond_is_only_slave(bond, pos) \
>> + (((pos)->list.prev == &(bond)->slave_list) && \
>> + ((pos)->list.next == &(bond)->slave_list))
>
> Could be done without pos at all -
>
> !list_empty(&(bond)->slave_list) && \
> &(bond)->slave_list.next == &(bond)->slave_list.prev
>
> If we have only one slave and pos is NOT our slave then... well.. we have
> big troubles.
>
yes, more simple more beautiful, thanks.

but if the pos is not our slave, it is the mistake, not bug. :)

>> +
>> /**
>> * bond_for_each_slave_from - iterate the slaves list from a starting 
>> point
>> * @bond: the bond holding this list.
>> * @pos: current slave.
>> - * @cnt: counter for max number of moves
>> * @start: starting point.
>> *
>> * Caller must hold bond->lock
>> */
>> -#define bond_for_each_slave_from(bond, pos, cnt, start) \
>> - for (cnt = 0, pos = start; pos && cnt < (bond)->slave_cnt; \
>> - cnt++, pos = bond_next_slave(bond, pos))
>> +#define bond_for_each_slave_from(bond, pos, start) \
>> + for (pos = start; pos && (bond_is_only_slave(bond, start) ? \
>> + &pos->list != &bond->slave_list : \
>> + &pos->list != &start->list); bond_is_only_slave(bond, start) ? \
>> + (pos = list_entry(pos->list.next, typeof(*pos), list)) : \
>> + (pos = bond_next_slave(bond, pos)))
>
> Did you check that?
>
> pos = slave1 (bond has more than one slave);
> pos && &pos->list != &slave1->list - false.
>
> We won't ever enter this loop if we have >1 slaves.
>
> I don't understand this at all.
>

ok, the logic is : if slaves == 1, run once for the slave.
if slaves > 1, run loops until reach the end list.
I could not get a better way to simplify the function, I think it is a 
suitable scheme.

by the way, I test the function and works well. :)

>> +
>> +/**
>> + * bond_for_each_slave_from_rcu - iterate the slaves list from a 
>> starting point
>> + * @bond: the bond holding this list.
>> + * @pos: current slave.
>> + * @start: starting point.
>> + *
>> + * Caller must hold rcu_read_lock
>> + */
>> +#define bond_for_each_slave_from_rcu(bond, pos, start) \
>> + for (pos = start; pos && (bond_is_only_slave(bond, start) ? \
>> + &pos->list != &bond->slave_list : \
>> + &pos->list != &start->list); bond_is_only_slave(bond, start) ? \
>> + (pos = list_entry_rcu(pos->list.next, typeof(*pos), list)) : \
>> + (pos = bond_next_slave_rcu(bond, pos)))
>
> Ditto as bond_for_each_slave_from() and, also, see my comment about RCU
> from patch 1.
>
>>
>> /**
>> * bond_for_each_slave - iterate over all slaves
>> -- 
>> 1.8.2.1
>>
>>
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [patch v2] sfc: check for allocation failure
From: Dan Carpenter @ 2013-09-04 15:07 UTC (permalink / raw)
  To: Solarflare linux maintainers; +Cc: Ben Hutchings, netdev, kernel-janitors
In-Reply-To: <20130903.224939.866623192858217581.davem@davemloft.net>

It upsets static analyzers when we don't check for allocation failure.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
v2: rebased on latest linux-next

diff --git a/drivers/net/ethernet/sfc/falcon.c b/drivers/net/ethernet/sfc/falcon.c
index 38d179c..75799f8 100644
--- a/drivers/net/ethernet/sfc/falcon.c
+++ b/drivers/net/ethernet/sfc/falcon.c
@@ -894,6 +894,8 @@ static int falcon_mtd_probe(struct efx_nic *efx)
 
 	/* Allocate space for maximum number of partitions */
 	parts = kcalloc(2, sizeof(*parts), GFP_KERNEL);
+	if (!parts)
+		return -ENOMEM;
 	n_parts = 0;
 
 	spi = &nic_data->spi_flash;

^ permalink raw reply related

* Re: [PATCH v3 0/3] Send audit/procinfo/cgroup data in socket-level control message
From: Jan Kaluža @ 2013-09-04 15:04 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Eric W. Biederman, davem-fT/PcQaiUtIeIZ0/mPfg9Q, LKML,
	netdev-u79uwXL29TY76Z2rM5mHXA, eparis-H+wXaHxf7aLQT0dZR+AlfA,
	tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <20130904145830.GC28517-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>

On 09/04/2013 04:58 PM, Richard Guy Briggs wrote:
> On Wed, Sep 04, 2013 at 12:42:26AM -0700, Eric W. Biederman wrote:
>> Jan Kaluza <jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>> Hi,
>>>
>>> this patchset against net-next (applies also to linux-next) adds 3 new types
>>> of "Socket"-level control message (SCM_AUDIT, SCM_PROCINFO and SCM_CGROUP).
>>>
>>> Server-like processes in many cases need credentials and other
>>> metadata of the peer, to decide if the calling process is allowed to
>>> request a specific action, or the server just wants to log away this
>>> type of information for auditing tasks.
>>>
>>> The current practice to retrieve such process metadata is to look that
>>> information up in procfs with the $PID received over SCM_CREDENTIALS.
>>> This is sufficient for long-running tasks, but introduces a race which
>>> cannot be worked around for short-living processes; the calling
>>> process and all the information in /proc/$PID/ is gone before the
>>> receiver of the socket message can look it up.
>>
>>> Changes introduced in this patchset can also increase performance
>>> of such server-like processes, because current way of opening and
>>> parsing /proc/$PID/* files is much more expensive than receiving these
>>> metadata using SCM.
>>
>> Can I just say ick, blech, barf, gag.
>
> /me hands ebiederman an air sickness bag.
>
>> You don't require this information to be passed.  You are asking people
>> to suport a lot of new code for the forseeable future.  The only advantage
>> appears to be for short lived racy processes that don't even bother to
>> make certain their message was acknowleged before exiting.
>>
>> You sent this during the merge window which is the time for code
>> integration and testing not new code.
>
> This is an RFC.  How is this important?
>
>> By my count you have overflowed cb in struct sk_buff and are stomping on
>> _skb_refdest.
>
> For patch1/3 I count 56/48, then for patch3 I get 48/48.  Jan, you might
> do the conversion to a pointer in patch1/3 to avoid bisect breakage.

Yes, this is valid point. I will do the conversion in patch1. Thanks all 
for reviewing and pointing that out.

Jan Kaluza

>> If you are going to go crazy and pass things is there a reason you do
>> not add a patch to pass the bsd SCM_CREDS?  That information seems more
>> relevant in a security context and for making security decisions than
>> about half the information you are passing.
>>
>> Eric
>
> - RGB
>
> --
> Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Senior Software Engineer
> Kernel Security
> AMER ENG Base Operating Systems
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635
> Internal: (81) 32635
> Alt: +1.613.693.0684x3545
>

^ permalink raw reply

* Re: [PATCH net-next v2 1/6] bonding: simplify and use RCU protection for 3ad xmit path
From: Ding Tianhong @ 2013-09-04 14:53 UTC (permalink / raw)
  To: Veaceslav Falico
  Cc: Ding Tianhong, Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <20130904101823.GO1992@redhat.com>

于 2013/9/4 18:18, Veaceslav Falico 写道:
> On Wed, Sep 04, 2013 at 05:43:45PM +0800, Ding Tianhong wrote:
> ...snip...
>> +/**
>> + * IMPORTANT: bond_first/last_slave_rcu can return NULL in case of 
>> an empty list
>> + * Caller must hold rcu_read_lock
>> + */
>> +#define bond_first_slave_rcu(bond) \
>> + list_first_or_null_rcu(&(bond)->slave_list, struct slave, list)
>> +#define bond_last_slave_rcu(bond) \
>> + (list_empty(&(bond)->slave_list) ? NULL : \
>> + bond_to_slave_rcu((bond)->slave_list.prev))
>
> Here, bond_last_slave_rcu() is racy. The list can be non-empty when
> list_empty() is verified, however afterwards it might become empty, when
> you call bond_to_slave_rcu(), and thus you'll get
> bond_to_slave(bond->slave_list) in the result, which is not a slave.
>
> Take a look at list_first_or_null_rcu() for a reference. The main idea is
> that it first gets the ->next pointer, with RCU protection, and then
> verifies if it's the list head or not, and if not - it gets the container
> already. This way the ->next pointer won't get away.
>
> These kind of bugs are really rare, but are *EXTREMELY* hard to debug.
Thanks for your response and opinions, but I think your miss something,
the slave_list will not changed in the rcu_read_lock, so ,the bugs will not
happen.

>
>> +
>> #define bond_is_first_slave(bond, pos) ((pos)->list.prev == 
>> &(bond)->slave_list)
>> #define bond_is_last_slave(bond, pos) ((pos)->list.next == 
>> &(bond)->slave_list)
>>
>> @@ -93,6 +106,15 @@
>> (bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
>> bond_to_slave((pos)->list.prev))
>>
>> +/* Since bond_first/last_slave_rcu can return NULL, these can return 
>> NULL too */
>> +#define bond_next_slave_rcu(bond, pos) \
>> + (bond_is_last_slave(bond, pos) ? bond_first_slave_rcu(bond) : \
>> + bond_to_slave_rcu((pos)->list.next))
>> +
>> +#define bond_prev_slave_rcu(bond, pos) \
>> + (bond_is_first_slave(bond, pos) ? bond_last_slave_rcu(bond) : \
>> + bond_to_slave_rcu((pos)->list.prev))
>> +
>
> These two are also racy. bond_is_last/first_slave() is not rcu-ified, and
> thus you can't rely on it without proper locking. Same ideas apply as per
> bond_first_slave_rcu().
> -- 
refer to the above answer.

> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH v3 0/3] Send audit/procinfo/cgroup data in socket-level control message
From: Richard Guy Briggs @ 2013-09-04 14:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jan Kaluza, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, tj-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <878uzdf2xp.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

On Wed, Sep 04, 2013 at 12:42:26AM -0700, Eric W. Biederman wrote:
> Jan Kaluza <jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> > Hi,
> >
> > this patchset against net-next (applies also to linux-next) adds 3 new types
> > of "Socket"-level control message (SCM_AUDIT, SCM_PROCINFO and SCM_CGROUP).
> >
> > Server-like processes in many cases need credentials and other
> > metadata of the peer, to decide if the calling process is allowed to
> > request a specific action, or the server just wants to log away this
> > type of information for auditing tasks.
> >
> > The current practice to retrieve such process metadata is to look that
> > information up in procfs with the $PID received over SCM_CREDENTIALS.
> > This is sufficient for long-running tasks, but introduces a race which
> > cannot be worked around for short-living processes; the calling
> > process and all the information in /proc/$PID/ is gone before the
> > receiver of the socket message can look it up.
> 
> > Changes introduced in this patchset can also increase performance
> > of such server-like processes, because current way of opening and
> > parsing /proc/$PID/* files is much more expensive than receiving these
> > metadata using SCM.
> 
> Can I just say ick, blech, barf, gag.

/me hands ebiederman an air sickness bag.

> You don't require this information to be passed.  You are asking people
> to suport a lot of new code for the forseeable future.  The only advantage
> appears to be for short lived racy processes that don't even bother to
> make certain their message was acknowleged before exiting.
> 
> You sent this during the merge window which is the time for code
> integration and testing not new code.

This is an RFC.  How is this important?

> By my count you have overflowed cb in struct sk_buff and are stomping on
> _skb_refdest.

For patch1/3 I count 56/48, then for patch3 I get 48/48.  Jan, you might
do the conversion to a pointer in patch1/3 to avoid bisect breakage.

> If you are going to go crazy and pass things is there a reason you do
> not add a patch to pass the bsd SCM_CREDS?  That information seems more
> relevant in a security context and for making security decisions than
> about half the information you are passing.
> 
> Eric

- RGB

--
Richard Guy Briggs <rbriggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox