* Re: [PATCH] net/fec: carrier off initially to avoid root mount failure
From: Greg Ungerer @ 2010-10-11 12:38 UTC (permalink / raw)
To: Oskar Schirmer; +Cc: David Miller, netdev, dan, bigeasy, hjk, gerg, bhutchings
In-Reply-To: <20101011075420.GA17320@www.tglx.de>
On 11/10/10 17:54, Oskar Schirmer wrote:
> On Sun, Oct 10, 2010 at 21:19:56 -0700, David Miller wrote:
>> From: Oskar Schirmer<oskar@linutronix.de>
>> Date: Thu, 7 Oct 2010 14:30:30 +0200
>>
>>> with hardware slow in negotiation, the system did freeze
>>> while trying to mount root on nfs at boot time.
>>>
>>> the link state has not been initialised so network stack
>>> tried to start transmission right away. this caused instant
>>> retries, as the driver solely stated business upon link down,
>>> rendering the system unusable.
>>>
>>> notify carrier off initially to prevent transmission until
>>> phylib will report link up.
>>>
>>> Signed-off-by: Oskar Schirmer<oskar@linutronix.de>
>>
>> I did some more investigation into this situation, and for now I'm
>> going to apply your patch. It is correct, and it also matches what
>> the only other seemingly correct driver I could find using phylib does
>> (gianfar) :-) Actually, although I didn't check, bi-modal drivers
>> (those that only use phylib for some phy types) like tg3 probably do
>> the right thing here too.
>>
>> Longer term I think the right thing to do might be:
>>
>> 1) Create some notion of "network device has managed carrier"
>>
>> This could simply be a flag bit in the netdev or netdev_ops,
>> or some other kind of attribute.
>>
>> 2) Managed carrier devices start with netif_carrier_off(), otherwise
>> the device starts with netif_carrier_on().
>
> This last conditional (managed vs otherwise) would be implicit
> with a null PHY driver as Ben Hutchings proposes it to Greg Ungerers
> "allow FEC driver to not have attached PHY", 2010/10/07,
> with the null PHY simply switching to netif_carrier_on right after
> machine start.
>
> Otherwise my patch would need another #ifdef to live in
> peace with Gregs patch.
You can ignore my patch for now. I am reworking the it to
use fixed phy. It will look quite different when it is done.
Regards
Greg
------------------------------------------------------------------------
Greg Ungerer -- Principal Engineer EMAIL: gerg@snapgear.com
SnapGear Group, McAfee PHONE: +61 7 3435 2888
8 Gardner Close, FAX: +61 7 3891 3630
Milton, QLD, 4064, Australia WEB: http://www.SnapGear.com
^ permalink raw reply
* [PATCH net-next] ixgbe: fix stats handling
From: Eric Dumazet @ 2010-10-11 12:17 UTC (permalink / raw)
To: David Miller, Peter Waskiewicz, Emil Tantilov, Jeff Kirsher; +Cc: netdev
Hi
I am sending this patch for Intel people review/test and acknowledge.
Thanks !
[PATCH net-next] ixgbe: fix stats handling
Current ixgbe stats have following problems :
- Not 64 bit safe (on 32bit arches)
- Not safe in ixgbe_clean_rx_irq() :
All cpus dirty a common location (netdev->stats.rx_bytes &
netdev->stats.rx_packets) without proper synchronization.
This slow down a bit multiqueue operations, and possibly miss some
updates.
Fixes :
Implement ndo_get_stats64() method to provide accurate 64bit rx|tx
bytes/packets counters, using 64bit safe infrastructure.
ixgbe_get_ethtool_stats() also use this infrastructure to provide 64bit
safe counters.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
CC: Emil Tantilov <emil.s.tantilov@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ixgbe/ixgbe.h | 3 +-
drivers/net/ixgbe/ixgbe_ethtool.c | 29 +++++++++++---------
drivers/net/ixgbe/ixgbe_main.c | 40 +++++++++++++++++++++++++---
3 files changed, 56 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index a8c47b0..944d9e2 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -180,8 +180,9 @@ struct ixgbe_ring {
*/
struct ixgbe_queue_stats stats;
- unsigned long reinit_state;
+ struct u64_stats_sync syncp;
int numa_node;
+ unsigned long reinit_state;
u64 rsc_count; /* stat for coalesced packets */
u64 rsc_flush; /* stats for flushed packets */
u32 restart_queue; /* track tx queue restarts */
diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c
index d4ac943..3c7f15d 100644
--- a/drivers/net/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
@@ -999,12 +999,11 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
struct ethtool_stats *stats, u64 *data)
{
struct ixgbe_adapter *adapter = netdev_priv(netdev);
- u64 *queue_stat;
- int stat_count = sizeof(struct ixgbe_queue_stats) / sizeof(u64);
struct rtnl_link_stats64 temp;
const struct rtnl_link_stats64 *net_stats;
- int j, k;
- int i;
+ unsigned int start;
+ struct ixgbe_ring *ring;
+ int i, j;
char *p = NULL;
ixgbe_update_stats(adapter);
@@ -1025,16 +1024,22 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
}
for (j = 0; j < adapter->num_tx_queues; j++) {
- queue_stat = (u64 *)&adapter->tx_ring[j]->stats;
- for (k = 0; k < stat_count; k++)
- data[i + k] = queue_stat[k];
- i += k;
+ ring = adapter->tx_ring[j];
+ do {
+ start = u64_stats_fetch_begin_bh(&ring->syncp);
+ data[i] = ring->stats.packets;
+ data[i+1] = ring->stats.bytes;
+ } while (u64_stats_fetch_retry_bh(&ring->syncp, start));
+ i += 2;
}
for (j = 0; j < adapter->num_rx_queues; j++) {
- queue_stat = (u64 *)&adapter->rx_ring[j]->stats;
- for (k = 0; k < stat_count; k++)
- data[i + k] = queue_stat[k];
- i += k;
+ ring = adapter->rx_ring[j];
+ do {
+ start = u64_stats_fetch_begin_bh(&ring->syncp);
+ data[i] = ring->stats.packets;
+ data[i+1] = ring->stats.bytes;
+ } while (u64_stats_fetch_retry_bh(&ring->syncp, start));
+ i += 2;
}
if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
for (j = 0; j < MAX_TX_PACKET_BUFFERS; j++) {
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 95dbf60..1efbcde 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -824,8 +824,10 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
tx_ring->total_bytes += total_bytes;
tx_ring->total_packets += total_packets;
+ u64_stats_update_begin(&tx_ring->syncp);
tx_ring->stats.packets += total_packets;
tx_ring->stats.bytes += total_bytes;
+ u64_stats_update_end(&tx_ring->syncp);
return count < tx_ring->work_limit;
}
@@ -1172,7 +1174,6 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
int *work_done, int work_to_do)
{
struct ixgbe_adapter *adapter = q_vector->adapter;
- struct net_device *netdev = adapter->netdev;
struct pci_dev *pdev = adapter->pdev;
union ixgbe_adv_rx_desc *rx_desc, *next_rxd;
struct ixgbe_rx_buffer *rx_buffer_info, *next_buffer;
@@ -1298,8 +1299,10 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
rx_ring->rsc_count++;
rx_ring->rsc_flush++;
}
+ u64_stats_update_begin(&rx_ring->syncp);
rx_ring->stats.packets++;
rx_ring->stats.bytes += skb->len;
+ u64_stats_update_end(&rx_ring->syncp);
} else {
if (rx_ring->flags & IXGBE_RING_RX_PS_ENABLED) {
rx_buffer_info->skb = next_buffer->skb;
@@ -1375,8 +1378,6 @@ next_desc:
rx_ring->total_packets += total_rx_packets;
rx_ring->total_bytes += total_rx_bytes;
- netdev->stats.rx_bytes += total_rx_bytes;
- netdev->stats.rx_packets += total_rx_packets;
return cleaned;
}
@@ -6559,6 +6560,38 @@ static void ixgbe_netpoll(struct net_device *netdev)
}
#endif
+static struct rtnl_link_stats64 *ixgbe_get_stats64(struct net_device *netdev,
+ struct rtnl_link_stats64 *stats)
+{
+ struct ixgbe_adapter *adapter = netdev_priv(netdev);
+ int i;
+
+ /* accurate rx/tx bytes/packets stats */
+ dev_txq_stats_fold(netdev, stats);
+ for (i = 0; i < adapter->num_rx_queues; i++) {
+ struct ixgbe_ring *ring = adapter->rx_ring[i];
+ u64 bytes, packets;
+ unsigned int start;
+
+ do {
+ start = u64_stats_fetch_begin_bh(&ring->syncp);
+ packets = ring->stats.packets;
+ bytes = ring->stats.bytes;
+ } while (u64_stats_fetch_retry_bh(&ring->syncp, start));
+ stats->rx_packets += packets;
+ stats->rx_bytes += bytes;
+ }
+
+ /* following stats updated by ixgbe_watchdog_task() */
+ stats->multicast = netdev->stats.multicast;
+ stats->rx_errors = netdev->stats.rx_errors;
+ stats->rx_length_errors = netdev->stats.rx_length_errors;
+ stats->rx_crc_errors = netdev->stats.rx_crc_errors;
+ stats->rx_missed_errors = netdev->stats.rx_missed_errors;
+ return stats;
+}
+
+
static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_open = ixgbe_open,
.ndo_stop = ixgbe_close,
@@ -6578,6 +6611,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_set_vf_vlan = ixgbe_ndo_set_vf_vlan,
.ndo_set_vf_tx_rate = ixgbe_ndo_set_vf_bw,
.ndo_get_vf_config = ixgbe_ndo_get_vf_config,
+ .ndo_get_stats64 = ixgbe_get_stats64,
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = ixgbe_netpoll,
#endif
^ permalink raw reply related
* Re: powerpc, fs_enet: scanning PHY after Linux is up
From: Holger brunck @ 2010-10-11 11:49 UTC (permalink / raw)
To: Grant Likely
Cc: linuxppc-dev-mnsaURCQ41sdnm+yROfE0A,
devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, hs-ynQEQJNshbs,
Detlev Zundel, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101008170615.GC3863-MrY2KI0G/OVr83L8+7iqerDks+cytr/Z@public.gmane.org>
Hi Grant,
On 10/08/2010 07:06 PM, Grant Likely wrote:
> On Fri, Oct 08, 2010 at 10:50:50AM +0200, Holger brunck wrote:
>>> On Wed, Oct 6, 2010 at 3:53 AM, Heiko Schocher <hs-ynQEQJNshbs@public.gmane.org> wrote:
>>
>>>> Wouldn;t it be good, just if we need a PHY (on calling fs_enet_open)
>>>> to look if there is one?
>>>>
>>>> Something like that (not tested):
>>>>
>>>> in drivers/net/fs_enet/fs_enet-main.c in fs_init_phy()
>>>> called from fs_enet_open():
>>>>
>>>> Do first:
>>>> phydev = of_phy_find_device(fep->fpi->phy_node);
>>>>
>>>> Look if there is a driver (phy_dev->drv == NULL ?)
>>>>
>>>> If not, call new function
>>>> of_mdiobus_register_phy(mii_bus, fep->fpi->phy_node)
>>>> see below patch for it.
>>>>
>>>> If this succeeds, all is OK, and we can use this phy,
>>>> else ethernet not work.
>>>
>>> I don't like this approach because it muddies the concept of which
>>> device is actually responsible for managing the phys on the bus. Is
>>> it managed by the mdio bus device or the Ethernet device? It also has
>>> a potential race condition. Whereas triggering a late driver bind
>>> will be safe.
>>>
>>> Alternately, I'd also be okay with a common method to trigger a
>>> reprobe of a particular phy from userspace, but I fear that would be a
>>> significantly more complex solution.
>>>
>>>>
>>>> !!just no idea, how to get mii_bus pointer ...
>>>
>>> You'd have to get the parent of the phy node, and then loop over all
>>> the registered mdio busses looking for a bus that uses that node.
>>>
>>
>> you say that you don't like the approach to probe the phy again in fs_enet_open,
>> but currently I don't understand what would be the alternate trigger point to
>> rescan the mdio bus?
>
> Same trigger point, but different operation. At fs_enet_open time,
> instead of registering the phy_device, the phy layer could sanity
> check the already registered phy_device, and refuse to connect to it
> if the phy isn't responding. If it is responding, then it could
> re-attempt binding a phy_driver to it (although I just realized that
> this has other problems, such as correct module loading. See below)
>
ok.
>> I made a first patch to enhance the phy_device structure and rescan the mdio bus
>> at time of fs_enet_open (because I didn't see a better trigger point). The
>> advantage is that we got the mii_bus pointer and the phy addr stored in the
>> already created phy device structure and is therefore easy to use. See the patch
>> below for this modifications. Whats currently missing in the patch is to set the
>> phy_id if the phy was scanned later after phy_device creation. For the mgcoge
>> board it seems to solve our problem, but maybe I miss something important.
>>
>> Best regards
>> Holger Brunck
>>
>> diff --git a/drivers/net/fs_enet/fs_enet-main.c b/drivers/net/fs_enet/fs_enet-main.c
>> index ec2f503..6bc117f 100644
>> --- a/drivers/net/fs_enet/fs_enet-main.c
>> +++ b/drivers/net/fs_enet/fs_enet-main.c
>> @@ -775,7 +774,8 @@ static int fs_enet_open(struct net_device *dev)
>> {
>> struct fs_enet_private *fep = netdev_priv(dev);
>> int r;
>> - int err;
>> + int err = 0;
>> + u32 phy_id = 0;
>>
>> /* to initialize the fep->cur_rx,... */
>> /* not doing this, will cause a crash in fs_enet_rx_napi */
>> @@ -795,13 +795,23 @@ static int fs_enet_open(struct net_device *dev)
>> return -EINVAL;
>> }
>>
>> - err = fs_init_phy(dev);
>> - if (err) {
>> + if (fep->phydev == NULL)
>> + err = fs_init_phy(dev);
>> +
>> + if (!err && (fep->phydev->available == false))
>> + r = get_phy_id(fep->phydev->bus, fep->phydev->addr, &phy_id);
>> +
>> + if (err || (phy_id == 0xffffffff)) {
>> free_irq(fep->interrupt, dev);
>> if (fep->fpi->use_napi)
>> napi_disable(&fep->napi);
>> - return err;
>> + if (err)
>> + return err;
>> + else
>> + return -EINVAL;
>> }
>> + else
>> + fep->phydev->available = true;
>> phy_start(fep->phydev);
>>
>> netif_start_queue(dev);
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index adbc0fd..1f443cb 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -173,6 +173,10 @@ struct phy_device* phy_device_create(struct mii_bus *bus,
>> int addr, int phy_id)
>> dev->dev.bus = &mdio_bus_type;
>> dev->irq = bus->irq != NULL ? bus->irq[addr] : PHY_POLL;
>> dev_set_name(&dev->dev, PHY_ID_FMT, bus->id, addr);
>> + if (phy_id == 0xffffffff)
>> + dev->available = false;
>> + else
>> + dev->available = true;
>
> This flag shouldn't be necessary. Just check whether or not
> phy_device->phy_id is sane at phy_attach_direct() time. If it is
> mostly f's, then don't attach.
>
Yes, indeed it is unneeded. Thanks for pointing out.
>>
>> dev->state = PHY_DOWN;
>>
>> @@ -232,13 +236,11 @@ struct phy_device * get_phy_device(struct mii_bus *bus,
>> int addr)
>> int r;
>>
>> r = get_phy_id(bus, addr, &phy_id);
>> - if (r)
>> - return ERR_PTR(r);
>>
>> /* If the phy_id is mostly Fs, there is no device there */
>> - if ((phy_id & 0x1fffffff) == 0x1fffffff)
>> - return NULL;
>> -
>> + if (((phy_id & 0x1fffffff) == 0x1fffffff) || r)
>> + phy_id = 0xffffffff;
>> + /* create phy even if the phy is currently not available */
>> dev = phy_device_create(bus, addr, phy_id);
>
> Cannot do it this way because many phylib users probe the bus for phys
> instead of the explicit creation used with the device tree. There
> needs to be a method to explicitly skip this test when creating a phy;
> possibly by having the device tree code call phy_device_create()
> directly.
>
Ah ok, every phy_device_create() call from of_mdiobus_register should skip this
test, because if a phy is described in the dts it is present (sooner or later)
and if phy_device_create is called from somewhere else I do this test as usual.
I adapted my patch accordingly.
> Hmmm.... I see another problem. Deferred probing of the phy will
> potentially cause problems with module loading. If the binding is
> deferred to phy connect time; then the phy driver may not have time to
> get loaded before the phy layer decides there is no driver and binds
> it to the generic one. Blech.
>
> Okay, so it seems like a method of explicitly triggering a phy_device
> rebind from userspace is necessary. This could be done with a
> per-phy_device sysfs file I suppose. Just an empty file that when
> read triggers a re-read of the phy id registers, and retries binding a
> driver, including the request_module call in phy_device_create().
>
Okay I suspected that there is not an easy solution for our problem. Another
solution comes in my mind. If we defer the call to fs_enet_probe at startup. So
enhance the dts entry with something like an hotplug indication and then trigger
via an sysfs entry the call to fs_enet_probe if the phy is up... Other hotplug
devices should have similar problems...
However for our mgcoge board we can live with the fact that we can't load/unload
the driver dynamically. So I think we will go with this modified "out of tree"
patch for our board. Thanks for your support.
Best regards
Holger Brunck
^ permalink raw reply
* Re: [PATCH 1/3] NET: pch, fix use after free
From: Eric Dumazet @ 2010-10-11 9:38 UTC (permalink / raw)
To: Jiri Slaby; +Cc: davem, netdev, linux-kernel, jirislaby, Masayuki Ohtake
In-Reply-To: <1286789218-13976-1-git-send-email-jslaby@suse.cz>
Le lundi 11 octobre 2010 à 11:26 +0200, Jiri Slaby a écrit :
> Stanse found that pch_gbe_xmit_frame uses skb after it is freed. Fix
> that.
>
> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
> Cc: Masayuki Ohtake <masa-korg@dsn.okisemi.com>
> ---
> drivers/net/pch_gbe/pch_gbe_main.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
Applicable to net-next-2.6 only, this driver is not yet in Linus tree
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
^ permalink raw reply
* [PATCH 3/3] NET: wimax, fix use after free
From: Jiri Slaby @ 2010-10-11 9:26 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-kernel, jirislaby, Inaky Perez-Gonzalez,
linux-wimax
In-Reply-To: <1286789218-13976-1-git-send-email-jslaby@suse.cz>
Stanse found that i2400m_rx frees skb, but still uses skb->len even
though it has skb_len defined. So use skb_len properly in the code.
And also define it unsinged int rather than size_t to solve
compilation warnings.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: linux-wimax@intel.com
---
drivers/net/wimax/i2400m/rx.c | 26 +++++++++++++-------------
1 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/net/wimax/i2400m/rx.c b/drivers/net/wimax/i2400m/rx.c
index c4876d0..844133b 100644
--- a/drivers/net/wimax/i2400m/rx.c
+++ b/drivers/net/wimax/i2400m/rx.c
@@ -1244,16 +1244,16 @@ int i2400m_rx(struct i2400m *i2400m, struct sk_buff *skb)
int i, result;
struct device *dev = i2400m_dev(i2400m);
const struct i2400m_msg_hdr *msg_hdr;
- size_t pl_itr, pl_size, skb_len;
+ size_t pl_itr, pl_size;
unsigned long flags;
- unsigned num_pls, single_last;
+ unsigned num_pls, single_last, skb_len;
skb_len = skb->len;
- d_fnstart(4, dev, "(i2400m %p skb %p [size %zu])\n",
+ d_fnstart(4, dev, "(i2400m %p skb %p [size %u])\n",
i2400m, skb, skb_len);
result = -EIO;
msg_hdr = (void *) skb->data;
- result = i2400m_rx_msg_hdr_check(i2400m, msg_hdr, skb->len);
+ result = i2400m_rx_msg_hdr_check(i2400m, msg_hdr, skb_len);
if (result < 0)
goto error_msg_hdr_check;
result = -EIO;
@@ -1261,10 +1261,10 @@ int i2400m_rx(struct i2400m *i2400m, struct sk_buff *skb)
pl_itr = sizeof(*msg_hdr) + /* Check payload descriptor(s) */
num_pls * sizeof(msg_hdr->pld[0]);
pl_itr = ALIGN(pl_itr, I2400M_PL_ALIGN);
- if (pl_itr > skb->len) { /* got all the payload descriptors? */
+ if (pl_itr > skb_len) { /* got all the payload descriptors? */
dev_err(dev, "RX: HW BUG? message too short (%u bytes) for "
"%u payload descriptors (%zu each, total %zu)\n",
- skb->len, num_pls, sizeof(msg_hdr->pld[0]), pl_itr);
+ skb_len, num_pls, sizeof(msg_hdr->pld[0]), pl_itr);
goto error_pl_descr_short;
}
/* Walk each payload payload--check we really got it */
@@ -1272,7 +1272,7 @@ int i2400m_rx(struct i2400m *i2400m, struct sk_buff *skb)
/* work around old gcc warnings */
pl_size = i2400m_pld_size(&msg_hdr->pld[i]);
result = i2400m_rx_pl_descr_check(i2400m, &msg_hdr->pld[i],
- pl_itr, skb->len);
+ pl_itr, skb_len);
if (result < 0)
goto error_pl_descr_check;
single_last = num_pls == 1 || i == num_pls - 1;
@@ -1290,16 +1290,16 @@ int i2400m_rx(struct i2400m *i2400m, struct sk_buff *skb)
if (i < i2400m->rx_pl_min)
i2400m->rx_pl_min = i;
i2400m->rx_num++;
- i2400m->rx_size_acc += skb->len;
- if (skb->len < i2400m->rx_size_min)
- i2400m->rx_size_min = skb->len;
- if (skb->len > i2400m->rx_size_max)
- i2400m->rx_size_max = skb->len;
+ i2400m->rx_size_acc += skb_len;
+ if (skb_len < i2400m->rx_size_min)
+ i2400m->rx_size_min = skb_len;
+ if (skb_len > i2400m->rx_size_max)
+ i2400m->rx_size_max = skb_len;
spin_unlock_irqrestore(&i2400m->rx_lock, flags);
error_pl_descr_check:
error_pl_descr_short:
error_msg_hdr_check:
- d_fnend(4, dev, "(i2400m %p skb %p [size %zu]) = %d\n",
+ d_fnend(4, dev, "(i2400m %p skb %p [size %u]) = %d\n",
i2400m, skb, skb_len, result);
return result;
}
--
1.7.3.1
^ permalink raw reply related
* [PATCH 2/3] ATM: iphase, remove sleep-inside-atomic
From: Jiri Slaby @ 2010-10-11 9:26 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-kernel, jirislaby, Chas Williams
In-Reply-To: <1286789218-13976-1-git-send-email-jslaby@suse.cz>
Stanse found that ia_init_one locks a spinlock and inside of that it
calls ia_start which calls:
* request_irq
* tx_init which does kmalloc(GFP_KERNEL)
Both of them can thus sleep and result in a deadlock. I don't see a
reason to have a per-device spinlock there which is used only there
and inited right before the lock location. So remove it completely.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
---
drivers/atm/iphase.c | 6 ------
drivers/atm/iphase.h | 2 +-
2 files changed, 1 insertions(+), 7 deletions(-)
diff --git a/drivers/atm/iphase.c b/drivers/atm/iphase.c
index 8b358d7..9309d47 100644
--- a/drivers/atm/iphase.c
+++ b/drivers/atm/iphase.c
@@ -3156,7 +3156,6 @@ static int __devinit ia_init_one(struct pci_dev *pdev,
{
struct atm_dev *dev;
IADEV *iadev;
- unsigned long flags;
int ret;
iadev = kzalloc(sizeof(*iadev), GFP_KERNEL);
@@ -3188,19 +3187,14 @@ static int __devinit ia_init_one(struct pci_dev *pdev,
ia_dev[iadev_count] = iadev;
_ia_dev[iadev_count] = dev;
iadev_count++;
- spin_lock_init(&iadev->misc_lock);
- /* First fixes first. I don't want to think about this now. */
- spin_lock_irqsave(&iadev->misc_lock, flags);
if (ia_init(dev) || ia_start(dev)) {
IF_INIT(printk("IA register failed!\n");)
iadev_count--;
ia_dev[iadev_count] = NULL;
_ia_dev[iadev_count] = NULL;
- spin_unlock_irqrestore(&iadev->misc_lock, flags);
ret = -EINVAL;
goto err_out_deregister_dev;
}
- spin_unlock_irqrestore(&iadev->misc_lock, flags);
IF_EVENT(printk("iadev_count = %d\n", iadev_count);)
iadev->next_board = ia_boards;
diff --git a/drivers/atm/iphase.h b/drivers/atm/iphase.h
index b2cd20f..077735e 100644
--- a/drivers/atm/iphase.h
+++ b/drivers/atm/iphase.h
@@ -1022,7 +1022,7 @@ typedef struct iadev_t {
struct dle_q rx_dle_q;
struct free_desc_q *rx_free_desc_qhead;
struct sk_buff_head rx_dma_q;
- spinlock_t rx_lock, misc_lock;
+ spinlock_t rx_lock;
struct atm_vcc **rx_open; /* list of all open VCs */
u16 num_rx_desc, rx_buf_sz, rxing;
u32 rx_pkt_ram, rx_tmp_cnt;
--
1.7.3.1
^ permalink raw reply related
* [PATCH 1/3] NET: pch, fix use after free
From: Jiri Slaby @ 2010-10-11 9:26 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-kernel, jirislaby, Masayuki Ohtake
Stanse found that pch_gbe_xmit_frame uses skb after it is freed. Fix
that.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Masayuki Ohtake <masa-korg@dsn.okisemi.com>
---
drivers/net/pch_gbe/pch_gbe_main.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/pch_gbe/pch_gbe_main.c b/drivers/net/pch_gbe/pch_gbe_main.c
index 53c56cf..e44644f 100644
--- a/drivers/net/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/pch_gbe/pch_gbe_main.c
@@ -1847,9 +1847,9 @@ static int pch_gbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
unsigned long flags;
if (unlikely(skb->len > (adapter->hw.mac.max_frame_size - 4))) {
- dev_kfree_skb_any(skb);
pr_err("Transfer length Error: skb len: %d > max: %d\n",
skb->len, adapter->hw.mac.max_frame_size);
+ dev_kfree_skb_any(skb);
adapter->stats.tx_length_errors++;
return NETDEV_TX_OK;
}
--
1.7.3.1
^ permalink raw reply related
* Re: [PATCH 1/1] ATM: mpc, fix use after free
From: Eric Dumazet @ 2010-10-11 8:59 UTC (permalink / raw)
To: Jiri Slaby; +Cc: davem, netdev, linux-atm-general, linux-kernel, jirislaby
In-Reply-To: <1286786794-10410-1-git-send-email-jslaby@suse.cz>
Le lundi 11 octobre 2010 à 10:46 +0200, Jiri Slaby a écrit :
> Stanse found that mpc_push frees skb and then it dereferences it. It
> is a typo, new_skb should be dereferenced there.
>
> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> net/atm/mpc.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/atm/mpc.c b/net/atm/mpc.c
> index 622b471..74bcc66 100644
> --- a/net/atm/mpc.c
> +++ b/net/atm/mpc.c
> @@ -778,7 +778,7 @@ static void mpc_push(struct atm_vcc *vcc, struct sk_buff *skb)
> eg->packets_rcvd++;
> mpc->eg_ops->put(eg);
>
> - memset(ATM_SKB(skb), 0, sizeof(struct atm_skb_data));
> + memset(ATM_SKB(new_skb), 0, sizeof(struct atm_skb_data));
> netif_rx(new_skb);
> }
>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
^ permalink raw reply
* RE: [PATCH net-next 2/5] bnx2x: save cycles in setting gso_size
From: Vladislav Zolotarov @ 2010-10-11 8:53 UTC (permalink / raw)
To: David Miller, Dmitry Kravkov; +Cc: netdev@vger.kernel.org, Eilon Greenstein
In-Reply-To: <20101010.205254.193716921.davem@davemloft.net>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Monday, October 11, 2010 5:53 AM
> To: Dmitry Kravkov
> Cc: netdev@vger.kernel.org; Vladislav Zolotarov; Eilon Greenstein
> Subject: Re: [PATCH net-next 2/5] bnx2x: save cycles in setting
> gso_size
>
> From: "Dmitry Kravkov" <dmitry@broadcom.com>
> Date: Mon, 11 Oct 2010 00:27:55 +0200
>
> > diff --git a/drivers/net/bnx2x/bnx2x_cmn.c
> b/drivers/net/bnx2x/bnx2x_cmn.c
> > index cb2a3d6..ddf90e1 100644
> > --- a/drivers/net/bnx2x/bnx2x_cmn.c
> > +++ b/drivers/net/bnx2x/bnx2x_cmn.c
> > @@ -277,8 +277,7 @@ static int bnx2x_fill_frag_skb(struct bnx2x *bp,
> struct bnx2x_fastpath *fp,
> >
> > /* This is needed in order to enable forwarding support */
> > if (frag_size)
> > - skb_shinfo(skb)->gso_size = min((u32)SGE_PAGE_SIZE,
> > - max(frag_size, (u32)len_on_bd));
> > + skb_shinfo(skb)->gso_size = 1;
> >
>
> I wonder why you need to set this here.
>
> When you pass this packet into the stack, dev_gro_receive() is going
> to set ->gro_size() unconditionally if any GRO processing is going to
> occur at all.
>
> Why do you need to set it at all?
Dave, it's a gSo_size, not a gRo_size and we are handling an LRO skb
here. It's quite confusing, I agree... ;) This is currently the way
to mark an LRO skb in order to properly handle the LRO skbs that
might still be forwarded to the stack short time after the LRO has
been disabled due to enabling the IP forwarding (or if there is a
bug in a driver).
See the skb_warn_if_lro() below:
static inline bool skb_warn_if_lro(const struct sk_buff *skb)
{
/* LRO sets gso_size but not gso_type, whereas if GSO is really
* wanted then gso_type will be set. */
struct skb_shared_info *shinfo = skb_shinfo(skb);
if (skb_is_nonlinear(skb) && shinfo->gso_size != 0 &&
unlikely(shinfo->gso_type == 0)) {
__skb_warn_lro_forwarding(skb);
return true;
}
return false;
}
So, the convention is to set the gSo_size to a none-zero value leaving
the gSo_type zero for an LRO skbs. It's quite hacky but this is what
we have for quite a while and it quite worked so far. ;) To make it
cleaner we might have done the following:
1) added a new member to the SKB_GSO_XXX family enum since we are riding
on the gso_type anyway. In that case there should have been the following
line in the LRO flow:
if (frag_size)
skb_shinfo(skb)->gso_type = SKB_LSO_TYPE;
and skb_warn_if_lro() should have been reworked appropriately.
2) Change the name of the gso_type field to something like skb_type
and this would make the LRO flow crystal clear then but would require
changes in quite a few places.
So, the question is do we want to start all this? ;)
Regarding the current patch itself: we wanted to save the extra
(not needed at all) calculations in the LRO bnx2x flow considering that
current semantics requires the gso_size to be none zero for the LRO skb.
>
> And if we do need it, doesn't every driver that builds fragmented SKBs?
I guess the answer is "yes".
> And if it's correct, why is setting a don't care value like "1" ok and
> will not cause problems to whoever cares about this value?
I disagree with your definition of "don't care" - it's quite "care"... ;)
I guess I've answered this question above but the bottom line would
be: because it's the current semantics to mark the LRO skb and it's
done exactly for reason that there is somebody out there who cares... ;)
Thanks,
vlad
>
> I really want all of these questions answered, and at least lightly
> explained in the commit message before I apply this patch.
>
> Thanks.
^ permalink raw reply
* [PATCH 1/1] ATM: mpc, fix use after free
From: Jiri Slaby @ 2010-10-11 8:46 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-atm-general, linux-kernel, jirislaby, Eric Dumazet
In-Reply-To: <1286785081.2737.2.camel@edumazet-laptop>
Stanse found that mpc_push frees skb and then it dereferences it. It
is a typo, new_skb should be dereferenced there.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
net/atm/mpc.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/atm/mpc.c b/net/atm/mpc.c
index 622b471..74bcc66 100644
--- a/net/atm/mpc.c
+++ b/net/atm/mpc.c
@@ -778,7 +778,7 @@ static void mpc_push(struct atm_vcc *vcc, struct sk_buff *skb)
eg->packets_rcvd++;
mpc->eg_ops->put(eg);
- memset(ATM_SKB(skb), 0, sizeof(struct atm_skb_data));
+ memset(ATM_SKB(new_skb), 0, sizeof(struct atm_skb_data));
netif_rx(new_skb);
}
--
1.7.3.1
^ permalink raw reply related
* Re: [PATCH 4/4] Phonet: mark the pipe controller as EXPERIMENTAL
From: Rémi Denis-Courmont @ 2010-10-11 8:24 UTC (permalink / raw)
To: Kumar SANGHVI; +Cc: netdev
In-Reply-To: <20101011070928.GC31124@bnru01.bnr.st.com>
On Mon, 11 Oct 2010 12:39:29 +0530, Kumar SANGHVI
<kumar.sanghvi@stericsson.com> wrote:
> I have already started implementing 'connect()' socket call for Pipe
> controller logic, as per advise of Rémi Denis-Courmontt. I will upload
> this patch for review once I have validated GPRS/3G path (control/data)
> to be working fine.
>
> Hopefully, with all the reviews and suggestions shared on the mailing
> list, existing abuse can be removed and new abuses can be avoided.
>
> The only intention of submitting is to make Linux work smoothly with
> newer devices. If there is some improper code getting introduced
> unknowingly, then I will surely work on the review comments to fix it.
I am sure everybody wants to get things right eventually. That's not much
of a concern to me.
However, I am concerned that Linux 2.6.37 ships with an unfinished
implementation, or worse, an unfinished user-space interface.
--
Rémi Denis-Courmont
http://www.remlab.net
http://fi.linkedin.com/in/remidenis
^ permalink raw reply
* Re: [PATCH 1/1] ATM: solos-pci, remove use after free
From: Eric Dumazet @ 2010-10-11 8:19 UTC (permalink / raw)
To: Jiri Slaby
Cc: davem, netdev, linux-atm-general, linux-kernel, jirislaby,
Chas Williams
In-Reply-To: <1286783444-7719-1-git-send-email-jslaby@suse.cz>
Le lundi 11 octobre 2010 à 09:50 +0200, Jiri Slaby a écrit :
> Stanse found we do in console_show:
> kfree_skb(skb);
> return skb->len;
> which is not good. Fix that by remembering the len and use it in the
> function instead.
>
> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
> Cc: Chas Williams <chas@cmf.nrl.navy.mil>
> ---
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
^ permalink raw reply
* Re: [HELP] ATM: mpc, use-after-free
From: Eric Dumazet @ 2010-10-11 8:18 UTC (permalink / raw)
To: Jiri Slaby; +Cc: David S. Miller, ML netdev, linux-atm-general, LKML, chas
In-Reply-To: <4CB2C33C.8080109@gmail.com>
Le lundi 11 octobre 2010 à 09:56 +0200, Jiri Slaby a écrit :
> Hi,
>
> Stanse found this use-after-free:
>
> static void mpc_push(struct atm_vcc *vcc, struct sk_buff *skb)
> {
> ...
> new_skb = skb_realloc_headroom(skb, eg->ctrl_info.DH_length);
>
> dev_kfree_skb_any(skb);
>
> FREE ^^^^^^^^^^^^^^^^^^^^^^^
>
> if (new_skb == NULL) {
> mpc->eg_ops->put(eg);
> return;
> }
> skb_push(new_skb, eg->ctrl_info.DH_length);
> skb_copy_to_linear_data(new_skb, eg->ctrl_info.DLL_header,
> eg->ctrl_info.DH_length);
> ...
> memset(ATM_SKB(skb), 0, sizeof(struct atm_skb_data));
>
> USE ^^^^^^^^^^^^
>
> netif_rx(new_skb);
>
> I guess it should be ATM_SKB(new_skb), right?
Yes
- memset(ATM_SKB(skb), 0, sizeof(struct atm_skb_data));
+ memset(ATM_SKB(new_skb), 0, sizeof(struct atm_skb_data));
>
> The two problems are:
> 1) obvious use-after-free
> 2) ?data leak, since we don't erase the right memory?
>
> thanks,
Indeed, please submit a formal patch ?
Thanks
^ permalink raw reply
* [HELP] ATM: mpc, use-after-free
From: Jiri Slaby @ 2010-10-11 7:56 UTC (permalink / raw)
To: David S. Miller; +Cc: ML netdev, linux-atm-general, LKML, chas
Hi,
Stanse found this use-after-free:
static void mpc_push(struct atm_vcc *vcc, struct sk_buff *skb)
{
...
new_skb = skb_realloc_headroom(skb, eg->ctrl_info.DH_length);
dev_kfree_skb_any(skb);
FREE ^^^^^^^^^^^^^^^^^^^^^^^
if (new_skb == NULL) {
mpc->eg_ops->put(eg);
return;
}
skb_push(new_skb, eg->ctrl_info.DH_length);
skb_copy_to_linear_data(new_skb, eg->ctrl_info.DLL_header,
eg->ctrl_info.DH_length);
...
memset(ATM_SKB(skb), 0, sizeof(struct atm_skb_data));
USE ^^^^^^^^^^^^
netif_rx(new_skb);
I guess it should be ATM_SKB(new_skb), right?
The two problems are:
1) obvious use-after-free
2) ?data leak, since we don't erase the right memory?
thanks,
--
js
^ permalink raw reply
* Re: [PATCH] net/fec: carrier off initially to avoid root mount failure
From: Oskar Schirmer @ 2010-10-11 7:54 UTC (permalink / raw)
To: David Miller; +Cc: oskar, netdev, dan, bigeasy, hjk, gerg, bhutchings
In-Reply-To: <20101010.211956.112597497.davem@davemloft.net>
On Sun, Oct 10, 2010 at 21:19:56 -0700, David Miller wrote:
> From: Oskar Schirmer <oskar@linutronix.de>
> Date: Thu, 7 Oct 2010 14:30:30 +0200
>
> > with hardware slow in negotiation, the system did freeze
> > while trying to mount root on nfs at boot time.
> >
> > the link state has not been initialised so network stack
> > tried to start transmission right away. this caused instant
> > retries, as the driver solely stated business upon link down,
> > rendering the system unusable.
> >
> > notify carrier off initially to prevent transmission until
> > phylib will report link up.
> >
> > Signed-off-by: Oskar Schirmer <oskar@linutronix.de>
>
> I did some more investigation into this situation, and for now I'm
> going to apply your patch. It is correct, and it also matches what
> the only other seemingly correct driver I could find using phylib does
> (gianfar) :-) Actually, although I didn't check, bi-modal drivers
> (those that only use phylib for some phy types) like tg3 probably do
> the right thing here too.
>
> Longer term I think the right thing to do might be:
>
> 1) Create some notion of "network device has managed carrier"
>
> This could simply be a flag bit in the netdev or netdev_ops,
> or some other kind of attribute.
>
> 2) Managed carrier devices start with netif_carrier_off(), otherwise
> the device starts with netif_carrier_on().
This last conditional (managed vs otherwise) would be implicit
with a null PHY driver as Ben Hutchings proposes it to Greg Ungerers
"allow FEC driver to not have attached PHY", 2010/10/07,
with the null PHY simply switching to netif_carrier_on right after
machine start.
Otherwise my patch would need another #ifdef to live in
peace with Gregs patch.
^ permalink raw reply
* [PATCH 1/1] ATM: solos-pci, remove use after free
From: Jiri Slaby @ 2010-10-11 7:50 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-atm-general, linux-kernel, jirislaby, Chas Williams
Stanse found we do in console_show:
kfree_skb(skb);
return skb->len;
which is not good. Fix that by remembering the len and use it in the
function instead.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
---
drivers/atm/solos-pci.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index f916ddf..f46138a 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -444,6 +444,7 @@ static ssize_t console_show(struct device *dev, struct device_attribute *attr,
struct atm_dev *atmdev = container_of(dev, struct atm_dev, class_dev);
struct solos_card *card = atmdev->dev_data;
struct sk_buff *skb;
+ unsigned int len;
spin_lock(&card->cli_queue_lock);
skb = skb_dequeue(&card->cli_queue[SOLOS_CHAN(atmdev)]);
@@ -451,11 +452,12 @@ static ssize_t console_show(struct device *dev, struct device_attribute *attr,
if(skb == NULL)
return sprintf(buf, "No data.\n");
- memcpy(buf, skb->data, skb->len);
- dev_dbg(&card->dev->dev, "len: %d\n", skb->len);
+ len = skb->len;
+ memcpy(buf, skb->data, len);
+ dev_dbg(&card->dev->dev, "len: %d\n", len);
kfree_skb(skb);
- return skb->len;
+ return len;
}
static int send_command(struct solos_card *card, int dev, const char *buf, size_t size)
--
1.7.3.1
^ permalink raw reply related
* Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Krishna Kumar2 @ 2010-10-11 7:21 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: anthony, arnd, avi, davem, kvm, netdev, rusty
In-Reply-To: <cover.1286372004.git.mst@redhat.com>
"Michael S. Tsirkin" <mst@redhat.com> wrote on 10/06/2010 07:04:31 PM:
> On Fri, Sep 17, 2010 at 03:33:07PM +0530, Krishna Kumar wrote:
> > For 1 TCP netperf, I ran 7 iterations and summed it. Explanation
> > for degradation for 1 stream case:
>
> I thought about possible RX/TX contention reasons, and I realized that
> we get/put the mm counter all the time. So I write the following: I
> haven't seen any performance gain from this in a single queue case, but
> maybe this will help multiqueue?
Sorry for the delay, I was sick last couple of days. The results
with your patch are (%'s over original code):
Code BW% CPU% RemoteCPU
MQ (#txq=16) 31.4% 38.42% 6.41%
MQ+MST (#txq=16) 28.3% 18.9% -10.77%
The patch helps CPU utilization but didn't help single stream
drop.
Thanks,
- KK
^ permalink raw reply
* Re: [PATCH 4/4] Phonet: mark the pipe controller as EXPERIMENTAL
From: Kumar SANGHVI @ 2010-10-11 7:09 UTC (permalink / raw)
To: Rémi Denis-Courmont; +Cc: netdev@vger.kernel.org, Rémi Denis-Courmont
In-Reply-To: <1286546523-3340-4-git-send-email-remi@remlab.net>
On Fri, Oct 08, 2010 at 16:02:03 +0200, Rémi Denis-Courmont wrote:
> From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
>
> There are a bunch of issues that need to be fixed, including:
> - GFP_KERNEL allocations from atomic context
> (and GFP_ATOMIC in process context),
> - abuse of the setsockopt() call convention,
> - unprotected/unlocked static variables...
>
> IMHO, we will need to alter the userspace ABI when we fix it. So mark
> the configuration option as EXPERIMENTAL for the time being (or should
> it be BROKEN instead?).
>
> Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Thank you for this patch.
I have already started implementing 'connect()' socket call for Pipe
controller logic, as per advise of Rémi Denis-Courmontt. I will upload
this patch for review once I have validated GPRS/3G path (control/data)
to be working fine.
Hopefully, with all the reviews and suggestions shared on the mailing
list, existing abuse can be removed and new abuses can be avoided.
The only intention of submitting is to make Linux work smoothly with
newer devices. If there is some improper code getting introduced
unknowingly, then I will surely work on the review comments to fix it.
Thank you all!
Best regards,
Kumar.
^ permalink raw reply
* RE: [PATCH v12 10/17] Add a hook to intercept external buffers from NIC driver.
From: Xin, Xiaohui @ 2010-10-11 7:06 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
davem@davemloft.net, herbert@gondor.apana.org.au,
jdike@linux.intel.com
In-Reply-To: <1285856533.2615.573.camel@edumazet-laptop>
>-----Original Message-----
>From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>Sent: Thursday, September 30, 2010 10:22 PM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; davem@davemloft.net; herbert@gondor.apana.org.au;
>jdike@linux.intel.com
>Subject: Re: [PATCH v12 10/17] Add a hook to intercept external buffers from NIC driver.
>
>Le jeudi 30 septembre 2010 à 22:04 +0800, xiaohui.xin@intel.com a
>écrit :
>> From: Xin Xiaohui <xiaohui.xin@intel.com>
>>
>> The hook is called in netif_receive_skb().
>> Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
>> Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
>> Reviewed-by: Jeff Dike <jdike@linux.intel.com>
>> ---
>> net/core/dev.c | 35 +++++++++++++++++++++++++++++++++++
>> 1 files changed, 35 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index c11e32c..83172b8 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2517,6 +2517,37 @@ err:
>> EXPORT_SYMBOL(netdev_mp_port_prep);
>> #endif
>>
>> +#if defined(CONFIG_MEDIATE_PASSTHRU) ||
>defined(CONFIG_MEDIATE_PASSTHRU_MODULE)
>> +/* Add a hook to intercept mediate passthru(zero-copy) packets,
>> + * and insert it to the socket queue owned by mp_port specially.
>> + */
>> +static inline struct sk_buff *handle_mpassthru(struct sk_buff *skb,
>> + struct packet_type **pt_prev,
>> + int *ret,
>> + struct net_device *orig_dev)
>> +{
>> + struct mp_port *mp_port = NULL;
>> + struct sock *sk = NULL;
>> +
>> + if (!dev_is_mpassthru(skb->dev))
>> + return skb;
>> + mp_port = skb->dev->mp_port;
>> +
>> + if (*pt_prev) {
>> + *ret = deliver_skb(skb, *pt_prev, orig_dev);
>> + *pt_prev = NULL;
>> + }
>> +
>> + sk = mp_port->sock->sk;
>> + skb_queue_tail(&sk->sk_receive_queue, skb);
>> + sk->sk_state_change(sk);
>> +
>> + return NULL;
>> +}
>> +#else
>> +#define handle_mpassthru(skb, pt_prev, ret, orig_dev) (skb)
>> +#endif
>> +
>> /**
>> * netif_receive_skb - process receive buffer from network
>> * @skb: buffer to process
>> @@ -2598,6 +2629,10 @@ int netif_receive_skb(struct sk_buff *skb)
>> ncls:
>> #endif
>>
>> + /* To intercept mediate passthru(zero-copy) packets here */
>> + skb = handle_mpassthru(skb, &pt_prev, &ret, orig_dev);
>> + if (!skb)
>> + goto out;
>> skb = handle_bridge(skb, &pt_prev, &ret, orig_dev);
>> if (!skb)
>> goto out;
>
>This does not apply to current net-next-2.6
>
>We now have dev->rx_handler (currently for bridge or macvlan)
>
>
Ok. Thanks, will rebase to fix that.
Thanks
Xiaohui
>commit ab95bfe01f9872459c8678572ccadbf646badad0
>Author: Jiri Pirko <jpirko@redhat.com>
>Date: Tue Jun 1 21:52:08 2010 +0000
>
> net: replace hooks in __netif_receive_skb V5
>
> What this patch does is it removes two receive frame hooks (for bridge and for
> macvlan) from __netif_receive_skb. These are replaced them with a single
> hook for both. It only supports one hook per device because it makes no
> sense to do bridging and macvlan on the same device.
>
> Then a network driver (of virtual netdev like macvlan or bridge) can register
> an rx_handler for needed net device.
>
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
^ permalink raw reply
* RE: [PATCH v12 06/17] Use callback to deal with skb_release_data() specially.
From: Xin, Xiaohui @ 2010-10-11 7:06 UTC (permalink / raw)
To: David Miller
Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
herbert@gondor.apana.org.au, jdike@linux.intel.com
In-Reply-To: <20101001.001439.85411154.davem@davemloft.net>
>-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net]
>Sent: Friday, October 01, 2010 3:15 PM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; herbert@gondor.apana.org.au; jdike@linux.intel.com
>Subject: Re: [PATCH v12 06/17] Use callback to deal with skb_release_data() specially.
>
>From: xiaohui.xin@intel.com
>Date: Thu, 30 Sep 2010 22:04:23 +0800
>
>> @@ -197,10 +197,11 @@ struct skb_shared_info {
>> union skb_shared_tx tx_flags;
>> struct sk_buff *frag_list;
>> struct skb_shared_hwtstamps hwtstamps;
>> - skb_frag_t frags[MAX_SKB_FRAGS];
>> /* Intermediate layers must ensure that destructor_arg
>> * remains valid until skb destructor */
>> void * destructor_arg;
>> +
>> + skb_frag_t frags[MAX_SKB_FRAGS];
>> };
>>
>> /* The structure is for a skb which pages may point to
>
>Why are you moving frags[] to the end like this?
That's to avoid the new cache miss caused by using destructor_arg in data path
like skb_release_data().
That's based on the comment from Eric Dumazet on v7 patches.
Thanks
Xiaohui
^ permalink raw reply
* Re: [PATCH 3/4] Phonet: cleanup pipe enable socket option
From: Kumar SANGHVI @ 2010-10-11 6:50 UTC (permalink / raw)
To: Rémi Denis-Courmont; +Cc: netdev@vger.kernel.org, Rémi Denis-Courmont
In-Reply-To: <1286546523-3340-3-git-send-email-remi@remlab.net>
Hi,
On Fri, Oct 08, 2010 at 16:02:02 +0200, Rémi Denis-Courmont wrote:
> From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
>
> The current code works like this:
>
> int garbage, status;
> socklen_t len = sizeof(status);
>
> /* enable pipe */
> setsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &garbage, sizeof(garbage));
> /* disable pipe */
> setsockopt(fd, SOL_PNPIPE, PNPIPE_DISABLE, &garbage, sizeof(garbage));
> /* get status */
> getsockopt(fd, SOL_PNPIPE, PNPIPE_INQ, &status, &len);
>
> ...which does not follow the usual socket option pattern. This patch
> merges all three "options" into a single gettable&settable option,
> before Linux 2.6.37 gets out:
>
> int status;
> socklen_t len = sizeof(status);
>
> /* enable pipe */
> status = 1;
> setsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &status, sizeof(status));
> /* disable pipe */
> status = 0;
> setsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &status, sizeof(status));
> /* get status */
> getsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &status, &len);
>
> This also fixes the error code from EFAULT to ENOTCONN.
>
> Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
> Cc: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Thank you!
BR,
Kumar.
^ permalink raw reply
* Re: [PATCH 2/4] Phonet: advise against enabling the pipe controller
From: Kumar SANGHVI @ 2010-10-11 6:49 UTC (permalink / raw)
To: Rémi Denis-Courmont; +Cc: netdev@vger.kernel.org, Rémi Denis-Courmont
In-Reply-To: <1286546523-3340-2-git-send-email-remi@remlab.net>
Hi,
On Fri, Oct 08, 2010 at 16:02:01 +0200, Rémi Denis-Courmont wrote:
> From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
>
> As it currently is, the new code path is not compatible with existing
> Nokia modems. This would break existing userspace for Nokia modem, such
> as the existing oFono ISI driver.
>
> Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Yes, the pipe controller logic is not intended for existing Nokia modems
that already implement pipe controller inside them.
Thank you for putting correct warning message.
^ permalink raw reply
* [PATCH net-2.6] tg3: restore rx_dropped accounting
From: Eric Dumazet @ 2010-10-11 5:55 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Matt Carlson, Michael Chan
commit 511d22247be7 (tg3: 64 bit stats on all arches), overlooked the
rx_dropped accounting.
We use a full "struct rtnl_link_stats64" to hold rx_dropped value, but
forgot to report it in tg3_get_stats64().
Use an "unsigned long" instead to shrink "struct tg3" by 176 bytes, and
report this value to stats readers.
Increment rx_dropped counter for oversized frames.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Matt Carlson <mcarlson@broadcom.com>
---
drivers/net/tg3.c | 6 ++++--
drivers/net/tg3.h | 2 +-
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index bc3af78..1ec4b9e 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4666,7 +4666,7 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
desc_idx, *post_ptr);
drop_it_no_recycle:
/* Other statistics kept track of by card. */
- tp->net_stats.rx_dropped++;
+ tp->rx_dropped++;
goto next_pkt;
}
@@ -4726,7 +4726,7 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
if (len > (tp->dev->mtu + ETH_HLEN) &&
skb->protocol != htons(ETH_P_8021Q)) {
dev_kfree_skb(skb);
- goto next_pkt;
+ goto drop_it_no_recycle;
}
if (desc->type_flags & RXD_FLAG_VLAN &&
@@ -9240,6 +9240,8 @@ static struct rtnl_link_stats64 *tg3_get_stats64(struct net_device *dev,
stats->rx_missed_errors = old_stats->rx_missed_errors +
get_stat64(&hw_stats->rx_discards);
+ stats->rx_dropped = tp->rx_dropped;
+
return stats;
}
diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h
index 4937bd1..be7ff13 100644
--- a/drivers/net/tg3.h
+++ b/drivers/net/tg3.h
@@ -2759,7 +2759,7 @@ struct tg3 {
/* begin "everything else" cacheline(s) section */
- struct rtnl_link_stats64 net_stats;
+ unsigned long rx_dropped;
struct rtnl_link_stats64 net_stats_prev;
struct tg3_ethtool_stats estats;
struct tg3_ethtool_stats estats_prev;
^ permalink raw reply related
* Re: [PATCH] net/fec: carrier off initially to avoid root mount failure
From: David Miller @ 2010-10-11 4:19 UTC (permalink / raw)
To: oskar; +Cc: netdev, dan, bigeasy, hjk
In-Reply-To: <1286454630-7396-1-git-send-email-oskar@linutronix.de>
From: Oskar Schirmer <oskar@linutronix.de>
Date: Thu, 7 Oct 2010 14:30:30 +0200
> with hardware slow in negotiation, the system did freeze
> while trying to mount root on nfs at boot time.
>
> the link state has not been initialised so network stack
> tried to start transmission right away. this caused instant
> retries, as the driver solely stated business upon link down,
> rendering the system unusable.
>
> notify carrier off initially to prevent transmission until
> phylib will report link up.
>
> Signed-off-by: Oskar Schirmer <oskar@linutronix.de>
I did some more investigation into this situation, and for now I'm
going to apply your patch. It is correct, and it also matches what
the only other seemingly correct driver I could find using phylib does
(gianfar) :-) Actually, although I didn't check, bi-modal drivers
(those that only use phylib for some phy types) like tg3 probably do
the right thing here too.
Longer term I think the right thing to do might be:
1) Create some notion of "network device has managed carrier"
This could simply be a flag bit in the netdev or netdev_ops,
or some other kind of attribute.
2) Managed carrier devices start with netif_carrier_off(), otherwise
the device starts with netif_carrier_on().
Then we gut all of the probe time netif_carrir_*() calls in all
of the drivers.
And hopefully it's less error prone than it is right now.
^ permalink raw reply
* Re: [PATCH] atl1c: Add support for Atheros AR8152 and AR8152
From: David Miller @ 2010-10-11 4:03 UTC (permalink / raw)
To: ben; +Cc: lrodriguez, netdev
In-Reply-To: <1286759930.2955.285.camel@localhost>
From: Ben Hutchings <ben@decadent.org.uk>
Date: Mon, 11 Oct 2010 02:18:50 +0100
> Your commit 496c185c9495629ef1c65387cb2594578393cfe0 "atl1c: Add support
> for Atheros AR8152 and AR8152" included the following changes:
...
>> + if (hw->nic_type == athr_l1c || hw->nic_type == athr_l2c_b) {
...
>> + if ((hw->nic_type == athr_l1c || hw->nic_type == athr_l2c)) {
...
> Shouldn't the first if-statement use the same condition as the second
> i.e. matching the previously-defined hardware types athr_l1c and
> athr_l2c?
Yeah that definitely looks like a bug to me.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox