* Re: [RFC] IFLA_PORT_* iproute2 cmd line
From: Chris Wright @ 2010-05-26 15:56 UTC (permalink / raw)
To: Stefan Berger
Cc: Arnd Bergmann, Chris Wright, Dirk Herrendoerfer, netdev,
Scott Feldman, Stephen Hemminger, Vivek Kashyap
In-Reply-To: <OFCF88A167.122DD206-ON8525772F.00470999-8525772F.0047F4A5@us.ibm.com>
* Stefan Berger (stefanb@us.ibm.com) wrote:
> When I have libvirt talk to my dummy server, I pass the getpid() of
> libvirt into the request message, just to fulfill the requirement of it
> being != 0 so the kernel driver doesn't process the msg. I also don't
> want to have to determine the pid of the target process. Now I believe
> the problem with that *may* be that multiple daemons may want to process
> the message,
What other daemon do you expect to be listening besides lldpad?
thanks,
-chris
^ permalink raw reply
* Re: [PATCH] net/core: use net_device dev_id to indicate port number
From: Eli Cohen @ 2010-05-26 16:09 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Eli Cohen, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
rdreier-FYB4Gu1CFyUAvxtiuMwx3w, yevgenyp-VPRAkNaXOzVS1MOuV/RT9w
In-Reply-To: <20100526083318.5b7b7704@nehalam>
On Wed, May 26, 2010 at 08:33:18AM -0700, Stephen Hemminger wrote:
>
> SET_NETDEV_DEV macro exists because at the time 2.5 kernel was being developed
> it was important to be able to maintain source compatibility between 2.4 and
> 2.6 (nee 2.5) drivers. Since 2.4 did not have sysfs, the macro was a mechanism
> to allow the same code to run on both kernel versions.
>
> Your situation is different, just use dev_id and update documentation if
> you need to.
>
OK, great. Things get much much simpler :-)
I'll send another patch for mlx4_en only.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC] IFLA_PORT_* iproute2 cmd line
From: Arnd Bergmann @ 2010-05-26 16:15 UTC (permalink / raw)
To: Stefan Berger
Cc: Chris Wright, Dirk Herrendoerfer, netdev, Scott Feldman,
Stephen Hemminger, Vivek Kashyap
In-Reply-To: <OFB3363EDA.945755DA-ON8525772F.0057CBD1-8525772F.00589D2B@us.ibm.com>
On Wednesday 26 May 2010, Stefan Berger wrote:
> I can start my dummy netlink 'server' twice and have two recipients of the
> libvirt
> message and thus two servers can (maliciously) respond. Well, netlink
> doesn't
> seem to be as 'directed' as unixio. We could run the netlink type of
> messages
libvirt can still check if the sender pid of the response is what it
thinks it should be.
Arnd
^ permalink raw reply
* [REGRESSION,BISECTED] MIPv6 support broken by f4f914b58019f0
From: Arnaud Ebalard @ 2010-05-26 17:01 UTC (permalink / raw)
To: David Miller
Cc: YOSHIFUJI Hideaki / 吉藤英明, Jiri Olsa,
Scott Otto, netdev
Hi,
I just updated my laptop's kernel to 2.6.34 (previously running .33 and
configured to act as an IPsec/IKE-protected MIPv6 Mobile Node using
racoon and umip): after rebooting on the new kernel, the transport mode
SA protecting MIPv6 signaling traffic are missing.
I bisected the issue down to f4f914b58019f0e50d521bbbadfaee260d766f95
(net: ipv6 bind to device issue) which was added after 2.6.34-rc5:
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c2438e8..05ebd78 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -815,7 +815,7 @@ struct dst_entry * ip6_route_output(struct net *net, struct sock *sk,
{
int flags = 0;
- if (rt6_need_strict(&fl->fl6_dst))
+ if (fl->oif || rt6_need_strict(&fl->fl6_dst))
flags |= RT6_LOOKUP_F_IFACE;
if (!ipv6_addr_any(&fl->fl6_src))
Reverting the patch on a 2.6.34 gives me a working kernel.
With MIPv6, the Home Address is bound to a tunnel interface but the
routing/XFRM code will not always send packet via this virtual device
(in fact, I would say never when IPsec is used for protecting signaling
and data traffic):
- Signaling traffic will be sent using a Care-of Address from another
interface (with the addition of a Home Address Option in a
Destination Option Header)
- Data traffic (when protected by tunnel mode IPsec) will also be sent
via another interface.
I *suspect* that previous commit somehow changes the lose coupling
between the address and the device to enforce a strict routing via
associated interface.
I will try and take a look at the code tomorrow to understand what
really happens but if someone has ideas, I am interested.
Cheers,
a+
ps: I use the same working setup for all kernels since 2.6.28
^ permalink raw reply related
* Re: [PATCH] tcp: Socket option to set congestion window
From: Andi Kleen @ 2010-05-26 17:33 UTC (permalink / raw)
To: Tom Herbert; +Cc: David Miller, shemminger, netdev, ycheng
In-Reply-To: <AANLkTinHGMtfw4Oydfgx0w7QLb-HyYSKdI-4smD-BEkq@mail.gmail.com>
Tom Herbert <therbert@google.com> writes:
>>
> Thanks to NAT, the concept of a network path or even host specific
> path is a weakened concept. On the Internet this may be a path
> characteristic per client, which unfortunately has no visibility in
> the kernel other than per connection state. When a single IP address
> may have thousands of hosts behind it, caching TCP parameters for that
> IP address is implicitly doing a huge aggregation-- probably dicey...
Yes all of Saudi-Arabia used to be (is?) one IP address...
Caching anything per IP is bogus.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply
* Re: [PATCH] tcp: Socket option to set congestion window
From: Denys Fedorysychenko @ 2010-05-26 17:41 UTC (permalink / raw)
To: Andi Kleen; +Cc: Tom Herbert, David Miller, shemminger, netdev, ycheng
In-Reply-To: <87fx1e1sat.fsf@basil.nowhere.org>
On Wednesday 26 May 2010 20:33:46 Andi Kleen wrote:
> Tom Herbert <therbert@google.com> writes:
> > Thanks to NAT, the concept of a network path or even host specific
> > path is a weakened concept. On the Internet this may be a path
> > characteristic per client, which unfortunately has no visibility in
> > the kernel other than per connection state. When a single IP address
> > may have thousands of hosts behind it, caching TCP parameters for that
> > IP address is implicitly doing a huge aggregation-- probably dicey...
>
> Yes all of Saudi-Arabia used to be (is?) one IP address...
>
> Caching anything per IP is bogus.
>
> -Andi
>
In Lebanon i have around 30k users behind few IP addresses(around 6) (for
web).
Because backbone here $1200/Mbit, and satellites mostly(rtt 400+ ms)... so TCP
accelerators and caching proxy a must. Tproxy doesn't work well yet to use
full set of ip's.
And no local google/youtube servers, so maybe i'm affected by something? :-)
^ permalink raw reply
* [PATCH 2/2] net: ll_temac: fix checksum offload logic
From: John Linn @ 2010-05-26 17:29 UTC (permalink / raw)
To: netdev, linuxppc-dev, grant.likely, jwboyer
Cc: john.williams, michal.simek, John Linn, Brian Hill
In-Reply-To: <1274894959-27473-1-git-send-email-john.linn@xilinx.com>
The current checksum offload code does not work and this corrects
that functionality. It also updates the interrupt coallescing
initialization so than there are fewer interrupts and performance
is increased.
Signed-off-by: Brian Hill <brian.hill@xilinx.com>
Signed-off-by: John Linn <john.linn@xilinx.com>
---
drivers/net/ll_temac.h | 5 +++
drivers/net/ll_temac_main.c | 82 ++++++++++++++++++++++++++++++------------
2 files changed, 63 insertions(+), 24 deletions(-)
diff --git a/drivers/net/ll_temac.h b/drivers/net/ll_temac.h
index c033584..2e0c9cc 100644
--- a/drivers/net/ll_temac.h
+++ b/drivers/net/ll_temac.h
@@ -295,6 +295,10 @@ This option defaults to enabled (set) */
#define MULTICAST_CAM_TABLE_NUM 4
+/* TEMAC Synthesis features */
+#define TEMAC_FEATURE_RX_CSUM (1 << 0)
+#define TEMAC_FEATURE_TX_CSUM (1 << 1)
+
/* TX/RX CURDESC_PTR points to first descriptor */
/* TX/RX TAILDESC_PTR points to last descriptor in linked list */
@@ -353,6 +357,7 @@ struct temac_local {
struct mutex indirect_mutex;
u32 options; /* Current options word */
int last_link;
+ unsigned temac_features;
/* Buffer descriptors */
struct cdmac_bd *tx_bd_v;
diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
index 0615737..52dcc84 100644
--- a/drivers/net/ll_temac_main.c
+++ b/drivers/net/ll_temac_main.c
@@ -245,7 +245,7 @@ static int temac_dma_bd_init(struct net_device *ndev)
CHNL_CTRL_IRQ_COAL_EN);
/* 0x10220483 */
/* 0x00100483 */
- lp->dma_out(lp, RX_CHNL_CTRL, 0xff010000 |
+ lp->dma_out(lp, RX_CHNL_CTRL, 0xff070000 |
CHNL_CTRL_IRQ_EN |
CHNL_CTRL_IRQ_DLY_EN |
CHNL_CTRL_IRQ_COAL_EN |
@@ -574,6 +574,10 @@ static void temac_start_xmit_done(struct net_device *ndev)
if (cur_p->app4)
dev_kfree_skb_irq((struct sk_buff *)cur_p->app4);
cur_p->app0 = 0;
+ cur_p->app1 = 0;
+ cur_p->app2 = 0;
+ cur_p->app3 = 0;
+ cur_p->app4 = 0;
ndev->stats.tx_packets++;
ndev->stats.tx_bytes += cur_p->len;
@@ -589,6 +593,29 @@ static void temac_start_xmit_done(struct net_device *ndev)
netif_wake_queue(ndev);
}
+static inline int temac_check_tx_bd_space(struct temac_local *lp, int num_frag)
+{
+ struct cdmac_bd *cur_p;
+ int tail;
+
+ tail = lp->tx_bd_tail;
+ cur_p = &lp->tx_bd_v[tail];
+
+ do {
+ if (cur_p->app0)
+ return NETDEV_TX_BUSY;
+
+ tail++;
+ if (tail >= TX_BD_NUM)
+ tail = 0;
+
+ cur_p = &lp->tx_bd_v[tail];
+ num_frag--;
+ } while (num_frag >= 0);
+
+ return 0;
+}
+
static int temac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
{
struct temac_local *lp = netdev_priv(ndev);
@@ -603,7 +630,7 @@ static int temac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
start_p = lp->tx_bd_p + sizeof(*lp->tx_bd_v) * lp->tx_bd_tail;
cur_p = &lp->tx_bd_v[lp->tx_bd_tail];
- if (cur_p->app0 & STS_CTRL_APP0_CMPLT) {
+ if (temac_check_tx_bd_space(lp, num_frag)) {
if (!netif_queue_stopped(ndev)) {
netif_stop_queue(ndev);
return NETDEV_TX_BUSY;
@@ -613,29 +640,14 @@ static int temac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
cur_p->app0 = 0;
if (skb->ip_summed == CHECKSUM_PARTIAL) {
- const struct iphdr *ip = ip_hdr(skb);
- int length = 0, start = 0, insert = 0;
-
- switch (ip->protocol) {
- case IPPROTO_TCP:
- start = sizeof(struct iphdr) + ETH_HLEN;
- insert = sizeof(struct iphdr) + ETH_HLEN + 16;
- length = ip->tot_len - sizeof(struct iphdr);
- break;
- case IPPROTO_UDP:
- start = sizeof(struct iphdr) + ETH_HLEN;
- insert = sizeof(struct iphdr) + ETH_HLEN + 6;
- length = ip->tot_len - sizeof(struct iphdr);
- break;
- default:
- break;
- }
- cur_p->app1 = ((start << 16) | insert);
- cur_p->app2 = csum_tcpudp_magic(ip->saddr, ip->daddr,
- length, ip->protocol, 0);
- skb->data[insert] = 0;
- skb->data[insert + 1] = 0;
+ unsigned int csum_start_off = skb_transport_offset(skb);
+ unsigned int csum_index_off = csum_start_off + skb->csum_offset;
+
+ cur_p->app0 |= 1; /* TX Checksum Enabled */
+ cur_p->app1 = (csum_start_off << 16) | csum_index_off;
+ cur_p->app2 = 0; /* initial checksum seed */
}
+
cur_p->app0 |= STS_CTRL_APP0_SOP;
cur_p->len = skb_headlen(skb);
cur_p->phys = dma_map_single(ndev->dev.parent, skb->data, skb->len,
@@ -699,6 +711,15 @@ static void ll_temac_recv(struct net_device *ndev)
skb->protocol = eth_type_trans(skb, ndev);
skb->ip_summed = CHECKSUM_NONE;
+ /* if we're doing rx csum offload, set it up */
+ if (((lp->temac_features & TEMAC_FEATURE_RX_CSUM) != 0) &&
+ (skb->protocol == __constant_htons(ETH_P_IP)) &&
+ (skb->len > 64)) {
+
+ skb->csum = cur_p->app3 & 0xFFFF;
+ skb->ip_summed = CHECKSUM_COMPLETE;
+ }
+
netif_rx(skb);
ndev->stats.rx_packets++;
@@ -883,6 +904,7 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
struct temac_local *lp;
struct net_device *ndev;
const void *addr;
+ __be32 *p;
int size, rc = 0;
/* Init network device structure */
@@ -926,6 +948,18 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
goto nodev;
}
+ /* Setup checksum offload, but default to off if not specified */
+ lp->temac_features = 0;
+ p = (__be32 *)of_get_property(op->dev.of_node, "xlnx,txcsum", NULL);
+ if (p && be32_to_cpu(*p)) {
+ lp->temac_features |= TEMAC_FEATURE_TX_CSUM;
+ /* Can checksum TCP/UDP over IPv4. */
+ ndev->features |= NETIF_F_IP_CSUM;
+ }
+ p = (__be32 *)of_get_property(op->dev.of_node, "xlnx,rxcsum", NULL);
+ if (p && be32_to_cpu(*p))
+ lp->temac_features |= TEMAC_FEATURE_RX_CSUM;
+
/* Find the DMA node, map the DMA registers, and decode the DMA IRQs */
np = of_parse_phandle(op->dev.of_node, "llink-connected", 0);
if (!np) {
--
1.6.2.1
This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.
^ permalink raw reply related
* [PATCH 1/2] net: ll_temac: fix interrupt bug when interrupt 0 is used
From: John Linn @ 2010-05-26 17:29 UTC (permalink / raw)
To: netdev, linuxppc-dev, grant.likely, jwboyer
Cc: john.williams, michal.simek, John Linn, Brian Hill
The code is not checking the interrupt for DMA correctly so that an
interrupt number of 0 will cause a false error.
Signed-off-by: Brian Hill <brian.hill@xilinx.com>
Signed-off-by: John Linn <john.linn@xilinx.com>
---
drivers/net/ll_temac_main.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
index fa7620e..0615737 100644
--- a/drivers/net/ll_temac_main.c
+++ b/drivers/net/ll_temac_main.c
@@ -950,7 +950,7 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
lp->rx_irq = irq_of_parse_and_map(np, 0);
lp->tx_irq = irq_of_parse_and_map(np, 1);
- if (!lp->rx_irq || !lp->tx_irq) {
+ if ((lp->rx_irq == NO_IRQ) || (lp->tx_irq == NO_IRQ)) {
dev_err(&op->dev, "could not determine irqs\n");
rc = -ENOMEM;
goto nodev;
--
1.6.2.1
This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.
^ permalink raw reply related
* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
From: Stanislaw Gruszka @ 2010-05-26 18:51 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-atm-general, netdev
In-Reply-To: <1274872584.20576.13579.camel@macbook.infradead.org>
On Wed, May 26, 2010 at 12:16:24PM +0100, David Woodhouse wrote:
> The problem is that the 'find_vcc' functions in these drivers are
> returning a vcc with the ATM_VF_READY bit cleared, because it's already
> in the process of being destroyed. If we fix that simple oversight,
> there's still a race condition because the socket can still be closed
> (and completely freed, afaict) between our call to find_vcc() and our
> call to vcc->push() in the RX tasklet.
>
> Here's a patch for solos-pci which should fix it. We prevent the race by
> making the dev->ops->close() function wait for the RX tasklet to
> complete, so it can't still be using the vcc in question.
I do not think this is problem with usbatm. We use custom vcc_list with
custom usbatm_vcc_data entries. We remove entry on atmdev_ops->close()
within tasklet_disable()/tasklet_enable() section. After that
rx tasklet will not find usbatm_vcc_data for corresponding vpi, vpi.
Stanislaw
^ permalink raw reply
* Re: [PATCH 1/17] net/rds: Add missing mutex_unlock
From: Andy Grover @ 2010-05-26 17:55 UTC (permalink / raw)
To: Julia Lawall
Cc: David S. Miller, rds-devel, netdev, linux-kernel, kernel-janitors,
Zach Brown
In-Reply-To: <Pine.LNX.4.64.1005261754030.23743@ask.diku.dk>
Reviewed-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Andy Grover <andy.grover@oracle.com>
-- Andy
On 05/26/2010 08:54 AM, Julia Lawall wrote:
> From: Julia Lawall<julia@diku.dk>
>
> Add a mutex_unlock missing on the error path. In each case, whenever the
> label out is reached from elsewhere in the function, mutex is not locked.
>
> The semantic match that finds this problem is as follows:
> (http://coccinelle.lip6.fr/)
>
> //<smpl>
> @@
> expression E1;
> @@
>
> * mutex_lock(E1);
> <+... when != E1
> if (...) {
> ... when != E1
> * return ...;
> }
> ...+>
> * mutex_unlock(E1);
> //</smpl>
>
> Signed-off-by: Julia Lawall<julia@diku.dk>
>
> ---
> net/rds/ib_cm.c | 1 +
> net/rds/iw_cm.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
> index 10ed0d5..f688327 100644
> --- a/net/rds/ib_cm.c
> +++ b/net/rds/ib_cm.c
> @@ -475,6 +475,7 @@ int rds_ib_cm_handle_connect(struct rdma_cm_id *cm_id,
> err = rds_ib_setup_qp(conn);
> if (err) {
> rds_ib_conn_error(conn, "rds_ib_setup_qp failed (%d)\n", err);
> + mutex_unlock(&conn->c_cm_lock);
> goto out;
> }
>
> diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
> index a9d951b..b5dd6ac 100644
> --- a/net/rds/iw_cm.c
> +++ b/net/rds/iw_cm.c
> @@ -452,6 +452,7 @@ int rds_iw_cm_handle_connect(struct rdma_cm_id *cm_id,
> err = rds_iw_setup_qp(conn);
> if (err) {
> rds_iw_conn_error(conn, "rds_iw_setup_qp failed (%d)\n", err);
> + mutex_unlock(&conn->c_cm_lock);
> goto out;
> }
>
^ permalink raw reply
* Re: [net-next PATCH] be2net: Adding PCI SRIOV support
From: Venugopal Busireddy @ 2010-05-26 17:55 UTC (permalink / raw)
To: netdev
In-Reply-To: <6f264a7a-e5da-494a-a24d-1578ca422807@VA3EHSMHS016.ehs.local>
David,
>> From: Sarveshwar Bandi <sarveshwarb@serverengines.com>
>> Date: Wed, 31 Mar 2010 18:26:12 +0530
>> - Patch adds support to enable PCI SRIOV in the driver and changes to
handle \
>> initialization of PCI virtual functions.
>> - Function handler to change mac addresses for VF from its
corresponding PF.
>>
>> Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
>Applied, thanks.
I checked the 2.6.34 kernel, and could not see this patch applied in
it. In which kernel would this patch be available? Also, if I were to
apply this patch, which kernel version would this patch require?
Thanks,
Venu
^ permalink raw reply
* Re: [PATCH 8/17] net/caif: Add missing spin_unlock
From: Sjur Brændeland @ 2010-05-26 18:13 UTC (permalink / raw)
To: Julia Lawall; +Cc: David S. Miller, netdev, linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005261756150.23743@ask.diku.dk>
Julia Lawall <julia@diku.dk> wrote:
> From: Julia Lawall <julia@diku.dk>
>
> Add a spin_unlock missing on the error path. The spin lock is used in a
> balanced way elsewhere in the file.
This bug is already fixed an part of net-2.6 in
"[PATCH net-2.6 5/6] caif: Bugfix - missing spin_unlock".
Regards
Sjur
^ permalink raw reply
* Re: [patch] caif: unlock on error path in cfserl_receive()
From: Sjur Brændeland @ 2010-05-26 18:18 UTC (permalink / raw)
To: Dan Carpenter; +Cc: Sjur Braendeland, David S. Miller, netdev, kernel-janitors
In-Reply-To: <20100526151648.GA15439@bicker>
Hi Dan.
Dan Carpenter <error27@gmail.com> wrote:
> There was an spin_unlock missing on the error path. The spin_lock was
> tucked in with the declarations so it was hard to spot. I added a new
> line.
>
> Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland
Well spotted! Thanks for reviewing and fixing this.
Regards
Sjur
^ permalink raw reply
* RE: NULL Pointer Deference: NFS & Telnet
From: Arce, Abraham @ 2010-05-26 20:19 UTC (permalink / raw)
To: Eric Dumazet, David Miller
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-omap@vger.kernel.org,
tony@atomide.com, Shilimkar, Santosh, Ha, Tristram
In-Reply-To: <1274851741.25136.16.camel@edumazet-laptop>
Thanks Eric, David,
[..]
> > > > - if (skb_shinfo(skb)->nr_frags) {
> > > > + if (skb_shinfo(skb)->nr_frags && skb_has_frags(skb)) {
> > > > int i;
> > > > for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> > > > put_page(skb_shinfo(skb)->frags[i].page);
> > >
> > > skb_shinfo(skb)->nr_frags counts the number of entries contained
> > > in the skb_shinfo(skb)->frags[] array.
> > >
> > > This has nothing to do with the frag list pointer,
> > > skb_shinfo(skb)->frag_list, which is what skb_has_frags()
> > > tests.
> > >
> > > You've got some kind of memory corruption going on and it
> > > appears to have nothing to do with the code paths you're
> > > playing with here.
> >
> > Do you have any recommendation on debugging technique/tool for this memory
> corruption issue?
[..]
> It seems quite strange. You have a skb->nr_frags > 0 value, but a
> frags[i].page = 0 value
>
> You might add following function :
>
> shinfo_check(struct sk_buff *skb)
> {
> struct skb_shared_info *shinfo = skb_shinfo(skb);
> int i;
>
> WARN_ON(shinfo->nr_frags >= MAX_SKB_FRAGS);
> for (i = 0; i < shinfo->nr_frags; i++)
> WARN_ON(!shinfo->frags[i].page);
> }
>
> And call it from various points, to check who corrupts your skb.
By increasing the allocation length of our rx skbuff the corruption issue is fixed... I have increased it by 2... Were we writing outside our boundaries of skb data?
Please let me know about this approach...
diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
index b4fb07a..6da81e1 100644
--- a/drivers/net/ks8851.c
+++ b/drivers/net/ks8851.c
@@ -504,7 +504,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
ks->rc_rxqcr | RXQCR_SDA | RXQCR_ADRFE);
if (rxlen > 0) {
- skb = netdev_alloc_skb(ks->netdev, rxlen + 2 + 8);
+ skb = netdev_alloc_skb(ks->netdev, rxlen + 4 + 8);
if (!skb) {
Best Regards
Abraham
^ permalink raw reply related
* RE: NULL Pointer Deference: NFS & Telnet
From: Eric Dumazet @ 2010-05-26 20:48 UTC (permalink / raw)
To: Arce, Abraham
Cc: David Miller, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-omap@vger.kernel.org, tony@atomide.com, Shilimkar, Santosh,
Ha, Tristram
In-Reply-To: <27F9C60D11D683428E133F85D2BB4A53043E3EE6A3@dlee03.ent.ti.com>
Le mercredi 26 mai 2010 à 15:19 -0500, Arce, Abraham a écrit :
> By increasing the allocation length of our rx skbuff the corruption issue is fixed... I have increased it by 2... Were we writing outside our boundaries of skb data?
>
> Please let me know about this approach...
>
> diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
> index b4fb07a..6da81e1 100644
> --- a/drivers/net/ks8851.c
> +++ b/drivers/net/ks8851.c
> @@ -504,7 +504,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
> ks->rc_rxqcr | RXQCR_SDA | RXQCR_ADRFE);
>
> if (rxlen > 0) {
> - skb = netdev_alloc_skb(ks->netdev, rxlen + 2 + 8);
> + skb = netdev_alloc_skb(ks->netdev, rxlen + 4 + 8);
> if (!skb) {
>
> Best Regards
> Abraham
>
Yes that makes sense, nr_frag is right after the packet (padded to L1
cache size)
But please do the correct allocation ?
Also, we dont need FCS ?
diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
index b4fb07a..05bd312 100644
--- a/drivers/net/ks8851.c
+++ b/drivers/net/ks8851.c
@@ -503,8 +503,9 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
ks8851_wrreg16(ks, KS_RXQCR,
ks->rc_rxqcr | RXQCR_SDA | RXQCR_ADRFE);
- if (rxlen > 0) {
- skb = netdev_alloc_skb(ks->netdev, rxlen + 2 + 8);
+ if (rxlen > 4) {
+ rxlen -= 4;
+ skb = netdev_alloc_skb(ks->netdev, 2 + 8 + ALIGN(rxlen, 4));
if (!skb) {
/* todo - dump frame and move on */
}
@@ -513,7 +514,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
* for the status header and 4 bytes of garbage */
skb_reserve(skb, 2 + 4 + 4);
- rxpkt = skb_put(skb, rxlen - 4) - 8;
+ rxpkt = skb_put(skb, rxlen) - 8;
/* align the packet length to 4 bytes, and add 4 bytes
* as we're getting the rx status header as well */
@@ -526,7 +527,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
netif_rx(skb);
ks->netdev->stats.rx_packets++;
- ks->netdev->stats.rx_bytes += rxlen - 4;
+ ks->netdev->stats.rx_bytes += rxlen;
}
ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr);
^ permalink raw reply related
* Re: [PATCH] tcp: Socket option to set congestion window
From: David Miller @ 2010-05-26 21:08 UTC (permalink / raw)
To: andi; +Cc: therbert, shemminger, netdev, ycheng
In-Reply-To: <87fx1e1sat.fsf@basil.nowhere.org>
From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 26 May 2010 19:33:46 +0200
> Tom Herbert <therbert@google.com> writes:
>>>
>> Thanks to NAT, the concept of a network path or even host specific
>> path is a weakened concept. On the Internet this may be a path
>> characteristic per client, which unfortunately has no visibility in
>> the kernel other than per connection state. When a single IP address
>> may have thousands of hosts behind it, caching TCP parameters for that
>> IP address is implicitly doing a huge aggregation-- probably dicey...
>
> Yes all of Saudi-Arabia used to be (is?) one IP address...
>
> Caching anything per IP is bogus.
And letting the applications choose the CWND is better?!?!
Every single proposal being mentioned in this thread has huge,
obvious, downsides.
Just because there are some cases of people NAT'ing many machines
behind one IP address doesn't mean we kill performance for the rest of
the world (the majority of internet usage btw) by not caching TCP path
characteristics per IP address.
And just because applications open up many sockets to get better TCP
latency and work around per-connection CWND limits DOES NOT mean we
let the application increase the initial CWND so it can abuse this
EVEN MORE and cause EVEN BIGGER problems.
If people have real, sane, ideas about how to attack this problem I am
all ears. But everything proposed here so far is complete and utter
crap.
^ permalink raw reply
* Re: [PATCH] tcp: Socket option to set congestion window
From: Andi Kleen @ 2010-05-26 21:27 UTC (permalink / raw)
To: David Miller; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526.140818.245406045.davem@davemloft.net>
> > Yes all of Saudi-Arabia used to be (is?) one IP address...
> >
> > Caching anything per IP is bogus.
>
> And letting the applications choose the CWND is better?!?!
No I actually agree with you on that. Just saying that
anything that relies on per IP caching is bad too.
As I understand the idea was that the application knows
what flows belong to a single peer and wants to have
a single cwnd for all of those. Perhaps there would
be a way to generalize that to tell it to the kernel.
e.g. have a "peer id" that is known by applications
and the kernel could manage cwnds shared between connections
associated with the same peer id?
Just an idea, I admit I haven't thought very deeply
about this. Feel free to poke holes into it.
-Andi
^ permalink raw reply
* Re: [PATCH] tcp: Socket option to set congestion window
From: David Miller @ 2010-05-26 22:10 UTC (permalink / raw)
To: andi; +Cc: therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526212745.GC24615@basil.fritz.box>
From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 26 May 2010 23:27:45 +0200
> As I understand the idea was that the application knows
> what flows belong to a single peer and wants to have
> a single cwnd for all of those. Perhaps there would
> be a way to generalize that to tell it to the kernel.
>
> e.g. have a "peer id" that is known by applications
> and the kernel could manage cwnds shared between connections
> associated with the same peer id?
>
> Just an idea, I admit I haven't thought very deeply
> about this. Feel free to poke holes into it.
Yes, a CWND "domain" that can include multiple sockets is
something that might gain some traction.
The "domain" could just simply be the tuple {process,peer-IP}
^ permalink raw reply
* Re: ipv6: Add GSO support on forwarding path
From: Herbert Xu @ 2010-05-26 22:16 UTC (permalink / raw)
To: Ralf Baechle; +Cc: David S. Miller, netdev
In-Reply-To: <20100526152019.GA11985@linux-mips.org>
On Wed, May 26, 2010 at 04:20:19PM +0100, Ralf Baechle wrote:
>
> I tested this on top of a 2.6.34 release kernel and asked the user
> experiencing the problem to re-test and the problem still persists.
OK, it looks like my patch doesn't work because the transport
header isn't set on the forwarding path. Here's an updated patch:
ipv6: Add GSO support on forwarding path
Currently we disallow GSO packets on the IPv6 forward path.
This patch fixes this.
Note that I discovered that our existing GSO MTU checks (e.g.,
IPv4 forwarding) are buggy in that they skip the check altogether,
hen they really should be checking gso_size instead.
I have also been lazy here in that I haven't bothered to segment
the GSO packet by hand before generating an ICMP message. Someone
should add that to be 100% correct.
Reported-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7cdfb4d..6ec5238 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2117,6 +2117,13 @@ static inline int skb_is_gso(const struct sk_buff *skb)
return skb_shinfo(skb)->gso_size;
}
+static inline int skb_gso_len(const struct sk_buff *skb)
+{
+ return skb_is_gso(skb) ?
+ skb_shinfo(skb)->gso_size +
+ (skb->csum_start - skb_headroom(skb)) : skb->len;
+}
+
static inline int skb_is_gso_v6(const struct sk_buff *skb)
{
return skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index cd963f6..8904767 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -507,7 +507,7 @@ int ip6_forward(struct sk_buff *skb)
if (mtu < IPV6_MIN_MTU)
mtu = IPV6_MIN_MTU;
- if (skb->len > mtu) {
+ if (skb_gso_len(skb) > mtu) {
/* Again, force OUTPUT device used as source address */
skb->dev = dst->dev;
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related
* Re: [PATCH] tcp: Socket option to set congestion window
From: Rick Jones @ 2010-05-26 22:29 UTC (permalink / raw)
To: andi; +Cc: David Miller, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526.151014.70204145.davem@davemloft.net>
David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Wed, 26 May 2010 23:27:45 +0200
>
>>As I understand the idea was that the application knows
>>what flows belong to a single peer and wants to have
>>a single cwnd for all of those. Perhaps there would
>>be a way to generalize that to tell it to the kernel.
>>
>>e.g. have a "peer id" that is known by applications
>>and the kernel could manage cwnds shared between connections
>>associated with the same peer id?
Then all the app does is say "I'am in peer id foo" right? Is that really that
much different from making the setsockopt() call for a different cwnd value?
Particularly if say the limit were not a global sysctl, but based on the
existing per-route value (perhaps expanded to have a min, max and default?)
>>Just an idea, I admit I haven't thought very deeply
>>about this. Feel free to poke holes into it.
>
> Yes, a CWND "domain" that can include multiple sockets is
> something that might gain some traction.
>
> The "domain" could just simply be the tuple {process,peer-IP}
Name or PID?
rick jones
^ permalink raw reply
* Re: [PATCH] tcp: Socket option to set congestion window
From: Hagen Paul Pfeifer @ 2010-05-26 23:15 UTC (permalink / raw)
To: David Miller; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526.151014.70204145.davem@davemloft.net>
* David Miller | 2010-05-26 15:10:14 [-0700]:
>From: Andi Kleen <andi@firstfloor.org>
>Date: Wed, 26 May 2010 23:27:45 +0200
>
>> As I understand the idea was that the application knows
>> what flows belong to a single peer and wants to have
>> a single cwnd for all of those. Perhaps there would
>> be a way to generalize that to tell it to the kernel.
>>
>> e.g. have a "peer id" that is known by applications
>> and the kernel could manage cwnds shared between connections
>> associated with the same peer id?
>>
>> Just an idea, I admit I haven't thought very deeply
>> about this. Feel free to poke holes into it.
>
>Yes, a CWND "domain" that can include multiple sockets is
>something that might gain some traction.
>
>The "domain" could just simply be the tuple {process,peer-IP}
This discussion - as once a month - is about fairness. But if we define a
domain as a tuple of {process,peer-IP} the fairness is applied only for the
last link before "peer-IP".
But fairness applies to *all* links in between! For example: consider a
dumpbell scenario:
+------+ +------+
| | | |
| H1 | | H3 |
| | | |
+------+ +------+
10MB \ +------+ +------+ / 10MB
\ | | 1MB/s | | /
> | R1 |------------| R2 |<
/ | | | | \
10MB / +------+ +------+ \ 10MB
+------+ +------+
| | | |
| H2 | | H4 |
| | | |
+------+ +------+
How can a domain defined as {process,peer-IP} fair to the 1MB bottleneck link?
It is not fair! And it is also not fair to open n simultaneous streams and so
on. This problem is discussed in several RFC's.
.02
Best regards, Hagen
--
Hagen Paul Pfeifer <hagen@jauu.net> || http://jauu.net/
Telephone: +49 174 5455209 || Key Id: 0x98350C22
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22
^ permalink raw reply
* Re: [REGRESSION,BISECTED] MIPv6 support broken by f4f914b58019f0
From: Brian Haley @ 2010-05-27 0:48 UTC (permalink / raw)
To: Arnaud Ebalard
Cc: David Miller,
YOSHIFUJI Hideaki / 吉藤英明, Jiri Olsa,
Scott Otto, netdev
In-Reply-To: <87zkzmppfg.fsf@small.ssi.corp>
On 05/26/2010 01:01 PM, Arnaud Ebalard wrote:
> Hi,
>
> I just updated my laptop's kernel to 2.6.34 (previously running .33 and
> configured to act as an IPsec/IKE-protected MIPv6 Mobile Node using
> racoon and umip): after rebooting on the new kernel, the transport mode
> SA protecting MIPv6 signaling traffic are missing.
>
> I bisected the issue down to f4f914b58019f0e50d521bbbadfaee260d766f95
> (net: ipv6 bind to device issue) which was added after 2.6.34-rc5:
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index c2438e8..05ebd78 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -815,7 +815,7 @@ struct dst_entry * ip6_route_output(struct net *net, struct sock *sk,
> {
> int flags = 0;
>
> - if (rt6_need_strict(&fl->fl6_dst))
> + if (fl->oif || rt6_need_strict(&fl->fl6_dst))
> flags |= RT6_LOOKUP_F_IFACE;
Can you see if fl->oif is at least a sane value here? Maybe there's some
partially un-initialized flowi getting passed-in, a quick source code check
didn't find anything obvious.
The other thought is that it's the tunnel code calling it, as it's going
to set 'oif' (actually it caches a whole flowi) from the tunnel parms ifindex/link
value. It could have been setting it forever, but ip6_route_output() just
never enforced it until now.
My $.02.
-Brian
^ permalink raw reply
* [PATCH] fs_enet: Adjust BDs after tx error
From: Mark Ware @ 2010-05-27 1:12 UTC (permalink / raw)
To: Linuxppc-dev Development, netdev; +Cc: Vitaly Bordug
This patch fixes an occasional transmit lockup in the mac-fcc which
occurs after a tx error. The test scenario had the local port set
to autoneg and the other end fixed at 100FD, resulting in a large
number of late collisions.
According to the MPC8280RM 30.10.1.3 (also 8272RM 29.10.1.3), after
a tx error occurs, TBPTR may sometimes point beyond BDs still marked
as ready. This patch walks back through the BDs and points TBPTR to
the earliest one marked as ready.
Tested on a custom board with a MPC8280.
Signed-off-by: Mark Ware <mware@elphinstone.net>
---
drivers/net/fs_enet/mac-fcc.c | 49 ++++++++++++++++++++++++++++++++++++-----
1 files changed, 43 insertions(+), 6 deletions(-)
diff --git a/drivers/net/fs_enet/mac-fcc.c b/drivers/net/fs_enet/mac-fcc.c
index 22e5a84..6023f15 100644
--- a/drivers/net/fs_enet/mac-fcc.c
+++ b/drivers/net/fs_enet/mac-fcc.c
@@ -503,17 +503,54 @@ static int get_regs_len(struct net_device *dev)
}
/* Some transmit errors cause the transmitter to shut
- * down. We now issue a restart transmit. Since the
- * errors close the BD and update the pointers, the restart
- * _should_ pick up without having to reset any of our
- * pointers either. Also, To workaround 8260 device erratum
- * CPM37, we must disable and then re-enable the transmitter
- * following a Late Collision, Underrun, or Retry Limit error.
+ * down. We now issue a restart transmit.
+ * Also, to workaround 8260 device erratum CPM37, we must
+ * disable and then re-enable the transmitterfollowing a
+ * Late Collision, Underrun, or Retry Limit error.
+ * In addition, tbptr may point beyond BDs beyond still marked
+ * as ready due to internal pipelining, so we need to look back
+ * through the BDs and adjust tbptr to point to the last BD
+ * marked as ready. This may result in some buffers being
+ * retransmitted.
*/
static void tx_restart(struct net_device *dev)
{
struct fs_enet_private *fep = netdev_priv(dev);
fcc_t __iomem *fccp = fep->fcc.fccp;
+ const struct fs_platform_info *fpi = fep->fpi;
+ fcc_enet_t __iomem *ep = fep->fcc.ep;
+ cbd_t __iomem *curr_tbptr;
+ cbd_t __iomem *recheck_bd;
+ cbd_t __iomem *prev_bd;
+ cbd_t __iomem *last_tx_bd;
+
+ last_tx_bd = fep->tx_bd_base + (fpi->tx_ring * sizeof(cbd_t));
+
+ /* get the current bd held in TBPTR and scan back from this point */
+ recheck_bd = curr_tbptr = (cbd_t __iomem *)
+ ((R32(ep, fen_genfcc.fcc_tbptr) - fep->ring_mem_addr) +
+ fep->ring_base);
+
+ prev_bd = (recheck_bd == fep->tx_bd_base) ? last_tx_bd : recheck_bd - 1;
+
+ /* Move through the bds in reverse, look for the earliest buffer
+ * that is not ready. Adjust TBPTR to the following buffer */
+ while ((CBDR_SC(prev_bd) & BD_ENET_TX_READY) != 0) {
+ /* Go back one buffer */
+ recheck_bd = prev_bd;
+
+ /* update the previous buffer */
+ prev_bd = (prev_bd == fep->tx_bd_base) ? last_tx_bd : prev_bd - 1;
+
+ /* We should never see all bds marked as ready, check anyway */
+ if (recheck_bd == curr_tbptr)
+ break;
+ }
+ /* Now update the TBPTR and dirty flag to the current buffer */
+ W32(ep, fen_genfcc.fcc_tbptr,
+ (uint) (((void *)recheck_bd - fep->ring_base) +
+ fep->ring_mem_addr));
+ fep->dirty_tx = recheck_bd;
C32(fccp, fcc_gfmr, FCC_GFMR_ENT);
udelay(10);
--
1.5.6.5
^ permalink raw reply related
* issue about virtio-net driver to suppoprt vhost mergeable buffer with zero-copy to support PS mode
From: Xin, Xiaohui @ 2010-05-27 1:21 UTC (permalink / raw)
To: mst@redhat.com
Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, Herbert Xu
Michael,
I'm now looking into the vhost mergeable buffer, and I tried to use it to support PS mode with zero-copy. And I found an issue there that I have to modify the guest virito-net driver.
When guest virtio-net driver submits mergeable buffers, it submits multiple pages outside. In zero-copy case, vhost cannot know which page is used to put header, and which page is used to put payload. Then vhost can only reserves 12 bytes for each page. That means, the page_offset of the payload DMAed into the guest buffer is always 12 bytes. But guest virtio-net driver always use offset 0 to put the data (See receive_mergeable()). That's where the zero-copy use mergeable buffer must modify.
Have I missed something here? And how do you think about it?
Thanks
Xiaohui
^ permalink raw reply
* UDP Fragmentation and DF bit..
From: Vincent, Pradeep @ 2010-05-27 1:45 UTC (permalink / raw)
To: netdev@vger.kernel.org
After running into issues with UDP in multi-MTU network environment, I
started digging through the code and found somewhat inconsistent behavior
(if I read the code right)
OMan 7 ip¹ declares that ³The system-wide default is controlled by the
ip_no_pmtu_disc sysctl for SOCK_STREAM sockets, and disabled on all
others.² which led me to think ODF¹ bit will not be set for UDP packets.
But..
The code in ip_output.c seems to do the following,
1. If packet size <= current PMTU, then set the DF=1 and send the packet
out.
2. If packet size > current PMTU, then set DF=0 and send the packet out
after fragmentation.
In a network environment where MTU-big and MTU-small co-exist (and have
router¹s fragmentation turned off in favor of PMTU discovery), UDP packets
that are > MTU-small and < MTU-big find the PMTU effectively but UDP packets
that are > MTU-big get dropped. This looks like inconsistent behavior to me
and doesn¹t seem to match the advertised behavior.
Is there a reason why PMTU support for UDP is somewhat inconsistent ?
Is there a reason ODF¹ bit cannot be set on fragmented packets on UDP
transmission ? I couldn¹t find anything in RFC for IP protocol that
prohibited DF bit on fragmented packets. Did I miss
something here ?
Would it be reasonable if PMTU discovery is performed (DF bit set +
appropriate icmp logic) even for locally fragmented packets ? I think this
will be a great help for UDP users that don¹t have PMTU handling logic in
the application (most udp applications belong to this category in my
experience). Thoughts ?
Thanks,
- Pradeep Vincent
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox