* [PATCH] virtio_net: indicate oom when addbuf returns failure
From: Rusty Russell @ 2010-06-04 0:58 UTC (permalink / raw)
To: stable; +Cc: Bruce Rogers, Michael S. Tsirkin, Herbert Xu, netdev
This patch is a subset of an already upstream patch, but this portion
is useful in earlier releases.
Please consider for the 2.6.32 and 2.6.33 stable trees.
If the add_buf operation fails, indicate failure to the caller.
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -318,6 +318,7 @@ static bool try_fill_recv_maxbufs(struct
skb_unlink(skb, &vi->recv);
trim_pages(vi, skb);
kfree_skb(skb);
+ oom = true;
break;
}
vi->num++;
@@ -368,6 +369,7 @@ static bool try_fill_recv(struct virtnet
if (err < 0) {
skb_unlink(skb, &vi->recv);
kfree_skb(skb);
+ oom = true;
break;
}
vi->num++;
^ permalink raw reply
* [PATCH] KVM: add schedule check to napi_enable call
From: Bruce Rogers @ 2010-06-04 0:48 UTC (permalink / raw)
To: netdev; +Cc: rusty
Please consider this patch for the 2.6.32, 2.6.33, and 2.6.34 stable trees as well as current development trees. (I've only tested on 2.6.32 however)
virtio_net: Add schedule check to napi_enable call
Under harsh testing conditions, including low memory, the guest would
stop receiving packets. With this patch applied we no longer see any
problems in the driver while performing these tests for extended periods
of time.
Make sure napi is scheduled subsequent to each napi_enable.
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Olaf Kirch <okir@suse.de>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -388,6 +388,20 @@ static void skb_recv_done(struct virtque
}
}
+static void virtnet_napi_enable(struct virtnet_info *vi)
+{
+ napi_enable(&vi->napi);
+
+ /* If all buffers were filled by other side before we napi_enabled, we
+ * won't get another interrupt, so process any outstanding packets
+ * now. virtnet_poll wants re-enable the queue, so we disable here.
+ * We synchronize against interrupts via NAPI_STATE_SCHED */
+ if (napi_schedule_prep(&vi->napi)) {
+ vi->rvq->vq_ops->disable_cb(vi->rvq);
+ __napi_schedule(&vi->napi);
+ }
+}
+
static void refill_work(struct work_struct *work)
{
struct virtnet_info *vi;
@@ -397,7 +411,7 @@ static void refill_work(struct work_stru
napi_disable(&vi->napi);
try_fill_recv(vi, GFP_KERNEL);
still_empty = (vi->num == 0);
- napi_enable(&vi->napi);
+ virtnet_napi_enable(vi);
/* In theory, this can happen: if we don't get any buffers in
* we will *never* try to fill again. */
@@ -589,16 +603,7 @@ static int virtnet_open(struct net_devic
{
struct virtnet_info *vi = netdev_priv(dev);
- napi_enable(&vi->napi);
-
- /* If all buffers were filled by other side before we napi_enabled, we
- * won't get another interrupt, so process any outstanding packets
- * now. virtnet_poll wants re-enable the queue, so we disable here.
- * We synchronize against interrupts via NAPI_STATE_SCHED */
- if (napi_schedule_prep(&vi->napi)) {
- vi->rvq->vq_ops->disable_cb(vi->rvq);
- __napi_schedule(&vi->napi);
- }
+ virtnet_napi_enable(vi);
return 0;
}
^ permalink raw reply
* Re: 200 millisecond timeouts in TCP
From: Hagen Paul Pfeifer @ 2010-06-04 0:45 UTC (permalink / raw)
To: Ivan Novick; +Cc: netdev
In-Reply-To: <AANLkTimP3IJzLUew7avKjDq7M2imLi7QHGBwyg_2zrPo@mail.gmail.com>
* Ivan Novick | 2010-06-03 17:11:07 [-0700]:
>resending tcpdump output as attachments.
Thank you, but where did you see a spurious retransmission? I looked over the
trace and didn't find any. (it is late here, so most likely I missed it ;-)
HGN
^ permalink raw reply
* Re: 200 millisecond timeouts in TCP
From: Ivan Novick @ 2010-06-04 0:11 UTC (permalink / raw)
To: Hagen Paul Pfeifer; +Cc: netdev
In-Reply-To: <20100603231002.GG6914@nuttenaction>
[-- Attachment #1: Type: text/plain, Size: 666 bytes --]
On Thu, Jun 3, 2010 at 4:10 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
> * Ivan Novick | 2010-06-03 15:37:24 [-0700]:
>
>>Using tcpdump and systemtap I am seeing that sometimes retransmission
>>of data is sent after waiting 200 milliseconds. However sometimes
>>retransmissions happen quicker.
>
> Quicker as 200ms? Conservatively the minimum TCP RTO should be 1s (rfc2988),
> Linux differs from this default and define the minimum RTO to 200ms:
>
> #define TCP_RTO_MIN ((unsigned)(HZ/5))
>
> Can you post the tcpdump traces where the relevant retransmission is recognizable?
resending tcpdump output as attachments.
Cheers,
Ivan Novick
[-- Attachment #2: dell1.tcpdump --]
[-- Type: application/octet-stream, Size: 6295 bytes --]
000005 IP dell-s1-1.46799 > dell-s2-1.47500: . 168771588:168787516(15928) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000052 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168749868 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000086 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168754212 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000026 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168757108 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000040 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168762900 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000013 IP dell-s1-1.46799 > dell-s2-1.47500: . 168787516:168803444(15928) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000046 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168768692 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000059 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168775932 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000006 IP dell-s1-1.46799 > dell-s2-1.47500: . 168803444:168817924(14480) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000049 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168781724 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000444 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168786068 win 16200 <nop,nop,timestamp 2077240629 1375977402>
000025 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168804892}>
000020 IP dell-s1-1.46799 > dell-s2-1.47500: . 168817924:168819372(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168806340}>
000006 IP dell-s1-1.46799 > dell-s2-1.47500: . 168819372:168820820(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168809236}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: . 168820820:168822268(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168812132}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: . 168822268:168823716(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168815028}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: . 168823716:168825164(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000005 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168817924}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: . 168825164:168826612(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000412 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168819372}>
000007 IP dell-s1-1.46799 > dell-s2-1.47500: . 168787516:168788964(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168822268}>
000004 IP dell-s1-1.46799 > dell-s2-1.47500: . 168788964:168790412(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168825164}>
000003 IP dell-s1-1.46799 > dell-s2-1.47500: . 168790412:168791860(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168826612}>
000004 IP dell-s1-1.46799 > dell-s2-1.47500: . 168791860:168793308(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000383 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168788964 win 16188 <nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1 {168801996:168826612}>
000006 IP dell-s1-1.46799 > dell-s2-1.47500: . 168793308:168794756(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168790412 win 16177 <nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1 {168801996:168826612}>
000004 IP dell-s1-1.46799 > dell-s2-1.47500: . 168794756:168796204(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168793308 win 16154 <nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1 {168801996:168826612}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: . 168796204:168797652(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240629>
000132 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168794756 win 16143 <nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1 {168801996:168826612}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: . 168797652:168799100(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240630>
000018 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168797652 win 16120 <nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1 {168801996:168826612}>
000006 IP dell-s1-1.46799 > dell-s2-1.47500: . 168799100:168800548(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240630>
000198 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168799100 win 16109 <nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1 {168801996:168826612}>
000006 IP dell-s1-1.46799 > dell-s2-1.47500: . 168800548:168801996(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240630>
000024 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168800548 win 16098 <nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1 {168801996:168826612}>
000007 IP dell-s1-1.46799 > dell-s2-1.47500: . 168826612:168828060(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240630>
203979 IP dell-s1-1.46799 > dell-s2-1.47500: . 168800548:168801996(1448) ack 1 win 46 <nop,nop,timestamp 1375977608 2077240630>
000110 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168828060 win 16200 <nop,nop,timestamp 2077240834 1375977404,nop,nop,sack 1 {168800548:168801996}>
[-- Attachment #3: dell2.tcpdump --]
[-- Type: application/octet-stream, Size: 8679 bytes --]
000032 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168746972 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000022 IP dell-s1-1.46799 > dell-s2-1.47500: . 168746972:168748420(1448) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240627>
000023 IP dell-s1-1.46799 > dell-s2-1.47500: . 168748420:168749868(1448) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240627>
000013 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168749868 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000037 IP dell-s1-1.46799 > dell-s2-1.47500: . 168749868:168751316(1448) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240627>
000027 IP dell-s1-1.46799 > dell-s2-1.47500: . 168751316:168754212(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240627>
000021 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168754212 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: . 168754212:168755660(1448) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240627>
000002 IP dell-s1-1.46799 > dell-s2-1.47500: . 168755660:168757108(1448) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000004 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168757108 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000019 IP dell-s1-1.46799 > dell-s2-1.47500: . 168757108:168760004(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000026 IP dell-s1-1.46799 > dell-s2-1.47500: . 168760004:168762900(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000011 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168762900 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000014 IP dell-s1-1.46799 > dell-s2-1.47500: . 168762900:168765796(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000026 IP dell-s1-1.46799 > dell-s2-1.47500: . 168765796:168768692(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000011 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168768692 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000016 IP dell-s1-1.46799 > dell-s2-1.47500: . 168768692:168773036(4344) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000026 IP dell-s1-1.46799 > dell-s2-1.47500: . 168773036:168775932(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000017 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168775932 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000008 IP dell-s1-1.46799 > dell-s2-1.47500: . 168775932:168778828(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000025 IP dell-s1-1.46799 > dell-s2-1.47500: . 168778828:168781724(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000012 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168781724 win 16200 <nop,nop,timestamp 2077240628 1375977402>
000014 IP dell-s1-1.46799 > dell-s2-1.47500: . 168781724:168784620(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000024 IP dell-s1-1.46799 > dell-s2-1.47500: . 168784620:168786068(1448) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000025 IP dell-s1-1.46799 > dell-s2-1.47500: . 168786068:168787516(1448) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000091 IP dell-s1-1.46799 > dell-s2-1.47500: . 168801996:168804892(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000009 IP dell-s1-1.46799 > dell-s2-1.47500: . 168804892:168806340(1448) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000026 IP dell-s1-1.46799 > dell-s2-1.47500: . 168806340:168809236(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000027 IP dell-s1-1.46799 > dell-s2-1.47500: . 168809236:168812132(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000024 IP dell-s1-1.46799 > dell-s2-1.47500: . 168812132:168815028(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000029 IP dell-s1-1.46799 > dell-s2-1.47500: . 168815028:168817924(2896) ack 1 win 46 <nop,nop,timestamp 1375977402 2077240628>
000077 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168786068 win 16200 <nop,nop,timestamp 2077240629 1375977402>
000006 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168804892}>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168806340}>
000001 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168809236}>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168812132}>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168815028}>
000001 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168817924}>
000203 IP dell-s1-1.46799 > dell-s2-1.47500: . 168817924:168819372(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000028 IP dell-s1-1.46799 > dell-s2-1.47500: . 168819372:168822268(2896) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000025 IP dell-s1-1.46799 > dell-s2-1.47500: . 168822268:168825164(2896) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000024 IP dell-s1-1.46799 > dell-s2-1.47500: . 168825164:168826612(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000312 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168819372}>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168822268}>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168825164}>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200 <nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1 {168801996:168826612}>
000074 IP dell-s1-1.46799 > dell-s2-1.47500: . 168787516:168788964(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000013 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168788964 win 16188 <nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1 {168801996:168826612}>
000013 IP dell-s1-1.46799 > dell-s2-1.47500: . 168788964:168790412(1448) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000013 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168790412 win 16177 <nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1 {168801996:168826612}>
000014 IP dell-s1-1.46799 > dell-s2-1.47500: . 168790412:168793308(2896) ack 1 win 46 <nop,nop,timestamp 1375977403 2077240629>
000015 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168793308 win 16154 <nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1 {168801996:168826612}>
000333 IP dell-s1-1.46799 > dell-s2-1.47500: . 168793308:168794756(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240629>
000016 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168794756 win 16143 <nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1 {168801996:168826612}>
000010 IP dell-s1-1.46799 > dell-s2-1.47500: . 168794756:168797652(2896) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240629>
000013 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168797652 win 16120 <nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1 {168801996:168826612}>
000111 IP dell-s1-1.46799 > dell-s2-1.47500: . 168797652:168799100(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240630>
000021 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168799100 win 16109 <nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1 {168801996:168826612}>
000004 IP dell-s1-1.46799 > dell-s2-1.47500: . 168799100:168800548(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240630>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168800548 win 16098 <nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1 {168801996:168826612}>
000258 IP dell-s1-1.46799 > dell-s2-1.47500: . 168800548:168801996(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240630>
000036 IP dell-s1-1.46799 > dell-s2-1.47500: . 168826612:168828060(1448) ack 1 win 46 <nop,nop,timestamp 1375977404 2077240630>
000030 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168826612 win 16077 <nop,nop,timestamp 2077240630 1375977404>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168828060 win 16066 <nop,nop,timestamp 2077240630 1375977404>
203890 IP dell-s1-1.46799 > dell-s2-1.47500: . 168800548:168801996(1448) ack 1 win 46 <nop,nop,timestamp 1375977608 2077240630>
000042 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168828060 win 16200 <nop,nop,timestamp 2077240834 1375977404,nop,nop,sack 1 {168800548:168801996}>
^ permalink raw reply
* Re: 200 millisecond timeouts in TCP
From: Ivan Novick @ 2010-06-04 0:05 UTC (permalink / raw)
To: Hagen Paul Pfeifer; +Cc: netdev
In-Reply-To: <20100603231002.GG6914@nuttenaction>
On Thu, Jun 3, 2010 at 4:10 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
> * Ivan Novick | 2010-06-03 15:37:24 [-0700]:
>
>>Using tcpdump and systemtap I am seeing that sometimes retransmission
>>of data is sent after waiting 200 milliseconds. However sometimes
>>retransmissions happen quicker.
>
> Quicker as 200ms? Conservatively the minimum TCP RTO should be 1s (rfc2988),
> Linux differs from this default and define the minimum RTO to 200ms:
>
> #define TCP_RTO_MIN ((unsigned)(HZ/5))
>
> Can you post the tcpdump traces where the relevant retransmission is recognizable?
Here is the tcpdump, below.
Cheers,
Ivan Novick
tcpdump on machine dell-s1-1:
000005 IP dell-s1-1.46799 > dell-s2-1.47500: .
168771588:168787516(15928) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000052 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168749868 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000086 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168754212 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000026 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168757108 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000040 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168762900 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000013 IP dell-s1-1.46799 > dell-s2-1.47500: .
168787516:168803444(15928) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000046 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168768692 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000059 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168775932 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000006 IP dell-s1-1.46799 > dell-s2-1.47500: .
168803444:168817924(14480) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000049 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168781724 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000444 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168786068 win 16200
<nop,nop,timestamp 2077240629 1375977402>
000025 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168804892}>
000020 IP dell-s1-1.46799 > dell-s2-1.47500: .
168817924:168819372(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168806340}>
000006 IP dell-s1-1.46799 > dell-s2-1.47500: .
168819372:168820820(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168809236}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: .
168820820:168822268(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168812132}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: .
168822268:168823716(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168815028}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: .
168823716:168825164(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000005 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168817924}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: .
168825164:168826612(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000412 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168819372}>
000007 IP dell-s1-1.46799 > dell-s2-1.47500: .
168787516:168788964(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168822268}>
000004 IP dell-s1-1.46799 > dell-s2-1.47500: .
168788964:168790412(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168825164}>
000003 IP dell-s1-1.46799 > dell-s2-1.47500: .
168790412:168791860(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168826612}>
000004 IP dell-s1-1.46799 > dell-s2-1.47500: .
168791860:168793308(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000383 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168788964 win 16188
<nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1
{168801996:168826612}>
000006 IP dell-s1-1.46799 > dell-s2-1.47500: .
168793308:168794756(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168790412 win 16177
<nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1
{168801996:168826612}>
000004 IP dell-s1-1.46799 > dell-s2-1.47500: .
168794756:168796204(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240629>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168793308 win 16154
<nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1
{168801996:168826612}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: .
168796204:168797652(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240629>
000132 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168794756 win 16143
<nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1
{168801996:168826612}>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: .
168797652:168799100(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240630>
000018 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168797652 win 16120
<nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1
{168801996:168826612}>
000006 IP dell-s1-1.46799 > dell-s2-1.47500: .
168799100:168800548(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240630>
000198 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168799100 win 16109
<nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1
{168801996:168826612}>
000006 IP dell-s1-1.46799 > dell-s2-1.47500: .
168800548:168801996(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240630>
000024 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168800548 win 16098
<nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1
{168801996:168826612}>
000007 IP dell-s1-1.46799 > dell-s2-1.47500: .
168826612:168828060(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240630>
203979 IP dell-s1-1.46799 > dell-s2-1.47500: .
168800548:168801996(1448) ack 1 win 46 <nop,nop,timestamp 1375977608
2077240630>
000110 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168828060 win 16200
<nop,nop,timestamp 2077240834 1375977404,nop,nop,sack 1
{168800548:168801996}>
tcpdump on machine dell-s2-1:
000032 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168746972 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000022 IP dell-s1-1.46799 > dell-s2-1.47500: .
168746972:168748420(1448) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240627>
000023 IP dell-s1-1.46799 > dell-s2-1.47500: .
168748420:168749868(1448) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240627>
000013 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168749868 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000037 IP dell-s1-1.46799 > dell-s2-1.47500: .
168749868:168751316(1448) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240627>
000027 IP dell-s1-1.46799 > dell-s2-1.47500: .
168751316:168754212(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240627>
000021 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168754212 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000005 IP dell-s1-1.46799 > dell-s2-1.47500: .
168754212:168755660(1448) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240627>
000002 IP dell-s1-1.46799 > dell-s2-1.47500: .
168755660:168757108(1448) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000004 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168757108 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000019 IP dell-s1-1.46799 > dell-s2-1.47500: .
168757108:168760004(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000026 IP dell-s1-1.46799 > dell-s2-1.47500: .
168760004:168762900(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000011 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168762900 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000014 IP dell-s1-1.46799 > dell-s2-1.47500: .
168762900:168765796(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000026 IP dell-s1-1.46799 > dell-s2-1.47500: .
168765796:168768692(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000011 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168768692 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000016 IP dell-s1-1.46799 > dell-s2-1.47500: .
168768692:168773036(4344) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000026 IP dell-s1-1.46799 > dell-s2-1.47500: .
168773036:168775932(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000017 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168775932 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000008 IP dell-s1-1.46799 > dell-s2-1.47500: .
168775932:168778828(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000025 IP dell-s1-1.46799 > dell-s2-1.47500: .
168778828:168781724(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000012 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168781724 win 16200
<nop,nop,timestamp 2077240628 1375977402>
000014 IP dell-s1-1.46799 > dell-s2-1.47500: .
168781724:168784620(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000024 IP dell-s1-1.46799 > dell-s2-1.47500: .
168784620:168786068(1448) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000025 IP dell-s1-1.46799 > dell-s2-1.47500: .
168786068:168787516(1448) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000091 IP dell-s1-1.46799 > dell-s2-1.47500: .
168801996:168804892(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000009 IP dell-s1-1.46799 > dell-s2-1.47500: .
168804892:168806340(1448) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000026 IP dell-s1-1.46799 > dell-s2-1.47500: .
168806340:168809236(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000027 IP dell-s1-1.46799 > dell-s2-1.47500: .
168809236:168812132(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000024 IP dell-s1-1.46799 > dell-s2-1.47500: .
168812132:168815028(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000029 IP dell-s1-1.46799 > dell-s2-1.47500: .
168815028:168817924(2896) ack 1 win 46 <nop,nop,timestamp 1375977402
2077240628>
000077 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168786068 win 16200
<nop,nop,timestamp 2077240629 1375977402>
000006 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168804892}>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168806340}>
000001 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168809236}>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168812132}>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168815028}>
000001 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168817924}>
000203 IP dell-s1-1.46799 > dell-s2-1.47500: .
168817924:168819372(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000028 IP dell-s1-1.46799 > dell-s2-1.47500: .
168819372:168822268(2896) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000025 IP dell-s1-1.46799 > dell-s2-1.47500: .
168822268:168825164(2896) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000024 IP dell-s1-1.46799 > dell-s2-1.47500: .
168825164:168826612(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000312 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168819372}>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168822268}>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168825164}>
000002 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168787516 win 16200
<nop,nop,timestamp 2077240629 1375977402,nop,nop,sack 1
{168801996:168826612}>
000074 IP dell-s1-1.46799 > dell-s2-1.47500: .
168787516:168788964(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000013 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168788964 win 16188
<nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1
{168801996:168826612}>
000013 IP dell-s1-1.46799 > dell-s2-1.47500: .
168788964:168790412(1448) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000013 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168790412 win 16177
<nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1
{168801996:168826612}>
000014 IP dell-s1-1.46799 > dell-s2-1.47500: .
168790412:168793308(2896) ack 1 win 46 <nop,nop,timestamp 1375977403
2077240629>
000015 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168793308 win 16154
<nop,nop,timestamp 2077240629 1375977403,nop,nop,sack 1
{168801996:168826612}>
000333 IP dell-s1-1.46799 > dell-s2-1.47500: .
168793308:168794756(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240629>
000016 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168794756 win 16143
<nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1
{168801996:168826612}>
000010 IP dell-s1-1.46799 > dell-s2-1.47500: .
168794756:168797652(2896) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240629>
000013 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168797652 win 16120
<nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1
{168801996:168826612}>
000111 IP dell-s1-1.46799 > dell-s2-1.47500: .
168797652:168799100(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240630>
000021 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168799100 win 16109
<nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1
{168801996:168826612}>
000004 IP dell-s1-1.46799 > dell-s2-1.47500: .
168799100:168800548(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240630>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168800548 win 16098
<nop,nop,timestamp 2077240630 1375977404,nop,nop,sack 1
{168801996:168826612}>
000258 IP dell-s1-1.46799 > dell-s2-1.47500: .
168800548:168801996(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240630>
000036 IP dell-s1-1.46799 > dell-s2-1.47500: .
168826612:168828060(1448) ack 1 win 46 <nop,nop,timestamp 1375977404
2077240630>
000030 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168826612 win 16077
<nop,nop,timestamp 2077240630 1375977404>
000003 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168828060 win 16066
<nop,nop,timestamp 2077240630 1375977404>
203890 IP dell-s1-1.46799 > dell-s2-1.47500: .
168800548:168801996(1448) ack 1 win 46 <nop,nop,timestamp 1375977608
2077240630>
000042 IP dell-s2-1.47500 > dell-s1-1.46799: . ack 168828060 win 16200
<nop,nop,timestamp 2077240834 1375977404,nop,nop,sack 1
{168800548:168801996}>
^ permalink raw reply
* Re: 200 millisecond timeouts in TCP
From: Hagen Paul Pfeifer @ 2010-06-03 23:10 UTC (permalink / raw)
To: Ivan Novick; +Cc: netdev
In-Reply-To: <AANLkTils4KHXwX79cLG6UvVOaDdddMqx-1tyUL3VtLQC@mail.gmail.com>
* Ivan Novick | 2010-06-03 15:37:24 [-0700]:
>Using tcpdump and systemtap I am seeing that sometimes retransmission
>of data is sent after waiting 200 milliseconds. However sometimes
>retransmissions happen quicker.
Quicker as 200ms? Conservatively the minimum TCP RTO should be 1s (rfc2988),
Linux differs from this default and define the minimum RTO to 200ms:
#define TCP_RTO_MIN ((unsigned)(HZ/5))
Can you post the tcpdump traces where the relevant retransmission is recognizable?
>Also do you know if the timeout numbers for TCP are configurable parameters?
Some values are documented in Documentation/networking/ip-sysctl.txt, you can
find the relevant timer implementation in ipv4/tcp_input.c and the definition
of TCP_RTO_MIN in include/net/tcp.h.
Hagen Paul Pfeifer
--
Hagen Paul Pfeifer <hagen@jauu.net> || http://jauu.net/
Telephone: +49 174 5455209 || Key Id: 0x98350C22
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22
^ permalink raw reply
* Re: 200 millisecond timeouts in TCP
From: Mitchell Erblich @ 2010-06-03 22:51 UTC (permalink / raw)
To: Ivan Novick; +Cc: netdev
In-Reply-To: <AANLkTils4KHXwX79cLG6UvVOaDdddMqx-1tyUL3VtLQC@mail.gmail.com>
On Jun 3, 2010, at 3:37 PM, Ivan Novick wrote:
> Hello,
>
> Using tcpdump and systemtap I am seeing that sometimes retransmission
> of data is sent after waiting 200 milliseconds. However sometimes
> retransmissions happen quicker.
>
> Is there a specifc event that causes these 200 milisec delays to kick
> in? Are those events identifiable in netstat -s output?
>
> Also do you know if the timeout numbers for TCP are configurable parameters?
>
> I am testing on RHEL5 with this kernel: 2.6.18-164.15.1.el5.
>
> Cheers,
> Ivan Novick
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Ivan Novick,
There is a public Linux TCP paper that I think covers this.
Mitchell Erblich
^ permalink raw reply
* Re: e1000e driver, Intel 82567LF-2, link negotiation (and wol) problems
From: John Ronciak @ 2010-06-03 22:39 UTC (permalink / raw)
To: David Härdeman; +Cc: netdev, jesse.brandeburg
In-Reply-To: <20100603222024.GA22121@hardeman.nu>
On Thu, Jun 3, 2010 at 3:20 PM, David Härdeman <david@hardeman.nu> wrote:
> I have an Intel DG45FC motherboard with an integrated gigabit NIC (lspci
> says it's a "Intel Corporation 82567LF-2 Gigabit Network Connection").
>
> When using the in-kernel e1000e driver (tried up to kernel version
> 2.6.34), the speed is negotiated to 100mbit (most of the time) even
> though the NIC is connected to a gigabit switch using quality cables
> (I've tried a few different to be sure). There seems to be no real
> pattern to when the link is negotiated to 100mbit or 1000mbit.
>
> I've tried Intel's version of the driver (e1000e from sourceforge,
> version 1.1.19) and it seems to behave in the same way.
>
> The output from mii-tool is quite confusing:
> scott:~# mii-tool -v eth0
> SIOCGMIIREG on eth0 failed: Input/output error
> SIOCGMIIREG on eth0 failed: Input/output error
> eth0: negotiated 100baseTx-FD flow-control, link ok
> product info: vendor 00:50:43, model 11 rev 0
> basic mode: autonegotiation enabled
> basic status: autonegotiation complete, link ok
> capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD
> 10baseT-HD
> advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
> flow-control
> link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD
> 10baseT-HD flow-control
>
> (capabilities and link partner agree on 1000mbit, but only 100mbit is
> advertised according to mii-tool)
>
> ethtool disagrees with mii-tool:
> scott:~# ethtool eth0
> Settings for eth0:
> Supported ports: [ TP ]
> Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half
> 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes
> Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half
> 100baseT/Full 1000baseT/Full Advertised pause frame use: No
> Advertised auto-negotiation: Yes
> Speed: 100Mb/s
> Duplex: Full
> Port: Twisted Pair
> PHYAD: 2
> Transceiver: internal
> Auto-negotiation: on
> MDI-X: on
> Supports Wake-on: pumbag
> Wake-on: g
> Current message level: 0x00000001 (1)
> Link detected: yes
>
> Manually setting the speed with ethtool doesn't work. Not sure how to
> proceed...any suggestions?
>
> (And while I'm at it, the Intel e1000e driver from sourceforge seems to
> have a wol init bug, ethtool reports "Wake-on: g" but I can wake a
> suspended machine using a simple ping. Calling "ethtool -s eth0 wol g"
> before suspending gets the expected behaviour - i.e. only wake on a
> magic wol packet. Don't want to register on sourceforge just to report
> that to the bug tracker though).
>
> Not subscribed to netdev, please CC me on any answers.
>
> --
> David Härdeman
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
What is your link partner? Do different ones all do the same thing?
Is the link partner configured for auto-neg? If you force speed and
duplex to 1000/full on both sides, is it linked at 1000/full? You
have to force both sides of the connection for it to work correctly.
Same for auto-neg, both sides need to be set to do that. Maybe the
link partner is only advertising 100Mb and so that is what it is
linking to?
--
Cheers,
John
^ permalink raw reply
* 200 millisecond timeouts in TCP
From: Ivan Novick @ 2010-06-03 22:37 UTC (permalink / raw)
To: netdev
Hello,
Using tcpdump and systemtap I am seeing that sometimes retransmission
of data is sent after waiting 200 milliseconds. However sometimes
retransmissions happen quicker.
Is there a specifc event that causes these 200 milisec delays to kick
in? Are those events identifiable in netstat -s output?
Also do you know if the timeout numbers for TCP are configurable parameters?
I am testing on RHEL5 with this kernel: 2.6.18-164.15.1.el5.
Cheers,
Ivan Novick
^ permalink raw reply
* e1000e driver, Intel 82567LF-2, link negotiation (and wol) problems
From: David Härdeman @ 2010-06-03 22:20 UTC (permalink / raw)
To: netdev; +Cc: jesse.brandeburg
I have an Intel DG45FC motherboard with an integrated gigabit NIC (lspci
says it's a "Intel Corporation 82567LF-2 Gigabit Network Connection").
When using the in-kernel e1000e driver (tried up to kernel version
2.6.34), the speed is negotiated to 100mbit (most of the time) even
though the NIC is connected to a gigabit switch using quality cables
(I've tried a few different to be sure). There seems to be no real
pattern to when the link is negotiated to 100mbit or 1000mbit.
I've tried Intel's version of the driver (e1000e from sourceforge,
version 1.1.19) and it seems to behave in the same way.
The output from mii-tool is quite confusing:
scott:~# mii-tool -v eth0
SIOCGMIIREG on eth0 failed: Input/output error
SIOCGMIIREG on eth0 failed: Input/output error
eth0: negotiated 100baseTx-FD flow-control, link ok
product info: vendor 00:50:43, model 11 rev 0
basic mode: autonegotiation enabled
basic status: autonegotiation complete, link ok
capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD
10baseT-HD
advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
flow-control
link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD
10baseT-HD flow-control
(capabilities and link partner agree on 1000mbit, but only 100mbit is
advertised according to mii-tool)
ethtool disagrees with mii-tool:
scott:~# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half
100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half
100baseT/Full 1000baseT/Full Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 2
Transceiver: internal
Auto-negotiation: on
MDI-X: on
Supports Wake-on: pumbag
Wake-on: g
Current message level: 0x00000001 (1)
Link detected: yes
Manually setting the speed with ethtool doesn't work. Not sure how to
proceed...any suggestions?
(And while I'm at it, the Intel e1000e driver from sourceforge seems to
have a wol init bug, ethtool reports "Wake-on: g" but I can wake a
suspended machine using a simple ping. Calling "ethtool -s eth0 wol g"
before suspending gets the expected behaviour - i.e. only wake on a
magic wol packet. Don't want to register on sourceforge just to report
that to the bug tracker though).
Not subscribed to netdev, please CC me on any answers.
--
David Härdeman
^ permalink raw reply
* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
From: Andrew Morton @ 2010-06-03 22:11 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S. Miller, netdev
In-Reply-To: <1275602282.2533.72.camel@edumazet-laptop>
On Thu, 03 Jun 2010 23:58:02 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 03 juin 2010 __ 14:39 -0700, Andrew Morton a __crit :
>
> > Well. The presence of this warning does serve to remind us how sucky
> > e1000[e] is :(
> >
> > I'm not particularly fussed either way - I'm just wondering if you guys
> > think this thing meets the noise-to-benefit test...
> >
>
> Well, in this particular case, I think its a genuine bug in the ipv6
> code, not on the e1000[e] driver :)
>
> It allocates "a priori" dev->mtu sized skb that are filled by maybe one
> hundred bytes by caller.
>
> With MTU=9000, this means order-2 allocations. In an ideal world, it
> would be fine, but in practice, we know only fools can trust high order
> allocations.
>
> Since code is prepared to chain skbs, just make sure we cap allocations
> to smaller units (up to 0xe80 bytes on a 64bit kernel)
>
> So in this particular case, the bugzilla report can point to a real
> problem in our stack.
>
> Failed allocations had been silent, we probably would never have
> noticed.
>
> Hmm...
>
heh, in that case I guess we should leave it there.
But I do tend to ignore such reports, or fob them off with some
boilerplate. It was pure luck that the one I chose as an example
happened to be interesting.
^ permalink raw reply
* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
From: Andrew Morton @ 2010-06-03 22:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S. Miller, netdev
In-Reply-To: <1275601036.2533.63.camel@edumazet-laptop>
On Thu, 03 Jun 2010 23:37:16 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 03 juin 2010 __ 23:13 +0200, Eric Dumazet a __crit :
>
> > MTU=9000 on a system with 4K pages... Oh well...
> >
> > maybe net/ipv6/mcast.c should cap dev->mtu to PAGE_SIZE-128 or
> > something, so that order-0 allocations are done.
> >
> >
>
> Something like this patch (completely untested) :
>
> [PATCH] ipv6: avoid high order allocations
>
> With mtu=9000, mld_newpack() use order-2 GFP_ATOMIC allocations, that
> are very unreliable, on machines where PAGE_SIZE=4K
>
> Limit allocated skbs to be at most one page. (order-0 allocations)
>
Maybe - I wouldn't know how desirable that is from the
imapct-on-efficiency POV. But I think most failures I've seen are for
regular old tcpipv4. Often with e1000, which does larger-than-needed
allocations for (iirc) weird alignment requirements.
> ---
> net/ipv6/mcast.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> index 59f1881..3484794 100644
> --- a/net/ipv6/mcast.c
> +++ b/net/ipv6/mcast.c
> @@ -1356,7 +1356,10 @@ static struct sk_buff *mld_newpack(struct net_device *dev, int size)
> IPV6_TLV_PADN, 0 };
>
> /* we assume size > sizeof(ra) here */
> - skb = sock_alloc_send_skb(sk, size + LL_ALLOCATED_SPACE(dev), 1, &err);
> + size += LL_ALLOCATED_SPACE(dev);
> + /* limit our allocations to order-0 page */
> + size = min(size, SKB_MAX_ORDER(0, 0));
> + skb = sock_alloc_send_skb(sk, size, 1, &err);
>
> if (!skb)
> return NULL;
An alternative which retains any performance benefit from the order-2
allocation would be:
p = alloc_pages(__GFP_NOWARN|..., 2);
if (!p)
p = alloc_pages(..., 0);
if you see what I mean.
This would also fix any retry/timeout-related stalls which people might
experience when the order-2 allocation fails, so it might make things
better in general.
^ permalink raw reply
* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
From: Eric Dumazet @ 2010-06-03 21:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: David S. Miller, netdev
In-Reply-To: <20100603143915.f1b0ba2b.akpm@linux-foundation.org>
Le jeudi 03 juin 2010 à 14:39 -0700, Andrew Morton a écrit :
> Well. The presence of this warning does serve to remind us how sucky
> e1000[e] is :(
>
> I'm not particularly fussed either way - I'm just wondering if you guys
> think this thing meets the noise-to-benefit test...
>
Well, in this particular case, I think its a genuine bug in the ipv6
code, not on the e1000[e] driver :)
It allocates "a priori" dev->mtu sized skb that are filled by maybe one
hundred bytes by caller.
With MTU=9000, this means order-2 allocations. In an ideal world, it
would be fine, but in practice, we know only fools can trust high order
allocations.
Since code is prepared to chain skbs, just make sure we cap allocations
to smaller units (up to 0xe80 bytes on a 64bit kernel)
So in this particular case, the bugzilla report can point to a real
problem in our stack.
Failed allocations had been silent, we probably would never have
noticed.
Hmm...
^ permalink raw reply
* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
From: Eric Dumazet @ 2010-06-03 21:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: David S. Miller, netdev
In-Reply-To: <1275599603.2533.58.camel@edumazet-laptop>
Le jeudi 03 juin 2010 à 23:13 +0200, Eric Dumazet a écrit :
> MTU=9000 on a system with 4K pages... Oh well...
>
> maybe net/ipv6/mcast.c should cap dev->mtu to PAGE_SIZE-128 or
> something, so that order-0 allocations are done.
>
>
Something like this patch (completely untested) :
[PATCH] ipv6: avoid high order allocations
With mtu=9000, mld_newpack() use order-2 GFP_ATOMIC allocations, that
are very unreliable, on machines where PAGE_SIZE=4K
Limit allocated skbs to be at most one page. (order-0 allocations)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/ipv6/mcast.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 59f1881..3484794 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1356,7 +1356,10 @@ static struct sk_buff *mld_newpack(struct net_device *dev, int size)
IPV6_TLV_PADN, 0 };
/* we assume size > sizeof(ra) here */
- skb = sock_alloc_send_skb(sk, size + LL_ALLOCATED_SPACE(dev), 1, &err);
+ size += LL_ALLOCATED_SPACE(dev);
+ /* limit our allocations to order-0 page */
+ size = min(size, SKB_MAX_ORDER(0, 0));
+ skb = sock_alloc_send_skb(sk, size, 1, &err);
if (!skb)
return NULL;
^ permalink raw reply related
* Re: [Patch 1/2]r8169: remove rtl_rw_cpluscmd
From: Francois Romieu @ 2010-06-03 21:42 UTC (permalink / raw)
To: Junchang Wang; +Cc: davem, netdev
In-Reply-To: <20100603112428.GC24909@host-a-55.ustcsz.edu.cn>
Junchang Wang <junchangwang@gmail.com> :
> Some clean up work. Please correct me if any of this is wrong.
>
> Writting "cmd" back without modification is redundant.
> Secondly, because rtl_rw_cpluscmd is just encapsulation of
> RTL_R16, remove rtl_rw_cpluscmd.
I'll figure that there may be some value in the patch if you
test the change on revision X, Y and Z.
Some cosmetic isolated in a sequence of changes can be fine too.
Otherwise I consider such cleanups as a (useless and) conceivably
harmful distraction.
--
Ueimor
^ permalink raw reply
* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
From: Andrew Morton @ 2010-06-03 21:39 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S. Miller, netdev
In-Reply-To: <1275599603.2533.58.camel@edumazet-laptop>
On Thu, 03 Jun 2010 23:13:23 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 03 juin 2010 __ 13:02 -0700, Andrew Morton a __crit :
> > On Mon, 31 May 2010 15:55:12 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=16083
> > >
> > > Summary: swapper: Page allocation failure
> > > Product: Memory Management
> > > Version: 2.5
> > > Kernel Version: 2.6.34
> > > Platform: All
> > > OS/Version: Linux
> > > Tree: Mainline
> > > Status: NEW
> > > Severity: normal
> > > Priority: P1
> > > Component: Other
> > > AssignedTo: akpm@linux-foundation.org
> > > ReportedBy: sgunderson@bigfoot.com
> > > Regression: No
> > >
> > >
> > > Hi,
> > >
> > > Since upgrading from a Q9450 to 2xE5520 (and upgrading from 2.6.34-rc-something
> > > to 2.6.34), I've started seeing these:
> > >
> > > [605882.372418] swapper: page allocation failure. order:2, mode:0x4020
> > > [605882.378981] Pid: 0, comm: swapper Not tainted 2.6.34 #1
> > > [605882.384617] Call Trace:
> > > [605882.387499] <IRQ> [<ffffffff81096d5a>] __alloc_pages_nodemask+0x5b0/0x629
> > > [605882.395068] [<ffffffff81096de5>] __get_free_pages+0x12/0x4f
> > > [605882.401103] [<ffffffff810bdeb4>] __kmalloc_track_caller+0x4c/0x156
> > > [605882.407817] [<ffffffff81245986>] ? sock_alloc_send_pskb+0xdd/0x32d
> > > [605882.414556] [<ffffffff8124a515>] __alloc_skb+0x66/0x15b
> >
> > I wonder if we should switch __alloc_skb() over to __GFP_NOWARN.
> > People keep on reporting events such as the above, and nobody's
> > getting any value from this.
> >
>
> Then we could make __GFP_NOWARN for all allocations in kernel, why
> network is so special ?
Because this failure is known and is expected to occur sometimes and we
know that networking knows how to recover from it.
This removes most of the value from the warning. The warning's there
to tell us about potentially buggy code, and to tell us why an
immediately-following oops happened. Not applicable with alloc_skb()!
I mean, it's just not telling us anything very useful and it's alarming
users and is consuming effort.
> > Downsides:
> >
> > - the change would tend to deprive MM developers of prompt "hey you
> > broke it again" notifications.
> >
> > - if a system is getting enough allocation failures to impact
> > throughput, the operators won't *know* that it's happening, and so
> > they won't make the changes necessary to reduce the frequency of
> > memory allocation failures.
> >
>
> We should have SNMP counter increments
I was thinking maybe a rate-limited printk every minute or so "12 skb
allocation failures since ...". Dunno.
One of the problem with the current warning is that it looks like an oops.
In fact reporters regularly _call_ it "an oops". Something less alarming
and more specific would be more helpful here.
> > If these are likely to be a problem, perhaps networking could provide
> > some other form of "hey, you keep on running out of memory"
> > notification, if it doesn't already do so.
> >
> > Thoughts?
> >
>
> order-2 ATOMIC allocations ?
>
> skb = mld_newpack(dev, dev->mtu);
>
> Let's face it : It can not work in the long term.
>
> MTU=9000 on a system with 4K pages... Oh well...
>
> maybe net/ipv6/mcast.c should cap dev->mtu to PAGE_SIZE-128 or
> something, so that order-0 allocations are done.
Well. The presence of this warning does serve to remind us how sucky
e1000[e] is :(
I'm not particularly fussed either way - I'm just wondering if you guys
think this thing meets the noise-to-benefit test...
^ permalink raw reply
* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
From: Eric Dumazet @ 2010-06-03 21:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: David S. Miller, netdev
In-Reply-To: <20100603130235.c372b38f.akpm@linux-foundation.org>
Le jeudi 03 juin 2010 à 13:02 -0700, Andrew Morton a écrit :
> On Mon, 31 May 2010 15:55:12 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=16083
> >
> > Summary: swapper: Page allocation failure
> > Product: Memory Management
> > Version: 2.5
> > Kernel Version: 2.6.34
> > Platform: All
> > OS/Version: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: Other
> > AssignedTo: akpm@linux-foundation.org
> > ReportedBy: sgunderson@bigfoot.com
> > Regression: No
> >
> >
> > Hi,
> >
> > Since upgrading from a Q9450 to 2xE5520 (and upgrading from 2.6.34-rc-something
> > to 2.6.34), I've started seeing these:
> >
> > [605882.372418] swapper: page allocation failure. order:2, mode:0x4020
> > [605882.378981] Pid: 0, comm: swapper Not tainted 2.6.34 #1
> > [605882.384617] Call Trace:
> > [605882.387499] <IRQ> [<ffffffff81096d5a>] __alloc_pages_nodemask+0x5b0/0x629
> > [605882.395068] [<ffffffff81096de5>] __get_free_pages+0x12/0x4f
> > [605882.401103] [<ffffffff810bdeb4>] __kmalloc_track_caller+0x4c/0x156
> > [605882.407817] [<ffffffff81245986>] ? sock_alloc_send_pskb+0xdd/0x32d
> > [605882.414556] [<ffffffff8124a515>] __alloc_skb+0x66/0x15b
>
> I wonder if we should switch __alloc_skb() over to __GFP_NOWARN.
> People keep on reporting events such as the above, and nobody's
> getting any value from this.
>
Then we could make __GFP_NOWARN for all allocations in kernel, why
network is so special ?
> Downsides:
>
> - the change would tend to deprive MM developers of prompt "hey you
> broke it again" notifications.
>
> - if a system is getting enough allocation failures to impact
> throughput, the operators won't *know* that it's happening, and so
> they won't make the changes necessary to reduce the frequency of
> memory allocation failures.
>
We should have SNMP counter increments
> If these are likely to be a problem, perhaps networking could provide
> some other form of "hey, you keep on running out of memory"
> notification, if it doesn't already do so.
>
> Thoughts?
>
order-2 ATOMIC allocations ?
skb = mld_newpack(dev, dev->mtu);
Let's face it : It can not work in the long term.
MTU=9000 on a system with 4K pages... Oh well...
maybe net/ipv6/mcast.c should cap dev->mtu to PAGE_SIZE-128 or
something, so that order-0 allocations are done.
^ permalink raw reply
* Re: [PATCH V2 1/2] Export firmware assigned labels of network devices to sysfs
From: Narendra K @ 2010-06-03 21:07 UTC (permalink / raw)
To: greg; +Cc: netdev, linux-hotplug, linux-pci, matt_domsch
In-Reply-To: <EDA0A4495861324DA2618B4C45DCB3EE6128E1@blrx3m08.blr.amer.dell.com>
> -----Original Message-----
> From: Greg KH [mailto:greg@kroah.com]
> Sent: Friday, May 28, 2010 9:11 PM
> To: K, Narendra
> Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org;
> linux-pci@vger.kernel.org; Domsch, Matt; Hargrave, Jordan; Rose,
> Charles; Nijhawan, Vijay
> Subject: Re: [PATCH 1/2] Export firmware assigned labels of network
> devices to sysfs
>
Thanks for the comments.
> On Fri, May 28, 2010 at 06:55:21AM -0500, K, Narendra wrote:
> > Please refer to the PCI-SIG Draft ECN
> > "PCIe Device Labeling under Operating Systems Draft ECN" at this link
> -
> > http://www.pcisig.com/specifications/pciexpress/review_zone/.
> >
> > It would be great to know your views on this ECN. Please let us know
> if you have
> > have any suggestions or changes.
>
> Note that only members of the PCI-SIG can do this, which pretty much
> rules out any "normal" Linux kernel developer :(
>
> Care to post a public version of this for us to review?
> > --- /dev/null
> > +++ b/drivers/pci/pci-label.c
> > @@ -0,0 +1,242 @@
> > +/*
> > + * File: drivers/pci/pci-label.c
>
> This line is not needed, we know the file name :)
>
> > + * Purpose: Export the firmware label associated with a pci
> network interface
> > + * device to sysfs
> > + * Copyright (C) 2010 Dell Inc.
> > + * by Narendra K <Narendra_K@dell.com>, Jordan Hargrave
> <Jordan_Hargrave@dell.com>
> > + *
> > + * This code checks if the pci network device has a related ACPI
> _DSM. If
> > + * available, the code calls the _DSM to retrieve the index and
> string and
> > + * exports them to sysfs. If the ACPI _DSM is not available, it falls
> back on
> > + * SMBIOS. SMBIOS defines type 41 for onboard pci devices. This code
> retrieves
> > + * strings associated with the type 41 and exports it to sysfs.
> > + *
> > + * Please see http://linux.dell.com/wiki/index.php/Oss/libnetdevname
> for more
> > + * information.
> > + */
> > +
> > +#include <linux/pci-label.h>
>
> Why is this file in include/linux/ ? Who needs it there? Can't it just
> be in in the drivers/pci/ directory? Actually all you need is 2
> functions in there, so it could go into the internal pci.h file in that
> directory without a problem, right?
>
This file is removed and functions are moved to the internal pci.h
> > +
> > +static ssize_t
> > +smbiosname_string_exists(struct device *dev, char *buf)
> > +{
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > + const struct dmi_device *dmi;
> > + struct dmi_devslot *dslot;
> > + int bus;
> > + int devfn;
> > +
> > + bus = pdev->bus->number;
> > + devfn = pdev->devfn;
> > +
> > + dmi = NULL;
> > + while ((dmi = dmi_find_device(DMI_DEV_TYPE_DEVSLOT, NULL,
> dmi)) != NULL) {
> > + dslot = dmi->device_data;
> > + if (dslot && dslot->bus == bus && dslot->devfn ==
> devfn) {
> > + if (buf)
> > + return scnprintf(buf, PAGE_SIZE,
> "%s\n", dmi->name);
> > + return strlen(dmi->name);
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static ssize_t
> > +smbiosname_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> > +{
> > + return smbiosname_string_exists(dev, buf);
> > +}
> > +
> > +struct smbios_attribute smbios_attr_label = {
> > + .attr = {.name = __stringify(label), .mode = 0444, .owner =
> THIS_MODULE},
>
> Can't you just put "label" as the name?
>
This is fixed.
> > + .show = smbiosname_show,
> > + .test = smbiosname_string_exists,
> > +};
> > +
> > +static int
> > +pci_create_smbiosname_file(struct pci_dev *pdev)
> > +{
> > + if (smbios_attr_label.test &&
> smbios_attr_label.test(&pdev->dev, NULL)) {
> > + sysfs_create_file(&pdev->dev.kobj,
> &smbios_attr_label.attr);
> > + return 0;
> > + }
> > + return -1;
> > +}
> > +
> > +static int
> > +pci_remove_smbiosname_file(struct pci_dev *pdev)
> > +{
> > + if (smbios_attr_label.test &&
> smbios_attr_label.test(&pdev->dev, NULL)) {
> > + sysfs_remove_file(&pdev->dev.kobj,
> &smbios_attr_label.attr);
> > + return 0;
> > + }
> > + return -1;
> > +}
> > +
> > +static const char dell_dsm_uuid[] = {
>
> Um, a dell specific uuid in a generic file? What happens when we need
> to support another manufacturer?
>
My understanding of uuid was incorrect. I have renamed it to a more generic
device_label_dsm_uuid and ACPI_DSM_FUNCTION to DEVICE_LABEL_DSM
> > + 0xD0, 0x37, 0xC9, 0xE5, 0x53, 0x35, 0x7A, 0x4D,
> > + 0x91, 0x17, 0xEA, 0x4D, 0x19, 0xC3, 0x43, 0x4D
> > +};
> > +
> > +
> > +static int
> > +dsm_get_label(acpi_handle handle, int func,
> > + struct acpi_buffer *output,
> > + char *buf, char *attribute)
> > +{
> > + struct acpi_object_list input;
> > + union acpi_object params[4];
> > + union acpi_object *obj;
> > + int len = 0;
> > +
> > + int err;
> > +
> > + input.count = 4;
> > + input.pointer = params;
> > + params[0].type = ACPI_TYPE_BUFFER;
> > + params[0].buffer.length = sizeof(dell_dsm_uuid);
> > + params[0].buffer.pointer = (char *)dell_dsm_uuid;
> > + params[1].type = ACPI_TYPE_INTEGER;
> > + params[1].integer.value = 0x02;
> > + params[2].type = ACPI_TYPE_INTEGER;
> > + params[2].integer.value = func;
> > + params[3].type = ACPI_TYPE_PACKAGE;
> > + params[3].package.count = 0;
> > + params[3].package.elements = NULL;
> > +
> > + err = acpi_evaluate_object(handle, "_DSM", &input, output);
> > + if (err) {
> > + printk(KERN_INFO "failed to evaulate _DSM\n");
> > + return -1;
> > + }
> > +
> > + obj = (union acpi_object *)output->pointer;
> > +
> > + switch (obj->type) {
> > + case ACPI_TYPE_PACKAGE:
> > + if (obj->package.count == 2) {
> > + len = obj->package.elements[0].integer.value;
> > + if (buf) {
> > + if (!strncmp(attribute, "index",
> strlen(attribute)))
> > + scnprintf(buf, PAGE_SIZE,
> "%lu\n",
> > +
> obj->package.elements[0].integer.value);
> > + else
> > + scnprintf(buf, PAGE_SIZE,
> "%s\n",
> > +
> obj->package.elements[1].string.pointer);
> > + kfree(output->pointer);
> > + return strlen(buf);
> > + }
> > + }
> > + kfree(output->pointer);
> > + return len;
> > + break;
> > + default:
> > + return -1;
> > + }
> > +}
> > +
> > +static ssize_t
> > +acpi_index_string_exist(struct device *dev, char *buf, char
> *attribute)
> > +{
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > + struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
> > + acpi_handle handle;
> > + int length;
> > + int is_addin_card = 0;
> > +
> > + if ((pdev->class >> 16) != PCI_BASE_CLASS_NETWORK)
> > + return -1;
> > +
> > + handle = DEVICE_ACPI_HANDLE(dev);
> > +
> > + if (!handle) {
> > + /*
> > + * The device is an add-in network controller and does
> have
> > + * a valid handle. Try until we get the handle for the
> parent
> > + * bridge
> > + */
> > + struct pci_bus *pbus;
> > + for (pbus = pdev->bus; pbus; pbus = pbus->parent) {
> > + handle =
> DEVICE_ACPI_HANDLE(&(pbus->self->dev));
> > + if (handle)
> > + break;
> > +
> > + }
> > + }
> > +
> > + if ((length = dsm_get_label(handle, DELL_DSM_NETWORK,
> > + &output, buf, attribute)) < 0)
> > + return -1;
> > +
> > + return length;
> > +}
> > +
> > +static ssize_t
> > +acpilabel_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> > +{
> > + return acpi_index_string_exist(dev, buf, "label");
> > +}
> > +
> > +static ssize_t
> > +acpiindex_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> > +{
> > + return acpi_index_string_exist(dev, buf, "index");
> > +}
> > +
> > +struct acpi_attribute acpi_attr_label = {
> > + .attr = {.name = __stringify(label), .mode = 0444, .owner =
> THIS_MODULE},
> > + .show = acpilabel_show,
> > + .test = acpi_index_string_exist,
> > +};
> > +
> > +struct acpi_attribute acpi_attr_index = {
> > + .attr = {.name = __stringify(index), .mode = 0444, .owner =
> THIS_MODULE},
> > + .show = acpiindex_show,
> > + .test = acpi_index_string_exist,
> > +};
> > +
> > +static int
> > +pci_create_acpi_index_label_files(struct pci_dev *pdev)
> > +{
> > + if (acpi_attr_label.test && acpi_attr_label.test(&pdev->dev,
> NULL) > 0) {
> > + sysfs_create_file(&pdev->dev.kobj,
> &acpi_attr_label.attr);
> > + sysfs_create_file(&pdev->dev.kobj,
> &acpi_attr_index.attr);
> > + return 0;
> > + }
> > + return -1;
> > +}
> > +
> > +static int
> > +pci_remove_acpi_index_label_files(struct pci_dev *pdev)
> > +{
> > + if (acpi_attr_label.test && acpi_attr_label.test(&pdev->dev,
> NULL) > 0) {
> > + sysfs_remove_file(&pdev->dev.kobj,
> &acpi_attr_label.attr);
> > + sysfs_remove_file(&pdev->dev.kobj,
> &acpi_attr_index.attr);
> > + return 0;
> > + }
> > + return -1;
> > +}
> > +
> > +int pci_create_acpi_attr_files(struct pci_dev *pdev)
> > +{
> > + if (!pci_create_acpi_index_label_files(pdev))
> > + return 0;
> > + if (!pci_create_smbiosname_file(pdev))
> > + return 0;
> > + return -ENODEV;
> > +}
> > +EXPORT_SYMBOL(pci_create_acpi_attr_files);
>
> EXPORT_SYMBOL_GPL?
>
> Wait, why does this need to be exported at all? What module is ever
> going to call this function?
>
> > +int pci_remove_acpi_attr_files(struct pci_dev *pdev)
> > +{
> > + if (!pci_remove_acpi_index_label_files(pdev))
> > + return 0;
> > + if (!pci_remove_smbiosname_file(pdev))
> > + return 0;
> > + return -ENODEV;
> > +
> > +}
> > +EXPORT_SYMBOL(pci_remove_acpi_attr_files);
>
> Same here, what module will call this?
>
These functions need not be exported as they are not called by any module.
> > +++ b/include/linux/pci-label.h
>
> As discussed above, this whole file does not need to exist.
>
> > +extern int pci_create_acpi_attr_files(struct pci_dev *pdev);
> > +extern int pci_remove_acpi_attr_files(struct pci_dev *pdev);
>
> Just put these two functions in the drivers/pci/pci.h file.
>
Fixed.
In addition to these changes there are a coulple of changes i have done -
1.Removed the check for network devices and evaulate _DSM for any pci device
that has _DSM defined in adherence to the spec.
2.Renamed the functions pci_create,remove-acpi_attr_files to
pci_create,remove_firmware_label_files.
3.Added checks for conditional compilation of if CONFIG_ACPI || CONFIG_DMI
Note: While testing the patch with CONFIG_ACPI set to no, the compilation
would fail with the below message.
CC drivers/pci/pci-label.o
In file included from drivers/pci/pci-label.c:24:
include/linux/pci-acpi.h:39: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’
before ‘acpi_find_root_bridge_handle’
I had to add make this change to proceed with the compilation. It would be
great to know if i am missing something in the way conditional compilation
is implemented or is it a issue.
---
include/linux/pci-acpi.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index c8b6473..bc40827 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -36,8 +36,8 @@ static inline acpi_handle acpi_pci_get_bridge_handle(struct pci_bus *pbus)
pbus->number);
}
#else
-static inline acpi_handle acpi_find_root_bridge_handle(struct pci_dev *pdev)
-{ return NULL; }
+static inline void acpi_find_root_bridge_handle(struct pci_dev *pdev)
+{ }
#endif
#endif /* _PCI_ACPI_H_ */
Please find the patch with above suggestions and changes -
From: Narendra K <narendra_k@dell.com>
Subject: [PATCH V2 1/2] Export firmware assigned labels of pci devices to sysfs
This patch exports the firmware assigned labels of pci devices to
sysfs which could be used by user space.
Signed-off-by: Jordan Hargrave <jordan_hargrave@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
---
drivers/firmware/dmi_scan.c | 24 ++++
drivers/pci/Makefile | 2 +-
drivers/pci/pci-label.c | 273 +++++++++++++++++++++++++++++++++++++++++++
drivers/pci/pci-sysfs.c | 5 +
drivers/pci/pci.h | 2 +
include/linux/dmi.h | 9 ++
6 files changed, 314 insertions(+), 1 deletions(-)
create mode 100644 drivers/pci/pci-label.c
diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index d464672..7d8439b 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -277,6 +277,28 @@ static void __init dmi_save_ipmi_device(const struct dmi_header *dm)
list_add_tail(&dev->list, &dmi_devices);
}
+static void __init dmi_save_devslot(int id, int seg, int bus, int devfn, const char *name)
+{
+ struct dmi_devslot *slot;
+
+ slot = dmi_alloc(sizeof(*slot) + strlen(name) + 1);
+ if (!slot) {
+ printk(KERN_ERR "dmi_save_devslot: out of memory.\n");
+ return;
+ }
+ slot->id = id;
+ slot->seg = seg;
+ slot->bus = bus;
+ slot->devfn = devfn;
+
+ strcpy((char *)&slot[1], name);
+ slot->dev.type = DMI_DEV_TYPE_DEVSLOT;
+ slot->dev.name = (char *)&slot[1];
+ slot->dev.device_data = slot;
+
+ list_add(&slot->dev.list, &dmi_devices);
+}
+
static void __init dmi_save_extended_devices(const struct dmi_header *dm)
{
const u8 *d = (u8*) dm + 5;
@@ -285,6 +307,7 @@ static void __init dmi_save_extended_devices(const struct dmi_header *dm)
if ((*d & 0x80) == 0)
return;
+ dmi_save_devslot(-1, *(u16 *)(d+2), *(d+4), *(d+5), dmi_string_nosave(dm, *(d-1)));
dmi_save_one_device(*d & 0x7f, dmi_string_nosave(dm, *(d - 1)));
}
@@ -333,6 +356,7 @@ static void __init dmi_decode(const struct dmi_header *dm, void *dummy)
break;
case 41: /* Onboard Devices Extended Information */
dmi_save_extended_devices(dm);
+ break;
}
}
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 0b51857..69c503a 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -4,7 +4,7 @@
obj-y += access.o bus.o probe.o remove.o pci.o \
pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
- irq.o vpd.o
+ irq.o vpd.o pci-label.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_SYSFS) += slot.o
diff --git a/drivers/pci/pci-label.c b/drivers/pci/pci-label.c
new file mode 100644
index 0000000..b35d48c
--- /dev/null
+++ b/drivers/pci/pci-label.c
@@ -0,0 +1,273 @@
+/*
+ * Purpose: Export the firmware label associated with a pci network interface
+ * device to sysfs
+ * Copyright (C) 2010 Dell Inc.
+ * by Narendra K <Narendra_K@dell.com>, Jordan Hargrave <Jordan_Hargrave@dell.com>
+ *
+ * This code checks if the pci network device has a related ACPI _DSM. If
+ * available, the code calls the _DSM to retrieve the index and string and
+ * exports them to sysfs. If the ACPI _DSM is not available, it falls back on
+ * SMBIOS. SMBIOS defines type 41 for onboard pci devices. This code retrieves
+ * strings associated with the type 41 and exports it to sysfs.
+ *
+ * Please see http://linux.dell.com/wiki/index.php/Oss/libnetdevname for more
+ * information.
+ */
+
+#include <linux/dmi.h>
+#include <linux/sysfs.h>
+#include <linux/pci.h>
+#include <linux/pci_ids.h>
+#include <linux/module.h>
+#include <linux/acpi.h>
+#include <linux/pci-acpi.h>
+#include <acpi/acpi_drivers.h>
+#include <acpi/acpi_bus.h>
+#include "pci.h"
+
+#define DEVICE_LABEL_DSM 0x07
+
+#if defined CONFIG_DMI
+
+struct smbios_attribute {
+ struct attribute attr;
+ ssize_t (*show) (struct device *dev, char *buf);
+ ssize_t (*test) (struct device *dev, char *buf);
+};
+
+static ssize_t
+smbiosname_string_exists(struct device *dev, char *buf)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ const struct dmi_device *dmi;
+ struct dmi_devslot *dslot;
+ int bus;
+ int devfn;
+
+ bus = pdev->bus->number;
+ devfn = pdev->devfn;
+
+ dmi = NULL;
+ while ((dmi = dmi_find_device(DMI_DEV_TYPE_DEVSLOT, NULL, dmi)) != NULL) {
+ dslot = dmi->device_data;
+ if (dslot && dslot->bus == bus && dslot->devfn == devfn) {
+ if (buf)
+ return scnprintf(buf, PAGE_SIZE, "%s\n", dmi->name);
+ return strlen(dmi->name);
+ }
+ }
+
+ return 0;
+}
+
+static ssize_t
+smbiosname_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ return smbiosname_string_exists(dev, buf);
+}
+
+struct smbios_attribute smbios_attr_label = {
+ .attr = {.name = "label", .mode = 0444, .owner = THIS_MODULE},
+ .show = smbiosname_show,
+ .test = smbiosname_string_exists,
+};
+
+static int
+pci_create_smbiosname_file(struct pci_dev *pdev)
+{
+ if (smbios_attr_label.test && smbios_attr_label.test(&pdev->dev, NULL)) {
+ sysfs_create_file(&pdev->dev.kobj, &smbios_attr_label.attr);
+ return 0;
+ }
+ return -1;
+}
+
+static int
+pci_remove_smbiosname_file(struct pci_dev *pdev)
+{
+ if (smbios_attr_label.test && smbios_attr_label.test(&pdev->dev, NULL)) {
+ sysfs_remove_file(&pdev->dev.kobj, &smbios_attr_label.attr);
+ return 0;
+ }
+ return -1;
+}
+#else
+static inline int
+pci_create_smbiosname_file(struct pci_dev *pdev)
+{
+ return -1;
+}
+
+static inline int
+pci_remove_smbiosname_file(struct pci_dev *pdev)
+{
+ return -1;
+}
+#endif
+
+#if defined CONFIG_ACPI
+
+static const char device_label_dsm_uuid[] = {
+ 0xD0, 0x37, 0xC9, 0xE5, 0x53, 0x35, 0x7A, 0x4D,
+ 0x91, 0x17, 0xEA, 0x4D, 0x19, 0xC3, 0x43, 0x4D
+};
+
+struct acpi_attribute {
+ struct attribute attr;
+ ssize_t (*show) (struct device *dev, char *buf);
+ ssize_t (*test) (struct device *dev, char *buf);
+};
+
+static int
+dsm_get_label(acpi_handle handle, int func,
+ struct acpi_buffer *output,
+ char *buf, char *attribute)
+{
+ struct acpi_object_list input;
+ union acpi_object params[4];
+ union acpi_object *obj;
+ int len = 0;
+
+ int err;
+
+ input.count = 4;
+ input.pointer = params;
+ params[0].type = ACPI_TYPE_BUFFER;
+ params[0].buffer.length = sizeof(device_label_dsm_uuid);
+ params[0].buffer.pointer = (char *)device_label_dsm_uuid;
+ params[1].type = ACPI_TYPE_INTEGER;
+ params[1].integer.value = 0x02;
+ params[2].type = ACPI_TYPE_INTEGER;
+ params[2].integer.value = func;
+ params[3].type = ACPI_TYPE_PACKAGE;
+ params[3].package.count = 0;
+ params[3].package.elements = NULL;
+
+ err = acpi_evaluate_object(handle, "_DSM", &input, output);
+ if (err)
+ return -1;
+
+
+ obj = (union acpi_object *)output->pointer;
+
+ switch (obj->type) {
+ case ACPI_TYPE_PACKAGE:
+ if (obj->package.count != 2)
+ break;
+ len = obj->package.elements[0].integer.value;
+ if (buf) {
+ if (!strncmp(attribute, "index", strlen(attribute)))
+ scnprintf(buf, PAGE_SIZE, "%lu\n",
+ obj->package.elements[0].integer.value);
+ else
+ scnprintf(buf, PAGE_SIZE, "%s\n",
+ obj->package.elements[1].string.pointer);
+ kfree(output->pointer);
+ return strlen(buf);
+ }
+ kfree(output->pointer);
+ return len;
+ break;
+ default:
+ kfree(output->pointer);
+ return -1;
+ }
+}
+
+static ssize_t
+acpi_index_string_exist(struct device *dev, char *buf, char *attribute)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
+ acpi_handle handle;
+ int length;
+
+ handle = DEVICE_ACPI_HANDLE(dev);
+
+ if (!handle)
+ return -1;
+
+ if ((length = dsm_get_label(handle, DEVICE_LABEL_DSM,
+ &output, buf, attribute)) < 0)
+ return -1;
+
+ return length;
+}
+
+static ssize_t
+acpilabel_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ return acpi_index_string_exist(dev, buf, "label");
+}
+
+static ssize_t
+acpiindex_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ return acpi_index_string_exist(dev, buf, "index");
+}
+
+struct acpi_attribute acpi_attr_label = {
+ .attr = {.name = "label", .mode = 0444, .owner = THIS_MODULE},
+ .show = acpilabel_show,
+ .test = acpi_index_string_exist,
+};
+
+struct acpi_attribute acpi_attr_index = {
+ .attr = {.name = "index", .mode = 0444, .owner = THIS_MODULE},
+ .show = acpiindex_show,
+ .test = acpi_index_string_exist,
+};
+
+static int
+pci_create_acpi_index_label_files(struct pci_dev *pdev)
+{
+ if (acpi_attr_label.test && acpi_attr_label.test(&pdev->dev, NULL) > 0) {
+ sysfs_create_file(&pdev->dev.kobj, &acpi_attr_label.attr);
+ sysfs_create_file(&pdev->dev.kobj, &acpi_attr_index.attr);
+ return 0;
+ }
+ return -1;
+}
+
+static int
+pci_remove_acpi_index_label_files(struct pci_dev *pdev)
+{
+ if (acpi_attr_label.test && acpi_attr_label.test(&pdev->dev, NULL) > 0) {
+ sysfs_remove_file(&pdev->dev.kobj, &acpi_attr_label.attr);
+ sysfs_remove_file(&pdev->dev.kobj, &acpi_attr_index.attr);
+ return 0;
+ }
+ return -1;
+}
+#else
+static inline int
+pci_create_acpi_index_label_files(struct pci_dev *pdev)
+{
+ return -1;
+}
+
+static inline int
+pci_remove_acpi_index_label_files(struct pci_dev *pdev)
+{
+ return -1;
+}
+#endif
+
+int pci_create_firmware_label_files(struct pci_dev *pdev)
+{
+ if (!pci_create_acpi_index_label_files(pdev))
+ return 0;
+ if (!pci_create_smbiosname_file(pdev))
+ return 0;
+ return -ENODEV;
+}
+
+int pci_remove_firmware_label_files(struct pci_dev *pdev)
+{
+ if (!pci_remove_acpi_index_label_files(pdev))
+ return 0;
+ if (!pci_remove_smbiosname_file(pdev))
+ return 0;
+ return -ENODEV;
+}
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index fad9398..4ed517f 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1073,6 +1073,8 @@ int __must_check pci_create_sysfs_dev_files (struct pci_dev *pdev)
if (retval)
goto err_vga_file;
+ pci_create_firmware_label_files(pdev);
+
return 0;
err_vga_file:
@@ -1140,6 +1142,9 @@ void pci_remove_sysfs_dev_files(struct pci_dev *pdev)
sysfs_remove_bin_file(&pdev->dev.kobj, pdev->rom_attr);
kfree(pdev->rom_attr);
}
+
+ pci_remove_firmware_label_files(pdev);
+
}
static int __init pci_sysfs_init(void)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4eb10f4..f223283 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -11,6 +11,8 @@
extern int pci_uevent(struct device *dev, struct kobj_uevent_env *env);
extern int pci_create_sysfs_dev_files(struct pci_dev *pdev);
extern void pci_remove_sysfs_dev_files(struct pci_dev *pdev);
+extern int pci_create_firmware_label_files(struct pci_dev *pdev);
+extern int pci_remove_firmware_label_files(struct pci_dev *pdev);
extern void pci_cleanup_rom(struct pci_dev *dev);
#ifdef HAVE_PCI_MMAP
extern int pci_mmap_fits(struct pci_dev *pdev, int resno,
diff --git a/include/linux/dmi.h b/include/linux/dmi.h
index a8a3e1a..cc57c3a 100644
--- a/include/linux/dmi.h
+++ b/include/linux/dmi.h
@@ -20,6 +20,7 @@ enum dmi_device_type {
DMI_DEV_TYPE_SAS,
DMI_DEV_TYPE_IPMI = -1,
DMI_DEV_TYPE_OEM_STRING = -2,
+ DMI_DEV_TYPE_DEVSLOT = -3,
};
struct dmi_header {
@@ -37,6 +38,14 @@ struct dmi_device {
#ifdef CONFIG_DMI
+struct dmi_devslot {
+ struct dmi_device dev;
+ int id;
+ int seg;
+ int bus;
+ int devfn;
+};
+
extern int dmi_check_system(const struct dmi_system_id *list);
const struct dmi_system_id *dmi_first_match(const struct dmi_system_id *list);
extern const char * dmi_get_system_info(int field);
--
1.6.5.2
With regards,
Narendra K
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
From: Andrew Morton @ 2010-06-03 20:02 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
In-Reply-To: <bug-16083-10286@https.bugzilla.kernel.org/>
On Mon, 31 May 2010 15:55:12 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=16083
>
> Summary: swapper: Page allocation failure
> Product: Memory Management
> Version: 2.5
> Kernel Version: 2.6.34
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Other
> AssignedTo: akpm@linux-foundation.org
> ReportedBy: sgunderson@bigfoot.com
> Regression: No
>
>
> Hi,
>
> Since upgrading from a Q9450 to 2xE5520 (and upgrading from 2.6.34-rc-something
> to 2.6.34), I've started seeing these:
>
> [605882.372418] swapper: page allocation failure. order:2, mode:0x4020
> [605882.378981] Pid: 0, comm: swapper Not tainted 2.6.34 #1
> [605882.384617] Call Trace:
> [605882.387499] <IRQ> [<ffffffff81096d5a>] __alloc_pages_nodemask+0x5b0/0x629
> [605882.395068] [<ffffffff81096de5>] __get_free_pages+0x12/0x4f
> [605882.401103] [<ffffffff810bdeb4>] __kmalloc_track_caller+0x4c/0x156
> [605882.407817] [<ffffffff81245986>] ? sock_alloc_send_pskb+0xdd/0x32d
> [605882.414556] [<ffffffff8124a515>] __alloc_skb+0x66/0x15b
I wonder if we should switch __alloc_skb() over to __GFP_NOWARN.
People keep on reporting events such as the above, and nobody's
getting any value from this.
Downsides:
- the change would tend to deprive MM developers of prompt "hey you
broke it again" notifications.
- if a system is getting enough allocation failures to impact
throughput, the operators won't *know* that it's happening, and so
they won't make the changes necessary to reduce the frequency of
memory allocation failures.
If these are likely to be a problem, perhaps networking could provide
some other form of "hey, you keep on running out of memory"
notification, if it doesn't already do so.
Thoughts?
^ permalink raw reply
* [PATCH] [ath5k][leds] Ability to disable leds support. If leds support enabled do not force mac802.11 leds layer selection.
From: Dmytro Milinevskyy @ 2010-06-03 19:39 UTC (permalink / raw)
Cc: Jiri Slaby, Nick Kossifidis, Luis R. Rodriguez, Bob Copeland,
John W. Linville, GeunSik Lim, Greg Kroah-Hartman, Lukas Turek,
Mark Hindley, Johannes Berg, Jiri Kosina, Kalle Valo, Keng-Yu Lin,
Luca Verdesca, Shahar Or, linux-wireless, netdev, linux-kernel
Hi!
Here is the patch to disable ath5k leds support on build stage.
However if the leds support was enabled do not force selection of 802.11 leds layer.
Depency on LEDS_CLASS is kept.
Suggestion given by Pavel Roskin and Bob Copeland applied.
Regards,
--Dima
---
drivers/net/wireless/ath/ath5k/Kconfig | 12 +++++++++---
drivers/net/wireless/ath/ath5k/Makefile | 2 +-
drivers/net/wireless/ath/ath5k/ath5k.h | 22 ++++++++++++++++++++++
drivers/net/wireless/ath/ath5k/base.h | 13 +++++++++----
drivers/net/wireless/ath/ath5k/gpio.c | 2 ++
5 files changed, 43 insertions(+), 8 deletions(-)
diff --git a/drivers/net/wireless/ath/ath5k/Kconfig b/drivers/net/wireless/ath/ath5k/Kconfig
index eb83b7b..29f4572 100644
--- a/drivers/net/wireless/ath/ath5k/Kconfig
+++ b/drivers/net/wireless/ath/ath5k/Kconfig
@@ -1,9 +1,6 @@
config ATH5K
tristate "Atheros 5xxx wireless cards support"
depends on PCI && MAC80211
- select MAC80211_LEDS
- select LEDS_CLASS
- select NEW_LEDS
---help---
This module adds support for wireless adapters based on
Atheros 5xxx chipset.
@@ -18,6 +15,15 @@ config ATH5K
If you choose to build a module, it'll be called ath5k. Say M if
unsure.
+
+config ATH5K_LEDS
+ tristate "Atheros 5xxx wireless cards LEDs support"
+ depends on ATH5K
+ select NEW_LEDS
+ select LEDS_CLASS
+ ---help---
+ Atheros 5xxx LED support.
+
config ATH5K_DEBUG
bool "Atheros 5xxx debugging"
depends on ATH5K
diff --git a/drivers/net/wireless/ath/ath5k/Makefile b/drivers/net/wireless/ath/ath5k/Makefile
index cc09595..6d552dd 100644
--- a/drivers/net/wireless/ath/ath5k/Makefile
+++ b/drivers/net/wireless/ath/ath5k/Makefile
@@ -10,8 +10,8 @@ ath5k-y += phy.o
ath5k-y += reset.o
ath5k-y += attach.o
ath5k-y += base.o
-ath5k-y += led.o
ath5k-y += rfkill.o
ath5k-y += ani.o
ath5k-$(CONFIG_ATH5K_DEBUG) += debug.o
+ath5k-$(CONFIG_ATH5K_LEDS) += led.o
obj-$(CONFIG_ATH5K) += ath5k.o
diff --git a/drivers/net/wireless/ath/ath5k/ath5k.h b/drivers/net/wireless/ath/ath5k/ath5k.h
index 2785946..bb7e09a 100644
--- a/drivers/net/wireless/ath/ath5k/ath5k.h
+++ b/drivers/net/wireless/ath/ath5k/ath5k.h
@@ -1148,11 +1148,27 @@ struct ath5k_hw {
int ath5k_hw_attach(struct ath5k_softc *sc);
void ath5k_hw_detach(struct ath5k_hw *ah);
+#ifdef CONFIG_ATH5K_LEDS
/* LED functions */
int ath5k_init_leds(struct ath5k_softc *sc);
void ath5k_led_enable(struct ath5k_softc *sc);
void ath5k_led_off(struct ath5k_softc *sc);
void ath5k_unregister_leds(struct ath5k_softc *sc);
+#else
+static inline int ath5k_init_leds(struct ath5k_softc *sc)
+{
+ return 0;
+}
+static inline void ath5k_led_enable(struct ath5k_softc *sc)
+{
+}
+static inline void ath5k_led_off(struct ath5k_softc *sc)
+{
+}
+static inline void ath5k_unregister_leds(struct ath5k_softc *sc)
+{
+}
+#endif
/* Reset Functions */
int ath5k_hw_nic_wakeup(struct ath5k_hw *ah, int flags, bool initial);
@@ -1233,7 +1249,13 @@ int ath5k_hw_set_slot_time(struct ath5k_hw *ah, unsigned int slot_time);
int ath5k_hw_init_desc_functions(struct ath5k_hw *ah);
/* GPIO Functions */
+#ifdef CONFIG_ATH5K_LEDS
void ath5k_hw_set_ledstate(struct ath5k_hw *ah, unsigned int state);
+#else
+static inline void ath5k_hw_set_ledstate(struct ath5k_hw *ah, unsigned int state)
+{
+}
+#endif
int ath5k_hw_set_gpio_input(struct ath5k_hw *ah, u32 gpio);
int ath5k_hw_set_gpio_output(struct ath5k_hw *ah, u32 gpio);
u32 ath5k_hw_get_gpio(struct ath5k_hw *ah, u32 gpio);
diff --git a/drivers/net/wireless/ath/ath5k/base.h b/drivers/net/wireless/ath/ath5k/base.h
index 56221bc..97b26c1 100644
--- a/drivers/net/wireless/ath/ath5k/base.h
+++ b/drivers/net/wireless/ath/ath5k/base.h
@@ -86,6 +86,7 @@ struct ath5k_txq {
#define ATH5K_LED_MAX_NAME_LEN 31
+#ifdef CONFIG_ATH5K_LEDS
/*
* State for LED triggers
*/
@@ -95,6 +96,7 @@ struct ath5k_led
struct ath5k_softc *sc; /* driver state */
struct led_classdev led_dev; /* led classdev */
};
+#endif
/* Rfkill */
struct ath5k_rfkill {
@@ -186,9 +188,6 @@ struct ath5k_softc {
u8 bssidmask[ETH_ALEN];
- unsigned int led_pin, /* GPIO pin for driving LED */
- led_on; /* pin setting for LED on */
-
struct tasklet_struct restq; /* reset tasklet */
unsigned int rxbufsize; /* rx size based on mtu */
@@ -196,7 +195,6 @@ struct ath5k_softc {
spinlock_t rxbuflock;
u32 *rxlink; /* link ptr in last RX desc */
struct tasklet_struct rxtq; /* rx intr tasklet */
- struct ath5k_led rx_led; /* rx led */
struct list_head txbuf; /* transmit buffer */
spinlock_t txbuflock;
@@ -204,7 +202,14 @@ struct ath5k_softc {
struct ath5k_txq txqs[AR5K_NUM_TX_QUEUES]; /* tx queues */
struct ath5k_txq *txq; /* main tx queue */
struct tasklet_struct txtq; /* tx intr tasklet */
+
+
+#ifdef CONFIG_ATH5K_LEDS
+ unsigned int led_pin, /* GPIO pin for driving LED */
+ led_on; /* pin setting for LED on */
+ struct ath5k_led rx_led; /* rx led */
struct ath5k_led tx_led; /* tx led */
+#endif
struct ath5k_rfkill rf_kill;
diff --git a/drivers/net/wireless/ath/ath5k/gpio.c b/drivers/net/wireless/ath/ath5k/gpio.c
index 64a27e7..9e757b3 100644
--- a/drivers/net/wireless/ath/ath5k/gpio.c
+++ b/drivers/net/wireless/ath/ath5k/gpio.c
@@ -25,6 +25,7 @@
#include "debug.h"
#include "base.h"
+#ifdef CONFIG_ATH5K_LEDS
/*
* Set led state
*/
@@ -76,6 +77,7 @@ void ath5k_hw_set_ledstate(struct ath5k_hw *ah, unsigned int state)
else
AR5K_REG_ENABLE_BITS(ah, AR5K_PCICFG, led_5210);
}
+#endif
/*
* Set GPIO inputs
--
1.7.1
^ permalink raw reply related
* [PATCH v2] net: deliver skbs on inactive slaves to exact matches
From: John Fastabend @ 2010-06-03 19:30 UTC (permalink / raw)
To: fubar, davem; +Cc: john.r.fastabend, nhorman, bonding-devel, netdev
Currently, the accelerated receive path for VLAN's will
drop packets if the real device is an inactive slave and
is not one of the special pkts tested for in
skb_bond_should_drop(). This behavior is different then
the non-accelerated path and for pkts over a bonded vlan.
For example,
vlanx -> bond0 -> ethx
will be dropped in the vlan path and not delivered to any
packet handlers at all. However,
bond0 -> vlanx -> ethx
and
bond0 -> ethx
will be delivered to handlers that match the exact dev,
because the VLAN path checks the real_dev which is not a
slave and netif_recv_skb() doesn't drop frames but only
delivers them to exact matches.
This patch adds a sk_buff flag which is used for tagging
skbs that would previously been dropped and allows the
skb to continue to skb_netif_recv(). Here we add
logic to check for the deliver_no_wcard flag and if it
is set only deliver to handlers that match exactly. This
makes both paths above consistent and gives pkt handlers
a way to identify skbs that come from inactive slaves.
Without this patch in some configurations skbs will be
delivered to handlers with exact matches and in others
be dropped out right in the vlan path.
I have tested the following 4 configurations in failover modes
and load balancing modes.
# bond0 -> ethx
# vlanx -> bond0 -> ethx
# bond0 -> vlanx -> ethx
# bond0 -> ethx
|
vlanx -> --
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
include/linux/skbuff.h | 5 ++++-
net/8021q/vlan_core.c | 4 ++--
net/core/dev.c | 17 ++++++++++++++---
3 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bf243fc..f89e7fd 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -380,7 +380,10 @@ struct sk_buff {
kmemcheck_bitfield_begin(flags2);
__u16 queue_mapping:16;
#ifdef CONFIG_IPV6_NDISC_NODETYPE
- __u8 ndisc_nodetype:2;
+ __u8 ndisc_nodetype:2,
+ deliver_no_wcard:1;
+#else
+ __u8 deliver_no_wcard:1;
#endif
kmemcheck_bitfield_end(flags2);
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index bd537fc..50f58f5 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -12,7 +12,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
return NET_RX_DROP;
if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
- goto drop;
+ skb->deliver_no_wcard = 1;
skb->skb_iif = skb->dev->ifindex;
__vlan_hwaccel_put_tag(skb, vlan_tci);
@@ -84,7 +84,7 @@ vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
struct sk_buff *p;
if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
- goto drop;
+ skb->deliver_no_wcard = 1;
skb->skb_iif = skb->dev->ifindex;
__vlan_hwaccel_put_tag(skb, vlan_tci);
diff --git a/net/core/dev.c b/net/core/dev.c
index ccba781..9c356fd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2821,13 +2821,24 @@ static int __netif_receive_skb(struct sk_buff *skb)
if (!skb->skb_iif)
skb->skb_iif = skb->dev->ifindex;
+ /*
+ * bonding note: skbs received on inactive slaves should only
+ * be delivered to pkt handlers that are exact matches. Also
+ * the deliver_no_wcard flag will be set. If packet handlers
+ * are sensitive to duplicate packets these skbs will need to
+ * be dropped at the handler. The vlan accel path may have
+ * already set the deliver_no_wcard flag.
+ */
null_or_orig = NULL;
orig_dev = skb->dev;
master = ACCESS_ONCE(orig_dev->master);
- if (master) {
- if (skb_bond_should_drop(skb, master))
+ if (skb->deliver_no_wcard)
+ null_or_orig = orig_dev;
+ else if (master) {
+ if (skb_bond_should_drop(skb, master)) {
+ skb->deliver_no_wcard = 1;
null_or_orig = orig_dev; /* deliver only exact match */
- else
+ } else
skb->dev = master;
}
^ permalink raw reply related
* Re: pull request: wireless-2.6 2010-06-03
From: David Miller @ 2010-06-03 19:30 UTC (permalink / raw)
To: linville-2XuSBdqkA4R54TAoqtyWWQ
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100603173843.GB14597-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
From: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Date: Thu, 3 Jun 2010 13:38:43 -0400
> This is a handful of fixes intended for 2.6.35. The ath5k patch fixes a
> problem reported by multiple people as "Commit 6b5d11 breaks association
> with WPA enabled APs". The ones from Johannes fix a locking problem and
> and a problem caused by reading the wrong data location.
>
> Please let me know if there are problems!
Pulled, thanks John!
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [Patch 2/2]r8169: remove redundant RTL_W32
From: David Miller @ 2010-06-03 19:29 UTC (permalink / raw)
To: junchangwang; +Cc: romieu, netdev
In-Reply-To: <20100603112721.GD24909@host-a-55.ustcsz.edu.cn>
From: Junchang Wang <junchangwang@gmail.com>
Date: Thu, 3 Jun 2010 19:27:23 +0800
> Writting "cmd" into "CounterAddrLow" twice is redundant.
>
> Signed-off-by: Junchang Wang <junchangwang@gmail.com>
I definitely think this is being done on purpose.
^ permalink raw reply
* [PATCH v3] netfilter: Xtables: idletimer target implementation
From: luciano.coelho @ 2010-06-03 19:14 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, jengelh, kaber, Luciano Coelho, Timo Teras
From: Luciano Coelho <luciano.coelho@nokia.com>
This patch implements an idletimer Xtables target that can be used to
identify when interfaces have been idle for a certain period of time.
Timers are identified by labels and are created when a rule is set with a new
label. The rules also take a timeout value (in seconds) as an option. If
more than one rule uses the same timer label, the timer will be restarted
whenever any of the rules get a hit.
One entry for each timer is created in sysfs. This attribute contains the
timer remaining for the timer to expire. The attributes are located under
the xt_idletimer class:
/sys/class/xt_idletimer/timers/<label>
When the timer expires, the target module sends a sysfs notification to the
userspace, which can then decide what to do (eg. disconnect to save power).
Cc: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
---
v2: Fixed according to Jan's comments
v3: Changed to a device class in the virtual bus in sysfs
Removed unnecessary attribute group
Fixed missing deallocation in some error cases
include/linux/netfilter/xt_IDLETIMER.h | 40 ++++
net/netfilter/Kconfig | 12 +
net/netfilter/Makefile | 1 +
net/netfilter/xt_IDLETIMER.c | 356 ++++++++++++++++++++++++++++++++
4 files changed, 409 insertions(+), 0 deletions(-)
create mode 100644 include/linux/netfilter/xt_IDLETIMER.h
create mode 100644 net/netfilter/xt_IDLETIMER.c
diff --git a/include/linux/netfilter/xt_IDLETIMER.h b/include/linux/netfilter/xt_IDLETIMER.h
new file mode 100644
index 0000000..6e62224
--- /dev/null
+++ b/include/linux/netfilter/xt_IDLETIMER.h
@@ -0,0 +1,40 @@
+/*
+ * linux/include/linux/netfilter/xt_IDLETIMER.h
+ *
+ * Header file for Xtables timer target module.
+ *
+ * Copyright (C) 2004, 2010 Nokia Corporation
+ * Written by Timo Teras <ext-timo.teras@nokia.com>
+ *
+ * Converted to x_tables and forward-ported to 2.6.34
+ * by Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * Contact: Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#ifndef _XT_IDLETIMER_H
+#define _XT_IDLETIMER_H
+
+#define MAX_LABEL_SIZE 32
+
+struct idletimer_tg_info {
+ __u32 timeout;
+
+ char label[MAX_LABEL_SIZE];
+};
+
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 8593a77..413ed24 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -424,6 +424,18 @@ config NETFILTER_XT_TARGET_HL
since you can easily create immortal packets that loop
forever on the network.
+config NETFILTER_XT_TARGET_IDLETIMER
+ tristate "IDLETIMER target support"
+ depends on NETFILTER_ADVANCED
+ help
+
+ This option adds the `IDLETIMER' target. Each matching packet
+ resets the timer associated with label specified when the rule is
+ added. When the timer expires, it triggers a sysfs notification.
+ The remaining time for expiration can be read via sysfs.
+
+ To compile it as a module, choose M here. If unsure, say N.
+
config NETFILTER_XT_TARGET_LED
tristate '"LED" target support'
depends on LEDS_CLASS && LEDS_TRIGGERS
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 14e3a8f..e28420a 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPMSS) += xt_TCPMSS.o
obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o
obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
# matches
obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c
new file mode 100644
index 0000000..65c195e
--- /dev/null
+++ b/net/netfilter/xt_IDLETIMER.c
@@ -0,0 +1,356 @@
+/*
+ * linux/net/netfilter/xt_IDLETIMER.c
+ *
+ * Netfilter module to trigger a timer when packet matches.
+ * After timer expires a kevent will be sent.
+ *
+ * Copyright (C) 2004, 2010 Nokia Corporation
+ * Written by Timo Teras <ext-timo.teras@nokia.com>
+ *
+ * Converted to x_tables and reworked for upstream inclusion
+ * by Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * Contact: Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/timer.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_IDLETIMER.h>
+#include <linux/kobject.h>
+#include <linux/workqueue.h>
+#include <linux/sysfs.h>
+
+struct idletimer_tg {
+ struct list_head entry;
+ struct timer_list timer;
+ struct work_struct work;
+
+ struct kobject *kobj;
+ struct idletimer_tg_attr *attr;
+
+ unsigned int refcnt;
+};
+
+struct idletimer_tg_attr {
+ struct attribute attr;
+ ssize_t (*show)(struct kobject *kobj,
+ struct attribute *attr, char *buf);
+};
+
+static LIST_HEAD(idletimer_tg_list);
+static DEFINE_SPINLOCK(list_lock);
+
+static struct kobject *idletimer_tg_kobj;
+
+static
+struct idletimer_tg *__idletimer_tg_find_by_label(const char *label)
+{
+ struct idletimer_tg *entry;
+
+ BUG_ON(!label);
+
+ list_for_each_entry(entry, &idletimer_tg_list, entry) {
+ if (!strcmp(label, entry->attr->attr.name))
+ return entry;
+ }
+
+ return NULL;
+}
+
+static ssize_t idletimer_tg_show(struct kobject *kobj, struct attribute *attr,
+ char *buf)
+{
+ struct idletimer_tg *timer;
+ unsigned long expires = 0;
+
+ spin_lock_bh(&list_lock);
+ timer = __idletimer_tg_find_by_label(attr->name);
+ if (timer)
+ expires = timer->timer.expires;
+ spin_unlock_bh(&list_lock);
+
+ if (expires > jiffies)
+ return sprintf(buf, "%u\n",
+ jiffies_to_msecs(expires - jiffies) / 1000);
+
+ return sprintf(buf, "0\n");
+}
+
+static void idletimer_tg_delete(const struct idletimer_tg_info *info)
+{
+ struct idletimer_tg *timer;
+
+ spin_lock_bh(&list_lock);
+ timer = __idletimer_tg_find_by_label(info->label);
+ if (!timer) {
+ spin_unlock_bh(&list_lock);
+ return;
+ }
+
+ if (--timer->refcnt == 0) {
+ pr_debug("deleting timer %s\n", info->label);
+
+ list_del(&timer->entry);
+ del_timer_sync(&timer->timer);
+ spin_unlock_bh(&list_lock);
+
+ sysfs_remove_file(idletimer_tg_kobj, &timer->attr->attr);
+ kfree(timer->attr->attr.name);
+ kfree(timer->attr);
+ kfree(timer);
+ } else {
+ spin_unlock_bh(&list_lock);
+ pr_debug("decreased refcnt of timer %s to %u\n",
+ info->label, timer->refcnt);
+ }
+}
+
+static void idletimer_tg_work(struct work_struct *work)
+{
+ struct idletimer_tg *timer = container_of(work, struct idletimer_tg,
+ work);
+
+ sysfs_notify(idletimer_tg_kobj, NULL,
+ timer->attr->attr.name);
+}
+
+static void idletimer_tg_expired(unsigned long data)
+{
+ struct idletimer_tg *timer = (struct idletimer_tg *) data;
+
+ pr_debug("timer %s expired\n",
+ timer->attr->attr.name);
+
+ schedule_work(&timer->work);
+}
+
+static
+struct idletimer_tg *idletimer_tg_create(const struct idletimer_tg_info *info)
+{
+ struct idletimer_tg *timer;
+ struct idletimer_tg_attr *attr;
+
+ attr = kzalloc(sizeof(attr), GFP_KERNEL);
+ if (!attr) {
+ pr_debug("couldn't alloc attribute\n");
+ return NULL;
+ }
+
+ attr->attr.name = kstrdup(info->label, GFP_KERNEL);
+ if (!attr->attr.name) {
+ pr_debug("couldn't alloc attribute name\n");
+ goto out_free_attr;
+ }
+ attr->attr.mode = S_IRUGO;
+ attr->show = idletimer_tg_show;
+
+ if (sysfs_create_file(idletimer_tg_kobj, &attr->attr)) {
+ pr_debug("couldn't add attr to sysfs\n");
+ goto out_free_name;
+ }
+
+ timer = kmalloc(sizeof(struct idletimer_tg), GFP_KERNEL);
+ if (!timer) {
+ pr_debug("couldn't alloc timer\n");
+ goto out_free_file;
+ }
+
+ spin_lock_bh(&list_lock);
+ list_add(&timer->entry, &idletimer_tg_list);
+
+ init_timer(&timer->timer);
+ setup_timer(&timer->timer, idletimer_tg_expired, (unsigned long) timer);
+ mod_timer(&timer->timer,
+ msecs_to_jiffies(info->timeout * 1000) + jiffies);
+
+ timer->attr = attr;
+ timer->refcnt = 0;
+
+ INIT_WORK(&timer->work, idletimer_tg_work);
+ spin_unlock_bh(&list_lock);
+
+ return timer;
+
+out_free_file:
+ sysfs_remove_file(idletimer_tg_kobj, &attr->attr);
+out_free_name:
+ kfree(attr->attr.name);
+out_free_attr:
+ kfree(attr);
+ return NULL;
+}
+
+static void idletimer_tg_cleanup(void)
+{
+ struct idletimer_tg *timer;
+
+ spin_lock(&list_lock);
+ list_for_each_entry(timer, &idletimer_tg_list, entry) {
+ pr_debug("deleting timer %s\n", timer->attr->attr.name);
+
+ list_del(&timer->entry);
+ del_timer_sync(&timer->timer);
+ kfree(timer->attr->attr.name);
+ kfree(timer->attr);
+ kfree(timer);
+ }
+ spin_unlock(&list_lock);
+}
+
+/*
+ * The actual xt_tables plugin.
+ */
+static unsigned int idletimer_tg_target(struct sk_buff *skb,
+ const struct xt_action_param *par)
+{
+ const struct idletimer_tg_info *info = par->targinfo;
+ struct idletimer_tg *timer;
+
+ pr_debug("resetting timer %s, timeout period %u\n",
+ info->label, info->timeout);
+
+ spin_lock(&list_lock);
+ timer = __idletimer_tg_find_by_label(info->label);
+
+ BUG_ON(!timer);
+
+ mod_timer(&timer->timer,
+ msecs_to_jiffies(info->timeout * 1000) + jiffies);
+ spin_unlock(&list_lock);
+
+ return XT_CONTINUE;
+}
+
+static int idletimer_tg_checkentry(const struct xt_tgchk_param *par)
+{
+ const struct idletimer_tg_info *info = par->targinfo;
+ struct idletimer_tg *timer;
+
+ pr_debug("checkentry targinfo %s\n", info->label);
+
+ if (info->timeout == 0) {
+ pr_debug("timeout value is zero\n");
+ return -EINVAL;
+ }
+
+ if (!info->label || strlen(info->label) == 0) {
+ pr_debug("label is missing\n");
+ return -EINVAL;
+ }
+
+ spin_lock(&list_lock);
+ timer = __idletimer_tg_find_by_label(info->label);
+ if (!timer) {
+ spin_unlock(&list_lock);
+ timer = idletimer_tg_create(info);
+ if (!timer) {
+ pr_debug("failed to create timer\n");
+ return -ENOMEM;
+ }
+ spin_lock(&list_lock);
+ }
+
+ timer->refcnt++;
+ mod_timer(&timer->timer,
+ msecs_to_jiffies(info->timeout * 1000) + jiffies);
+ spin_unlock(&list_lock);
+
+ return 0;
+}
+
+static void idletimer_tg_destroy(const struct xt_tgdtor_param *par)
+{
+ const struct idletimer_tg_info *info = par->targinfo;
+
+ pr_debug("destroy targinfo %s\n", info->label);
+
+ idletimer_tg_delete(info);
+}
+
+static struct xt_target idletimer_tg __read_mostly = {
+ .name = "IDLETIMER",
+ .family = NFPROTO_UNSPEC,
+ .target = idletimer_tg_target,
+ .targetsize = sizeof(struct idletimer_tg_info),
+ .checkentry = idletimer_tg_checkentry,
+ .destroy = idletimer_tg_destroy,
+ .me = THIS_MODULE,
+};
+
+static struct class *idletimer_tg_class;
+
+static struct device *idletimer_tg_device;
+
+static int __init idletimer_tg_init(void)
+{
+ int err;
+
+ idletimer_tg_class = class_create(THIS_MODULE, "xt_idletimer");
+ err = PTR_ERR(idletimer_tg_class);
+ if (IS_ERR(idletimer_tg_class)) {
+ pr_debug("couldn't register device class\n");
+ goto out;
+ }
+
+ idletimer_tg_device = device_create(idletimer_tg_class, NULL,
+ MKDEV(0, 0), NULL, "timers");
+ err = PTR_ERR(idletimer_tg_device);
+ if (IS_ERR(idletimer_tg_device)) {
+ pr_debug("couldn't register system device\n");
+ goto out_class;
+ }
+
+ idletimer_tg_kobj = &idletimer_tg_device->kobj;
+
+ err = xt_register_target(&idletimer_tg);
+ if (err < 0) {
+ pr_debug("couldn't register xt target\n");
+ goto out_dev;
+ }
+
+ return 0;
+out_dev:
+ device_destroy(idletimer_tg_class, MKDEV(0, 0));
+out_class:
+ class_destroy(idletimer_tg_class);
+out:
+ return err;
+}
+
+static void __exit idletimer_tg_exit(void)
+{
+ xt_unregister_target(&idletimer_tg);
+
+ device_destroy(idletimer_tg_class, MKDEV(0, 0));
+ class_destroy(idletimer_tg_class);
+
+ idletimer_tg_cleanup();
+}
+
+module_init(idletimer_tg_init);
+module_exit(idletimer_tg_exit);
+
+MODULE_AUTHOR("Timo Teras <ext-timo.teras@nokia.com>");
+MODULE_AUTHOR("Luciano Coelho <luciano.coelho@nokia.com>");
+MODULE_DESCRIPTION("Xtables: idle time monitor");
+MODULE_LICENSE("GPL v2");
--
1.6.3.3
^ permalink raw reply related
* Re: [PATCH 1/2] net: Enable 64-bit net device statistics on 32-bit architectures
From: Ben Hutchings @ 2010-06-03 19:11 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, Arnd Bergmann, netdev, linux-net-drivers
In-Reply-To: <20100603114738.41256434@nehalam>
On Thu, 2010-06-03 at 11:47 -0700, Stephen Hemminger wrote:
> On Thu, 03 Jun 2010 18:39:04 +0100
> Ben Hutchings <bhutchings@solarflare.com> wrote:
>
> > #if BITS_PER_LONG == 64
> > +#define NET_DEVICE_STATS_DEFINE(name) u64 name
> > +#define RTNL_LINK_STATS64_READ(stats, name) \
> > + ACCESS_ONCE((stats)->name)
> > +#define RTNL_LINK_STATS64_READ_OFFSET(stats, offset) \
> > + ACCESS_ONCE((const u64 *)((const u8 *)(stats) + (offset)))
> > +#define RTNL_LINK_STATS64_READ32(stats, name) \
> > + ((u32)ACCESS_ONCE((stats)->name))
> > +#else
> > +#if defined(__LITTLE_ENDIAN)
> > +#define NET_DEVICE_STATS_DEFINE(name) u32 name, pad_ ## name
> > +#define RTNL_LINK_STATS64_READ_OFFSET(stats, offset) \
> > + (ACCESS_ONCE(*(const u32 *)((const u8 *)(stats) + (offset))) | \
> > + (u64)(*(const u32 *)((const u8 *)(stats) + (offset) + 4)) << 32)
> > +#define RTNL_LINK_STATS64_READ32(stats, name) \
> > + (((const volatile u32 *)&(stats)->name)[0])
> > +#else
> > +#define NET_DEVICE_STATS_DEFINE(name) u32 pad_ ## name, name
> > +#define RTNL_LINK_STATS64_READ_OFFSET(stats, offset) \
> > + ((u64)(*(const u32 *)((const u8 *)(stats) + (offset))) << 32 | \
> > + ACCESS_ONCE(*(const u32 *)((const u8 *)(stats) + (offset) + 4)))
> > +#define RTNL_LINK_STATS64_READ32(stats, name) \
> > + (((const volatile u32 *)&(stats)->name)[1])
> > +#endif
> > +#define RTNL_LINK_STATS64_READ(stats, name) \
> > + RTNL_LINK_STATS64_READ_OFFSET( \
> > + stats, offsetof(struct rtnl_link_stats64, name))
> > +#endif
>
> Macros... with multiple casts. Gack
RTNL_LINK_STATS64_READ_OFFSET() is only really needed in net-sysfs.c,
and that ugliness could maybe be left there. So maybe the accessors
could be defined as something like:
#if BITS_PER_LONG == 64
static inline u64 rtnl_link_stats64_read(const u64 *field)
{
return ACCESS_ONCE(*field);
}
static inline u32 rtnl_link_stats64_read32(const u64 *field)
{
return ACCESS_ONCE(*field);
}
#else
static inline u64 rtnl_link_stats64_read(const u64 *field)
{
#if defined(__LITTLE_ENDIAN)
const u32 *low = (const u32 *)field, *high = low + 1;
#else
const u32 *high = (const u32 *)field, *low = high + 1;
#endif
return ACCESS_ONCE(*low) | (u64)*high << 32;
}
static inline u32 rtnl_link_stats64_read32(const u64 *field)
{
#if defined(__LITTLE_ENDIAN)
const u32 *low = (const u32 *)field;
#else
const u32 *low = (const u32 *)field + 1;
#endif
return ACCESS_ONCE(*low);
}
#endif
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox