* Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
From: Jeff Garzik @ 2007-09-13 18:56 UTC (permalink / raw)
To: Steve Wise; +Cc: Roland Dreier, general, linux-kernel, netdev
In-Reply-To: <46E97BB0.9030106@opengridcomputing.com>
Steve Wise wrote:
> I was about to post v2 of my patch to avoid port space collisions with
> the native stack. Can we get that 2.6.24? It is high priority IMO.
> I've tried to solicit review on it, but I think folks are reluctant... ;-)
Well, if it involves /sharing/ port space with the native stack, i.e.
where port 1234 is IB but 1235 is Linux, pretty much all the networking
devs have NAK'd that approach AFAICS.
Jeff
^ permalink raw reply
* [v2 PATCH for 2.6.24] SCTP: Implement the Supported Extensions Parameter
From: Vlad Yasevich @ 2007-09-13 18:45 UTC (permalink / raw)
To: davem; +Cc: netdev, lksctp-developers, Vlad Yasevich
[... i can't seem to spell to save my life lately...]
SCTP Supported Extenions parameter is specified in Section 4.2.7
of the ADD-IP draft (soon to be RFC). The parameter is
encoded as:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Parameter Type = 0x8008 | Parameter Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CHUNK TYPE 1 | CHUNK TYPE 2 | CHUNK TYPE 3 | CHUNK TYPE 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CHUNK TYPE N | PAD | PAD | PAD |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
It contains a list of chunks that a particular SCTP extension
uses. Current extensions supported are Partial Reliability
(FWD-TSN) and ADD-IP (ASCONF and ASCONF-ACK).
When implementing new extensions (AUTH, PKT-DROP, etc..), new
chunks need to be added to this parameter. Parameter processing
would be modified to negotiate support for these new features.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
include/linux/sctp.h | 9 ++++
include/net/sctp/structs.h | 1 +
net/sctp/sm_make_chunk.c | 91 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 99 insertions(+), 2 deletions(-)
diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index d70df61..f4d717b 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -180,6 +180,9 @@ typedef enum {
SCTP_PARAM_SUPPORTED_ADDRESS_TYPES = __constant_htons(12),
SCTP_PARAM_ECN_CAPABLE = __constant_htons(0x8000),
+ /* Add-IP: Supported Extensions, Section 4.2 */
+ SCTP_PARAM_SUPPORTED_EXT = __constant_htons(0x8008),
+
/* PR-SCTP Sec 3.1 */
SCTP_PARAM_FWD_TSN_SUPPORT = __constant_htons(0xc000),
@@ -296,6 +299,12 @@ typedef struct sctp_adaptation_ind_param {
__be32 adaptation_ind;
} __attribute__((packed)) sctp_adaptation_ind_param_t;
+/* ADDIP Section 4.2.7 Supported Extensions Parameter */
+typedef struct sctp_supported_ext_param {
+ struct sctp_paramhdr param_hdr;
+ __u8 chunks[0];
+} __attribute__((packed)) sctp_supported_ext_param_t;
+
/* RFC 2960. Section 3.3.3 Initiation Acknowledgement (INIT ACK) (2):
* The INIT ACK chunk is used to acknowledge the initiation of an SCTP
* association.
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index c0d5848..46d215b 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -435,6 +435,7 @@ union sctp_params {
struct sctp_ipv6addr_param *v6;
union sctp_addr_param *addr;
struct sctp_adaptation_ind_param *aind;
+ struct sctp_supported_ext_param *ext;
};
/* RFC 2960. Section 3.3.5 Heartbeat.
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 79856c9..3d8f85f 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -179,6 +179,9 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc,
sctp_supported_addrs_param_t sat;
__be16 types[2];
sctp_adaptation_ind_param_t aiparam;
+ sctp_supported_ext_param_t ext_param;
+ int num_ext = 0;
+ __u8 extensions[3];
/* RFC 2960 3.3.2 Initiation (INIT) (1)
*
@@ -202,11 +205,31 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc,
chunksize = sizeof(init) + addrs_len + SCTP_SAT_LEN(num_types);
chunksize += sizeof(ecap_param);
- if (sctp_prsctp_enable)
+ if (sctp_prsctp_enable) {
chunksize += sizeof(prsctp_param);
+ extensions[num_ext] = SCTP_CID_FWD_TSN;
+ num_ext += 1;
+ }
+ /* ADDIP: Section 4.2.7:
+ * An implementation supporting this extension [ADDIP] MUST list
+ * the ASCONF,the ASCONF-ACK, and the AUTH chunks in its INIT and
+ * INIT-ACK parameters.
+ * XXX: We don't support AUTH just yet, so don't list it. AUTH
+ * support should add it.
+ */
+ if (sctp_addip_enable) {
+ extensions[num_ext] = SCTP_CID_ASCONF;
+ extensions[num_ext+1] = SCTP_CID_ASCONF_ACK;
+ num_ext += 2;
+ }
+
chunksize += sizeof(aiparam);
chunksize += vparam_len;
+ /* If we have any extensions to report, account for that */
+ if (num_ext)
+ chunksize += sizeof(sctp_supported_ext_param_t) + num_ext;
+
/* RFC 2960 3.3.2 Initiation (INIT) (1)
*
* Note 3: An INIT chunk MUST NOT contain more than one Host
@@ -241,12 +264,27 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc,
sctp_addto_chunk(retval, num_types * sizeof(__u16), &types);
sctp_addto_chunk(retval, sizeof(ecap_param), &ecap_param);
+
+ /* Add the supported extensions paramter. Be nice and add this
+ * fist before addiding the parameters for the extensions themselves
+ */
+ if (num_ext) {
+ ext_param.param_hdr.type = SCTP_PARAM_SUPPORTED_EXT;
+ ext_param.param_hdr.length =
+ htons(sizeof(sctp_supported_ext_param_t) + num_ext);
+ sctp_addto_chunk(retval, sizeof(sctp_supported_ext_param_t),
+ &ext_param);
+ sctp_addto_chunk(retval, num_ext, extensions);
+ }
+
if (sctp_prsctp_enable)
sctp_addto_chunk(retval, sizeof(prsctp_param), &prsctp_param);
+
aiparam.param_hdr.type = SCTP_PARAM_ADAPTATION_LAYER_IND;
aiparam.param_hdr.length = htons(sizeof(aiparam));
aiparam.adaptation_ind = htonl(sp->adaptation_ind);
sctp_addto_chunk(retval, sizeof(aiparam), &aiparam);
+
nodata:
kfree(addrs.v);
return retval;
@@ -264,6 +302,9 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
int cookie_len;
size_t chunksize;
sctp_adaptation_ind_param_t aiparam;
+ sctp_supported_ext_param_t ext_param;
+ int num_ext = 0;
+ __u8 extensions[3];
retval = NULL;
@@ -294,9 +335,19 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
chunksize += sizeof(ecap_param);
/* Tell peer that we'll do PR-SCTP only if peer advertised. */
- if (asoc->peer.prsctp_capable)
+ if (asoc->peer.prsctp_capable) {
chunksize += sizeof(prsctp_param);
+ extensions[num_ext] = SCTP_CID_FWD_TSN;
+ num_ext += 1;
+ }
+ if (sctp_addip_enable) {
+ extensions[num_ext] = SCTP_CID_ASCONF;
+ extensions[num_ext+1] = SCTP_CID_ASCONF_ACK;
+ num_ext += 2;
+ }
+
+ chunksize += sizeof(ext_param) + num_ext;
chunksize += sizeof(aiparam);
/* Now allocate and fill out the chunk. */
@@ -314,6 +365,14 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
sctp_addto_chunk(retval, cookie_len, cookie);
if (asoc->peer.ecn_capable)
sctp_addto_chunk(retval, sizeof(ecap_param), &ecap_param);
+ if (num_ext) {
+ ext_param.param_hdr.type = SCTP_PARAM_SUPPORTED_EXT;
+ ext_param.param_hdr.length =
+ htons(sizeof(sctp_supported_ext_param_t) + num_ext);
+ sctp_addto_chunk(retval, sizeof(sctp_supported_ext_param_t),
+ &ext_param);
+ sctp_addto_chunk(retval, num_ext, extensions);
+ }
if (asoc->peer.prsctp_capable)
sctp_addto_chunk(retval, sizeof(prsctp_param), &prsctp_param);
@@ -1663,6 +1722,28 @@ static int sctp_process_hn_param(const struct sctp_association *asoc,
return 0;
}
+static void sctp_process_ext_param(struct sctp_association *asoc,
+ union sctp_params param)
+{
+ __u16 num_ext = ntohs(param.p->length) - sizeof(sctp_paramhdr_t);
+ int i;
+
+ for (i = 0; i < num_ext; i++) {
+ switch (param.ext->chunks[i]) {
+ case SCTP_CID_FWD_TSN:
+ if (sctp_prsctp_enable &&
+ !asoc->peer.prsctp_capable)
+ asoc->peer.prsctp_capable = 1;
+ break;
+ case SCTP_CID_ASCONF:
+ case SCTP_CID_ASCONF_ACK:
+ /* don't need to do anything for ASCONF */
+ default:
+ break;
+ }
+ }
+}
+
/* RFC 3.2.1 & the Implementers Guide 2.2.
*
* The Parameter Types are encoded such that the
@@ -1779,11 +1860,13 @@ static int sctp_verify_param(const struct sctp_association *asoc,
case SCTP_PARAM_UNRECOGNIZED_PARAMETERS:
case SCTP_PARAM_ECN_CAPABLE:
case SCTP_PARAM_ADAPTATION_LAYER_IND:
+ case SCTP_PARAM_SUPPORTED_EXT:
break;
case SCTP_PARAM_HOST_NAME_ADDRESS:
/* Tell the peer, we won't support this param. */
return sctp_process_hn_param(asoc, param, chunk, err_chunk);
+
case SCTP_PARAM_FWD_TSN_SUPPORT:
if (sctp_prsctp_enable)
break;
@@ -2128,6 +2211,10 @@ static int sctp_process_param(struct sctp_association *asoc,
asoc->peer.adaptation_ind = param.aind->adaptation_ind;
break;
+ case SCTP_PARAM_SUPPORTED_EXT:
+ sctp_process_ext_param(asoc, param);
+ break;
+
case SCTP_PARAM_FWD_TSN_SUPPORT:
if (sctp_prsctp_enable) {
asoc->peer.prsctp_capable = 1;
--
1.5.2.4
^ permalink raw reply related
* [PATCH for 2.6.24] SCTP: Implete the Supported Extensions Parameter
From: Vlad Yasevich @ 2007-09-13 18:39 UTC (permalink / raw)
To: davem; +Cc: netdev, lksctp-developers, Vlad Yasevich
SCTP Supported Extenions parameter is specified in Section 4.2.7
of the ADD-IP draft (soon to be RFC). The parameter is
encoded as:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Parameter Type = 0x8008 | Parameter Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CHUNK TYPE 1 | CHUNK TYPE 2 | CHUNK TYPE 3 | CHUNK TYPE 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CHUNK TYPE N | PAD | PAD | PAD |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
It contains the list of chunks that a particular SCTP extension
usues. Current extensions supported are Partial Reliability
(FWD-TSN) and ADD-IP (ASCONF and ASCONF-ACK).
When implementing new extensions (AUTH, PKT-DROP, etc..), new
chunks need to be added to this parameter. Parameter processing
would be modified to negotiate support for these features.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
include/linux/sctp.h | 9 ++++
include/net/sctp/structs.h | 1 +
net/sctp/sm_make_chunk.c | 91 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 99 insertions(+), 2 deletions(-)
diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index d70df61..f4d717b 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -180,6 +180,9 @@ typedef enum {
SCTP_PARAM_SUPPORTED_ADDRESS_TYPES = __constant_htons(12),
SCTP_PARAM_ECN_CAPABLE = __constant_htons(0x8000),
+ /* Add-IP: Supported Extensions, Section 4.2 */
+ SCTP_PARAM_SUPPORTED_EXT = __constant_htons(0x8008),
+
/* PR-SCTP Sec 3.1 */
SCTP_PARAM_FWD_TSN_SUPPORT = __constant_htons(0xc000),
@@ -296,6 +299,12 @@ typedef struct sctp_adaptation_ind_param {
__be32 adaptation_ind;
} __attribute__((packed)) sctp_adaptation_ind_param_t;
+/* ADDIP Section 4.2.7 Supported Extensions Parameter */
+typedef struct sctp_supported_ext_param {
+ struct sctp_paramhdr param_hdr;
+ __u8 chunks[0];
+} __attribute__((packed)) sctp_supported_ext_param_t;
+
/* RFC 2960. Section 3.3.3 Initiation Acknowledgement (INIT ACK) (2):
* The INIT ACK chunk is used to acknowledge the initiation of an SCTP
* association.
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index c0d5848..46d215b 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -435,6 +435,7 @@ union sctp_params {
struct sctp_ipv6addr_param *v6;
union sctp_addr_param *addr;
struct sctp_adaptation_ind_param *aind;
+ struct sctp_supported_ext_param *ext;
};
/* RFC 2960. Section 3.3.5 Heartbeat.
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 79856c9..3d8f85f 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -179,6 +179,9 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc,
sctp_supported_addrs_param_t sat;
__be16 types[2];
sctp_adaptation_ind_param_t aiparam;
+ sctp_supported_ext_param_t ext_param;
+ int num_ext = 0;
+ __u8 extensions[3];
/* RFC 2960 3.3.2 Initiation (INIT) (1)
*
@@ -202,11 +205,31 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc,
chunksize = sizeof(init) + addrs_len + SCTP_SAT_LEN(num_types);
chunksize += sizeof(ecap_param);
- if (sctp_prsctp_enable)
+ if (sctp_prsctp_enable) {
chunksize += sizeof(prsctp_param);
+ extensions[num_ext] = SCTP_CID_FWD_TSN;
+ num_ext += 1;
+ }
+ /* ADDIP: Section 4.2.7:
+ * An implementation supporting this extension [ADDIP] MUST list
+ * the ASCONF,the ASCONF-ACK, and the AUTH chunks in its INIT and
+ * INIT-ACK parameters.
+ * XXX: We don't support AUTH just yet, so don't list it. AUTH
+ * support should add it.
+ */
+ if (sctp_addip_enable) {
+ extensions[num_ext] = SCTP_CID_ASCONF;
+ extensions[num_ext+1] = SCTP_CID_ASCONF_ACK;
+ num_ext += 2;
+ }
+
chunksize += sizeof(aiparam);
chunksize += vparam_len;
+ /* If we have any extensions to report, account for that */
+ if (num_ext)
+ chunksize += sizeof(sctp_supported_ext_param_t) + num_ext;
+
/* RFC 2960 3.3.2 Initiation (INIT) (1)
*
* Note 3: An INIT chunk MUST NOT contain more than one Host
@@ -241,12 +264,27 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc,
sctp_addto_chunk(retval, num_types * sizeof(__u16), &types);
sctp_addto_chunk(retval, sizeof(ecap_param), &ecap_param);
+
+ /* Add the supported extensions paramter. Be nice and add this
+ * fist before addiding the parameters for the extensions themselves
+ */
+ if (num_ext) {
+ ext_param.param_hdr.type = SCTP_PARAM_SUPPORTED_EXT;
+ ext_param.param_hdr.length =
+ htons(sizeof(sctp_supported_ext_param_t) + num_ext);
+ sctp_addto_chunk(retval, sizeof(sctp_supported_ext_param_t),
+ &ext_param);
+ sctp_addto_chunk(retval, num_ext, extensions);
+ }
+
if (sctp_prsctp_enable)
sctp_addto_chunk(retval, sizeof(prsctp_param), &prsctp_param);
+
aiparam.param_hdr.type = SCTP_PARAM_ADAPTATION_LAYER_IND;
aiparam.param_hdr.length = htons(sizeof(aiparam));
aiparam.adaptation_ind = htonl(sp->adaptation_ind);
sctp_addto_chunk(retval, sizeof(aiparam), &aiparam);
+
nodata:
kfree(addrs.v);
return retval;
@@ -264,6 +302,9 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
int cookie_len;
size_t chunksize;
sctp_adaptation_ind_param_t aiparam;
+ sctp_supported_ext_param_t ext_param;
+ int num_ext = 0;
+ __u8 extensions[3];
retval = NULL;
@@ -294,9 +335,19 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
chunksize += sizeof(ecap_param);
/* Tell peer that we'll do PR-SCTP only if peer advertised. */
- if (asoc->peer.prsctp_capable)
+ if (asoc->peer.prsctp_capable) {
chunksize += sizeof(prsctp_param);
+ extensions[num_ext] = SCTP_CID_FWD_TSN;
+ num_ext += 1;
+ }
+ if (sctp_addip_enable) {
+ extensions[num_ext] = SCTP_CID_ASCONF;
+ extensions[num_ext+1] = SCTP_CID_ASCONF_ACK;
+ num_ext += 2;
+ }
+
+ chunksize += sizeof(ext_param) + num_ext;
chunksize += sizeof(aiparam);
/* Now allocate and fill out the chunk. */
@@ -314,6 +365,14 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
sctp_addto_chunk(retval, cookie_len, cookie);
if (asoc->peer.ecn_capable)
sctp_addto_chunk(retval, sizeof(ecap_param), &ecap_param);
+ if (num_ext) {
+ ext_param.param_hdr.type = SCTP_PARAM_SUPPORTED_EXT;
+ ext_param.param_hdr.length =
+ htons(sizeof(sctp_supported_ext_param_t) + num_ext);
+ sctp_addto_chunk(retval, sizeof(sctp_supported_ext_param_t),
+ &ext_param);
+ sctp_addto_chunk(retval, num_ext, extensions);
+ }
if (asoc->peer.prsctp_capable)
sctp_addto_chunk(retval, sizeof(prsctp_param), &prsctp_param);
@@ -1663,6 +1722,28 @@ static int sctp_process_hn_param(const struct sctp_association *asoc,
return 0;
}
+static void sctp_process_ext_param(struct sctp_association *asoc,
+ union sctp_params param)
+{
+ __u16 num_ext = ntohs(param.p->length) - sizeof(sctp_paramhdr_t);
+ int i;
+
+ for (i = 0; i < num_ext; i++) {
+ switch (param.ext->chunks[i]) {
+ case SCTP_CID_FWD_TSN:
+ if (sctp_prsctp_enable &&
+ !asoc->peer.prsctp_capable)
+ asoc->peer.prsctp_capable = 1;
+ break;
+ case SCTP_CID_ASCONF:
+ case SCTP_CID_ASCONF_ACK:
+ /* don't need to do anything for ASCONF */
+ default:
+ break;
+ }
+ }
+}
+
/* RFC 3.2.1 & the Implementers Guide 2.2.
*
* The Parameter Types are encoded such that the
@@ -1779,11 +1860,13 @@ static int sctp_verify_param(const struct sctp_association *asoc,
case SCTP_PARAM_UNRECOGNIZED_PARAMETERS:
case SCTP_PARAM_ECN_CAPABLE:
case SCTP_PARAM_ADAPTATION_LAYER_IND:
+ case SCTP_PARAM_SUPPORTED_EXT:
break;
case SCTP_PARAM_HOST_NAME_ADDRESS:
/* Tell the peer, we won't support this param. */
return sctp_process_hn_param(asoc, param, chunk, err_chunk);
+
case SCTP_PARAM_FWD_TSN_SUPPORT:
if (sctp_prsctp_enable)
break;
@@ -2128,6 +2211,10 @@ static int sctp_process_param(struct sctp_association *asoc,
asoc->peer.adaptation_ind = param.aind->adaptation_ind;
break;
+ case SCTP_PARAM_SUPPORTED_EXT:
+ sctp_process_ext_param(asoc, param);
+ break;
+
case SCTP_PARAM_FWD_TSN_SUPPORT:
if (sctp_prsctp_enable) {
asoc->peer.prsctp_capable = 1;
--
1.5.2.4
^ permalink raw reply related
* incorrect cksum with tcp/udp on lo with 2.6.20/2.6.21/2.6.22
From: Krzysztof Oledzki @ 2007-09-13 17:55 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1925 bytes --]
Hello,
It seems that after some not very recent changes udp and tcp packes
carring data send by a loopback have incorrect cksum:
UDP:
# echo test|nc -u 127.0.0.1 1111
# tcpdump -i lo -n -v -v port 1111
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
19:43:39.340576 IP (tos 0x0, ttl 64, id 15179, offset 0, flags [DF], proto: UDP (17), length: 33) 127.0.0.1.49512 > 127.0.0.1.1111: [bad udp cksum 174c!] UDP, length 5
TCP:
# echo test|nc -u 127.0.0.1 1111
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
*Correct:
19:44:27.692614 IP (tos 0x0, ttl 64, id 32100, offset 0, flags [DF], proto: TCP (6), length: 60) 127.0.0.1.53804 > 127.0.0.1.1111: S, cksum 0xfd54 (correct), 3426125135:3426125135(0) win 32792 <mss 16396,sackOK,timestamp 1912797227 0,nop,wscale 7>
19:44:27.692674 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: TCP (6), length: 60) 127.0.0.1.1111 > 127.0.0.1.53804: S, cksum 0xea3f (correct), 3427916955:3427916955(0) ack 3426125136 win 32768 <mss 16396,sackOK,timestamp 1912797227 1912797227,nop,wscale 7>
19:44:27.692711 IP (tos 0x0, ttl 64, id 32101, offset 0, flags [DF], proto: TCP (6), length: 52) 127.0.0.1.53804 > 127.0.0.1.1111: ., cksum 0xd263 (correct), 1:1(0) ack 1 win 257 <nop,nop,timestamp 1912797227 1912797227>
*Incorrect:
19:44:27.692831 IP (tos 0x0, ttl 64, id 32102, offset 0, flags [DF], proto: TCP (6), length: 57) 127.0.0.1.53804 > 127.0.0.1.1111: P, cksum 0xfe2d (incorrect (-> 0xe07c), 1:6(5) ack 1 win 257 <nop,nop,timestamp 1912797227 1912797227>
*Correct:
19:44:27.692859 IP (tos 0x0, ttl 64, id 9399, offset 0, flags [DF], proto: TCP (6), length: 52) 127.0.0.1.1111 > 127.0.0.1.53804: ., cksum 0xd25f (correct), 1:1(0) ack 6 win 256 <nop,nop,timestamp 1912797227 1912797227>
Tested on:
- 2.6.22.6
- 2.6.21.7
- 2.6.20.11
Best regards,
Krzysztof Olędzki
^ permalink raw reply
* Re: InfiniBand/RDMA merge plans for 2.6.24
From: Shirley Ma @ 2007-09-13 18:22 UTC (permalink / raw)
To: Roland Dreier; +Cc: general, linux-kernel, netdev, netdev-owner
In-Reply-To: <adahclymos8.fsf@cisco.com>
Hello Roland,
Since ehca can support 4K MTU, we would like to see a patch in
IPoIB to allow link MTU to be up to 4K instead of current 2K for 2.6.24
kernel. The idea is IPoIB link MTU will pick up a return value from SM's
default broadcast MTU. This patch should be a small patch, I hope you are
OK with this.
Thanks
Shirley
^ permalink raw reply
* RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
From: Sean Hefty @ 2007-09-13 18:20 UTC (permalink / raw)
To: 'Roland Dreier', general, linux-kernel, netdev
In-Reply-To: <adahclymos8.fsf@cisco.com>
> - My user_mad P_Key index support patch. I'll test the ioctl to
> change to the new mode and merge this I guess, since Hal and Sean
> have tested this out.
I can give this patch a reviewed-by: too, and I will also try to review a couple
of the pending ipoib patches.
> - Sean's QoS changes. These look fine at first glance, and I just
> plan to understand the backwards compatibility story (ie how this
> works with an old SM) and merge. Anyone who objects let me know.
The new QoS fields fall into fields that are currently reserved, which should be
ignored by an older SM. I've only tested this against openSM however.
> - Sean's IB CM MRA interface changes. Don't know at this point. It
> seems OK but I'm not clear on what if any real-world improvement
> this gives us.
This patch was generated in response to an Intel MPI issue. We've seen MPI take
several minutes to respond to a connection request during the middle of large
application runs. When this happens, the active side times out the connection.
In OFED, we added module parameters to adjust the rdma_cm connection timeout on
the active side, but I believe that sending an MRA from the passive side is a
better solution.
- Sean
^ permalink raw reply
* Re: [RFC v3 PATCH 2/21] SCTP: Convert bind_addr_list locking to RCU
From: Vlad Yasevich @ 2007-09-13 18:15 UTC (permalink / raw)
To: Sridhar Samudrala; +Cc: paulmck, netdev, lksctp-developers
In-Reply-To: <1189706346.2748.20.camel@w-sridhar2.beaverton.ibm.com>
Hi Sridhar
Sridhar Samudrala wrote:
>
> looks good to me too. some minor typos and some comments on
> RCU usage comments inline.
>
> Also, I guess we can remove the sctp_[read/write]_[un]lock macros
> from sctp.h now that you removed the all the users of rwlocks
> in SCTP
Ok. I guess I pull them.
>
> Thanks
> Sridhar
>>> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
>>> ---
>>> include/net/sctp/structs.h | 7 +--
>>> net/sctp/associola.c | 14 +-----
>>> net/sctp/bind_addr.c | 68 ++++++++++++++++++++----------
>>> net/sctp/endpointola.c | 27 +++---------
>>> net/sctp/ipv6.c | 12 ++---
>>> net/sctp/protocol.c | 25 ++++-------
>>> net/sctp/sm_make_chunk.c | 18 +++-----
>>> net/sctp/socket.c | 98 ++++++++++++-------------------------------
>>> 8 files changed, 106 insertions(+), 163 deletions(-)
>>>
>>>
>>> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
>>> index 7fc369f..d16055f 100644
>>> --- a/net/sctp/bind_addr.c
>>> +++ b/net/sctp/bind_addr.c
>>> @@ -167,7 +167,11 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
>>>
>>> INIT_LIST_HEAD(&addr->list);
>>> INIT_RCU_HEAD(&addr->rcu);
>>> - list_add_tail(&addr->list, &bp->address_list);
>>> +
>>> + /* We always hold a socket lock when calling this function,
>>> + * so rcu_read_lock is not needed.
>>> + */
>>> + list_add_tail_rcu(&addr->list, &bp->address_list);
>
> I am little confused with the comment above.
> Isn't this an update-side of RCU. If so, this should be protected
> by a spin-lock or a mutex rather than rcu_read_lock().
>
Yes, the comment is confusing. I put it there because I removed the rcu_read_lock() that
was also taken in prior version of the patch. The comment should really say, that since
the socket is held, we don't need another synchronizing spin lock in this case.
>>> SCTP_DBG_OBJCNT_INC(addr);
>>>
>>> return 0;
>>> @@ -176,23 +180,35 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
>>> /* Delete an address from the bind address list in the SCTP_bind_addr
>>> * structure.
>>> */
>>> -int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr)
>>> +int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr,
>>> + void (*rcu_call)(struct rcu_head *head,
>>> + void (*func)(struct rcu_head *head)))
>>> {
>>> - struct list_head *pos, *temp;
>>> - struct sctp_sockaddr_entry *addr;
>>> + struct sctp_sockaddr_entry *addr, *temp;
>>>
>>> - list_for_each_safe(pos, temp, &bp->address_list) {
>>> - addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>>> + /* We hold the socket lock when calling this function, so
>>> + * rcu_read_lock is not needed.
>>> + */
>
> Same as above. This is also an update-side of RCU protected
> by socket lock.
Same reason. Prior versions used rcu_spin_lock and I was just making a note that
that not needed. I'll remove.
>
>>> + list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
>>> if (sctp_cmp_addr_exact(&addr->a, del_addr)) {
>>> /* Found the exact match. */
>>> - list_del(pos);
>>> - kfree(addr);
>>> - SCTP_DBG_OBJCNT_DEC(addr);
>>> -
>>> - return 0;
>>> + addr->valid = 0;
>>> + list_del_rcu(&addr->list);
>>> + break;
>>> }
>>> }
>>>
>>> + /* Call the rcu callback provided in the args. This function is
>>> + * called by both BH packet processing and user side socket option
>>> + * processing, but it works on different lists in those 2 contexts.
>>> + * Each context provides it's own callback, whether call_rc_bh()
> s/call_rc_bh/call_rcu_bh
yep.
>
>>> + * or call_rcu(), to make sure that we wait an for appropriate time.
> s/an for/for an
yep. fat fingered...
>>> @@ -295,20 +285,17 @@ struct sctp_association *sctp_endpoint_lookup_assoc(
>>> int sctp_endpoint_is_peeled_off(struct sctp_endpoint *ep,
>>> const union sctp_addr *paddr)
>>> {
>>> - struct list_head *pos;
>>> struct sctp_sockaddr_entry *addr;
>>> struct sctp_bind_addr *bp;
>>>
>>> - sctp_read_lock(&ep->base.addr_lock);
>>> bp = &ep->base.bind_addr;
>>> - list_for_each(pos, &bp->address_list) {
>>> - addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>>> - if (sctp_has_association(&addr->a, paddr)) {
>>> - sctp_read_unlock(&ep->base.addr_lock);
>>> + /* This function is called whith the socket lock held,
> s/whith/with
ok
Thanks
-vlad
^ permalink raw reply
* Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
From: Steve Wise @ 2007-09-13 18:04 UTC (permalink / raw)
To: Roland Dreier; +Cc: netdev, linux-kernel, general
In-Reply-To: <adahclymos8.fsf@cisco.com>
Hey Roland,
I was about to post v2 of my patch to avoid port space collisions with
the native stack. Can we get that 2.6.24? It is high priority IMO.
I've tried to solicit review on it, but I think folks are reluctant... ;-)
Steve.
Roland Dreier wrote:
> With 2.6.24 probably opening in the not-too-distant future, it's
> probably a good time to review what my plans are for when the merge
> window opens.
>
> At the kernel summit, we discussed patch review (doing a web search
> for "kernel summit" "reviewed-by:" should turn up lots of info on
> this). Due to an unfortunate combination of vacation and conference
> travel, summer colds, and other inconveniences, I am very backed up on
> reviewing. And in any case, I've allowed too much code review to be
> dumped on me -- when there are dozens of people working on IB and RDMA
> stuff, it obviously doesn't work to expect me to do all the reviewing.
>
> Unfortunately, due to the length of the backlog and the fact that
> 2.6.23 seems fairly close, some of the things listed below are going
> to miss the 2.6.24 merge window. So, although the plan is to phase in
> requiring "Reviewed-by:" gently, for this merge, if you can get
> someone other than me to review your work, then the chances of it
> being merged increase dramatically. I'm talking about a real review--
> ideally, someone independent (from another company would be good) who
> is willing to provide a "Reviewed-by:" line that means the reviewer
> has really looked at and thought about the patch. There should be a
> mailing list thread you can point me at where the reviewer comments on
> the patch and a new version of that patch addressing all comments is
> posted (or in exceptional cases, where the patch is perfect to start
> with, where the reviewer says the patch is great).
>
> For example, given the number of IPoIB changes pending, it might be a
> good idea for the people submitting them to get together and trade
> reviews (ie "If you review my patch, I'll review your patch"). There
> are a few cases where getting a review may not be necessary. First of
> all, trivial and obvious patches don't need a review. It's a
> judgement call what is trivial or obvious, and it's always a good idea
> to provide a changelog that makes it clear why a patch is trivial and
> obviously correct. Second, hardware driver patches may not make sense
> to anyone outside of the company whose hardware the driver is for.
> Still, in this case, an internal Reviewed-by: would be nice, and also
> a changelog that explains the reason for the change always helps
> (don't just tell me what your patch does, but also explain what the
> patch fixes and what the impact of the current situation is).
>
> Anyway, here are all the pending things that I'm aware of. As usual,
> if something isn't already in my tree and isn't listed below, I
> probably missed it or dropped it by mistake. Please remind me again
> in that case.
>
> Core:
>
> - My user_mad P_Key index support patch. I'll test the ioctl to
> change to the new mode and merge this I guess, since Hal and Sean
> have tested this out.
>
> - A fix to the user_mad 32-bit big-endian userspace 64/32 problem
> with the method_mask when registering agents. I'll write a patch
> to handle this in a way that doesn't change the ABI for anything
> other than the broken case and hope to get someone to review this
> so it can be merged.
>
> - Sean's QoS changes. These look fine at first glance, and I just
> plan to understand the backwards compatibility story (ie how this
> works with an old SM) and merge. Anyone who objects let me know.
>
> - Sean's IB CM MRA interface changes. Don't know at this point. It
> seems OK but I'm not clear on what if any real-world improvement
> this gives us.
>
> ULPs:
>
> - Pradeep's IPoIB CM support for devices that don't have SRQs. I
> think the basic approach makes sense (I don't think faking SRQs at
> some other layer is really feasible) and I need to find time to
> look at the details to see if the current patch looks workable. I'm
> likely to merge this; getting an independent Reviewed-by: would
> certainly be appreciated too.
>
> - Moni's IPoIB bonding support. This seems mostly an issue of
> getting the core bonding maintainer's attention. However getting a
> Reviewed-by: for the IPoIB changes wouldn't hurt too.
>
> - Rolf's IPoIB MGID scope changes. Certainly we want to fix this
> issue but the specific changes need review.
>
> - Eli and Michael's IPoIB stateless offload (checksum offload, LSO,
> LRO, etc). It's a big series that makes quite a few core changes.
> I think it needs some careful review and is probably at risk of
> missing this merge window. Sorting in order of invasiveness so we
> can merge at least some of it (if splitting it makes sense) might
> be a good idea.
>
> HW specific:
>
> - I already merged patches to enable MSI-X by default for mthca and
> mlx4. I hope there aren't too many systems that get hosed if a
> MSI-X interrupt is generated.
>
> - Jack and Michael's mlx4 FMR support. Will merge I guess, although
> I do hope to have time to address the DMA API abuse that is being
> copied from mthca, so that mlx4 and mthca work in Xen domU.
>
> - ehca patch queue. Will merge, pending fixes for the few minor
> issues I commented on.
>
> - Steve's mthca router mode support. Would be nice to see a review
> from someone at Mellanox.
>
> - Arthur's mthca doorbell alignment fixes. I will experiment with a
> few different approaches and post what I like (and fix mlx4 as
> well). I hope Arthur can review.
>
> - Michael's mlx4 WQE shrinking patch. Not sure yet; I'll reply to
> the latest patch directly.
>
> Here are a few topics that I believe will not be ready in time for the
> 2.6.24 window and will need to wait for 2.6.25:
>
> - Multiple CQ event vector support. I haven't seen any discussions
> about how ULPs or userspace apps should decide which vector to use,
> and hence no progress has been made since we deferred this during
> the 2.6.23 merge window.
>
> - XRC. Given the length of the backlog above and the fact that a
> first draft of this code has not been posted yet, I don't see any
> way that we could have something this major ready in time.
>
> Here is the complete list of patches I have in my for-2.6.24 branch
> waiting for the merge window so far. Mostly I haven't merged anything
> big out of my backlog, so this is essentially all
>
> Ali Ayoub (1):
> IB/sa: Error handling thinko fix
>
> Anton Blanchard (3):
> IB/fmr_pool: Clean up some error messages in fmr_pool.c
> IB/ehca: Make output clearer by removing some debug messages
> IB/ehca: Export module parameters in sysfs
>
> Dotan Barak (1):
> mlx4_core: Use enum value GO_BIT_TIMEOUT_MSECS
>
> Eli Cohen (2):
> IPoIB: Fix typo to end statement with ';' instead of ','
> IPoIB: Fix error path memory leak
>
> Michael S. Tsirkin (2):
> mlx4_core: Enable MSI-X by default
> IB/mthca: Enable MSI-X by default
>
> Peter Oruba (1):
> IB/mthca: Use PCI-X/PCI-Express read control interfaces
>
> Roland Dreier (6):
> IPoIB: Make sure no receives are handled when stopping device
> IB: find_first_zero_bit() takes unsigned pointer
> mlx4_core: Don't free special QPs in QP number bitmap
> IB/mlx4: Use set_data_seg() in mlx4_ib_post_recv()
> IB/ehca: Include <linux/mutex.h> from ehca_classes.h
> IB/mlx4: Fix up SRQ limit_watermark endianness
>
> Steve Wise (1):
> RDMA/cxgb3: Make the iw_cxgb3 module parameters writable
> _______________________________________________
> general mailing list
> general@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
^ permalink raw reply
* Re: [RFC v3 PATCH 2/21] SCTP: Convert bind_addr_list locking to RCU
From: Sridhar Samudrala @ 2007-09-13 17:59 UTC (permalink / raw)
To: paulmck; +Cc: Vlad Yasevich, netdev, lksctp-developers
In-Reply-To: <20070912223352.GJ9830@linux.vnet.ibm.com>
On Wed, 2007-09-12 at 15:33 -0700, Paul E. McKenney wrote:
> On Wed, Sep 12, 2007 at 05:03:42PM -0400, Vlad Yasevich wrote:
> > [... and here is the updated version as promissed ...]
> >
> > Since the sctp_sockaddr_entry is now RCU enabled as part of
> > the patch to synchronize sctp_localaddr_list, it makes sense to
> > change all handling of these entries to RCU. This includes the
> > sctp_bind_addrs structure and it's list of bound addresses.
> >
> > This list is currently protected by an external rw_lock and that
> > looks like an overkill. There are only 2 writers to the list:
> > bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
> > These are already seriealized via the socket lock, so they will
> > not step on each other. These are also relatively rare, so we
> > should be good with RCU.
> >
> > The readers are varied and they are easily converted to RCU.
>
> Looks good from an RCU viewpoint -- I must defer to others on
> the networking aspects.
>
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
looks good to me too. some minor typos and some comments on
RCU usage comments inline.
Also, I guess we can remove the sctp_[read/write]_[un]lock macros
from sctp.h now that you removed the all the users of rwlocks
in SCTP
Thanks
Sridhar
>
> > Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
> > ---
> > include/net/sctp/structs.h | 7 +--
> > net/sctp/associola.c | 14 +-----
> > net/sctp/bind_addr.c | 68 ++++++++++++++++++++----------
> > net/sctp/endpointola.c | 27 +++---------
> > net/sctp/ipv6.c | 12 ++---
> > net/sctp/protocol.c | 25 ++++-------
> > net/sctp/sm_make_chunk.c | 18 +++-----
> > net/sctp/socket.c | 98 ++++++++++++-------------------------------
> > 8 files changed, 106 insertions(+), 163 deletions(-)
> >
> > diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> > index a89e361..c2fe2dc 100644
> > --- a/include/net/sctp/structs.h
> > +++ b/include/net/sctp/structs.h
> > @@ -1155,7 +1155,9 @@ int sctp_bind_addr_copy(struct sctp_bind_addr *dest,
> > int flags);
> > int sctp_add_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
> > __u8 use_as_src, gfp_t gfp);
> > -int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *);
> > +int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
> > + void (*rcu_call)(struct rcu_head *,
> > + void (*func)(struct rcu_head *)));
> > int sctp_bind_addr_match(struct sctp_bind_addr *, const union sctp_addr *,
> > struct sctp_sock *);
> > union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr *bp,
> > @@ -1226,9 +1228,6 @@ struct sctp_ep_common {
> > * bind_addr.address_list is our set of local IP addresses.
> > */
> > struct sctp_bind_addr bind_addr;
> > -
> > - /* Protection during address list comparisons. */
> > - rwlock_t addr_lock;
> > };
> >
> >
> > diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> > index 2ad1caf..9bad8ba 100644
> > --- a/net/sctp/associola.c
> > +++ b/net/sctp/associola.c
> > @@ -99,7 +99,6 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
> >
> > /* Initialize the bind addr area. */
> > sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port);
> > - rwlock_init(&asoc->base.addr_lock);
> >
> > asoc->state = SCTP_STATE_CLOSED;
> >
> > @@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
> > {
> > struct sctp_transport *transport;
> >
> > - sctp_read_lock(&asoc->base.addr_lock);
> > -
> > if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
> > (htons(asoc->peer.port) == paddr->v4.sin_port)) {
> > transport = sctp_assoc_lookup_paddr(asoc, paddr);
> > @@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
> > transport = NULL;
> >
> > out:
> > - sctp_read_unlock(&asoc->base.addr_lock);
> > return transport;
> > }
> >
> > @@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct sctp_association *asoc,
> > int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
> > const union sctp_addr *laddr)
> > {
> > - int found;
> > + int found = 0;
> >
> > - sctp_read_lock(&asoc->base.addr_lock);
> > if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) &&
> > sctp_bind_addr_match(&asoc->base.bind_addr, laddr,
> > - sctp_sk(asoc->base.sk))) {
> > + sctp_sk(asoc->base.sk)))
> > found = 1;
> > - goto out;
> > - }
> >
> > - found = 0;
> > -out:
> > - sctp_read_unlock(&asoc->base.addr_lock);
> > return found;
> > }
> >
> > diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> > index 7fc369f..d16055f 100644
> > --- a/net/sctp/bind_addr.c
> > +++ b/net/sctp/bind_addr.c
> > @@ -167,7 +167,11 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
> >
> > INIT_LIST_HEAD(&addr->list);
> > INIT_RCU_HEAD(&addr->rcu);
> > - list_add_tail(&addr->list, &bp->address_list);
> > +
> > + /* We always hold a socket lock when calling this function,
> > + * so rcu_read_lock is not needed.
> > + */
> > + list_add_tail_rcu(&addr->list, &bp->address_list);
I am little confused with the comment above.
Isn't this an update-side of RCU. If so, this should be protected
by a spin-lock or a mutex rather than rcu_read_lock().
> > SCTP_DBG_OBJCNT_INC(addr);
> >
> > return 0;
> > @@ -176,23 +180,35 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
> > /* Delete an address from the bind address list in the SCTP_bind_addr
> > * structure.
> > */
> > -int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr)
> > +int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr,
> > + void (*rcu_call)(struct rcu_head *head,
> > + void (*func)(struct rcu_head *head)))
> > {
> > - struct list_head *pos, *temp;
> > - struct sctp_sockaddr_entry *addr;
> > + struct sctp_sockaddr_entry *addr, *temp;
> >
> > - list_for_each_safe(pos, temp, &bp->address_list) {
> > - addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> > + /* We hold the socket lock when calling this function, so
> > + * rcu_read_lock is not needed.
> > + */
Same as above. This is also an update-side of RCU protected
by socket lock.
> > + list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
> > if (sctp_cmp_addr_exact(&addr->a, del_addr)) {
> > /* Found the exact match. */
> > - list_del(pos);
> > - kfree(addr);
> > - SCTP_DBG_OBJCNT_DEC(addr);
> > -
> > - return 0;
> > + addr->valid = 0;
> > + list_del_rcu(&addr->list);
> > + break;
> > }
> > }
> >
> > + /* Call the rcu callback provided in the args. This function is
> > + * called by both BH packet processing and user side socket option
> > + * processing, but it works on different lists in those 2 contexts.
> > + * Each context provides it's own callback, whether call_rc_bh()
s/call_rc_bh/call_rcu_bh
> > + * or call_rcu(), to make sure that we wait an for appropriate time.
s/an for/for an
> > + */
> > + if (addr && !addr->valid) {
> > + rcu_call(&addr->rcu, sctp_local_addr_free);
> > + SCTP_DBG_OBJCNT_DEC(addr);
> > + }
> > +
> > return -EINVAL;
> > }
> >
> > @@ -302,15 +318,20 @@ int sctp_bind_addr_match(struct sctp_bind_addr *bp,
> > struct sctp_sock *opt)
> > {
> > struct sctp_sockaddr_entry *laddr;
> > - struct list_head *pos;
> > -
> > - list_for_each(pos, &bp->address_list) {
> > - laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> > - if (opt->pf->cmp_addr(&laddr->a, addr, opt))
> > - return 1;
> > + int match = 0;
> > +
> > + rcu_read_lock();
> > + list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> > + if (!laddr->valid)
> > + continue;
> > + if (opt->pf->cmp_addr(&laddr->a, addr, opt)) {
> > + match = 1;
> > + break;
> > + }
> > }
> > + rcu_read_unlock();
> >
> > - return 0;
> > + return match;
> > }
> >
> > /* Find the first address in the bind address list that is not present in
> > @@ -325,18 +346,19 @@ union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr *bp,
> > union sctp_addr *addr;
> > void *addr_buf;
> > struct sctp_af *af;
> > - struct list_head *pos;
> > int i;
> >
> > - list_for_each(pos, &bp->address_list) {
> > - laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> > -
> > + /* This is only called sctp_send_asconf_del_ip() and we hold
> > + * the socket lock in that code patch, so that address list
> > + * can't change.
> > + */
> > + list_for_each_entry(laddr, &bp->address_list, list) {
> > addr_buf = (union sctp_addr *)addrs;
> > for (i = 0; i < addrcnt; i++) {
> > addr = (union sctp_addr *)addr_buf;
> > af = sctp_get_af_specific(addr->v4.sin_family);
> > if (!af)
> > - return NULL;
> > + break;
> >
> > if (opt->pf->cmp_addr(&laddr->a, addr, opt))
> > break;
> > diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
> > index 1404a9e..110d912 100644
> > --- a/net/sctp/endpointola.c
> > +++ b/net/sctp/endpointola.c
> > @@ -92,7 +92,6 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep,
> >
> > /* Initialize the bind addr area */
> > sctp_bind_addr_init(&ep->base.bind_addr, 0);
> > - rwlock_init(&ep->base.addr_lock);
> >
> > /* Remember who we are attached to. */
> > ep->base.sk = sk;
> > @@ -225,21 +224,14 @@ void sctp_endpoint_put(struct sctp_endpoint *ep)
> > struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
> > const union sctp_addr *laddr)
> > {
> > - struct sctp_endpoint *retval;
> > + struct sctp_endpoint *retval = NULL;
> >
> > - sctp_read_lock(&ep->base.addr_lock);
> > if (htons(ep->base.bind_addr.port) == laddr->v4.sin_port) {
> > if (sctp_bind_addr_match(&ep->base.bind_addr, laddr,
> > - sctp_sk(ep->base.sk))) {
> > + sctp_sk(ep->base.sk)))
> > retval = ep;
> > - goto out;
> > - }
> > }
> >
> > - retval = NULL;
> > -
> > -out:
> > - sctp_read_unlock(&ep->base.addr_lock);
> > return retval;
> > }
> >
> > @@ -261,9 +253,7 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc(
> > list_for_each(pos, &ep->asocs) {
> > asoc = list_entry(pos, struct sctp_association, asocs);
> > if (rport == asoc->peer.port) {
> > - sctp_read_lock(&asoc->base.addr_lock);
> > *transport = sctp_assoc_lookup_paddr(asoc, paddr);
> > - sctp_read_unlock(&asoc->base.addr_lock);
> >
> > if (*transport)
> > return asoc;
> > @@ -295,20 +285,17 @@ struct sctp_association *sctp_endpoint_lookup_assoc(
> > int sctp_endpoint_is_peeled_off(struct sctp_endpoint *ep,
> > const union sctp_addr *paddr)
> > {
> > - struct list_head *pos;
> > struct sctp_sockaddr_entry *addr;
> > struct sctp_bind_addr *bp;
> >
> > - sctp_read_lock(&ep->base.addr_lock);
> > bp = &ep->base.bind_addr;
> > - list_for_each(pos, &bp->address_list) {
> > - addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> > - if (sctp_has_association(&addr->a, paddr)) {
> > - sctp_read_unlock(&ep->base.addr_lock);
> > + /* This function is called whith the socket lock held,
s/whith/with
> > + * so the address_list can not change.
> > + */
> > + list_for_each_entry(addr, &bp->address_list, list) {
> > + if (sctp_has_association(&addr->a, paddr))
> > return 1;
> > - }
> > }
> > - sctp_read_unlock(&ep->base.addr_lock);
> >
> > return 0;
> > }
<snip>
deleted the rest of the patch as it looks good and i have no
comments.
^ permalink raw reply
* InfiniBand/RDMA merge plans for 2.6.24
From: Roland Dreier @ 2007-09-13 17:57 UTC (permalink / raw)
To: general, linux-kernel, netdev
With 2.6.24 probably opening in the not-too-distant future, it's
probably a good time to review what my plans are for when the merge
window opens.
At the kernel summit, we discussed patch review (doing a web search
for "kernel summit" "reviewed-by:" should turn up lots of info on
this). Due to an unfortunate combination of vacation and conference
travel, summer colds, and other inconveniences, I am very backed up on
reviewing. And in any case, I've allowed too much code review to be
dumped on me -- when there are dozens of people working on IB and RDMA
stuff, it obviously doesn't work to expect me to do all the reviewing.
Unfortunately, due to the length of the backlog and the fact that
2.6.23 seems fairly close, some of the things listed below are going
to miss the 2.6.24 merge window. So, although the plan is to phase in
requiring "Reviewed-by:" gently, for this merge, if you can get
someone other than me to review your work, then the chances of it
being merged increase dramatically. I'm talking about a real review--
ideally, someone independent (from another company would be good) who
is willing to provide a "Reviewed-by:" line that means the reviewer
has really looked at and thought about the patch. There should be a
mailing list thread you can point me at where the reviewer comments on
the patch and a new version of that patch addressing all comments is
posted (or in exceptional cases, where the patch is perfect to start
with, where the reviewer says the patch is great).
For example, given the number of IPoIB changes pending, it might be a
good idea for the people submitting them to get together and trade
reviews (ie "If you review my patch, I'll review your patch"). There
are a few cases where getting a review may not be necessary. First of
all, trivial and obvious patches don't need a review. It's a
judgement call what is trivial or obvious, and it's always a good idea
to provide a changelog that makes it clear why a patch is trivial and
obviously correct. Second, hardware driver patches may not make sense
to anyone outside of the company whose hardware the driver is for.
Still, in this case, an internal Reviewed-by: would be nice, and also
a changelog that explains the reason for the change always helps
(don't just tell me what your patch does, but also explain what the
patch fixes and what the impact of the current situation is).
Anyway, here are all the pending things that I'm aware of. As usual,
if something isn't already in my tree and isn't listed below, I
probably missed it or dropped it by mistake. Please remind me again
in that case.
Core:
- My user_mad P_Key index support patch. I'll test the ioctl to
change to the new mode and merge this I guess, since Hal and Sean
have tested this out.
- A fix to the user_mad 32-bit big-endian userspace 64/32 problem
with the method_mask when registering agents. I'll write a patch
to handle this in a way that doesn't change the ABI for anything
other than the broken case and hope to get someone to review this
so it can be merged.
- Sean's QoS changes. These look fine at first glance, and I just
plan to understand the backwards compatibility story (ie how this
works with an old SM) and merge. Anyone who objects let me know.
- Sean's IB CM MRA interface changes. Don't know at this point. It
seems OK but I'm not clear on what if any real-world improvement
this gives us.
ULPs:
- Pradeep's IPoIB CM support for devices that don't have SRQs. I
think the basic approach makes sense (I don't think faking SRQs at
some other layer is really feasible) and I need to find time to
look at the details to see if the current patch looks workable. I'm
likely to merge this; getting an independent Reviewed-by: would
certainly be appreciated too.
- Moni's IPoIB bonding support. This seems mostly an issue of
getting the core bonding maintainer's attention. However getting a
Reviewed-by: for the IPoIB changes wouldn't hurt too.
- Rolf's IPoIB MGID scope changes. Certainly we want to fix this
issue but the specific changes need review.
- Eli and Michael's IPoIB stateless offload (checksum offload, LSO,
LRO, etc). It's a big series that makes quite a few core changes.
I think it needs some careful review and is probably at risk of
missing this merge window. Sorting in order of invasiveness so we
can merge at least some of it (if splitting it makes sense) might
be a good idea.
HW specific:
- I already merged patches to enable MSI-X by default for mthca and
mlx4. I hope there aren't too many systems that get hosed if a
MSI-X interrupt is generated.
- Jack and Michael's mlx4 FMR support. Will merge I guess, although
I do hope to have time to address the DMA API abuse that is being
copied from mthca, so that mlx4 and mthca work in Xen domU.
- ehca patch queue. Will merge, pending fixes for the few minor
issues I commented on.
- Steve's mthca router mode support. Would be nice to see a review
from someone at Mellanox.
- Arthur's mthca doorbell alignment fixes. I will experiment with a
few different approaches and post what I like (and fix mlx4 as
well). I hope Arthur can review.
- Michael's mlx4 WQE shrinking patch. Not sure yet; I'll reply to
the latest patch directly.
Here are a few topics that I believe will not be ready in time for the
2.6.24 window and will need to wait for 2.6.25:
- Multiple CQ event vector support. I haven't seen any discussions
about how ULPs or userspace apps should decide which vector to use,
and hence no progress has been made since we deferred this during
the 2.6.23 merge window.
- XRC. Given the length of the backlog above and the fact that a
first draft of this code has not been posted yet, I don't see any
way that we could have something this major ready in time.
Here is the complete list of patches I have in my for-2.6.24 branch
waiting for the merge window so far. Mostly I haven't merged anything
big out of my backlog, so this is essentially all
Ali Ayoub (1):
IB/sa: Error handling thinko fix
Anton Blanchard (3):
IB/fmr_pool: Clean up some error messages in fmr_pool.c
IB/ehca: Make output clearer by removing some debug messages
IB/ehca: Export module parameters in sysfs
Dotan Barak (1):
mlx4_core: Use enum value GO_BIT_TIMEOUT_MSECS
Eli Cohen (2):
IPoIB: Fix typo to end statement with ';' instead of ','
IPoIB: Fix error path memory leak
Michael S. Tsirkin (2):
mlx4_core: Enable MSI-X by default
IB/mthca: Enable MSI-X by default
Peter Oruba (1):
IB/mthca: Use PCI-X/PCI-Express read control interfaces
Roland Dreier (6):
IPoIB: Make sure no receives are handled when stopping device
IB: find_first_zero_bit() takes unsigned pointer
mlx4_core: Don't free special QPs in QP number bitmap
IB/mlx4: Use set_data_seg() in mlx4_ib_post_recv()
IB/ehca: Include <linux/mutex.h> from ehca_classes.h
IB/mlx4: Fix up SRQ limit_watermark endianness
Steve Wise (1):
RDMA/cxgb3: Make the iw_cxgb3 module parameters writable
^ permalink raw reply
* Re: [BUG] tg3 cannot do PXE (loses MAC address) after soft reboot
From: Michael Chan @ 2007-09-13 18:05 UTC (permalink / raw)
To: Lucas Nussbaum; +Cc: netdev
In-Reply-To: <20070913154110.GA25781@xanadu.blop.info>
On Thu, 2007-09-13 at 17:41 +0200, Lucas Nussbaum wrote:
> # ethtool -i eth0
> driver: tg3
> version: 3.65
> firmware-version: 5703-v2.21a
> bus-info: 0000:02:02.0
The firmware is quite old and needs to be upgraded to fix the problem.
I'll have someone contact you to get it upgraded.
>
> What do you mean by "what machine" ? The systems are Dell PowerEdge
> 1600SC, but the NICs were bought separately AFAIK.
I assumed the device was on-board which would normally require a BIOS
upgrade. For NICs, it's easier.
^ permalink raw reply
* Re: [PATCH] [RFC] allow admin/users to specify rto_min in milliseconds rather than jiffies
From: Rick Jones @ 2007-09-13 16:56 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20070913103957.5354ea5d@oldman>
> Your observations are correct. rtnetlink can't/shouldn't be doing conversions
> itself. The 'ip' command should use a consistent unit for all values and
> do conversions if necessary.
That being the case I'll start looking to see what is involved in
"leveraging" the time conversion stuff in tc for use in ip.
rick jones
^ permalink raw reply
* [PATCH] net: Fix the prototype of call_netdevice_notifiers
From: Eric W. Biederman @ 2007-09-13 15:59 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Linux Containers
This replaces the void * parameter with a struct net_device * which
is what is actually required.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
include/linux/netdevice.h | 2 +-
net/core/dev.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0106fa6..90aecc3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,7 +779,7 @@ extern void free_netdev(struct net_device *dev);
extern void synchronize_net(void);
extern int register_netdevice_notifier(struct notifier_block *nb);
extern int unregister_netdevice_notifier(struct notifier_block *nb);
-extern int call_netdevice_notifiers(unsigned long val, void *v);
+extern int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
extern struct net_device *dev_get_by_index(struct net *net, int ifindex);
extern struct net_device *__dev_get_by_index(struct net *net, int ifindex);
extern int dev_restart(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index f119dc0..cc343dd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1206,9 +1206,9 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
* are as for raw_notifier_call_chain().
*/
-int call_netdevice_notifiers(unsigned long val, void *v)
+int call_netdevice_notifiers(unsigned long val, struct net_device *dev)
{
- return raw_notifier_call_chain(&netdev_chain, val, v);
+ return raw_notifier_call_chain(&netdev_chain, val, dev);
}
/* When > 0 there are consumers of rx skb time stamps */
--
1.5.3.rc6.17.g1911
^ permalink raw reply related
* Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...
From: Jeff Garzik @ 2007-09-13 15:51 UTC (permalink / raw)
To: Ralf Baechle
Cc: Maciej W. Rozycki, Andrew Morton, netdev, linux-mips,
linux-kernel
In-Reply-To: <20070913151452.GB29665@linux-mips.org>
Ralf Baechle wrote:
> On Thu, Sep 13, 2007 at 03:13:06PM +0100, Maciej W. Rozycki wrote:
>
>> Hmm, works fine with linux-2.6.git#master. I do not recall any recent
>> activity with this driver -- I wonder what the difference is. Let me
>> see...
>
> Hmm... HEAD du jour has no differences for the sb1250-mac between lmo
> and kernel.org.
Net driver patches should apply on top of netdev-2.6.git#upstream, which
is where changes to net drivers are queued for the next release.
The closer we get to the merge window, the greater the diff between
netdev-2.6.git#upstream and linux-2.6.git#master, so
linux-2.6.git#master is not a useful comparison.
Jeff
^ permalink raw reply
* Re: [BUG] tg3 cannot do PXE (loses MAC address) after soft reboot
From: Lucas Nussbaum @ 2007-09-13 15:41 UTC (permalink / raw)
To: Michael Chan; +Cc: netdev
In-Reply-To: <1551EAE59135BE47B544934E30FC4FC002AABA5B@nt-irva-0751.brcm.ad.broadcom.com>
On 13/09/07 at 08:15 -0700, Michael Chan wrote:
> Lucas Nussbaum wrote:
>
> > This used to work, and broke between 2.6.16 and 2.6.17. Using
> > git bissect,
> > I could trace this back to that commit:
> > commit bc1c756741b065cfebf850e4164c0e2aae9d527f
> > Author: Michael Chan <mchan@broadcom.com>
> > Date: Mon Mar 20 17:48:03 2006 -0800
> > [TG3]: Support shutdown WoL.
>
> This may be caused by bugs in early versions of bootcode or
> PXE code. When tg3 powers down the PHY during shutdown, the
> MAC address will become zero in the MAC address register. The
> PXE code or bootcode needs to fetch the MAC address again from
> The NVRAM.
>
> Can you also send me ethtool -i eth0 which will provide the
> bootcode version? What machine are you using? Thanks.
# ethtool -i eth0
driver: tg3
version: 3.65
firmware-version: 5703-v2.21a
bus-info: 0000:02:02.0
What do you mean by "what machine" ? The systems are Dell PowerEdge
1600SC, but the NICs were bought separately AFAIK.
--
| Lucas Nussbaum PhD student |
| lucas.nussbaum@imag.fr LIG / Projet MESCAL |
| jabber: lucas@nussbaum.fr +33 (0)6 64 71 41 65 |
| homepage: http://www-id.imag.fr/~nussbaum/ |
^ permalink raw reply
* [PATCH] ucc_geth: fix compilation
From: Anton Vorontsov @ 2007-09-13 15:23 UTC (permalink / raw)
To: linuxppc-dev; +Cc: netdev
In-Reply-To: <20070912112456.GA15556@localhost.localdomain>
Currently qe_bd_t is used in the macro call -- dma_unmap_single,
which is a no-op on PPC32, thus error is hidden today. Starting
with 2.6.24, macro will be replaced by the empty static function,
and erroneous use of qe_bd_t will trigger compilation error.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---
Reposting this to include netdev in Cc.
drivers/net/ucc_geth.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 12e01b2..9a38dfe 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -2148,7 +2148,7 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth)
for (j = 0; j < ugeth->ug_info->bdRingLenTx[i]; j++) {
if (ugeth->tx_skbuff[i][j]) {
dma_unmap_single(NULL,
- ((qe_bd_t *)bd)->buf,
+ ((struct qe_bd *)bd)->buf,
(in_be32((u32 *)bd) &
BD_LENGTH_MASK),
DMA_TO_DEVICE);
--
1.5.0.6
^ permalink raw reply related
* Re: [BUG] tg3 cannot do PXE (loses MAC address) after soft reboot
From: Michael Chan @ 2007-09-13 15:15 UTC (permalink / raw)
To: Lucas Nussbaum, netdev
In-Reply-To: <20070913083918.GA5386@xanadu.blop.info>
Lucas Nussbaum wrote:
> This used to work, and broke between 2.6.16 and 2.6.17. Using
> git bissect,
> I could trace this back to that commit:
> commit bc1c756741b065cfebf850e4164c0e2aae9d527f
> Author: Michael Chan <mchan@broadcom.com>
> Date: Mon Mar 20 17:48:03 2006 -0800
> [TG3]: Support shutdown WoL.
This may be caused by bugs in early versions of bootcode or
PXE code. When tg3 powers down the PHY during shutdown, the
MAC address will become zero in the MAC address register. The
PXE code or bootcode needs to fetch the MAC address again from
The NVRAM.
Can you also send me ethtool -i eth0 which will provide the
bootcode version? What machine are you using? Thanks.
>
> During boot, the following messages are displayed:
> Broadcom NetXtreme Gigabit Ethernet Boot Agent v2.2.8
> [...]
> Broadcom UNDI, PXE-2.1 (build 082) v2.2.8
> [...]
> CLIENT MAC ADDR: 00 10 18 01 E5 2F GUID: 44454C4C 4800 1052 8032
> B9C04F53304A
>
> After a soft reboot, the last line is changed to:
> CLIENT MAC ADDR: 00 00 00 00 00 00 GUID: 44454C4C 4800 1052 8032
> B9C04F53304A
>
> lspci -v for the card:
> 02:02.0 Ethernet controller: Broadcom Corporation NetXtreme
> BCM5703X Gigabit Ethernet (rev 02)
> Subsystem: Broadcom Corporation NetXtreme BCM5703 1000Base-T
> Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 177
> Memory at fcf00000 (64-bit, non-prefetchable) [size=64K]
> Capabilities: [40] PCI-X non-bridge device
> Capabilities: [48] Power Management version 2
> Capabilities: [50] Vital Product Data
> Capabilities: [58] Message Signalled Interrupts:
> Mask- 64bit+ Queue=0/3 Enable-
>
> Thank you,
> --
> | Lucas Nussbaum PhD student |
> | lucas.nussbaum@imag.fr LIG / Projet MESCAL |
> | jabber: lucas@nussbaum.fr +33 (0)6 64 71 41 65 |
> | homepage: http://www-id.imag.fr/~nussbaum/ |
>
>
^ permalink raw reply
* Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...
From: Ralf Baechle @ 2007-09-13 15:14 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: Jeff Garzik, Andrew Morton, netdev, linux-mips, linux-kernel
In-Reply-To: <Pine.LNX.4.64N.0709131506040.31069@blysk.ds.pg.gda.pl>
On Thu, Sep 13, 2007 at 03:13:06PM +0100, Maciej W. Rozycki wrote:
> Hmm, works fine with linux-2.6.git#master. I do not recall any recent
> activity with this driver -- I wonder what the difference is. Let me
> see...
Hmm... HEAD du jour has no differences for the sb1250-mac between lmo
and kernel.org.
Ralf
^ permalink raw reply
* Re: Distributed storage. Security attributes and ducumentation update.
From: Paul E. McKenney @ 2007-09-13 15:03 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: netdev, linux-kernel, linux-fsdevel
In-Reply-To: <20070913122259.GA20714@2ka.mipt.ru>
On Thu, Sep 13, 2007 at 04:22:59PM +0400, Evgeniy Polyakov wrote:
> Hi Paul.
>
> On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> > > Further TODO list includes:
> > > * implement optional saving of mirroring/linear information on the remote
> > > nodes (simple)
> > > * implement netlink based setup (simple)
> > > * new redundancy algorithm (complex)
> > >
> > > Homepage:
> > > http://tservice.net.ru/~s0mbre/old/?section=projects&item=dst
> >
> > A couple questions below, but otherwise looks good from an RCU viewpoint.
> >
> > Thanx, Paul
>
> Thanks for your comments, and sorry for late reply I was at KS/London
> trip.
> > > + if (--num) {
> > > + list_for_each_entry_rcu(n, &node->shared, shared) {
> >
> > This function is called under rcu_read_lock() or similar, right?
> > (Can't tell from this patch.) It is also OK to call it from under the
> > update-side mutex, of course.
>
> Actually not, but it does not require it, since entry can not be removed
> during this operations since appropriate reference counter for given node is
> being held. It should not be RCU at all.
Ah! Yes, it is OK to use _rcu in this case, but should be avoided
unless doing so eliminates duplicate code or some such. So, agree
with dropping _rcu in this case.
> > > +static int dst_mirror_read(struct dst_request *req)
> > > +{
> > > + struct dst_node *node = req->node, *n, *min_dist_node;
> > > + struct dst_mirror_priv *priv = node->priv;
> > > + u64 dist, d;
> > > + int err;
> > > +
> > > + req->bio_endio = &dst_mirror_read_endio;
> > > +
> > > + do {
> > > + err = -ENODEV;
> > > + min_dist_node = NULL;
> > > + dist = -1ULL;
> > > +
> > > + /*
> > > + * Reading is never performed from the node under resync.
> > > + * If this will cause any troubles (like all nodes must be
> > > + * resynced between each other), this check can be removed
> > > + * and per-chunk dirty bit can be tested instead.
> > > + */
> > > +
> > > + if (!test_bit(DST_NODE_NOTSYNC, &node->flags)) {
> > > + priv = node->priv;
> > > + if (req->start > priv->last_start)
> > > + dist = req->start - priv->last_start;
> > > + else
> > > + dist = priv->last_start - req->start;
> > > + min_dist_node = req->node;
> > > + }
> > > +
> > > + list_for_each_entry_rcu(n, &node->shared, shared) {
> >
> > I see one call to this function that appears to be under the update-side
> > mutex, but I cannot tell if the other calls are safe. (Safe as in either
> > under the update-side mutex or under rcu_read_lock() and friends.)
>
> The same here - those processing function are called from
> generic_make_request() from any lock on top of them. Each node is linked
> into the list of the first added node, which reference counter is
> increased in higher layer. Right now there is no way to add or remove
> nodes after array was started, such functionality requires storage tree
> lock to be taken and RCU can not be used (since it requires sleeping and
> I did not investigate sleepable RCU for this purpose).
>
> So, essentially RCU is not used in DST :)
Works for me! "Use the right tool for the job!"
> Thanks for review, Paul.
Thanx, Paul
^ permalink raw reply
* RE: [PATCH v3] Make the pr_*() family of macros in kernel.hcomplete
From: Medve Emilian-EMMEDVE1 @ 2007-09-13 14:32 UTC (permalink / raw)
To: linux-kernel, netdev, i2c, linux-omap-open-source
In-Reply-To: <1189660274.19708.125.camel@localhost>
Hello Joe,
> I expect all the kernel logging functions to be
> overhauled eventually.
>
> I'd prefer a mechanism that somehow supports
> identifying complete messages. I think the new
> pr_<level> functions are not particularly useful
> without a mechanism to avoid or identify multiple
> processors or threads interleaving partial in-progress
> multiple statement messages.
I agree with you that one can think and propose an improved kernel
logging system, but that might be an incremental effort. For now,
patches like the ones you or I sent are a step in the general direction
of improving kernel logging, fix an inconsistency and increase the
probability of people logging kernel message as intended (i.e. at a
minimum, with a loglevel). I don't think that this hurts or delays the
perceived urgency of getting a sub-optimal kernel logging mechanism...
> At some point, sooner or later, the logging functions
> will be improved. Apparently, more likely later.
I'm not sure way must it be later or why the resistance about a little
better and sooner.
Cheerios,
Emil.
^ permalink raw reply
* Re: [CORRECTION][PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c
From: Jeff Garzik @ 2007-09-13 14:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: Micah Gruber, linux-kernel, netdev, Grant Grundler
In-Reply-To: <20070913020346.979647c1.akpm@linux-foundation.org>
Andrew Morton wrote:
> --- a/drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt
> +++ a/drivers/net/tulip/uli526x.c
> @@ -666,11 +666,6 @@ static irqreturn_t uli526x_interrupt(int
> unsigned long ioaddr = dev->base_addr;
> unsigned long flags;
>
> - if (!dev) {
> - ULI526X_DBUG(1, "uli526x_interrupt() without DEVICE arg", 0);
> - return IRQ_NONE;
> - }
> -
correct / ACK
^ permalink raw reply
* Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...
From: Maciej W. Rozycki @ 2007-09-13 14:13 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Andrew Morton, netdev, linux-mips, linux-kernel
In-Reply-To: <46E8B56E.7060705@pobox.com>
On Wed, 12 Sep 2007, Jeff Garzik wrote:
> > Remove typedefs, volatiles and convert kmalloc()/memset() pairs to
> > kcalloc(). Also reformat the surrounding clutter.
> >
> > Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
> > ---
>
> ACK, but patch does not apply cleanly to netdev-2.6.git#upstream (nor -mm)
Hmm, works fine with linux-2.6.git#master. I do not recall any recent
activity with this driver -- I wonder what the difference is. Let me
see...
Maciej
^ permalink raw reply
* Re: [RFC v2 PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
From: Vlad Yasevich @ 2007-09-13 13:46 UTC (permalink / raw)
To: Sridhar Samudrala; +Cc: netdev, lksctp-developers
In-Reply-To: <1189638236.27182.12.camel@w-sridhar2.beaverton.ibm.com>
Hi Sridhar
Sridhar Samudrala wrote:
> Vlad,
>
> few minor comments inline.
> otherwise, looks good.
>
> Thanks
> Sridhar
>
>> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
>> index f8aa23d..54ff472 100644
>> --- a/net/sctp/ipv6.c
>> +++ b/net/sctp/ipv6.c
>> @@ -77,13 +77,18 @@
>>
>> #include <asm/uaccess.h>
>>
>> -/* Event handler for inet6 address addition/deletion events. */
>> +/* Event handler for inet6 address addition/deletion events.
>> + * This even is part of the atomic notifier call chain
>> + * and thus happens atomically and can NOT sleep. As a result
>> + * we can't and really don't need to add any locks to guard the
>> + * RCU.
>> + */
>
> Now that we are adding a spin_lock, the above comment is not valid.
> It should be fixed saying that we still need a lock because we use the
> same list for both inet and inet6 address events and they can happen in
> parallel.
Yes, I forgot to fix this comment. Will do.
>> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
>> index e98579b..4688559 100644
>> --- a/net/sctp/protocol.c
>> +++ b/net/sctp/protocol.c
>> @@ -153,6 +153,8 @@ static void sctp_v4_copy_addrlist(struct list_head *addrlist,
>> addr->a.v4.sin_family = AF_INET;
>> addr->a.v4.sin_port = 0;
>> addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
>> + addr->valid = 1;
>> + INIT_RCU_HEAD(&addr->rcu);
>
> This has nothing to do with this patch, but i noticed that
> INIT_LIST_HEAD(&addr->list) is missing here when comparing with
> earlier v6 version of this routine.
Hmm... I thought it looked a little different, but didn't pay too much
attention to it. I'll add a follow-on patch to fix this.
Thanks
-vlad
^ permalink raw reply
* Re: [RFC] af_packet: allow disabling timestamps
From: Eric Dumazet @ 2007-09-13 12:24 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Unai Uribarri, David S. Miller, Evgeniy Polyakov, netdev
In-Reply-To: <20070913124253.60da52f2@oldman>
On Thu, 13 Sep 2007 12:42:53 +0200
Stephen Hemminger <shemminger@linux-foundation.org> wrote:
> Currently, af_packet does not allow disabling timestamps. This patch changes
> that but doesn't force global timestamps on.
>
> This shows up in bugzilla as:
> http://bugzilla.kernel.org/show_bug.cgi?id=4809
>
> Patch against net-2.6.24 tree.
>
I am not sure I understood this patch.
This means that tcpdump/ethereal wont get precise timestamps
(gathered when packet is received), but imprecise ones (gathered when the sniffer reads the packet)
I added some time ago ktime infrastructure to eventually get nanosecond
precision in libpcap, so I would prefer a step in the right direction :)
Should'nt we use something like :
[PATCH] af_packet : allow disabling timestamps, or requesting nanosecond precision.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
diff --git a/net/core/sock.c b/net/core/sock.c
index 5a16e38..1c10b9d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -563,6 +563,7 @@ set_rcvbuf:
} else {
sock_reset_flag(sk, SOCK_RCVTSTAMP);
sock_reset_flag(sk, SOCK_RCVTSTAMPNS);
+ sock_disable_timestamp(sk);
}
break;
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 745e2cb..409de44 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -650,12 +650,27 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, struct packe
h->tp_snaplen = snaplen;
h->tp_mac = macoff;
h->tp_net = netoff;
- if (skb->tstamp.tv64)
- tv = ktime_to_timeval(skb->tstamp);
- else
- do_gettimeofday(&tv);
- h->tp_sec = tv.tv_sec;
- h->tp_usec = tv.tv_usec;
+ h->tp_sec = 0;
+ h->tp_usec = 0;
+ if ((sock_flag(sk, SOCK_TIMESTAMP))) {
+ if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
+ struct timespec ts;
+ if (skb->tstamp.tv64)
+ ts = ktime_to_timespec(skb->tstamp);
+ else
+ getnstimeofday(&ts);
+ h->tp_sec = ts.tv_sec;
+ h->tp_usec = ts.tv_nsec; /* cheat a litle bit */
+ }
+ else {
+ if (skb->tstamp.tv64)
+ tv = ktime_to_timeval(skb->tstamp);
+ else
+ do_gettimeofday(&tv);
+ h->tp_sec = tv.tv_sec;
+ h->tp_usec = tv.tv_usec;
+ }
+ }
sll = (struct sockaddr_ll*)((u8*)h + TPACKET_ALIGN(sizeof(*h)));
sll->sll_halen = 0;
@@ -1014,6 +1029,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol)
sock->ops = &packet_ops_spkt;
sock_init_data(sock, sk);
+ sock_enable_timestamp(sk);
po = pkt_sk(sk);
sk->sk_family = PF_PACKET;
^ permalink raw reply related
* Re: Distributed storage. Security attributes and ducumentation update.
From: Evgeniy Polyakov @ 2007-09-13 12:22 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: netdev, linux-kernel, linux-fsdevel
In-Reply-To: <20070910221445.GL11801@linux.vnet.ibm.com>
Hi Paul.
On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> > Further TODO list includes:
> > * implement optional saving of mirroring/linear information on the remote
> > nodes (simple)
> > * implement netlink based setup (simple)
> > * new redundancy algorithm (complex)
> >
> > Homepage:
> > http://tservice.net.ru/~s0mbre/old/?section=projects&item=dst
>
> A couple questions below, but otherwise looks good from an RCU viewpoint.
>
> Thanx, Paul
Thanks for your comments, and sorry for late reply I was at KS/London
trip.
> > + if (--num) {
> > + list_for_each_entry_rcu(n, &node->shared, shared) {
>
> This function is called under rcu_read_lock() or similar, right?
> (Can't tell from this patch.) It is also OK to call it from under the
> update-side mutex, of course.
Actually not, but it does not require it, since entry can not be removed
during this operations since appropriate reference counter for given node is
being held. It should not be RCU at all.
> > +static int dst_mirror_read(struct dst_request *req)
> > +{
> > + struct dst_node *node = req->node, *n, *min_dist_node;
> > + struct dst_mirror_priv *priv = node->priv;
> > + u64 dist, d;
> > + int err;
> > +
> > + req->bio_endio = &dst_mirror_read_endio;
> > +
> > + do {
> > + err = -ENODEV;
> > + min_dist_node = NULL;
> > + dist = -1ULL;
> > +
> > + /*
> > + * Reading is never performed from the node under resync.
> > + * If this will cause any troubles (like all nodes must be
> > + * resynced between each other), this check can be removed
> > + * and per-chunk dirty bit can be tested instead.
> > + */
> > +
> > + if (!test_bit(DST_NODE_NOTSYNC, &node->flags)) {
> > + priv = node->priv;
> > + if (req->start > priv->last_start)
> > + dist = req->start - priv->last_start;
> > + else
> > + dist = priv->last_start - req->start;
> > + min_dist_node = req->node;
> > + }
> > +
> > + list_for_each_entry_rcu(n, &node->shared, shared) {
>
> I see one call to this function that appears to be under the update-side
> mutex, but I cannot tell if the other calls are safe. (Safe as in either
> under the update-side mutex or under rcu_read_lock() and friends.)
The same here - those processing function are called from
generic_make_request() from any lock on top of them. Each node is linked
into the list of the first added node, which reference counter is
increased in higher layer. Right now there is no way to add or remove
nodes after array was started, such functionality requires storage tree
lock to be taken and RCU can not be used (since it requires sleeping and
I did not investigate sleepable RCU for this purpose).
So, essentially RCU is not used in DST :)
Thanks for review, Paul.
--
Evgeniy Polyakov
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox