* [PATCH 0/5] [DCCP]: Queuing policies
@ 2008-04-11 10:24 Tomasz Grobelny
2008-04-14 6:50 ` Gerrit Renker
0 siblings, 1 reply; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-11 10:24 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: dccp, netdev
This is resubmission of queuing policy related patches for DCCP. Queuing
policies allow userspace applications to more effectively use DCCP. Changes
include:
- CodingStyle,
- do not allow applications to change qpolicy after accept or connect,
- naming changes,
- make dccps_qpolicy a non-pointer,
- include dccpd_policy in dccp_skb_cb as a structure not char[16].
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 0/5] [DCCP]: Queuing policies
2008-04-11 10:24 [PATCH 0/5] [DCCP]: Queuing policies Tomasz Grobelny
@ 2008-04-14 6:50 ` Gerrit Renker
2008-04-14 7:39 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Gerrit Renker
0 siblings, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-14 6:50 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
| This is resubmission of queuing policy related patches for DCCP. Queuing
| policies allow userspace applications to more effectively use DCCP.
Thanks a lot -- all patches applied cleanly. As promised, I have created
a tree "qpolicy" for these patches, which can be checked out via
# using a recent netdev-2.6 kernel
git checkout -b qpolicy master
git pull git://eden-feed.erg.abdn.ac.uk/dccp_exp qpolicy
I have left your patches unchanged (apart from adapting them to apply
on the most recent base).
There are four other patches in this repository: I am interested in merging
your patches into the test tree and thus have worked on integrating it.
Rather than sending those new 4 patches individually, I will send the result
(combined with your patch set) as a single RFC patch to the list.
Please have a look at the repository and the subsequent patch -- I would
like to know if you are ok with the changes and/or discuss these.
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-14 6:50 ` Gerrit Renker
@ 2008-04-14 7:39 ` Gerrit Renker
2008-04-14 23:45 ` Tomasz Grobelny
0 siblings, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-14 7:39 UTC (permalink / raw)
To: Tomasz Grobelny, Arnaldo Carvalho de Melo, dccp, netdev
[DCCP] [RFC]: Reworked queueing policy patch set
This is a re-worked version of Tomasz's priority-queueing patch set for
DCCP. The essential internals of his patch set have been left untouched,
in particular the prio/simple algorithms are the same as before.
The changes for which I am asking for comments are:
1. Fewer files:
---------------
The patch set originally introduced 4 new files:
50 net/dccp/qpolicy.c
48 net/dccp/qpolicy.h
70 net/dccp/qpolicy_prio.c
36 net/dccp/qpolicy_simple.c
Given that each file has less than 100 lines, it seemed better to put them
all into one file and export the declarations in dccp.h, the result is now:
143 net/dccp/qpolicy.c
With less than 150 lines, there is still room for a few more policies.
2. Let pop() return result
--------------------------
* at least for the forseeable future, the underlying queue is sk_write_queue;
* an skb can not be on two lists at the same time;
* thus skb_unlink will always be part of implementing pop();
* having pop() return the skb is more versatile (compare Assembler pop()).
3. Hide internals of implementation
-----------------------------------
There had been discussion on hiding the internals, the present approach
* uses symbolic identifiers (enum) to refer to policies;
* makes the function table local to qpolicy.c, so that
* dccp_sock only needs to take the policy identifier;
* makes tx_qlen a socket member of dccp_sock (there was a bug in the
implementation -- tx_qlen was shared by all sockets using the same policy);
* default policy as well as inheriting policy to child socket now implicit:
- default policy (ID=0) set due to zeroing socket on allocation,
- inheriting policy due to performing sock_copy() on the policy ID;
4. Clerical changes
-------------------
Some work done here, such as adding documentation, aligning identifiers etc.
If people (Tomasz in particular) are ok with the changes, I would like to add
this to the test tree, otherwise will keep it separate as subtree "qpolicy" in
git://eden-feed.erg.abdn.ac.uk/dccp_exp
during discussions.
Gerrit
---
Documentation/networking/dccp.txt | 11 ++
include/linux/dccp.h | 20 +++++
net/dccp/Makefile | 2
net/dccp/dccp.h | 45 +++++++----
net/dccp/output.c | 6 -
net/dccp/proto.c | 26 ++++++
net/dccp/qpolicy.c | 143 ++++++++++++++++++++++++++++++++++++++
7 files changed, 231 insertions(+), 22 deletions(-)
--- /dev/null
+++ b/net/dccp/qpolicy.c
@@ -0,0 +1,143 @@
+/*
+ * net/dccp/qpolicy.c
+ *
+ * An implementation of the DCCP protocol
+ *
+ * Copyright (c) 2008 Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License v2
+ * as published by the Free Software Foundation.
+ */
+#include "dccp.h"
+
+/*
+ * Simple Dequeueing Policy
+ */
+static void simple_push(struct sock *sk, struct sk_buff *skb,
+ const struct dccp_packet_info *dcp)
+{
+ skb_queue_tail(&sk->sk_write_queue, skb);
+}
+
+static int simple_full(struct sock *sk)
+{
+ return dccp_sk(sk)->dccps_tx_qlen &&
+ sk->sk_write_queue.qlen >= dccp_sk(sk)->dccps_tx_qlen;
+}
+
+static struct sk_buff *simple_top(struct sock *sk)
+{
+ return skb_peek(&sk->sk_write_queue);
+}
+
+/*
+ * Priority-based Dequeueing Policy
+ */
+static struct sk_buff *prio_get_worst_skb(struct sock *sk)
+{
+ struct sk_buff *skb, *worst = NULL;
+ int worstp = -128;
+
+ skb_queue_walk(&sk->sk_write_queue, skb)
+ if (DCCP_SKB_CB(skb)->dccpd_policy.priority >= worstp) {
+ worstp = DCCP_SKB_CB(skb)->dccpd_policy.priority;
+ worst = skb;
+ }
+ return worst;
+}
+
+static void prio_push(struct sock *sk, struct sk_buff *skb,
+ const struct dccp_packet_info *dcp)
+{
+ struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
+
+ dcb->dccpd_policy.priority = 127;
+ if (dcp)
+ memcpy(&dcb->dccpd_policy, dcp, sizeof(*dcp));
+
+ skb_queue_tail(&sk->sk_write_queue, skb);
+
+ if (sk->sk_write_queue.qlen >= dccp_sk(sk)->dccps_tx_qlen) {
+ struct sk_buff *worst = prio_get_worst_skb(sk);
+
+ skb_unlink(worst, &sk->sk_write_queue);
+ kfree_skb(worst);
+ }
+}
+
+/** we can always push a packet into the queue => queue is never full */
+static int prio_full(struct sock *sk)
+{
+ return 0;
+}
+
+static struct sk_buff *prio_get_best_skb(struct sock *sk)
+{
+ struct sk_buff *skb, *best = NULL;
+ int bestp = 127;
+
+ skb_queue_walk(&sk->sk_write_queue, skb)
+ if (DCCP_SKB_CB(skb)->dccpd_policy.priority <= bestp) {
+ bestp = DCCP_SKB_CB(skb)->dccpd_policy.priority;
+ best = skb;
+ }
+ return best;
+}
+
+/**
+ * struct dccp_qpolicy_operations - TX Packet Dequeueing Interface
+ * @push: add a new @skb with possibly a struct dccp_packet_info
+ * @full: indicates that no more packets will be admitted
+ * @top: peeks at whatever the queueing policy defines as top
+ */
+static struct dccp_qpolicy_operations {
+ void (*push) (struct sock *sk, struct sk_buff *skb,
+ const struct dccp_packet_info *dcp);
+ int (*full) (struct sock *sk);
+ struct sk_buff* (*top) (struct sock *sk);
+
+} qpol_table[DCCPQ_POLICY_MAX] = {
+ [DCCPQ_POLICY_SIMPLE] = {
+ .push = simple_push,
+ .full = simple_full,
+ .top = simple_top,
+ },
+ [DCCPQ_POLICY_PRIO] = {
+ .push = prio_push,
+ .full = prio_full,
+ .top = prio_get_best_skb,
+ },
+};
+
+/*
+ * Wrappers for methods provided by policy currently in use
+ */
+void qpolicy_push(struct sock *sk, struct sk_buff *skb, struct msghdr *msg)
+{
+ struct dccp_packet_info *dcp = NULL;
+
+ if (msg->msg_control != NULL && msg->msg_controllen == sizeof(*dcp))
+ dcp = (struct dccp_packet_info *)msg->msg_control;
+
+ qpol_table[dccp_sk(sk)->dccps_qpolicy].push(sk, skb, dcp);
+}
+
+int qpolicy_full(struct sock *sk)
+{
+ return qpol_table[dccp_sk(sk)->dccps_qpolicy].full(sk);
+}
+
+struct sk_buff *qpolicy_top(struct sock *sk)
+{
+ return qpol_table[dccp_sk(sk)->dccps_qpolicy].top(sk);
+}
+
+struct sk_buff *qpolicy_pop(struct sock *sk)
+{
+ struct sk_buff *skb = qpolicy_top(sk);
+
+ if (skb)
+ skb_unlink(skb, &sk->sk_write_queue);
+ return skb;
+}
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -195,6 +195,20 @@ enum dccp_feature_numbers {
DCCPF_MAX_CCID_SPECIFIC = 255,
};
+/* DCCP TX Dequeueing Policy */
+enum dccp_packet_dequeueing_policy {
+ DCCPQ_POLICY_SIMPLE,
+ DCCPQ_POLICY_PRIO,
+ DCCPQ_POLICY_MAX
+};
+/**
+ * struct dccp_packet_info - Per-packet TX dequeueing information
+ * @priority: packet priority, lower value => packet more likely to be sent
+ */
+struct dccp_packet_info {
+ __s8 priority;
+};
+
/* DCCP socket options */
#define DCCP_SOCKOPT_PACKET_SIZE 1 /* XXX deprecated, without effect */
#define DCCP_SOCKOPT_SERVICE 2
@@ -208,6 +222,8 @@ enum dccp_feature_numbers {
#define DCCP_SOCKOPT_CCID 13
#define DCCP_SOCKOPT_TX_CCID 14
#define DCCP_SOCKOPT_RX_CCID 15
+#define DCCP_SOCKOPT_QPOLICY_ID 16
+#define DCCP_SOCKOPT_QPOLICY_TXQLEN 17
#define DCCP_SOCKOPT_CCID_RX_INFO 128
#define DCCP_SOCKOPT_CCID_TX_INFO 192
@@ -459,6 +475,8 @@ struct dccp_ackvec;
* @dccps_hc_rx_ccid - CCID used for the receiver (or receiving half-connection)
* @dccps_hc_tx_ccid - CCID used for the sender (or sending half-connection)
* @dccps_options_received - parsed set of retrieved options
+ * @dccps_qpolicy - TX dequeueing policy, one of %dccp_packet_dequeueing_policy
+ * @dccps_tx_qlen - maximum length of the TX queue
* @dccps_role - role of this sock, one of %dccp_role
* @dccps_hc_rx_insert_options - receiver wants to add options when acking
* @dccps_hc_tx_insert_options - sender wants to add options when sending
@@ -501,6 +519,8 @@ struct dccp_sock {
struct ccid *dccps_hc_rx_ccid;
struct ccid *dccps_hc_tx_ccid;
struct dccp_options_received dccps_options_received;
+ __u8 dccps_qpolicy;
+ __u32 dccps_tx_qlen;
enum dccp_role dccps_role:2;
__u8 dccps_hc_rx_insert_options:1;
__u8 dccps_hc_tx_insert_options:1;
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -187,6 +187,7 @@ int dccp_init_sock(struct sock *sk, cons
dp->dccps_rate_last = jiffies;
dp->dccps_role = DCCP_ROLE_UNDEFINED;
dp->dccps_service = DCCP_SERVICE_CODE_IS_ABSENT;
+ dp->dccps_tx_qlen = sysctl_dccp_tx_qlen;
INIT_LIST_HEAD(&dp->dccps_featneg);
/* control socket doesn't need feat nego */
@@ -540,6 +541,20 @@ static int do_dccp_setsockopt(struct soc
case DCCP_SOCKOPT_RECV_CSCOV:
err = dccp_setsockopt_cscov(sk, val, true);
break;
+ case DCCP_SOCKOPT_QPOLICY_ID:
+ if (sk->sk_state != DCCP_CLOSED)
+ err = -EISCONN;
+ else if (val < 0 || val >= DCCPQ_POLICY_MAX)
+ err = -EINVAL;
+ else
+ dp->dccps_qpolicy = val;
+ break;
+ case DCCP_SOCKOPT_QPOLICY_TXQLEN:
+ if (val < 0)
+ err = -EINVAL;
+ else
+ dp->dccps_tx_qlen = val;
+ break;
default:
err = -ENOPROTOOPT;
break;
@@ -643,6 +658,12 @@ static int do_dccp_getsockopt(struct soc
case DCCP_SOCKOPT_RECV_CSCOV:
val = dp->dccps_pcrlen;
break;
+ case DCCP_SOCKOPT_QPOLICY_ID:
+ val = dp->dccps_qpolicy;
+ break;
+ case DCCP_SOCKOPT_QPOLICY_TXQLEN:
+ val = dp->dccps_tx_qlen;
+ break;
case 128 ... 191:
return ccid_hc_rx_getsockopt(dp->dccps_hc_rx_ccid, sk, optname,
len, (u32 __user *)optval, optlen);
@@ -700,8 +721,7 @@ int dccp_sendmsg(struct kiocb *iocb, str
lock_sock(sk);
- if (sysctl_dccp_tx_qlen &&
- (sk->sk_write_queue.qlen >= sysctl_dccp_tx_qlen)) {
+ if (qpolicy_full(sk)) {
rc = -EAGAIN;
goto out_release;
}
@@ -729,7 +749,7 @@ int dccp_sendmsg(struct kiocb *iocb, str
if (rc != 0)
goto out_discard;
- skb_queue_tail(&sk->sk_write_queue, skb);
+ qpolicy_push(sk, skb, msg);
dccp_write_xmit(sk);
out_release:
release_sock(sk);
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -215,6 +215,18 @@ extern void dccp_reqsk_send_ack(struct s
extern void dccp_send_sync(struct sock *sk, const u64 seq,
const enum dccp_pkt_type pkt_type);
+/*
+ * TX Packet Dequeueing Interface
+ */
+extern void qpolicy_push(struct sock *sk, struct sk_buff *skb,
+ struct msghdr *msg);
+extern int qpolicy_full(struct sock *sk);
+extern struct sk_buff *qpolicy_top(struct sock *sk);
+extern struct sk_buff *qpolicy_pop(struct sock *sk);
+
+/*
+ * TX Packet Output and TX Timers
+ */
extern void dccp_write_xmit(struct sock *sk);
extern void dccp_write_space(struct sock *sk);
extern void dccp_flush_write_queue(struct sock *sk, long *time_budget);
@@ -313,14 +325,16 @@ static inline int dccp_bad_service_code(
/**
* dccp_skb_cb - DCCP per-packet control information
- * @dccpd_type: one of %dccp_pkt_type (or unknown)
- * @dccpd_ccval: CCVal field (5.1), see e.g. RFC 4342, 8.1
- * @dccpd_reset_code: one of %dccp_reset_codes
- * @dccpd_reset_data: Data1..3 fields (depend on @dccpd_reset_code)
- * @dccpd_opt_len: total length of all options (5.8) in the packet
- * @dccpd_seq: sequence number
- * @dccpd_ack_seq: acknowledgment number subheader field value
- * This is used for transmission as well as for reception.
+ * @header: IPv4/6 parameters
+ * @dccpd_type: one of %dccp_pkt_type (or unknown)
+ * @dccpd_ccval: CCVal field (5.1), see e.g. RFC 4342, 8.1
+ * @dccpd_reset_code: one of %dccp_reset_codes
+ * @dccpd_reset_data: Data1..3 fields (depend on @dccpd_reset_code)
+ * @dccpd_opt_len: total length of all options (5.8) in the packet
+ * @dccpd_seq: sequence number
+ * @dccpd_ack_seq: acknowledgment number subheader field value
+ * @dccpd_policy: packet dequeueing policy (TX packets only)
+ * This struct is used for transmission as well as for reception.
*/
struct dccp_skb_cb {
union {
@@ -329,13 +343,14 @@ struct dccp_skb_cb {
struct inet6_skb_parm h6;
#endif
} header;
- __u8 dccpd_type:4;
- __u8 dccpd_ccval:4;
- __u8 dccpd_reset_code,
- dccpd_reset_data[3];
- __u16 dccpd_opt_len;
- __u64 dccpd_seq;
- __u64 dccpd_ack_seq;
+ __u8 dccpd_type:4;
+ __u8 dccpd_ccval:4;
+ __u8 dccpd_reset_code,
+ dccpd_reset_data[3];
+ __u16 dccpd_opt_len;
+ __u64 dccpd_seq;
+ __u64 dccpd_ack_seq;
+ struct dccp_packet_info dccpd_policy;
};
#define DCCP_SKB_CB(__skb) ((struct dccp_skb_cb *)&((__skb)->cb[0]))
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -45,6 +45,17 @@ http://linux-net.osdl.org/index.php/DCCP
Socket options
==============
+DCCP_SOCKOPT_QPOLICY_ID sets the dequeueing policy for outgoing packets. It
+takes an integer number as argument and can only be set before establishing
+a connection (i.e. changes during an established connection are not supported).
+Currently defined IDs are "simple" (DCCPQ_POLICY_SIMPLE) and a priority-based
+variant (DCCPQ_POLICY_PRIO). In the latter case, a struct dccp_packet_info can
+be passed as ancillary data in the msg_control field of sendmsg(2).
+
+DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum TX queue length. This is relevant only for
+those TX dequeueing policies that honour this upper bound (currently only the
+simple/default policy does this). This socket option overrides the default value
+of the sysctl /proc/sys/net/dccp/default/tx_qlen.
DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
service codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -238,7 +238,7 @@ static void dccp_xmit_packet(struct sock
{
int err, len;
struct dccp_sock *dp = dccp_sk(sk);
- struct sk_buff *skb = skb_dequeue(&sk->sk_write_queue);
+ struct sk_buff *skb = qpolicy_pop(sk);
if (unlikely(skb == NULL))
return;
@@ -328,7 +328,7 @@ void dccp_write_xmit(struct sock *sk)
struct dccp_sock *dp = dccp_sk(sk);
struct sk_buff *skb;
- while ((skb = skb_peek(&sk->sk_write_queue))) {
+ while ((skb = qpolicy_top(sk))) {
int rc = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb);
switch (ccid_packet_dequeue_eval(rc)) {
@@ -342,7 +342,7 @@ void dccp_write_xmit(struct sock *sk)
dccp_xmit_packet(sk);
break;
case CCID_PACKET_ERR:
- skb_dequeue(&sk->sk_write_queue);
+ qpolicy_pop(sk);
kfree_skb(skb);
dccp_pr_debug("packet discarded due to err=%d\n", rc);
}
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -1,7 +1,7 @@
obj-$(CONFIG_IP_DCCP) += dccp.o dccp_ipv4.o
dccp-y := ccid.o feat.o input.o minisocks.o options.o \
- output.o proto.o timer.o ackvec.o
+ output.o proto.o timer.o ackvec.o qpolicy.o
dccp_ipv4-y := ipv4.o
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-14 7:39 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Gerrit Renker
@ 2008-04-14 23:45 ` Tomasz Grobelny
2008-04-15 15:14 ` Gerrit Renker
0 siblings, 1 reply; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-14 23:45 UTC (permalink / raw)
To: Gerrit Renker, Arnaldo Carvalho de Melo, dccp, netdev
Dnia Monday 14 of April 2008, Gerrit Renker napisał:
> [DCCP] [RFC]: Reworked queueing policy patch set
>
> This is a re-worked version of Tomasz's priority-queueing patch set for
> DCCP. The essential internals of his patch set have been left untouched,
> in particular the prio/simple algorithms are the same as before.
>
> The changes for which I am asking for comments are:
>
> Given that each file has less than 100 lines, it seemed better to put
> them all into one file and export the declarations in dccp.h, the result is
> now:
>
> With less than 150 lines, there is still room for a few more policies.
>
For now the files are small but I would expect at least qpolicy_prio.c to grow
a little bit in the future. What comes to mind are get/setsockopts (for
example statistics about packets dropped by qpolicy), and per packet expiry
times. I didn't look how big the files are but wanted to have clear
separation between specific policies and common code. But if you think that
this organization is better I have no problem with it.
> 2. Let pop() return result
> --------------------------
> * at least for the forseeable future, the underlying queue is
> sk_write_queue; * an skb can not be on two lists at the same time;
> * thus skb_unlink will always be part of implementing pop();
> * having pop() return the skb is more versatile (compare Assembler
> pop()).
>
That should be fine...
> * makes tx_qlen a socket member of dccp_sock (there was a bug in the
> implementation -- tx_qlen was shared by all sockets using the same
> policy);
Haven't tested it thoroughly but it shouldn't be so. After all qpol_txqlen was
part of struct dccp_qpolicy which was in turn part of dccp_sock.
> +/*
> + * Wrappers for methods provided by policy currently in use
> + */
> +void qpolicy_push(struct sock *sk, struct sk_buff *skb, struct msghdr
> *msg) +{
> + struct dccp_packet_info *dcp = NULL;
> +
> + if (msg->msg_control != NULL && msg->msg_controllen == sizeof(*dcp))
> + dcp = (struct dccp_packet_info *)msg->msg_control;
> +
> + qpol_table[dccp_sk(sk)->dccps_qpolicy].push(sk, skb, dcp);
> +}
> +
What happens if application is compiled with dccp_packet_info containing only
one field (priority) and kernel containing two fields (for example priority
and later added expiry time)? Wouldn't that implementation make extensions to
dccp_packet_info impossible?
> @@ -501,6 +519,8 @@ struct dccp_sock {
> struct ccid *dccps_hc_rx_ccid;
> struct ccid *dccps_hc_tx_ccid;
> struct dccp_options_received dccps_options_received;
> + __u8 dccps_qpolicy;
> + __u32 dccps_tx_qlen;
> enum dccp_role dccps_role:2;
> __u8 dccps_hc_rx_insert_options:1;
> __u8 dccps_hc_tx_insert_options:1;
I know that currently none of the policies has any per-socket data. But if it
had were should it go?
> + case DCCP_SOCKOPT_QPOLICY_ID:
> + if (sk->sk_state != DCCP_CLOSED)
> + err = -EISCONN;
What about DCCP_LISTENING state?
> --- a/Documentation/networking/dccp.txt
> +++ b/Documentation/networking/dccp.txt
> @@ -45,6 +45,17 @@ http://linux-net.osdl.org/index.php/DCCP
>
> +DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum TX queue length. This is
> relevant only for +those TX dequeueing policies that honour this upper
> bound (currently only the +simple/default policy does this). This socket
> option overrides the default value +of the sysctl
> /proc/sys/net/dccp/default/tx_qlen.
>
In my implementation prio policy did honour DCCP_SOCKOPT_QPOLICY_TXQLEN
setting. At least it was meant to do so.
Having said "ok" here and there or nothing in other points I haven't tried
applying this patch nor tested your rework.
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-14 23:45 ` Tomasz Grobelny
@ 2008-04-15 15:14 ` Gerrit Renker
2008-04-15 15:21 ` Gerrit Renker
2008-04-15 19:38 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Tomasz Grobelny
0 siblings, 2 replies; 47+ messages in thread
From: Gerrit Renker @ 2008-04-15 15:14 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
| > * makes tx_qlen a socket member of dccp_sock (there was a bug in the
| > implementation -- tx_qlen was shared by all sockets using the same
| > policy);
| Haven't tested it thoroughly but it shouldn't be so. After all qpol_txqlen was
| part of struct dccp_qpolicy which was in turn part of dccp_sock.
|
I have to apologise here -- this was my oversight. The bug I assumed to be there
does not exist, since the tx_qlen was indeed part of the socket. Sorry.
| > +/*
| > + * Wrappers for methods provided by policy currently in use
| > + */
| > +void qpolicy_push(struct sock *sk, struct sk_buff *skb, struct msghdr
| > *msg) +{
| > + struct dccp_packet_info *dcp = NULL;
| > +
| > + if (msg->msg_control != NULL && msg->msg_controllen == sizeof(*dcp))
| > + dcp = (struct dccp_packet_info *)msg->msg_control;
| > +
| > + qpol_table[dccp_sk(sk)->dccps_qpolicy].push(sk, skb, dcp);
| > +}
| > +
| What happens if application is compiled with dccp_packet_info containing only
| one field (priority) and kernel containing two fields (for example priority
| and later added expiry time)? Wouldn't that implementation make extensions to
| dccp_packet_info impossible?
Had not looked at it from this point of view. So you are suggesting
if (msg->msg_control != NULL && msg->msg_controllen >= sizeof(*dcp))
dcp = (struct dccp_packet_info *)msg->msg_control;
instead? Agreed, point #1 to change in the patch.
|
| > @@ -501,6 +519,8 @@ struct dccp_sock {
| > struct ccid *dccps_hc_rx_ccid;
| > struct ccid *dccps_hc_tx_ccid;
| > struct dccp_options_received dccps_options_received;
| > + __u8 dccps_qpolicy;
| > + __u32 dccps_tx_qlen;
| > enum dccp_role dccps_role:2;
| > __u8 dccps_hc_rx_insert_options:1;
| > __u8 dccps_hc_tx_insert_options:1;
| I know that currently none of the policies has any per-socket data. But if it
| had were should it go?
|
I can't see anything wrong with putting everything into dccp_sock. To do it well,
we could consider inserting documentation such as "this section is only for
queueing policies" (as is done very well for struct tcp_sock).
I see several advantages of this:
* by just storing the u8 of the policy ID, the socket as a whole gets smaller,
* when not using nested structs, the elements can be optimised for minimal space,
* the internals of the function pointer table can be hidden,
* the socket carries the same information as is passed on via socket options,
which simplifies querying.
| > + case DCCP_SOCKOPT_QPOLICY_ID:
| > + if (sk->sk_state != DCCP_CLOSED)
| > + err = -EISCONN;
| What about DCCP_LISTENING state?
|
There is a problem with allowing this, considering for example:
* socket is created, and listen() is called,
* socket enters with default policy,
* now user decides to change policy from default ("simple") to "prio",
* before he can do that, the first packet arrives on the incoming
queue, policy is inherited to child socket,
* so this connection ends up with "simple" policy instead of "prio".
| > +DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum TX queue length. This is
| > relevant only for +those TX dequeueing policies that honour this upper
| > bound (currently only the +simple/default policy does this). This socket
| > option overrides the default value +of the sysctl
| > /proc/sys/net/dccp/default/tx_qlen.
| >
| In my implementation prio policy did honour DCCP_SOCKOPT_QPOLICY_TXQLEN
| setting. At least it was meant to do so.
|
The documentation certainly needs more work, I just sketched something
but it is not very good. So this is point #2 which needs to change.
Ideally, it would have a little description of what each policy does and
for which use case it is best suited.
Please take your time and check through the other parts of the patch as
well. I would like to accumulate the issues and then address each of the
points. And also, do some testing.I haven't had time to write test code for
prio policy -- is there something in your SVN repository?
Gerrit
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-15 15:14 ` Gerrit Renker
@ 2008-04-15 15:21 ` Gerrit Renker
2008-04-15 18:01 ` Tomasz Grobelny
2008-04-15 19:38 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Tomasz Grobelny
1 sibling, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-15 15:21 UTC (permalink / raw)
To: Tomasz Grobelny, Arnaldo Carvalho de Melo, dccp, netdev
| | What happens if application is compiled with dccp_packet_info containing only
| | one field (priority) and kernel containing two fields (for example priority
| | and later added expiry time)? Wouldn't that implementation make extensions to
| | dccp_packet_info impossible?
| Had not looked at it from this point of view. So you are suggesting
|
| if (msg->msg_control != NULL && msg->msg_controllen >= sizeof(*dcp))
| dcp = (struct dccp_packet_info *)msg->msg_control;
|
| instead? Agreed, point #1 to change in the patch.
|
No, the above is nonsense. If the application uses an earlier version of the API, then
it needs to be recompiled, there is no way it could pretend to have an adequate size.
So my take is that still the '==' is correct. But maybe this is a known problem?
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-15 15:21 ` Gerrit Renker
@ 2008-04-15 18:01 ` Tomasz Grobelny
2008-04-16 6:20 ` Gerrit Renker
0 siblings, 1 reply; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-15 18:01 UTC (permalink / raw)
To: Gerrit Renker, Arnaldo Carvalho de Melo, dccp, netdev
Dnia Tuesday 15 of April 2008, Gerrit Renker napisał:
> | | What happens if application is compiled with dccp_packet_info
> | | containing only one field (priority) and kernel containing two fields
> | | (for example priority and later added expiry time)? Wouldn't that
> | | implementation make extensions to dccp_packet_info impossible?
> |
> | Had not looked at it from this point of view. So you are suggesting
> |
> | if (msg->msg_control != NULL && msg->msg_controllen >= sizeof(*dcp))
> | dcp = (struct dccp_packet_info *)msg->msg_control;
> |
> | instead? Agreed, point #1 to change in the patch.
>
> No, the above is nonsense. If the application uses an earlier version of
> the API, then it needs to be recompiled, there is no way it could pretend
> to have an adequate size.
>
That would effectively stop development of any policy. And it's not about
pretending the structure has different size, it's just about using only the
data that is provided by the application (even though it may be incomplete).
> So my take is that still the '==' is correct. But maybe this is a known
> problem?
>
I can't see a problem with copying smaller struct to bigger one. It would work
like that:
1. set default values for in-kernel structure,
2. copy min(msg_control, sizeof(struct dccp_packet_info)) bytes from userspace
structure to the kernel one.
In that sense the code above is in fact wrong but the idea alone should be ok.
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-15 15:14 ` Gerrit Renker
2008-04-15 15:21 ` Gerrit Renker
@ 2008-04-15 19:38 ` Tomasz Grobelny
2008-04-15 20:04 ` Arnaldo Carvalho de Melo
` (2 more replies)
1 sibling, 3 replies; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-15 19:38 UTC (permalink / raw)
To: Gerrit Renker, Arnaldo Carvalho de Melo, dccp, netdev
Dnia Tuesday 15 of April 2008, Gerrit Renker napisał:
> | > @@ -501,6 +519,8 @@ struct dccp_sock {
> | > struct ccid *dccps_hc_rx_ccid;
> | > struct ccid *dccps_hc_tx_ccid;
> | > struct dccp_options_received dccps_options_received;
> | > + __u8 dccps_qpolicy;
> | > + __u32 dccps_tx_qlen;
> | > enum dccp_role dccps_role:2;
> | > __u8 dccps_hc_rx_insert_options:1;
> | > __u8 dccps_hc_tx_insert_options:1;
> |
> | I know that currently none of the policies has any per-socket data. But
> | if it had were should it go?
>
> I can't see anything wrong with putting everything into dccp_sock. To do it
> well, we could consider inserting documentation such as "this section is
> only for queueing policies" (as is done very well for struct tcp_sock).
>
Let me remind you your comment made on 18/03/2008 on dccp ml to my first
patch:
--- START ---
@@ -545,6 +549,8 @@ struct dccp_sock {
__u8 dccps_hc_tx_insert_options:1;
__u8 dccps_server_timewait:1;
struct timer_list dccps_xmit_timer;
+ struct queuing_policy *dccps_policy;
+ void *dccps_policy_data;
};
I think this should be just one field for the policy, and the
policy_data can be an internal field of `struct queueing_policy'
(compare with struct ackvec or struct ccid).
--- END ---
> I see several advantages of this:
> * by just storing the u8 of the policy ID, the socket as a whole gets
> smaller,
> * the internals of the function pointer table can be hidden,
>
That's ok with me.
> * when not using nested structs, the elements can be optimised for
> minimal space,
I have nothing against it. In fact that's what I did in the first try.
> | > + case DCCP_SOCKOPT_QPOLICY_ID:
> | > + if (sk->sk_state != DCCP_CLOSED)
> | > + err = -EISCONN;
> |
> | What about DCCP_LISTENING state?
>
> There is a problem with allowing this, considering for example:
> (...)
Ok, you are right. If before DCCP_LISTENING there is always DCCP_CLOSED then
there is no need to allow changing qpolicy in DCCP_LISTENING state.
> Please take your time and check through the other parts of the patch as
> well. I would like to accumulate the issues and then address each of the
If I didn't mention some parts of the patch you can assume that I'm ok with
them.
> points. And also, do some testing.I haven't had time to write test code for
> prio policy -- is there something in your SVN repository?
>
Have a look at http://dccp.one.pl/svn/userspace/test/ And I'll also do some
testing of your patch.
One more question (probably more for Arnaldo): is there any timeframe to push
changes from test tree upstream?
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-15 19:38 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Tomasz Grobelny
@ 2008-04-15 20:04 ` Arnaldo Carvalho de Melo
2008-04-17 20:20 ` Tomasz Grobelny
2008-04-15 20:14 ` inconsistent lock state with kernel 2.6.24.4 Bernard Pidoux
2008-04-16 7:43 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Gerrit Renker
2 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2008-04-15 20:04 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Gerrit Renker, Arnaldo Carvalho de Melo, dccp, netdev
Em Tue, Apr 15, 2008 at 09:38:40PM +0200, Tomasz Grobelny escreveu:
> Dnia Tuesday 15 of April 2008, Gerrit Renker napisał:
> > | > @@ -501,6 +519,8 @@ struct dccp_sock {
> > | > struct ccid *dccps_hc_rx_ccid;
> > | > struct ccid *dccps_hc_tx_ccid;
> > | > struct dccp_options_received dccps_options_received;
> > | > + __u8 dccps_qpolicy;
> > | > + __u32 dccps_tx_qlen;
> > | > enum dccp_role dccps_role:2;
> > | > __u8 dccps_hc_rx_insert_options:1;
> > | > __u8 dccps_hc_tx_insert_options:1;
> > |
> > | I know that currently none of the policies has any per-socket data. But
> > | if it had were should it go?
> >
> > I can't see anything wrong with putting everything into dccp_sock. To do it
> > well, we could consider inserting documentation such as "this section is
> > only for queueing policies" (as is done very well for struct tcp_sock).
> >
> Let me remind you your comment made on 18/03/2008 on dccp ml to my first
> patch:
> --- START ---
> @@ -545,6 +549,8 @@ struct dccp_sock {
> __u8 dccps_hc_tx_insert_options:1;
> __u8 dccps_server_timewait:1;
> struct timer_list dccps_xmit_timer;
> + struct queuing_policy *dccps_policy;
> + void *dccps_policy_data;
> };
>
> I think this should be just one field for the policy, and the
> policy_data can be an internal field of `struct queueing_policy'
> (compare with struct ackvec or struct ccid).
> --- END ---
>
> > I see several advantages of this:
> > * by just storing the u8 of the policy ID, the socket as a whole gets
> > smaller,
> > * the internals of the function pointer table can be hidden,
> >
> That's ok with me.
>
> > * when not using nested structs, the elements can be optimised for
> > minimal space,
> I have nothing against it. In fact that's what I did in the first try.
>
> > | > + case DCCP_SOCKOPT_QPOLICY_ID:
> > | > + if (sk->sk_state != DCCP_CLOSED)
> > | > + err = -EISCONN;
> > |
> > | What about DCCP_LISTENING state?
> >
> > There is a problem with allowing this, considering for example:
> > (...)
> Ok, you are right. If before DCCP_LISTENING there is always DCCP_CLOSED then
> there is no need to allow changing qpolicy in DCCP_LISTENING state.
>
> > Please take your time and check through the other parts of the patch as
> > well. I would like to accumulate the issues and then address each of the
> If I didn't mention some parts of the patch you can assume that I'm ok with
> them.
>
> > points. And also, do some testing.I haven't had time to write test code for
> > prio policy -- is there something in your SVN repository?
> >
> Have a look at http://dccp.one.pl/svn/userspace/test/ And I'll also do some
> testing of your patch.
>
> One more question (probably more for Arnaldo): is there any timeframe to push
> changes from test tree upstream?
Not at this moment, I'm not managing to dedicate time for another round
of reviewing of what is in Gerrit's tree :-(
- Arnaldo
^ permalink raw reply [flat|nested] 47+ messages in thread
* inconsistent lock state with kernel 2.6.24.4
2008-04-15 19:38 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Tomasz Grobelny
2008-04-15 20:04 ` Arnaldo Carvalho de Melo
@ 2008-04-15 20:14 ` Bernard Pidoux
2008-04-16 7:43 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Gerrit Renker
2 siblings, 0 replies; 47+ messages in thread
From: Bernard Pidoux @ 2008-04-15 20:14 UTC (permalink / raw)
To: Linux Netdev List
Hi,
Here is an inconsistent lock state detected with 2.6.24.4 kernel.
I am not sure if it should be reported here or to kernel list ?
=================================
[ INFO: inconsistent lock state ]
2.6.24.4 #7
---------------------------------
inconsistent {in-softirq-W} -> {softirq-on-W} usage.
listen/3831 [HC0[0]:SC0[0]:HE1:SE1] takes:
(_xmit_ETHER){-+..}, at: [<c0262cf6>] netpoll_send_skb+0x116/0x170
{in-softirq-W} state was registered at:
[<c0139992>] __lock_acquire+0x402/0x11d0
[<c013a7e3>] lock_acquire+0x83/0xa0
[<c02badc3>] _spin_lock+0x33/0x60
[<c0265988>] __qdisc_run+0x128/0x1a0
[<c0257c6c>] dev_queue_xmit+0x25c/0x330
[<c8c24a5f>] mld_sendpack+0x2cf/0x2f0 [ipv6]
[<c8c25beb>] mld_ifc_timer_expire+0x17b/0x260 [ipv6]
[<c0120f69>] run_timer_softirq+0x149/0x1b0
[<c011d5c5>] __do_softirq+0x55/0xc0
[<c011d677>] do_softirq+0x47/0x50
[<c011dadc>] irq_exit+0x6c/0x80
[<c010663f>] do_IRQ+0x4f/0xa0
[<c0104bc2>] common_interrupt+0x2e/0x34
[<ffffffff>] 0xffffffff
irq event stamp: 6133329
hardirqs last enabled at (6133329): [<c02bb3e7>]
_spin_unlock_irqrestore+0x47/0
x60
hardirqs last disabled at (6133328): [<c02bb269>]
_spin_lock_irqsave+0x19/0x70
softirqs last enabled at (6133322): [<c011d613>] __do_softirq+0xa3/0xc0
softirqs last disabled at (6133291): [<c011d677>] do_softirq+0x47/0x50
other info that might help us debug this:
2 locks held by listen/3831:
#0: (&tty->atomic_write_lock){--..}, at: [<c0208b5c>]
tty_write_lock+0x1c/0x50
#1: (target_list_lock){--..}, at: [<c8abe10d>] write_msg+0x2d/0xf0
[netconsole
]
stack backtrace:
Pid: 3831, comm: listen Not tainted 2.6.24.4 #7
[<c010513a>] show_trace_log_lvl+0x1a/0x30
[<c0105b52>] show_trace+0x12/0x20
[<c01064ac>] dump_stack+0x6c/0x80
[<c01379c6>] print_usage_bug+0x156/0x160
[<c0138ceb>] mark_lock+0x49b/0x630
[<c01399d8>] __lock_acquire+0x448/0x11d0
[<c013a7e3>] lock_acquire+0x83/0xa0
[<c02bab1a>] __lock_text_start+0x3a/0x50
[<c0262cf6>] netpoll_send_skb+0x116/0x170
[<c0263a7c>] netpoll_send_udp+0x20c/0x280
[<c8abe18c>] write_msg+0xac/0xf0 [netconsole]
[<c01187f7>] __call_console_drivers+0x47/0x60
[<c0118889>] _call_console_drivers+0x79/0x90
[<c0118c27>] release_console_sem+0xd7/0x1f0
[<c02173c1>] do_con_write+0x161/0x1a20
[<c0218cc4>] con_write+0x14/0x30
[<c020b5c3>] write_chan+0x183/0x330
[<c0208d26>] tty_write+0x126/0x1b0
[<c017298f>] vfs_write+0xaf/0x120
[<c0172edd>] sys_write+0x3d/0x70
[<c010412a>] sysenter_past_esp+0x5f/0xa5
=======================
Clocksource tsc unstable (delta = 88012255 ns)
Time: pit clocksource has been installed.
Bernard Pidoux
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-15 18:01 ` Tomasz Grobelny
@ 2008-04-16 6:20 ` Gerrit Renker
2008-04-16 8:36 ` [DCCP] [RFC] [Patchv3 " Gerrit Renker
2008-04-17 20:03 ` [DCCP] [RFC] [Patchv2 " Tomasz Grobelny
0 siblings, 2 replies; 47+ messages in thread
From: Gerrit Renker @ 2008-04-16 6:20 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
| > | | What happens if application is compiled with dccp_packet_info
| > | | containing only one field (priority) and kernel containing two fields
| > | | (for example priority and later added expiry time)? Wouldn't that
| > | | implementation make extensions to dccp_packet_info impossible?
| > |
| > | Had not looked at it from this point of view. So you are suggesting
| > |
| > | if (msg->msg_control != NULL && msg->msg_controllen >= sizeof(*dcp))
| > | dcp = (struct dccp_packet_info *)msg->msg_control;
| > |
| > | instead? Agreed, point #1 to change in the patch.
| >
| > No, the above is nonsense. If the application uses an earlier version of
| > the API, then it needs to be recompiled, there is no way it could pretend
| > to have an adequate size.
| >
| That would effectively stop development of any policy. And it's not about
| pretending the structure has different size, it's just about using only the
| data that is provided by the application (even though it may be incomplete).
|
| > So my take is that still the '==' is correct. But maybe this is a known
| > problem?
| >
| I can't see a problem with copying smaller struct to bigger one. It would work
| like that:
| 1. set default values for in-kernel structure,
| 2. copy min(msg_control, sizeof(struct dccp_packet_info)) bytes from userspace
| structure to the kernel one.
|
| In that sense the code above is in fact wrong but the idea alone should be ok.
| --
There is a more serious problem here: apparently no one tried to compile
the `qpolicy' subtree since the changes on Monday. It fails with the
BUILD_BUG_ON.
I tried yesterday evening and found that there is not even a possibility
of adding a single field to struct dccp_skb_cb: with the addition of the
inet{,6}_skb_parm, the struct has reached its maximum size of 48 bytes
(try printk("%u", sizeof(struct dccp_skb_cb));).
There is a solution to this, will post a revised patch shortly.
With regard to your point above: I think there is a way of keeping the
framework open without bending over backwards to ensure the binary
compatibility.
My question for you in this regard: which policy can you think of whose
priorities/preferences can not be mapped into an unsigned 32-bit
integer? I think that such a number is sufficiently expressive enough
to cater for a wide range of policies. It can be interpreted as
* symbolic value (enum)
* ascii string (similar to DCCP service code)
* millisecond timeout (with room for up to 7 weeks)
* microsecond timeout (with room for up to 1.19 hours)
* priority values (the large range 0..2^32-1 can be partitioned)
* ... ?
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-15 19:38 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Tomasz Grobelny
2008-04-15 20:04 ` Arnaldo Carvalho de Melo
2008-04-15 20:14 ` inconsistent lock state with kernel 2.6.24.4 Bernard Pidoux
@ 2008-04-16 7:43 ` Gerrit Renker
2008-04-17 18:03 ` Tomasz Grobelny
2 siblings, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-16 7:43 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
Quoting Tomasz Grobelny:
| Dnia Tuesday 15 of April 2008, Gerrit Renker napisa?:
| > | > @@ -501,6 +519,8 @@ struct dccp_sock {
| > | > struct ccid *dccps_hc_rx_ccid;
| > | > struct ccid *dccps_hc_tx_ccid;
| > | > struct dccp_options_received dccps_options_received;
| > | > + __u8 dccps_qpolicy;
| > | > + __u32 dccps_tx_qlen;
| > | > enum dccp_role dccps_role:2;
| > | > __u8 dccps_hc_rx_insert_options:1;
| > | > __u8 dccps_hc_tx_insert_options:1;
| > |
| > | I know that currently none of the policies has any per-socket data. But
| > | if it had were should it go?
| >
| > I can't see anything wrong with putting everything into dccp_sock. To do it
| > well, we could consider inserting documentation such as "this section is
| > only for queueing policies" (as is done very well for struct tcp_sock).
| >
| Let me remind you your comment made on 18/03/2008 on dccp ml to my first
| patch:
| --- START ---
| @@ -545,6 +549,8 @@ struct dccp_sock {
| __u8 dccps_hc_tx_insert_options:1;
| __u8 dccps_server_timewait:1;
| struct timer_list dccps_xmit_timer;
| + struct queuing_policy *dccps_policy;
| + void *dccps_policy_data;
| };
|
| I think this should be just one field for the policy, and the
| policy_data can be an internal field of `struct queueing_policy'
| (compare with struct ackvec or struct ccid).
| --- END ---
|
Hm, even after reading it again I still find that I don't like void* fields. It
may be a personal thing, but I think using void* as part of a field is bad (and
this was in an even earlier comment).
I think I understand your approach better now, after going through the
patches again.
| Ok, you are right. If before DCCP_LISTENING there is always DCCP_CLOSED then
| there is no need to allow changing qpolicy in DCCP_LISTENING state.
|
Yes, that's right -- the initial state is always DCCP_CLOSED, set in
dccp_init_sock().
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [DCCP] [RFC] [Patchv3 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-16 6:20 ` Gerrit Renker
@ 2008-04-16 8:36 ` Gerrit Renker
2008-04-17 20:03 ` [DCCP] [RFC] [Patchv2 " Tomasz Grobelny
1 sibling, 0 replies; 47+ messages in thread
From: Gerrit Renker @ 2008-04-16 8:36 UTC (permalink / raw)
To: Tomasz Grobelny, Arnaldo Carvalho de Melo, dccp, netdev
This is a reworked version of the patch set, tackling in particular:
With the recent addition of the inet{,6}_skb_parm to struct dccp_skb_cb there
is no more room left for any extensions -- the size of the struct now has
already reached its maximum of 48 bytes.
To support the queueing policy patches developed by Tomasz, this patch therefore
introduces a different concept - to reuse the @priority field of struct sk_buff.
This is possible since all operations are output-only and since skb->priority is
set from sk_priority in net/ipv4/ip_output.c and net/ipv6/ip6_output.c.
sk_priority itself is set from the SO_PRIORITY value in net/core/sock.ci, so that
until the skb is forwarded to the IP layer there is no conflict.
This patch set uses the skb->priority field for DCCP transport-layer queueing
priorities. It does not clear the priority field when passing the skb on to
the IP layer; maybe this should be added.
Other changes:
--------------
1) the "prio" policy previously used the ordering "lowest priority is best",
whereas the SO_PRIORITY setting underlying sk_priority uses "highest
priority is best". To make the socket API a bit more intuitive, this
patch reuses the SO_PRIORITY semantics for the "prio" policy
-- as remarked in the documentation.
2) Consolidated the error checking for push() by
* passing a struct msghdr* instead of its two control member elements,
* setting the skb->priority only if the msg_controllen matches exactly the
sizeof(u32),
* otherwise, the priority is set to 0 (implies lowest possible priority).
3) Made naming scheme consistent (dccp_qpolicy_xxx for externally visible
functions and qpolicy_<policyname>_xxx for internal functions).
4) Added a drop function - I think the simple_push() has become quite good now.
So far compile-tested only.
Gerrit
---
Documentation/networking/dccp.txt | 15 ++++
include/linux/dccp.h | 13 +++
net/dccp/Makefile | 2
net/dccp/dccp.h | 13 +++
net/dccp/output.c | 7 -
net/dccp/proto.c | 26 ++++++-
net/dccp/qpolicy.c | 135 ++++++++++++++++++++++++++++++++++++++
7 files changed, 203 insertions(+), 8 deletions(-)
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -45,6 +45,21 @@ http://linux-net.osdl.org/index.php/DCCP
Socket options
==============
+DCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes
+a policy ID as argument and can only be set before the connection (i.e. changes
+during an established connection are not supported). Currently, two policies are
+defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special,
+and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an
+u32 priority value as ancillary data to sendmsg(), where higher numbers indicate
+a higher packet priority (similar to SO_PRIORITY).
+
+DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero
+value is always interpreted as unbounded queue length. If different from zero,
+the interpretation of this parameter depends on the current dequeuing policy
+(see above): the "simple" policy will enforce a fixed queue size by returning
+EAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the
+lowest-priority packet first. The default value for this parameter is
+initialised from /proc/sys/net/dccp/default/tx_qlen.
DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
service codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -195,6 +195,13 @@ enum dccp_feature_numbers {
DCCPF_MAX_CCID_SPECIFIC = 255,
};
+/* DCCP priorities for outgoing/queued packets */
+enum dccp_packet_dequeueing_policy {
+ DCCPQ_POLICY_SIMPLE,
+ DCCPQ_POLICY_PRIO,
+ DCCPQ_POLICY_MAX
+};
+
/* DCCP socket options */
#define DCCP_SOCKOPT_PACKET_SIZE 1 /* XXX deprecated, without effect */
#define DCCP_SOCKOPT_SERVICE 2
@@ -208,6 +215,8 @@ enum dccp_feature_numbers {
#define DCCP_SOCKOPT_CCID 13
#define DCCP_SOCKOPT_TX_CCID 14
#define DCCP_SOCKOPT_RX_CCID 15
+#define DCCP_SOCKOPT_QPOLICY_ID 16
+#define DCCP_SOCKOPT_QPOLICY_TXQLEN 17
#define DCCP_SOCKOPT_CCID_RX_INFO 128
#define DCCP_SOCKOPT_CCID_TX_INFO 192
@@ -459,6 +468,8 @@ struct dccp_ackvec;
* @dccps_hc_rx_ccid - CCID used for the receiver (or receiving half-connection)
* @dccps_hc_tx_ccid - CCID used for the sender (or sending half-connection)
* @dccps_options_received - parsed set of retrieved options
+ * @dccps_qpolicy - TX dequeueing policy, one of %dccp_packet_dequeueing_policy
+ * @dccps_tx_qlen - maximum length of the TX queue
* @dccps_role - role of this sock, one of %dccp_role
* @dccps_hc_rx_insert_options - receiver wants to add options when acking
* @dccps_hc_tx_insert_options - sender wants to add options when sending
@@ -501,6 +512,8 @@ struct dccp_sock {
struct ccid *dccps_hc_rx_ccid;
struct ccid *dccps_hc_tx_ccid;
struct dccp_options_received dccps_options_received;
+ __u8 dccps_qpolicy;
+ __u32 dccps_tx_qlen;
enum dccp_role dccps_role:2;
__u8 dccps_hc_rx_insert_options:1;
__u8 dccps_hc_tx_insert_options:1;
--- /dev/null
+++ b/net/dccp/qpolicy.c
@@ -0,0 +1,135 @@
+/*
+ * net/dccp/qpolicy.c
+ *
+ * An implementation of the DCCP protocol
+ *
+ * Copyright (c) 2008 Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License v2
+ * as published by the Free Software Foundation.
+ */
+#include <asm/unaligned.h>
+#include "dccp.h"
+
+/*
+ * Simple Dequeueing Policy:
+ * If tx_qlen is different from 0, enqueue up to tx_qlen elements.
+ */
+static void qpolicy_simple_push(struct sock *sk, struct sk_buff *skb)
+{
+ skb_queue_tail(&sk->sk_write_queue, skb);
+}
+
+static bool qpolicy_simple_full(struct sock *sk)
+{
+ return dccp_sk(sk)->dccps_tx_qlen &&
+ sk->sk_write_queue.qlen >= dccp_sk(sk)->dccps_tx_qlen;
+}
+
+static struct sk_buff *qpolicy_simple_top(struct sock *sk)
+{
+ return skb_peek(&sk->sk_write_queue);
+}
+
+/*
+ * Priority-based Dequeueing Policy:
+ * If tx_qlen is different from 0 and the queue has reached its upper bound
+ * of tx_qlen elements, replace older packets lowest-priority-first.
+ */
+static struct sk_buff *qpolicy_prio_best_skb(struct sock *sk)
+{
+ struct sk_buff *skb, *best = NULL;
+
+ skb_queue_walk(&sk->sk_write_queue, skb)
+ if (best == NULL || skb->priority > best->priority)
+ best = skb;
+ return best;
+}
+
+static struct sk_buff *qpolicy_prio_worst_skb(struct sock *sk)
+{
+ struct sk_buff *skb, *worst = NULL;
+
+ skb_queue_walk(&sk->sk_write_queue, skb)
+ if (worst == NULL || skb->priority < worst->priority)
+ worst = skb;
+ return worst;
+}
+
+static void qpolicy_prio_push(struct sock *sk, struct sk_buff *skb)
+{
+ qpolicy_simple_push(sk, skb);
+ if (qpolicy_simple_full(sk))
+ dccp_qpolicy_drop(sk, qpolicy_prio_worst_skb(sk));
+}
+
+/* we can always push a packet into the queue => queue is never full */
+static bool qpolicy_prio_full(struct sock *sk)
+{
+ return false;
+}
+
+/**
+ * struct dccp_qpolicy_operations - TX Packet Dequeueing Interface
+ * @push: add a new @skb with possibly a struct dccp_packet_info
+ * @full: indicates that no more packets will be admitted
+ * @top: peeks at whatever the queueing policy defines as top
+ */
+static struct dccp_qpolicy_operations {
+ void (*push) (struct sock *sk, struct sk_buff *skb);
+ bool (*full) (struct sock *sk);
+ struct sk_buff* (*top) (struct sock *sk);
+
+} qpol_table[DCCPQ_POLICY_MAX] = {
+ [DCCPQ_POLICY_SIMPLE] = {
+ .push = qpolicy_simple_push,
+ .full = qpolicy_simple_full,
+ .top = qpolicy_simple_top,
+ },
+ [DCCPQ_POLICY_PRIO] = {
+ .push = qpolicy_prio_push,
+ .full = qpolicy_prio_full,
+ .top = qpolicy_prio_best_skb,
+ },
+};
+
+/*
+ * Externally visible interface
+ */
+void dccp_qpolicy_push(struct sock *sk, struct sk_buff *skb, struct msghdr *msg)
+{
+ if (msg->msg_control == NULL || msg->msg_controllen != sizeof(__u32))
+ skb->priority = 0; /* implies lowest-possible priority */
+ else
+ skb->priority = get_unaligned((__u32 *)msg->msg_control);
+
+ qpol_table[dccp_sk(sk)->dccps_qpolicy].push(sk, skb);
+}
+
+bool dccp_qpolicy_full(struct sock *sk)
+{
+ return qpol_table[dccp_sk(sk)->dccps_qpolicy].full(sk);
+}
+
+void dccp_qpolicy_drop(struct sock *sk, struct sk_buff *skb)
+{
+ if (skb != NULL) {
+ skb_unlink(skb, &sk->sk_write_queue);
+ kfree(skb);
+ }
+}
+
+struct sk_buff *dccp_qpolicy_top(struct sock *sk)
+{
+ return qpol_table[dccp_sk(sk)->dccps_qpolicy].top(sk);
+}
+
+struct sk_buff *dccp_qpolicy_pop(struct sock *sk)
+{
+ struct sk_buff *skb = dccp_qpolicy_top(sk);
+
+ if (skb)
+ skb_unlink(skb, &sk->sk_write_queue);
+ return skb;
+}
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -215,6 +215,19 @@ extern void dccp_reqsk_send_ack(struct s
extern void dccp_send_sync(struct sock *sk, const u64 seq,
const enum dccp_pkt_type pkt_type);
+/*
+ * TX Packet Dequeueing Interface
+ */
+extern void dccp_qpolicy_push(struct sock *sk, struct sk_buff *skb,
+ struct msghdr *msg);
+extern bool dccp_qpolicy_full(struct sock *sk);
+extern void dccp_qpolicy_drop(struct sock *sk, struct sk_buff *skb);
+extern struct sk_buff *dccp_qpolicy_top(struct sock *sk);
+extern struct sk_buff *dccp_qpolicy_pop(struct sock *sk);
+
+/*
+ * TX Packet Output and TX Timers
+ */
extern void dccp_write_xmit(struct sock *sk);
extern void dccp_write_space(struct sock *sk);
extern void dccp_flush_write_queue(struct sock *sk, long *time_budget);
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -1,7 +1,7 @@
obj-$(CONFIG_IP_DCCP) += dccp.o dccp_ipv4.o
dccp-y := ccid.o feat.o input.o minisocks.o options.o \
- output.o proto.o timer.o ackvec.o
+ qpolicy.o output.o proto.o timer.o ackvec.o
dccp_ipv4-y := ipv4.o
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -238,7 +238,7 @@ static void dccp_xmit_packet(struct sock
{
int err, len;
struct dccp_sock *dp = dccp_sk(sk);
- struct sk_buff *skb = skb_dequeue(&sk->sk_write_queue);
+ struct sk_buff *skb = dccp_qpolicy_pop(sk);
if (unlikely(skb == NULL))
return;
@@ -328,7 +328,7 @@ void dccp_write_xmit(struct sock *sk)
struct dccp_sock *dp = dccp_sk(sk);
struct sk_buff *skb;
- while ((skb = skb_peek(&sk->sk_write_queue))) {
+ while ((skb = dccp_qpolicy_top(sk))) {
int rc = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb);
switch (ccid_packet_dequeue_eval(rc)) {
@@ -342,8 +342,7 @@ void dccp_write_xmit(struct sock *sk)
dccp_xmit_packet(sk);
break;
case CCID_PACKET_ERR:
- skb_dequeue(&sk->sk_write_queue);
- kfree_skb(skb);
+ dccp_qpolicy_drop(sk, skb);
dccp_pr_debug("packet discarded due to err=%d\n", rc);
}
}
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -187,6 +187,7 @@ int dccp_init_sock(struct sock *sk, cons
dp->dccps_rate_last = jiffies;
dp->dccps_role = DCCP_ROLE_UNDEFINED;
dp->dccps_service = DCCP_SERVICE_CODE_IS_ABSENT;
+ dp->dccps_tx_qlen = sysctl_dccp_tx_qlen;
INIT_LIST_HEAD(&dp->dccps_featneg);
/* control socket doesn't need feat nego */
@@ -540,6 +541,20 @@ static int do_dccp_setsockopt(struct soc
case DCCP_SOCKOPT_RECV_CSCOV:
err = dccp_setsockopt_cscov(sk, val, true);
break;
+ case DCCP_SOCKOPT_QPOLICY_ID:
+ if (sk->sk_state != DCCP_CLOSED)
+ err = -EISCONN;
+ else if (val < 0 || val >= DCCPQ_POLICY_MAX)
+ err = -EINVAL;
+ else
+ dp->dccps_qpolicy = val;
+ break;
+ case DCCP_SOCKOPT_QPOLICY_TXQLEN:
+ if (val < 0)
+ err = -EINVAL;
+ else
+ dp->dccps_tx_qlen = val;
+ break;
default:
err = -ENOPROTOOPT;
break;
@@ -643,6 +658,12 @@ static int do_dccp_getsockopt(struct soc
case DCCP_SOCKOPT_RECV_CSCOV:
val = dp->dccps_pcrlen;
break;
+ case DCCP_SOCKOPT_QPOLICY_ID:
+ val = dp->dccps_qpolicy;
+ break;
+ case DCCP_SOCKOPT_QPOLICY_TXQLEN:
+ val = dp->dccps_tx_qlen;
+ break;
case 128 ... 191:
return ccid_hc_rx_getsockopt(dp->dccps_hc_rx_ccid, sk, optname,
len, (u32 __user *)optval, optlen);
@@ -700,8 +721,7 @@ int dccp_sendmsg(struct kiocb *iocb, str
lock_sock(sk);
- if (sysctl_dccp_tx_qlen &&
- (sk->sk_write_queue.qlen >= sysctl_dccp_tx_qlen)) {
+ if (dccp_qpolicy_full(sk)) {
rc = -EAGAIN;
goto out_release;
}
@@ -729,7 +749,7 @@ int dccp_sendmsg(struct kiocb *iocb, str
if (rc != 0)
goto out_discard;
- skb_queue_tail(&sk->sk_write_queue, skb);
+ dccp_qpolicy_push(sk, skb, msg);
dccp_write_xmit(sk);
out_release:
release_sock(sk);
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-16 7:43 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Gerrit Renker
@ 2008-04-17 18:03 ` Tomasz Grobelny
2008-04-17 18:29 ` Gerrit Renker
0 siblings, 1 reply; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-17 18:03 UTC (permalink / raw)
To: Gerrit Renker, Arnaldo Carvalho de Melo, dccp, netdev
Dnia Wednesday 16 of April 2008, Gerrit Renker napisał:
> Quoting Tomasz Grobelny:
> | Dnia Tuesday 15 of April 2008, Gerrit Renker napisa?:
> | > | > @@ -501,6 +519,8 @@ struct dccp_sock {
> | > | > struct ccid *dccps_hc_rx_ccid;
> | > | > struct ccid *dccps_hc_tx_ccid;
> | > | > struct dccp_options_received dccps_options_received;
> | > | > + __u8 dccps_qpolicy;
> | > | > + __u32 dccps_tx_qlen;
> | > | > enum dccp_role dccps_role:2;
> | > | > __u8 dccps_hc_rx_insert_options:1;
> | > | > __u8 dccps_hc_tx_insert_options:1;
> | > |
> | > | I know that currently none of the policies has any per-socket data.
> | > | But if it had were should it go?
> | >
> | > I can't see anything wrong with putting everything into dccp_sock. To
> | > do it well, we could consider inserting documentation such as "this
> | > section is only for queueing policies" (as is done very well for struct
> | > tcp_sock).
> |
> | Let me remind you your comment made on 18/03/2008 on dccp ml to my first
> | patch:
> | --- START ---
> | @@ -545,6 +549,8 @@ struct dccp_sock {
> | __u8 dccps_hc_tx_insert_options:1;
> | __u8 dccps_server_timewait:1;
> | struct timer_list dccps_xmit_timer;
> | + struct queuing_policy *dccps_policy;
> | + void *dccps_policy_data;
> | };
> |
> | I think this should be just one field for the policy, and the
> | policy_data can be an internal field of `struct queueing_policy'
> | (compare with struct ackvec or struct ccid).
> | --- END ---
>
> Hm, even after reading it again I still find that I don't like void*
> fields. It may be a personal thing, but I think using void* as part of a
> field is bad (and this was in an even earlier comment).
>
The next paragraph was in fact about void* pointers. But the paragraph I
quoted above talks only about whether those three values (policy
numer/pointer, tx_qlen and possibly other data) should be put directly in
struct dccp_sock or grouped in stuct queueing_policy which in turn should be
one field in struct dccp_sock. In the mail from 18/03/2008 you seemed to be
in favour of grouping, in the one from 15/04/2008 you seemed to contradict
your earlier statement. At least that's how I understood it.
But never mind, both ways are ok for me.
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-17 18:03 ` Tomasz Grobelny
@ 2008-04-17 18:29 ` Gerrit Renker
0 siblings, 0 replies; 47+ messages in thread
From: Gerrit Renker @ 2008-04-17 18:29 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
| > | @@ -545,6 +549,8 @@ struct dccp_sock {
| > | __u8 dccps_hc_tx_insert_options:1;
| > | __u8 dccps_server_timewait:1;
| > | struct timer_list dccps_xmit_timer;
| > | + struct queuing_policy *dccps_policy;
| > | + void *dccps_policy_data;
| > | };
| > |
| > | I think this should be just one field for the policy, and the
| > | policy_data can be an internal field of `struct queueing_policy'
| > | (compare with struct ackvec or struct ccid).
| > | --- END ---
| >
| > Hm, even after reading it again I still find that I don't like void*
| > fields. It may be a personal thing, but I think using void* as part of a
| > field is bad (and this was in an even earlier comment).
| >
| The next paragraph was in fact about void* pointers. But the paragraph I
| quoted above talks only about whether those three values (policy
| numer/pointer, tx_qlen and possibly other data) should be put directly in
| struct dccp_sock or grouped in stuct queueing_policy which in turn should be
| one field in struct dccp_sock. In the mail from 18/03/2008 you seemed to be
| in favour of grouping, in the one from 15/04/2008 you seemed to contradict
| your earlier statement. At least that's how I understood it.
| But never mind, both ways are ok for me.
| --
Those things only became clearer when looking at the code for a while. I
have reworked some of your userland code and done some tests with it.
There was a bug with the 32-bit compatibility layer, I have posted a
message to netdev. On x86 it was found to work correctly.
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-16 6:20 ` Gerrit Renker
2008-04-16 8:36 ` [DCCP] [RFC] [Patchv3 " Gerrit Renker
@ 2008-04-17 20:03 ` Tomasz Grobelny
2008-04-18 10:13 ` Gerrit Renker
1 sibling, 1 reply; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-17 20:03 UTC (permalink / raw)
To: Gerrit Renker, Arnaldo Carvalho de Melo, dccp, netdev
Dnia Wednesday 16 of April 2008, Gerrit Renker napisał:
> | > | | What happens if application is compiled with dccp_packet_info
> | > | | containing only one field (priority) and kernel containing two
> | > | | fields (for example priority and later added expiry time)? Wouldn't
> | > | | that implementation make extensions to dccp_packet_info impossible?
> | > |
> | > | Had not looked at it from this point of view. So you are suggesting
> | > |
> | > | if (msg->msg_control != NULL && msg->msg_controllen >= sizeof(*dcp))
> | > | dcp = (struct dccp_packet_info *)msg->msg_control;
> | > |
> | > | instead? Agreed, point #1 to change in the patch.
> | >
> | > No, the above is nonsense. If the application uses an earlier version
> | > of the API, then it needs to be recompiled, there is no way it could
> | > pretend to have an adequate size.
> |
> | That would effectively stop development of any policy. And it's not about
> | pretending the structure has different size, it's just about using only
> | the data that is provided by the application (even though it may be
> | incomplete).
> |
> | > So my take is that still the '==' is correct. But maybe this is a known
> | > problem?
> |
> | I can't see a problem with copying smaller struct to bigger one. It would
> | work like that:
> | 1. set default values for in-kernel structure,
> | 2. copy min(msg_control, sizeof(struct dccp_packet_info)) bytes from
> | userspace structure to the kernel one.
> |
> | In that sense the code above is in fact wrong but the idea alone should
> | be ok. --
>
> There is a more serious problem here: apparently no one tried to compile
> the `qpolicy' subtree since the changes on Monday. It fails with the
> BUILD_BUG_ON.
>
> I tried yesterday evening and found that there is not even a possibility
> of adding a single field to struct dccp_skb_cb: with the addition of the
> inet{,6}_skb_parm, the struct has reached its maximum size of 48 bytes
> (try printk("%u", sizeof(struct dccp_skb_cb));).
>
> There is a solution to this, will post a revised patch shortly.
>
What you proposed in the patch should work ok for qpolicies for now. But
sooner or later the need to add a field to struct dccp_skb_cb will arise. So
maybe we should think about it now...
Other possibility apart from what you proposed in the patch is creating struct
dccp_skb_cb_ext, move to it some fields from struct dccp_skb_cb and in struct
add a pointer to this new struct dccp_skb_cb_ext. Ok, maybe it is not the
prettiest idea but in case somebody needs yet another additional field in
struct dccp_skb_cb_ext it would be nice to have an idea how to do it.
> With regard to your point above: I think there is a way of keeping the
> framework open without bending over backwards to ensure the binary
> compatibility.
>
While we are developing in the test tree compatibility is not important at
all. But real world needs compatibility, especially if it's not that
expensive. Otherwise people will get weird behaviour without any messages
indicating what is wrong. And it's certainly not how code should be written.
It can't be that binary package of VLC (or any other streaming server) cannot
use newer kernel version because we added a new field.
> My question for you in this regard: which policy can you think of whose
> priorities/preferences can not be mapped into an unsigned 32-bit
> integer? I think that such a number is sufficiently expressive enough
> to cater for a wide range of policies. It can be interpreted as
> * symbolic value (enum)
> * ascii string (similar to DCCP service code)
> * millisecond timeout (with room for up to 7 weeks)
> * microsecond timeout (with room for up to 1.19 hours)
> * priority values (the large range 0..2^32-1 can be partitioned)
> * ... ?
>
Putting any of those values in an integer is pretty straightforward. But when
you want to put both timeout and priority things get messy. Would it be
possible at least to use structure like that:
struct dccp_packet_info
{
s8 priority;
u16 timeout_mantissa:10;
u16 timeout_exponent:6;
}
? That would still fit in 32-bit integer (so could be stored in
skb->priority), but would provide a much cleaner interface.
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-15 20:04 ` Arnaldo Carvalho de Melo
@ 2008-04-17 20:20 ` Tomasz Grobelny
0 siblings, 0 replies; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-17 20:20 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Gerrit Renker, dccp, netdev
Dnia Tuesday 15 of April 2008, Arnaldo Carvalho de Melo napisał:
> > One more question (probably more for Arnaldo): is there any timeframe to
> > push changes from test tree upstream?
>
> Not at this moment, I'm not managing to dedicate time for another round
> of reviewing of what is in Gerrit's tree :-(
>
That's quite unfortunate because pushing patches that make use of new kernel
features to other projects would be hard if not impossible...
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-17 20:03 ` [DCCP] [RFC] [Patchv2 " Tomasz Grobelny
@ 2008-04-18 10:13 ` Gerrit Renker
2008-04-19 20:42 ` Tomasz Grobelny
0 siblings, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-18 10:13 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
| > There is a more serious problem here: apparently no one tried to compile
| > the `qpolicy' subtree since the changes on Monday. It fails with the
| > BUILD_BUG_ON.
| >
| > I tried yesterday evening and found that there is not even a possibility
| > of adding a single field to struct dccp_skb_cb: with the addition of the
| > inet{,6}_skb_parm, the struct has reached its maximum size of 48 bytes
| > (try printk("%u", sizeof(struct dccp_skb_cb));).
| >
| > There is a solution to this, will post a revised patch shortly.
| >
| What you proposed in the patch should work ok for qpolicies for now. But
| sooner or later the need to add a field to struct dccp_skb_cb will arise. So
| maybe we should think about it now...
| Other possibility apart from what you proposed in the patch is creating struct
| dccp_skb_cb_ext, move to it some fields from struct dccp_skb_cb and in struct
| add a pointer to this new struct dccp_skb_cb_ext. Ok, maybe it is not the
| prettiest idea but in case somebody needs yet another additional field in
| struct dccp_skb_cb_ext it would be nice to have an idea how to do it.
|
When the patch failed to compile I thought about those alternatives.
Trying to extend the dccp_skb_cb over and above what is in there will be
messy, since the IPv4/v6 parameters are required by other subsystems.
And the parts that are already in there are needed by DCCP: sequence/Ack
number, reset code and reset data, packet type and CCVal are required
information to build the packet.
| > With regard to your point above: I think there is a way of keeping the
| > framework open without bending over backwards to ensure the binary
| > compatibility.
| >
| While we are developing in the test tree compatibility is not important at
| all. But real world needs compatibility, especially if it's not that
| expensive. Otherwise people will get weird behaviour without any messages
| indicating what is wrong. And it's certainly not how code should be written.
| It can't be that binary package of VLC (or any other streaming server) cannot
| use newer kernel version because we added a new field.
|
True, but we need to get something working first. There is no point
exporting an API which only works half.
In this regard, another problem with the patch has arisen: the type of
passing priority information will not work on 64-bit architectures in
compatibility mode.
Thanks to an answer by David Miller I found this out today. You can
verify this problem by running the patch e.g. on a sparc64 system: no
packets will be sent, sendmsg() returns with EINVAL.
I am working on this and hope to have something workable tomorrow or
otherwise it will be later next week.
| > My question for you in this regard: which policy can you think of whose
| > priorities/preferences can not be mapped into an unsigned 32-bit
| > integer? I think that such a number is sufficiently expressive enough
| > to cater for a wide range of policies. It can be interpreted as
| > * symbolic value (enum)
| > * ascii string (similar to DCCP service code)
| > * millisecond timeout (with room for up to 7 weeks)
| > * microsecond timeout (with room for up to 1.19 hours)
| > * priority values (the large range 0..2^32-1 can be partitioned)
| > * ... ?
| >
| Putting any of those values in an integer is pretty straightforward. But when
| you want to put both timeout and priority things get messy. Would it be
| possible at least to use structure like that:
| struct dccp_packet_info
| {
| s8 priority;
| u16 timeout_mantissa:10;
| u16 timeout_exponent:6;
| }
| ? That would still fit in 32-bit integer (so could be stored in
| skb->priority), but would provide a much cleaner interface.
| --
Yes that is possible. It need not be a struct, the same effect can be
achieved with bit shifts and bitmasks (this is the way the Ack Vectors
are encoded).
Semantically there is no restriction of what the u32 number represents. And
it is a large space. I don't think that a larger size is required.
And it makes the implementation much simpler, by reusing a field which is
already there and which matches the purpose.
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-18 10:13 ` Gerrit Renker
@ 2008-04-19 20:42 ` Tomasz Grobelny
2008-04-20 16:57 ` Arnaldo Carvalho de Melo
2008-04-22 17:30 ` Gerrit Renker
0 siblings, 2 replies; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-19 20:42 UTC (permalink / raw)
To: Gerrit Renker; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
Dnia Friday 18 of April 2008, Gerrit Renker napisał:
> | > There is a more serious problem here: apparently no one tried to
> | > compile the `qpolicy' subtree since the changes on Monday. It fails
> | > with the BUILD_BUG_ON.
> | >
> | > I tried yesterday evening and found that there is not even a
> | > possibility of adding a single field to struct dccp_skb_cb: with the
> | > addition of the inet{,6}_skb_parm, the struct has reached its maximum
> | > size of 48 bytes (try printk("%u", sizeof(struct dccp_skb_cb));).
> | >
> | > There is a solution to this, will post a revised patch shortly.
> |
> | What you proposed in the patch should work ok for qpolicies for now. But
> | sooner or later the need to add a field to struct dccp_skb_cb will arise.
> | So maybe we should think about it now...
> | Other possibility apart from what you proposed in the patch is creating
> | struct dccp_skb_cb_ext, move to it some fields from struct dccp_skb_cb
> | and in struct add a pointer to this new struct dccp_skb_cb_ext. Ok, maybe
> | it is not the prettiest idea but in case somebody needs yet another
> | additional field in struct dccp_skb_cb_ext it would be nice to have an
> | idea how to do it.
>
> When the patch failed to compile I thought about those alternatives.
> Trying to extend the dccp_skb_cb over and above what is in there will be
> messy, since the IPv4/v6 parameters are required by other subsystems.
>
If inet{,6}_skb_parm is used only outside DCCP code then why at all should it
be placed in struct dccp_skb_cb taking up quite a lot of valuable space? Why
not put it directly in struct sk_buff? Especially that it is present in
struct udp_skb_cb, struct tcp_skb_cb as well.
> | > With regard to your point above: I think there is a way of keeping the
> | > framework open without bending over backwards to ensure the binary
> | > compatibility.
> |
> | While we are developing in the test tree compatibility is not important
> | at all. But real world needs compatibility, especially if it's not that
> | expensive. Otherwise people will get weird behaviour without any messages
> | indicating what is wrong. And it's certainly not how code should be
> | written. It can't be that binary package of VLC (or any other streaming
> | server) cannot use newer kernel version because we added a new field.
>
> True, but we need to get something working first. There is no point
> exporting an API which only works half.
>
You can never assume that the API is feature complete. Even if it "works half"
at the beginning (which is I guess quite normal) it should be easy to make
it "work full" without breaking compatibility when it's not necessary.
> In this regard, another problem with the patch has arisen: the type of
> passing priority information will not work on 64-bit architectures in
> compatibility mode.
>
> Thanks to an answer by David Miller I found this out today. You can
> verify this problem by running the patch e.g. on a sparc64 system: no
> packets will be sent, sendmsg() returns with EINVAL.
>
I don't have any 64 bit system at hand...
Does David's suggestion mean that changes are only required in userspace
application or on the kernel side as well?
> | > My question for you in this regard: which policy can you think of whose
> | > priorities/preferences can not be mapped into an unsigned 32-bit
> | > integer? I think that such a number is sufficiently expressive enough
> | > to cater for a wide range of policies. It can be interpreted as
> | > * symbolic value (enum)
> | > * ascii string (similar to DCCP service code)
> | > * millisecond timeout (with room for up to 7 weeks)
> | > * microsecond timeout (with room for up to 1.19 hours)
> | > * priority values (the large range 0..2^32-1 can be partitioned)
> | > * ... ?
> |
> | Putting any of those values in an integer is pretty straightforward. But
> | when you want to put both timeout and priority things get messy. Would it
> | be possible at least to use structure like that:
> | struct dccp_packet_info
> | {
> | s8 priority;
> | u16 timeout_mantissa:10;
> | u16 timeout_exponent:6;
> | }
> | ? That would still fit in 32-bit integer (so could be stored in
> | skb->priority), but would provide a much cleaner interface.
> | --
>
> Yes that is possible. It need not be a struct, the same effect can be
> achieved with bit shifts and bitmasks (this is the way the Ack Vectors
> are encoded).
>
Yes, it can be achieved by bit operations. But to me a structure is a much
cleaner way. And that is especially important when designing interfaces.
Final effects should be the same but when using structure it is the compiler
that takes care of low level bit operations.
> Semantically there is no restriction of what the u32 number represents. And
> it is a large space. I don't think that a larger size is required.
>
Ok, for now 32 bits are enough and I can't think of the need to use more bits.
I wouldn't be so sure about future though. But ok, if it turns up that we
need more space in the future then we will see, there is no need to worry
about it now. We can indeed assume 32 bits for now.
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-19 20:42 ` Tomasz Grobelny
@ 2008-04-20 16:57 ` Arnaldo Carvalho de Melo
2008-04-20 20:12 ` Tomasz Grobelny
2008-04-22 17:30 ` Gerrit Renker
1 sibling, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2008-04-20 16:57 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Gerrit Renker, Arnaldo Carvalho de Melo, dccp, netdev
Em Sat, Apr 19, 2008 at 10:42:32PM +0200, Tomasz Grobelny escreveu:
> Dnia Friday 18 of April 2008, Gerrit Renker napisał:
> > | > There is a more serious problem here: apparently no one tried to
> > | > compile the `qpolicy' subtree since the changes on Monday. It fails
> > | > with the BUILD_BUG_ON.
> > | >
> > | > I tried yesterday evening and found that there is not even a
> > | > possibility of adding a single field to struct dccp_skb_cb: with the
> > | > addition of the inet{,6}_skb_parm, the struct has reached its maximum
> > | > size of 48 bytes (try printk("%u", sizeof(struct dccp_skb_cb));).
> > | >
> > | > There is a solution to this, will post a revised patch shortly.
> > |
> > | What you proposed in the patch should work ok for qpolicies for now. But
> > | sooner or later the need to add a field to struct dccp_skb_cb will arise.
> > | So maybe we should think about it now...
> > | Other possibility apart from what you proposed in the patch is creating
> > | struct dccp_skb_cb_ext, move to it some fields from struct dccp_skb_cb
> > | and in struct add a pointer to this new struct dccp_skb_cb_ext. Ok, maybe
> > | it is not the prettiest idea but in case somebody needs yet another
> > | additional field in struct dccp_skb_cb_ext it would be nice to have an
> > | idea how to do it.
> >
> > When the patch failed to compile I thought about those alternatives.
> > Trying to extend the dccp_skb_cb over and above what is in there will be
> > messy, since the IPv4/v6 parameters are required by other subsystems.
> >
> If inet{,6}_skb_parm is used only outside DCCP code then why at all should it
> be placed in struct dccp_skb_cb taking up quite a lot of valuable space? Why
> not put it directly in struct sk_buff? Especially that it is present in
> struct udp_skb_cb, struct tcp_skb_cb as well.
Because all this is used in skb->cb[], a scratchpad for protocols to
use, we can go back to what we had before, that is to not reserve use
for inet6?_skb_parm but be sure to zero it before passing it to IP, as
we don't want IP to be confused with things being non zero there. Then
we can use all its space.
- Arnaldo
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-20 16:57 ` Arnaldo Carvalho de Melo
@ 2008-04-20 20:12 ` Tomasz Grobelny
2008-04-21 11:45 ` Patrick McHardy
0 siblings, 1 reply; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-20 20:12 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: Gerrit Renker, dccp, netdev, Patrick McHardy
Dnia Sunday 20 of April 2008, Arnaldo Carvalho de Melo napisał:
> Em Sat, Apr 19, 2008 at 10:42:32PM +0200, Tomasz Grobelny escreveu:
> > > When the patch failed to compile I thought about those alternatives.
> > > Trying to extend the dccp_skb_cb over and above what is in there will
> > > be messy, since the IPv4/v6 parameters are required by other
> > > subsystems.
> >
> > If inet{,6}_skb_parm is used only outside DCCP code then why at all
> > should it be placed in struct dccp_skb_cb taking up quite a lot of
> > valuable space? Why not put it directly in struct sk_buff? Especially
> > that it is present in struct udp_skb_cb, struct tcp_skb_cb as well.
>
> Because all this is used in skb->cb[], a scratchpad for protocols to
> use, we can go back to what we had before, that is to not reserve use
> for inet6?_skb_parm but be sure to zero it before passing it to IP, as
> we don't want IP to be confused with things being non zero there. Then
> we can use all its space.
>
Several questions regarding this case:
1. What about SCTP? It doesn't have inet6?_skb_parm in it's structure that is
stored in skb->cb. So does it contain a potential bug (that is to be fixed)
or is it not needed there or what?
2. If the sole purpose of this change was to keep skb->cb zeroed then it
doesn't seem to me like the right solution. Wasting about 20 bytes instead of
zeroing them when needed I would consider at least weird. I understand that
TCP and UDP may have enough space left but it just turned out that DCCP
doesn't.
3. If it's IP layer that needs zeroes then why not clear skb->cb in IP layer?
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-20 20:12 ` Tomasz Grobelny
@ 2008-04-21 11:45 ` Patrick McHardy
2008-04-21 13:12 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 47+ messages in thread
From: Patrick McHardy @ 2008-04-21 11:45 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Arnaldo Carvalho de Melo, Gerrit Renker, dccp, netdev
Tomasz Grobelny wrote:
> Dnia Sunday 20 of April 2008, Arnaldo Carvalho de Melo napisał:
>> Em Sat, Apr 19, 2008 at 10:42:32PM +0200, Tomasz Grobelny escreveu:
>>>> When the patch failed to compile I thought about those alternatives.
>>>> Trying to extend the dccp_skb_cb over and above what is in there will
>>>> be messy, since the IPv4/v6 parameters are required by other
>>>> subsystems.
>>> If inet{,6}_skb_parm is used only outside DCCP code then why at all
>>> should it be placed in struct dccp_skb_cb taking up quite a lot of
>>> valuable space? Why not put it directly in struct sk_buff? Especially
>>> that it is present in struct udp_skb_cb, struct tcp_skb_cb as well.
>> Because all this is used in skb->cb[], a scratchpad for protocols to
>> use, we can go back to what we had before, that is to not reserve use
>> for inet6?_skb_parm but be sure to zero it before passing it to IP, as
>> we don't want IP to be confused with things being non zero there. Then
>> we can use all its space.
>>
> Several questions regarding this case:
> 1. What about SCTP? It doesn't have inet6?_skb_parm in it's structure that is
> stored in skb->cb. So does it contain a potential bug (that is to be fixed)
> or is it not needed there or what?
Judging by a quick grep, SCTP only uses the CB on input and
appears to be fine.
> 2. If the sole purpose of this change was to keep skb->cb zeroed then it
> doesn't seem to me like the right solution. Wasting about 20 bytes instead of
> zeroing them when needed I would consider at least weird. I understand that
> TCP and UDP may have enough space left but it just turned out that DCCP
> doesn't.
It was the safest solution that late in a release. It also
avoids to memset the cb unnecessarily. If the room is not
enough anymore, its easy to go back to using memset.
> 3. If it's IP layer that needs zeroes then why not clear skb->cb in IP layer?
That would certainly work, but it adds unnecessary costs for
the other protocols that don't need this.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-21 11:45 ` Patrick McHardy
@ 2008-04-21 13:12 ` Arnaldo Carvalho de Melo
2008-04-21 16:17 ` Tomasz Grobelny
0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2008-04-21 13:12 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Tomasz Grobelny, Gerrit Renker, dccp, netdev
Em Mon, Apr 21, 2008 at 01:45:20PM +0200, Patrick McHardy escreveu:
> Tomasz Grobelny wrote:
>> Dnia Sunday 20 of April 2008, Arnaldo Carvalho de Melo napisał:
>>> Em Sat, Apr 19, 2008 at 10:42:32PM +0200, Tomasz Grobelny escreveu:
>>>>> When the patch failed to compile I thought about those alternatives.
>>>>> Trying to extend the dccp_skb_cb over and above what is in there will
>>>>> be messy, since the IPv4/v6 parameters are required by other
>>>>> subsystems.
>>>> If inet{,6}_skb_parm is used only outside DCCP code then why at all
>>>> should it be placed in struct dccp_skb_cb taking up quite a lot of
>>>> valuable space? Why not put it directly in struct sk_buff? Especially
>>>> that it is present in struct udp_skb_cb, struct tcp_skb_cb as well.
>>> Because all this is used in skb->cb[], a scratchpad for protocols to
>>> use, we can go back to what we had before, that is to not reserve use
>>> for inet6?_skb_parm but be sure to zero it before passing it to IP, as
>>> we don't want IP to be confused with things being non zero there. Then
>>> we can use all its space.
>>>
>> Several questions regarding this case:
>> 1. What about SCTP? It doesn't have inet6?_skb_parm in it's structure that
>> is stored in skb->cb. So does it contain a potential bug (that is to be
>> fixed) or is it not needed there or what?
>
> Judging by a quick grep, SCTP only uses the CB on input and
> appears to be fine.
Thanks for checking!
>> 2. If the sole purpose of this change was to keep skb->cb zeroed then it
>> doesn't seem to me like the right solution. Wasting about 20 bytes instead
>> of zeroing them when needed I would consider at least weird. I understand
>> that TCP and UDP may have enough space left but it just turned out that
>> DCCP doesn't.
>
> It was the safest solution that late in a release. It also
> avoids to memset the cb unnecessarily. If the room is not
> enough anymore, its easy to go back to using memset.
Nod, if we don't need the space reserved for the lower layer protocols
in DCCP it is actually the best solution, as we don't need to zero the
cb again before passing it to IP, it gets zeroed at alloc_skb time and
that is it. If we need the space, we have to pay the price of
memset before passing to IP.
The rule here is: all layers expect that the CB be zeroed, not all of
it, just the part that it uses. DCCP gets it zeroed, its only fair that
it passes it to IP zeroed, and hey, not all of it, just what IP uses :-)
>> 3. If it's IP layer that needs zeroes then why not clear skb->cb in IP layer?
>
> That would certainly work, but it adds unnecessary costs for
> the other protocols that don't need this.
Nod again.
- Arnaldo
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-21 13:12 ` Arnaldo Carvalho de Melo
@ 2008-04-21 16:17 ` Tomasz Grobelny
2008-04-22 4:56 ` Patrick McHardy
2008-04-22 17:41 ` Gerrit Renker
0 siblings, 2 replies; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-21 16:17 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Patrick McHardy, Gerrit Renker; +Cc: dccp, netdev
Dnia Monday 21 of April 2008, Arnaldo Carvalho de Melo napisał:
> Em Mon, Apr 21, 2008 at 01:45:20PM +0200, Patrick McHardy escreveu:
> > Tomasz Grobelny wrote:
> >> 2. If the sole purpose of this change was to keep skb->cb zeroed then it
> >> doesn't seem to me like the right solution. Wasting about 20 bytes
> >> instead of zeroing them when needed I would consider at least weird. I
> >> understand that TCP and UDP may have enough space left but it just
> >> turned out that DCCP doesn't.
> >
> > It was the safest solution that late in a release. It also
> > avoids to memset the cb unnecessarily. If the room is not
> > enough anymore, its easy to go back to using memset.
>
> Nod, if we don't need the space reserved for the lower layer protocols
> in DCCP it is actually the best solution, as we don't need to zero the
> cb again before passing it to IP, it gets zeroed at alloc_skb time and
> that is it. If we need the space, we have to pay the price of
> memset before passing to IP.
>
Ok, so in this case the patch for DCCP could be reverted in test tree, is that
right? Were these two deleted memsets zeroing all that was necessary or were
there any other bugs fixed by the patch?
If we iron this out we could finally return to the main subject of this
thread. That is Patch v2 by me and Gerrit...
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-21 16:17 ` Tomasz Grobelny
@ 2008-04-22 4:56 ` Patrick McHardy
2008-04-22 20:45 ` Tomasz Grobelny
2008-04-22 17:41 ` Gerrit Renker
1 sibling, 1 reply; 47+ messages in thread
From: Patrick McHardy @ 2008-04-22 4:56 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Arnaldo Carvalho de Melo, Gerrit Renker, dccp, netdev
Tomasz Grobelny wrote:
> Dnia Monday 21 of April 2008, Arnaldo Carvalho de Melo napisał:
>> Nod, if we don't need the space reserved for the lower layer protocols
>> in DCCP it is actually the best solution, as we don't need to zero the
>> cb again before passing it to IP, it gets zeroed at alloc_skb time and
>> that is it. If we need the space, we have to pay the price of
>> memset before passing to IP.
>>
> Ok, so in this case the patch for DCCP could be reverted in test tree, is that
> right? Were these two deleted memsets zeroing all that was necessary or were
> there any other bugs fixed by the patch?
No, those two memsets became unnecessary by the addition
of the new cb members. If you want to remove them again,
you need to add those memsets back and additionally add
memsets that zero the first sizeof(inet_skb_parm)/
sizeof(inet6_skb_parm) bytes everywhere else where packets
are passed to IP(v6).
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-19 20:42 ` Tomasz Grobelny
2008-04-20 16:57 ` Arnaldo Carvalho de Melo
@ 2008-04-22 17:30 ` Gerrit Renker
2008-04-22 20:30 ` Tomasz Grobelny
1 sibling, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-22 17:30 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
| If inet{,6}_skb_parm is used only outside DCCP code then why at all should it
| be placed in struct dccp_skb_cb taking up quite a lot of valuable space? Why
| not put it directly in struct sk_buff? Especially that it is present in
| struct udp_skb_cb, struct tcp_skb_cb as well.
|
With or without inet{,6}_sbk_parm there are a further unused 32 bits in the cb -- by
restricting the 48-bit sequence/ack numbers from their current u64 type to 48 bit
only. This requires to change the parts which rely on DCCP_PKT_WITHOUT_ACK_SEQ,
but that seems possible to do.
So we can reduce this to the other question "is 32 bit enough for priority information".
| > | While we are developing in the test tree compatibility is not important
| > | at all. But real world needs compatibility, especially if it's not that
| > | expensive. Otherwise people will get weird behaviour without any messages
| > | indicating what is wrong. And it's certainly not how code should be
| > | written. It can't be that binary package of VLC (or any other streaming
| > | server) cannot use newer kernel version because we added a new field.
| >
| > True, but we need to get something working first. There is no point
| > exporting an API which only works half.
| >
| You can never assume that the API is feature complete. Even if it "works half"
| at the beginning (which is I guess quite normal) it should be easy to make
| it "work full" without breaking compatibility when it's not necessary.
|
No I don't agree. Things that work half are worse than constant failures, since
it is never clear when they will fail next. But maybe you didn't mean that.
| > In this regard, another problem with the patch has arisen: the type of
| > passing priority information will not work on 64-bit architectures in
| > compatibility mode.
| >
| > Thanks to an answer by David Miller I found this out today. You can
| > verify this problem by running the patch e.g. on a sparc64 system: no
| > packets will be sent, sendmsg() returns with EINVAL.
| >
| I don't have any 64 bit system at hand...
| Does David's suggestion mean that changes are only required in userspace
| application or on the kernel side as well?
|
In both -- I will send an update in reply to this message and a link to
some user code. Very busy right now.
| > | struct dccp_packet_info
| > | {
| > | s8 priority;
| > | u16 timeout_mantissa:10;
| > | u16 timeout_exponent:6;
| > | }
| > | ? That would still fit in 32-bit integer (so could be stored in
| > | skb->priority), but would provide a much cleaner interface.
| > | --
| >
| > Yes that is possible. It need not be a struct, the same effect can be
| > achieved with bit shifts and bitmasks (this is the way the Ack Vectors
| > are encoded).
| >
| Yes, it can be achieved by bit operations. But to me a structure is a much
| cleaner way. And that is especially important when designing interfaces.
| Final effects should be the same but when using structure it is the compiler
| that takes care of low level bit operations.
|
You are free to use what you think is best. But I think both ways are
equivalent, using e.g. typecasts to get from one to the other. I.e.
I think we have no disagreement here.
| > Semantically there is no restriction of what the u32 number represents. And
| > it is a large space. I don't think that a larger size is required.
| >
| Ok, for now 32 bits are enough and I can't think of the need to use more bits.
| I wouldn't be so sure about future though. But ok, if it turns up that we
| need more space in the future then we will see, there is no need to worry
| about it now. We can indeed assume 32 bits for now.
| --
So it only remains to see whether to use a field in dccp_skb_cb, or to
use skb->priority. I think the latter has advantages, since this field
is only set/used for the SO_PRIORITY field (in socket(7)).
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-21 16:17 ` Tomasz Grobelny
2008-04-22 4:56 ` Patrick McHardy
@ 2008-04-22 17:41 ` Gerrit Renker
2008-04-22 22:42 ` David Miller
1 sibling, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-22 17:41 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: dccp, netdev
| If we iron this out we could finally return to the main subject of this
| thread. That is Patch v2 by me and Gerrit...
| --
Fully agree - we just need to decide whether or not to use skb->priority.
Below is as far as I got in integrating your patch last week, it shows
only the major changes. The following bits have been updated:
* skb->priority now cleared before passing the skb onto layer 3;
* order of statements in prio_push() reversed (first dropping worst
skb and then pushing the new skb - this is better when e.g.
tx_qlen=1);
* added general parsing routine for cmsg(3) socket control messages
and defined one for the SOL_DCCP socket level; thanks to advice
by Dave Miller
A new and updated version has been uploaded to
git://eden-feed.erg.abdn.ac.uk/dccp_exp
subtree `qpolicy'
The implementation needs some more testing.
I have uploaded matching userspace code to
http://www.erg.abdn.ac.uk/users/gerrit/dccp/packet_prio_tests.tar.gz
Will send an update next week and then we can discuss my edits, it
remains your patch. Simply too busy this week.
Gerrit
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -195,6 +195,12 @@ enum dccp_feature_numbers {
DCCPF_MAX_CCID_SPECIFIC = 255,
};
+/* DCCP socket control message types for cmsg */
+enum dccp_cmsg_type {
+ DCCP_SCM_PRIORITY = 1,
+ DCCP_SCM_MAX
+};
+
/* DCCP priorities for outgoing/queued packets */
enum dccp_packet_dequeueing_policy {
DCCPQ_POLICY_SIMPLE,
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -706,6 +706,32 @@ int compat_dccp_getsockopt(struct sock *
EXPORT_SYMBOL_GPL(compat_dccp_getsockopt);
#endif
+static int dccp_msghdr_parse(struct msghdr *msg, __u32 **priority)
+{
+ struct cmsghdr *cmsg = CMSG_FIRSTHDR(msg);
+
+ for (; cmsg != NULL; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+
+ if (!CMSG_OK(msg, cmsg))
+ return -EINVAL;
+
+ /* Only look at DCCP-related socket control messages */
+ if (cmsg->cmsg_level != SOL_DCCP)
+ continue;
+
+ switch (cmsg->cmsg_type) {
+ case DCCP_SCM_PRIORITY:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(__u32)))
+ return -EINVAL;
+ *priority = (__u32 *)CMSG_DATA(cmsg);
+ break;
+ default:
+ return -EINVAL;
+ }
+ }
+ return 0;
+}
+
int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
@@ -714,11 +740,15 @@ int dccp_sendmsg(struct kiocb *iocb, str
const int noblock = flags & MSG_DONTWAIT;
struct sk_buff *skb;
int rc, size;
+ __u32 *pprio = NULL;
long timeo;
if (len > dp->dccps_mss_cache)
return -EMSGSIZE;
+ if (dccp_msghdr_parse(msg, &pprio))
+ return -EINVAL;
+
lock_sock(sk);
if (dccp_qpolicy_full(sk)) {
@@ -749,7 +779,9 @@ int dccp_sendmsg(struct kiocb *iocb, str
if (rc != 0)
goto out_discard;
- dccp_qpolicy_push(sk, skb, msg);
+ skb->priority = pprio != NULL ? *pprio : 0;
+
+ dccp_qpolicy_push(sk, skb);
dccp_write_xmit(sk);
out_release:
release_sock(sk);
--- a/net/dccp/qpolicy.c
+++ b/net/dccp/qpolicy.c
@@ -97,13 +102,8 @@ static struct dccp_qpolicy_operations {
/*
* Externally visible interface
*/
-void dccp_qpolicy_push(struct sock *sk, struct sk_buff *skb, struct msghdr *msg)
+void dccp_qpolicy_push(struct sock *sk, struct sk_buff *skb)
{
- if (msg->msg_control == NULL || msg->msg_controllen != sizeof(__u32))
- skb->priority = 0; /* implies lowest-possible priority */
- else
- skb->priority = get_unaligned((__u32 *)msg->msg_control);
-
qpol_table[dccp_sk(sk)->dccps_qpolicy].push(sk, skb);
}
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -51,7 +51,11 @@ during an established connection are not
defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special,
and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an
u32 priority value as ancillary data to sendmsg(), where higher numbers indicate
-a higher packet priority (similar to SO_PRIORITY).
+a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to
+be formatted using a cmsg(3) message header filled in as follows:
+ cmsg->cmsg_level = SOL_DCCP;
+ cmsg->cmsg_type = DCCP_SCM_PRIORITY;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t));
DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero
value is always interpreted as unbounded queue length. If different from zero,
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -257,6 +257,9 @@ static void dccp_xmit_packet(struct sock
DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_DATA;
}
+ /* Clear priority value used by the qpolicy subsystem */
+ skb->priority = 0;
+
err = dccp_transmit_skb(sk, skb);
if (err)
dccp_pr_debug("transmit_skb() returned err=%d\n", err);
--
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-22 17:30 ` Gerrit Renker
@ 2008-04-22 20:30 ` Tomasz Grobelny
2008-04-28 13:10 ` Gerrit Renker
[not found] ` <20080424220704.0483DBC12@poczta.oswiecenia.net>
0 siblings, 2 replies; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-22 20:30 UTC (permalink / raw)
To: Gerrit Renker, Arnaldo Carvalho de Melo; +Cc: dccp, netdev
Dnia Tuesday 22 of April 2008, Gerrit Renker napisał:
> | > | While we are developing in the test tree compatibility is not
> | > | important at all. But real world needs compatibility, especially if
> | > | it's not that expensive. Otherwise people will get weird behaviour
> | > | without any messages indicating what is wrong. And it's certainly not
> | > | how code should be written. It can't be that binary package of VLC
> | > | (or any other streaming server) cannot use newer kernel version
> | > | because we added a new field.
> | >
> | > True, but we need to get something working first. There is no point
> | > exporting an API which only works half.
> |
> | You can never assume that the API is feature complete. Even if it "works
> | half" at the beginning (which is I guess quite normal) it should be easy
> | to make it "work full" without breaking compatibility when it's not
> | necessary.
>
> No I don't agree. Things that work half are worse than constant failures,
> since it is never clear when they will fail next. But maybe you didn't mean
> that.
>
I mean that what we are currently discussing (only priorities) is in itself
useful ("works half"). Adding more features (expiry times) would be a nice
but non essential feature ("work full").
(a bit of reordering)
> So we can reduce this to the other question "is 32 bit enough for priority
> information".
>
That is an important question. But I'm afraid the answer is no (even though I
thought otherwise when writing previous mail). When we want to add packet
expiry times we will need a field for timestamp. Which is quite big (64
bits?). But note that it is only needed for in-kernel implemetation.
Userspace can write this information on as low as 16 bits. This leads me to
conclusions stated below.
> | > Semantically there is no restriction of what the u32 number represents.
> | > And it is a large space. I don't think that a larger size is required.
> |
> | Ok, for now 32 bits are enough and I can't think of the need to use more
> | bits. I wouldn't be so sure about future though. But ok, if it turns up
> | that we need more space in the future then we will see, there is no need
> | to worry about it now. We can indeed assume 32 bits for now.
>
> So it only remains to see whether to use a field in dccp_skb_cb, or to
> use skb->priority. I think the latter has advantages, since this field
> is only set/used for the SO_PRIORITY field (in socket(7)).
>
I think we should distingish two things:
1. How do we pass data along with each packet from userspace to kernelspace.
2. How do we store data in in-kernel skb for internal use.
I'm afraid there was no such separation in my patches and mixing those two
things causes a lots misunderstandings.
Ad 1. In this case we need not care about sizes etc. Clean and extensible
interface is very important. We can also have very different priority data
representantion from what is later stored in skb.
Ad 2. In this case sizes do matter. We can use whichever fields we want. In
particular we need not keep all the data passed from userspace application in
one chunk but we can distribute it among arbitrary skb fields.
So this could look like that:
Userspace application fills in a structure:
struct dccp_packet_info {
s8 priority; //as it is now
u16 expire_after; //up to 65s with 1ms accuracy
}
and passes it to sendmsg(..., struct dccp_packet_info pktinfo), which passes
it to qpolicy_push which passes it to qpolicy_prio_push which does the tricky
stuff with CMSG and does:
skb->priority=pktinfo->priority;
skb->tstamp=now()+pktinfo->expire_after;
(possibly other things that could potentially store data in dccp_skb_cb)
This would allow us to:
1. Use existing skb fields in a clean way. I mean according to their names
(priority, tstamp).
2. Have a clean userspace interface not affected by internal kernel
implementation details.
What do you think of such decoupling?
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-22 4:56 ` Patrick McHardy
@ 2008-04-22 20:45 ` Tomasz Grobelny
2008-04-22 22:06 ` David Miller
0 siblings, 1 reply; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-22 20:45 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Arnaldo Carvalho de Melo, Gerrit Renker, dccp, netdev
Dnia Tuesday 22 of April 2008, napisałeś:
> Tomasz Grobelny wrote:
> > Dnia Monday 21 of April 2008, Arnaldo Carvalho de Melo napisał:
> >> Nod, if we don't need the space reserved for the lower layer protocols
> >> in DCCP it is actually the best solution, as we don't need to zero the
> >> cb again before passing it to IP, it gets zeroed at alloc_skb time and
> >> that is it. If we need the space, we have to pay the price of
> >> memset before passing to IP.
> >
> > Ok, so in this case the patch for DCCP could be reverted in test tree, is
> > that right? Were these two deleted memsets zeroing all that was necessary
> > or were there any other bugs fixed by the patch?
>
> No, those two memsets became unnecessary by the addition
> of the new cb members. If you want to remove them again,
> you need to add those memsets back and additionally add
> memsets that zero the first sizeof(inet_skb_parm)/
> sizeof(inet6_skb_parm) bytes everywhere else where packets
> are passed to IP(v6).
Maybe I wasn't clear enough with my question. I understand that these two
memsets would have to be readded. But my question is: did you identify any
bugs that were caused by junk in skb->cb before applying your patch? If so,
do you have any test cases? It could help to check code for correctness if
one day more space in skb->cb will be needed (which of course doesn't have to
be in the nearest future).
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-22 20:45 ` Tomasz Grobelny
@ 2008-04-22 22:06 ` David Miller
2008-04-23 0:03 ` Patrick McHardy
0 siblings, 1 reply; 47+ messages in thread
From: David Miller @ 2008-04-22 22:06 UTC (permalink / raw)
To: tomasz; +Cc: kaber, acme, gerrit, dccp, netdev
From: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Date: Tue, 22 Apr 2008 22:45:53 +0200
> Maybe I wasn't clear enough with my question. I understand that
> these two memsets would have to be readded. But my question is: did
> you identify any bugs that were caused by junk in skb->cb before
> applying your patch? If so, do you have any test cases? It could
> help to check code for correctness if one day more space in skb->cb
> will be needed (which of course doesn't have to be in the nearest
> future).
I think Patrick noticed it because it broke netfilter.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-22 17:41 ` Gerrit Renker
@ 2008-04-22 22:42 ` David Miller
2008-04-25 19:33 ` Tomasz Grobelny
2008-04-28 7:21 ` Gerrit Renker
0 siblings, 2 replies; 47+ messages in thread
From: David Miller @ 2008-04-22 22:42 UTC (permalink / raw)
To: gerrit; +Cc: tomasz, dccp, netdev
From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Date: Tue, 22 Apr 2008 18:41:52 +0100
> | If we iron this out we could finally return to the main subject of this
> | thread. That is Patch v2 by me and Gerrit...
> | --
> Fully agree - we just need to decide whether or not to use skb->priority.
>
> Below is as far as I got in integrating your patch last week, it shows
> only the major changes. The following bits have been updated:
>
> * skb->priority now cleared before passing the skb onto layer 3;
> * order of statements in prio_push() reversed (first dropping worst
> skb and then pushing the new skb - this is better when e.g.
> tx_qlen=1);
> * added general parsing routine for cmsg(3) socket control messages
> and defined one for the SOL_DCCP socket level; thanks to advice
> by Dave Miller
If this usage of skb->priority is going to override the
IP_TOS socket option setting, I don't think it's a good
idea.
Right now every packet output goes through ip_output.c
which sets skb->priority to sk->sk_priority, which is set
by the user via the IP_TOS socket option in ip_sockglue.c
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-22 22:06 ` David Miller
@ 2008-04-23 0:03 ` Patrick McHardy
0 siblings, 0 replies; 47+ messages in thread
From: Patrick McHardy @ 2008-04-23 0:03 UTC (permalink / raw)
To: David Miller; +Cc: tomasz, acme, gerrit, dccp, netdev
David Miller wrote:
> From: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
> Date: Tue, 22 Apr 2008 22:45:53 +0200
>
>> Maybe I wasn't clear enough with my question. I understand that
>> these two memsets would have to be readded. But my question is: did
>> you identify any bugs that were caused by junk in skb->cb before
>> applying your patch? If so, do you have any test cases? It could
>> help to check code for correctness if one day more space in skb->cb
>> will be needed (which of course doesn't have to be in the nearest
>> future).
>
> I think Patrick noticed it because it broke netfilter.
>
Exactly, at least one path to the IP stack didn't clear
the CB (I don't recall the exact function, sorry). The
same is true for IPv6.
The testcase was transfering data over NATed DCCP connections,
some packets had garbage in IPCB(skb)->flags, which caused
IP to skip the POST_ROUTING hook, breaking NAT and conntrack.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 1/1] [DCCP][QPOLICY]: External interface changes
[not found] ` <20080424220704.0483DBC12@poczta.oswiecenia.net>
@ 2008-04-24 22:16 ` Tomasz Grobelny
2008-04-28 15:08 ` Gerrit Renker
1 sibling, 0 replies; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-24 22:16 UTC (permalink / raw)
To: Gerrit Renker; +Cc: acme, dccp, netdev
Dnia Thursday 24 of April 2008, napisałeś:
> + skb->priority = info.priority;
Here, as David pointed out, we are not sure wheter skb->priority can be used.
But that's not the point of this patch. The point is to request for comment
when it comes to separation of externally visible interface and internal per
skb data storage.
> + printk("pushing prio: %d\n", skb->priority);
And that is of course a simple mistake.
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-22 22:42 ` David Miller
@ 2008-04-25 19:33 ` Tomasz Grobelny
2008-04-25 20:40 ` Arnaldo Carvalho de Melo
2008-04-28 7:21 ` Gerrit Renker
1 sibling, 1 reply; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-25 19:33 UTC (permalink / raw)
To: David Miller; +Cc: gerrit, dccp, netdev
Dnia Wednesday 23 of April 2008, David Miller napisał:
> From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
> Date: Tue, 22 Apr 2008 18:41:52 +0100
>
> > | If we iron this out we could finally return to the main subject of this
> > | thread. That is Patch v2 by me and Gerrit...
> > | --
> >
> > Fully agree - we just need to decide whether or not to use skb->priority.
> >
> > Below is as far as I got in integrating your patch last week, it shows
> > only the major changes. The following bits have been updated:
> >
> > * skb->priority now cleared before passing the skb onto layer 3;
> > * order of statements in prio_push() reversed (first dropping worst
> > skb and then pushing the new skb - this is better when e.g.
> > tx_qlen=1);
> > * added general parsing routine for cmsg(3) socket control messages
> > and defined one for the SOL_DCCP socket level; thanks to advice
> > by Dave Miller
>
> If this usage of skb->priority is going to override the
> IP_TOS socket option setting, I don't think it's a good
> idea.
>
> Right now every packet output goes through ip_output.c
> which sets skb->priority to sk->sk_priority, which is set
> by the user via the IP_TOS socket option in ip_sockglue.c
>
But I guess this assignment happens a bit later (that is after outgoing packet
leaves DCCP code). Consequently using skb->priority should not harm as it
will be overwritten. Or did I miss something?
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-25 19:33 ` Tomasz Grobelny
@ 2008-04-25 20:40 ` Arnaldo Carvalho de Melo
2008-04-25 20:58 ` David Miller
0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2008-04-25 20:40 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: David Miller, gerrit, dccp, netdev
Em Fri, Apr 25, 2008 at 09:33:11PM +0200, Tomasz Grobelny escreveu:
> Dnia Wednesday 23 of April 2008, David Miller napisał:
> > From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
> > Date: Tue, 22 Apr 2008 18:41:52 +0100
> >
> > > | If we iron this out we could finally return to the main subject of this
> > > | thread. That is Patch v2 by me and Gerrit...
> > > | --
> > >
> > > Fully agree - we just need to decide whether or not to use skb->priority.
> > >
> > > Below is as far as I got in integrating your patch last week, it shows
> > > only the major changes. The following bits have been updated:
> > >
> > > * skb->priority now cleared before passing the skb onto layer 3;
> > > * order of statements in prio_push() reversed (first dropping worst
> > > skb and then pushing the new skb - this is better when e.g.
> > > tx_qlen=1);
> > > * added general parsing routine for cmsg(3) socket control messages
> > > and defined one for the SOL_DCCP socket level; thanks to advice
> > > by Dave Miller
> >
> > If this usage of skb->priority is going to override the
> > IP_TOS socket option setting, I don't think it's a good
> > idea.
> >
> > Right now every packet output goes through ip_output.c
> > which sets skb->priority to sk->sk_priority, which is set
> > by the user via the IP_TOS socket option in ip_sockglue.c
> >
> But I guess this assignment happens a bit later (that is after outgoing packet
> leaves DCCP code). Consequently using skb->priority should not harm as it
> will be overwritten. Or did I miss something?
I haven't read all the patches, but I guess Tomasz is on the safe side
as the intended skb->priority usage is limited to DCCP, when IP is
handed the skb it can do as it pleases with skb->priority.
- Arnaldo
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-25 20:40 ` Arnaldo Carvalho de Melo
@ 2008-04-25 20:58 ` David Miller
0 siblings, 0 replies; 47+ messages in thread
From: David Miller @ 2008-04-25 20:58 UTC (permalink / raw)
To: acme; +Cc: tomasz, gerrit, dccp, netdev
From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Fri, 25 Apr 2008 17:40:24 -0300
> I haven't read all the patches, but I guess Tomasz is on the safe side
> as the intended skb->priority usage is limited to DCCP, when IP is
> handed the skb it can do as it pleases with skb->priority.
Great.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-22 22:42 ` David Miller
2008-04-25 19:33 ` Tomasz Grobelny
@ 2008-04-28 7:21 ` Gerrit Renker
2008-04-28 7:39 ` David Miller
1 sibling, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-28 7:21 UTC (permalink / raw)
To: David Miller; +Cc: tomasz, dccp, netdev
| > * skb->priority now cleared before passing the skb onto layer 3;
<snip>
|
| If this usage of skb->priority is going to override the
| IP_TOS socket option setting, I don't think it's a good
| idea.
|
| Right now every packet output goes through ip_output.c
| which sets skb->priority to sk->sk_priority, which is set
| by the user via the IP_TOS socket option in ip_sockglue.c
That is actually the key point I wanted to ask about.
The patch exploits that the assignment happens later, on layer3,
with these conditions:
* Output path only (sent by the DCCP output module on this host).
* It is not clear which value skb->priority has until reaching the
output functions in ipv4/ip_output.c and ipv6/ip6_output.c. There
is no memset on this field in net/core/skbuff.c and the skb comes
from a cache, so presumably skb->priority is undefined until it
is assigned from sk->sk_priority.
* The assignment for appletalk (net/appletalk/aarp.c), econet,
IPv4/6 raw sockets, packet sockets (af_packet.c) is similar, so
the assumption was that the 4 bytes of skb->priority are not
used while the skb is on layer 4.
* The only place where I can see mischief happen is when there is
a fork where the skb is not passed on to layer 3, using either
__copy_skb_header (copies the priority field) or skb_segment.
Since I can currently not see the last point happen within DCCP, I think
that the use of skb->priority for layer 4 (DCCP) is safe.
So the remaining question was whether maintainers would be okay with
this "overloading" of the field (Arnaldo seems ok with it).
Part of the patch tried to make this use obvious, by aligning the
"prio" policy with the semantics of SO_PRIORITY.
Gerrit
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-28 7:21 ` Gerrit Renker
@ 2008-04-28 7:39 ` David Miller
0 siblings, 0 replies; 47+ messages in thread
From: David Miller @ 2008-04-28 7:39 UTC (permalink / raw)
To: gerrit; +Cc: tomasz, dccp, netdev
From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Date: Mon, 28 Apr 2008 08:21:28 +0100
> Since I can currently not see the last point happen within DCCP, I think
> that the use of skb->priority for layer 4 (DCCP) is safe.
Ok.
> So the remaining question was whether maintainers would be okay with
> this "overloading" of the field (Arnaldo seems ok with it).
I'm okay with it, thanks for the explanation and analysis.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-22 20:30 ` Tomasz Grobelny
@ 2008-04-28 13:10 ` Gerrit Renker
2008-04-28 15:19 ` [DCCP] [RFC] [Patchv3 " Gerrit Renker
2008-04-28 21:03 ` [DCCP] [RFC] [Patchv2 " Tomasz Grobelny
[not found] ` <20080424220704.0483DBC12@poczta.oswiecenia.net>
1 sibling, 2 replies; 47+ messages in thread
From: Gerrit Renker @ 2008-04-28 13:10 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
| > | > True, but we need to get something working first. There is no point
| > | > exporting an API which only works half.
| > |
| > | You can never assume that the API is feature complete. Even if it "works
| > | half" at the beginning (which is I guess quite normal) it should be easy
| > | to make it "work full" without breaking compatibility when it's not
| > | necessary.
| >
| > No I don't agree. Things that work half are worse than constant failures,
| > since it is never clear when they will fail next. But maybe you didn't mean
| > that.
| >
| I mean that what we are currently discussing (only priorities) is in itself
| useful ("works half"). Adding more features (expiry times) would be a nice
| but non essential feature ("work full").
|
It seems I misunderstood you, and agree that with regard to API it is
better to have a little too much discussion than too little, there are
many aspects to consider.
| (a bit of reordering)
| > So we can reduce this to the other question "is 32 bit enough for priority
| > information".
| >
| That is an important question. But I'm afraid the answer is no (even though I
| thought otherwise when writing previous mail). When we want to add packet
| expiry times we will need a field for timestamp. Which is quite big (64
| bits?). But note that it is only needed for in-kernel implemetation.
| Userspace can write this information on as low as 16 bits. This leads me to
| conclusions stated below.
|
I had the same thoughts - it is easy to end up with a wide range of
possible structs, numbers, etc. that are not related and require 20
different APIs.
But the basic line of thought is
* the queue size is always limited since afaik the amount of skbs that
can be queued depends on the socket's write memory (wmem);
* hence a strict priority ordering can be done even with u8 (up to 255)
in not-too-extreme case
* for timeouts, it is not necessary to use struct time{val,spec} (as
also above); if really required, a conversion can be done by a
library (but not a kernel) routine.
| > So it only remains to see whether to use a field in dccp_skb_cb, or to
| > use skb->priority. I think the latter has advantages, since this field
| > is only set/used for the SO_PRIORITY field (in socket(7)).
| >
| I think we should distingish two things:
| 1. How do we pass data along with each packet from userspace to kernelspace.
| 2. How do we store data in in-kernel skb for internal use.
<snip>
|
| Ad 1. In this case we need not care about sizes etc. Clean and extensible
| interface is very important. We can also have very different priority data
| representantion from what is later stored in skb.
| Ad 2. In this case sizes do matter. We can use whichever fields we want. In
| particular we need not keep all the data passed from userspace application in
| one chunk but we can distribute it among arbitrary skb fields.
|
| So this could look like that:
| Userspace application fills in a structure:
| struct dccp_packet_info {
| s8 priority; //as it is now
| u16 expire_after; //up to 65s with 1ms accuracy
| }
| and passes it to sendmsg(..., struct dccp_packet_info pktinfo), which passes
| it to qpolicy_push which passes it to qpolicy_prio_push which does the tricky
| stuff with CMSG and does:
| skb->priority=pktinfo->priority;
| skb->tstamp=now()+pktinfo->expire_after;
| (possibly other things that could potentially store data in dccp_skb_cb)
I notice you have discovered another skb field that could be reused :)
The idea looks good, if you decide to go ahead with it, please just make
sure to document it (and that is a reminder for myself too).
|
| This would allow us to:
| 1. Use existing skb fields in a clean way. I mean according to their names
| (priority, tstamp).
| 2. Have a clean userspace interface not affected by internal kernel
| implementation details.
|
| What do you think of such decoupling?
| --
Yes, great. The concept is good, it is some of the details that need
some more work. In (2) you mention distributing the information among
several skb fields. With regard to above comment, I think we need to do
this with some care, to avoid interactions with other subsystems.
At the moment I have no concrete idea of how to implement a timeout-based
policy ... probably this is something along the line of Ian's patches
and what you sketched above.
Also in (2) is the size question. The bottom line is if 2^32-1 different
numbers are sufficient to represent a range of policies, then we can
work with skb->priority for the moment.
My understanding of the above is
* the per-policy data is an opaque bitstring whose only requirement
is that it fits within 32 bits;
* how the bitstring is interpreted depends on the chosen policy;
* not necessary to always use the full 32 bits (also in your example
above).
I have a tidied-up version of the changes so far which will be posted
later, also with comments on your patch.
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 1/1] [DCCP][QPOLICY]: External interface changes
[not found] ` <20080424220704.0483DBC12@poczta.oswiecenia.net>
2008-04-24 22:16 ` [PATCH 1/1] [DCCP][QPOLICY]: External interface changes Tomasz Grobelny
@ 2008-04-28 15:08 ` Gerrit Renker
2008-04-28 21:29 ` Tomasz Grobelny
1 sibling, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-28 15:08 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: acme, dccp, netdev
| Separate internal (in-kernel) storage of per packet data from external
| interface (which is now an easily extensible structure).
|
I can see your point, there is an alternative to realise what you are
aiming to do, suggestions/comments are below.
| --- a/include/linux/dccp.h
| +++ b/include/linux/dccp.h
| @@ -98,6 +98,15 @@ struct dccp_hdr_reset {
| dccph_reset_data[3];
| };
|
| +/**
| + * struct dccp_qpolicy_prio_info - information used by prio queuing policy
| + *
| + * @priority - packet priority (bigger value sent first)
| + */
| +struct dccp_qpolicy_prio_info {
| + __s8 priority;
| +};
| +
| enum dccp_pkt_type {
| DCCP_PKT_REQUEST = 0,
| DCCP_PKT_RESPONSE,
| --- a/net/dccp/proto.c
| +++ b/net/dccp/proto.c
| @@ -706,7 +706,7 @@ int compat_dccp_getsockopt(struct sock *sk, int level, int optname,
| EXPORT_SYMBOL_GPL(compat_dccp_getsockopt);
| #endif
|
| -static int dccp_msghdr_parse(struct msghdr *msg, __u32 **priority)
| +static int dccp_msghdr_parse(struct msghdr *msg, struct cmsghdr **cmsg_qpolicy)
| {
| struct cmsghdr *cmsg = CMSG_FIRSTHDR(msg);
|
| @@ -721,9 +721,7 @@ static int dccp_msghdr_parse(struct msghdr *msg, __u32 **priority)
|
| switch (cmsg->cmsg_type) {
| case DCCP_SCM_PRIORITY:
| - if (cmsg->cmsg_len != CMSG_LEN(sizeof(__u32)))
| - return -EINVAL;
| - *priority = (__u32 *)CMSG_DATA(cmsg);
| + *cmsg_qpolicy = cmsg;
| break;
| default:
| return -EINVAL;
Here the cmsg is exported to any current qpolicy. The disadvantages are that
* there is no longer any length check, the functions using this interface
have in the worst case no idea how much the user wanted to pass, so
this is dangerous (compare also RFC 3542, 20.2);
* the separation-of-concerns becomes unclear: now part of the parsing is done
in dccp_msg_hdr_parse() and another part in some of the qpolicy_xxx_push()
routines, so one routine does part of the work of another;
* each qpolicy_xxx_push() routine needs to have a cmsghdr* argument even if it is
not used.
It is however possible to reach the same goal with just a small modification:
* so far only one SCM message (DCCP_SCM_PRIORITY) has been defined;
* maintainers generally accepted the use of skb->priority while on layer 4;
* hence for all sub-variants of a "prio" policy (strict/partial orderings),
one can use DCCP_SCM_PRIORITY, in combination with skb->priority;
* for different policies, define a different DCCP_SCM_xxx message and
store the cmsg data in a different field (to be decided);
* this has the advantage that type/length checking is all in one place,
so that the qpolicy routines need to do qpolicy only.
| --- a/net/dccp/qpolicy.c
| +++ b/net/dccp/qpolicy.c
| @@ -57,8 +58,13 @@ static struct sk_buff *qpolicy_prio_worst_skb(struct sock *sk)
| return worst;
| }
|
| -static void qpolicy_prio_push(struct sock *sk, struct sk_buff *skb)
| +static void qpolicy_prio_push(struct sock *sk, struct sk_buff *skb,
| + struct cmsghdr *cmsg)
| {
| + struct dccp_qpolicy_prio_info info;
| + memcpy(&info, CMSG_DATA(cmsg), min(cmsg->cmsg_len, sizeof(info)));
That is the problem I see - `info' could end up with garbled data, if the user had
accidentally passed `u32' instead of CMSG_LEN(sizeof priority).
The CMSG_LEN() ensures an additional check - that the user is in fact using
proper encapsulation via CMSG_xxx macros and not via msg_control/msg_controllen
directly. The latter leads to hard-to-debug problems: in my case, I found
this works - by accident - on 32-bit Intel, but will fail on 64-bit computers
in compatibility mode.
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [DCCP] [RFC] [Patchv3 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-28 13:10 ` Gerrit Renker
@ 2008-04-28 15:19 ` Gerrit Renker
2008-04-28 20:12 ` Tomasz Grobelny
2008-04-28 21:03 ` [DCCP] [RFC] [Patchv2 " Tomasz Grobelny
1 sibling, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-28 15:19 UTC (permalink / raw)
To: Tomasz Grobelny, Arnaldo Carvalho de Melo, dccp, netdev
Here is a tidied-up version, the patch sent last week and stored in
git://eden-feed.erg.abdn.ac.uk/dccp_exp (tree `qpolicy')
was not very good, the updates are:
* the skb->priority is now cleared when leaving the DCCP layer;
* there is now documentation/hints about the use of skb->priority in DCCP;
* there is one question I have, and that can be reverted:
- qpolicy_prio_push() has been replaced by a combination of
- qpolicy_prio_full() [which now does the drop-on-queue-full] and
qpolicy_simple_push() [which does the FIFO-queueing]
This means fewer routines, but if you don't like it, I am ok to revert it.
Please do take time with the comments -- another busy week means I won't
have time to respond properly before the end of the week.
Gerrit
---
Documentation/networking/dccp.txt | 19 +++++
include/linux/dccp.h | 19 +++++
net/dccp/Makefile | 2
net/dccp/dccp.h | 12 +++
net/dccp/output.c | 10 +--
net/dccp/proto.c | 67 +++++++++++++++++++-
net/dccp/qpolicy.c | 124 ++++++++++++++++++++++++++++++++++++++
7 files changed, 245 insertions(+), 8 deletions(-)
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -45,6 +45,25 @@ http://linux-net.osdl.org/index.php/DCCP
Socket options
==============
+DCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes
+a policy ID as argument and can only be set before the connection (i.e. changes
+during an established connection are not supported). Currently, two policies are
+defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special,
+and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an
+u32 priority value as ancillary data to sendmsg(), where higher numbers indicate
+a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to
+be formatted using a cmsg(3) message header filled in as follows:
+ cmsg->cmsg_level = SOL_DCCP;
+ cmsg->cmsg_type = DCCP_SCM_PRIORITY;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */
+
+DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero
+value is always interpreted as unbounded queue length. If different from zero,
+the interpretation of this parameter depends on the current dequeuing policy
+(see above): the "simple" policy will enforce a fixed queue size by returning
+EAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the
+lowest-priority packet first. The default value for this parameter is
+initialised from /proc/sys/net/dccp/default/tx_qlen.
DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
service codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -195,6 +195,19 @@ enum dccp_feature_numbers {
DCCPF_MAX_CCID_SPECIFIC = 255,
};
+/* DCCP socket control message types for cmsg */
+enum dccp_cmsg_type {
+ DCCP_SCM_PRIORITY = 1,
+ DCCP_SCM_MAX
+};
+
+/* DCCP priorities for outgoing/queued packets */
+enum dccp_packet_dequeueing_policy {
+ DCCPQ_POLICY_SIMPLE,
+ DCCPQ_POLICY_PRIO,
+ DCCPQ_POLICY_MAX
+};
+
/* DCCP socket options */
#define DCCP_SOCKOPT_PACKET_SIZE 1 /* XXX deprecated, without effect */
#define DCCP_SOCKOPT_SERVICE 2
@@ -208,6 +221,8 @@ enum dccp_feature_numbers {
#define DCCP_SOCKOPT_CCID 13
#define DCCP_SOCKOPT_TX_CCID 14
#define DCCP_SOCKOPT_RX_CCID 15
+#define DCCP_SOCKOPT_QPOLICY_ID 16
+#define DCCP_SOCKOPT_QPOLICY_TXQLEN 17
#define DCCP_SOCKOPT_CCID_RX_INFO 128
#define DCCP_SOCKOPT_CCID_TX_INFO 192
@@ -459,6 +474,8 @@ struct dccp_ackvec;
* @dccps_hc_rx_ccid - CCID used for the receiver (or receiving half-connection)
* @dccps_hc_tx_ccid - CCID used for the sender (or sending half-connection)
* @dccps_options_received - parsed set of retrieved options
+ * @dccps_qpolicy - TX dequeueing policy, one of %dccp_packet_dequeueing_policy
+ * @dccps_tx_qlen - maximum length of the TX queue
* @dccps_role - role of this sock, one of %dccp_role
* @dccps_hc_rx_insert_options - receiver wants to add options when acking
* @dccps_hc_tx_insert_options - sender wants to add options when sending
@@ -501,6 +518,8 @@ struct dccp_sock {
struct ccid *dccps_hc_rx_ccid;
struct ccid *dccps_hc_tx_ccid;
struct dccp_options_received dccps_options_received;
+ __u8 dccps_qpolicy;
+ __u32 dccps_tx_qlen;
enum dccp_role dccps_role:2;
__u8 dccps_hc_rx_insert_options:1;
__u8 dccps_hc_tx_insert_options:1;
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -187,6 +187,7 @@ int dccp_init_sock(struct sock *sk, cons
dp->dccps_rate_last = jiffies;
dp->dccps_role = DCCP_ROLE_UNDEFINED;
dp->dccps_service = DCCP_SERVICE_CODE_IS_ABSENT;
+ dp->dccps_tx_qlen = sysctl_dccp_tx_qlen;
INIT_LIST_HEAD(&dp->dccps_featneg);
/* control socket doesn't need feat nego */
@@ -540,6 +541,20 @@ static int do_dccp_setsockopt(struct soc
case DCCP_SOCKOPT_RECV_CSCOV:
err = dccp_setsockopt_cscov(sk, val, true);
break;
+ case DCCP_SOCKOPT_QPOLICY_ID:
+ if (sk->sk_state != DCCP_CLOSED)
+ err = -EISCONN;
+ else if (val < 0 || val >= DCCPQ_POLICY_MAX)
+ err = -EINVAL;
+ else
+ dp->dccps_qpolicy = val;
+ break;
+ case DCCP_SOCKOPT_QPOLICY_TXQLEN:
+ if (val < 0)
+ err = -EINVAL;
+ else
+ dp->dccps_tx_qlen = val;
+ break;
default:
err = -ENOPROTOOPT;
break;
@@ -643,6 +658,12 @@ static int do_dccp_getsockopt(struct soc
case DCCP_SOCKOPT_RECV_CSCOV:
val = dp->dccps_pcrlen;
break;
+ case DCCP_SOCKOPT_QPOLICY_ID:
+ val = dp->dccps_qpolicy;
+ break;
+ case DCCP_SOCKOPT_QPOLICY_TXQLEN:
+ val = dp->dccps_tx_qlen;
+ break;
case 128 ... 191:
return ccid_hc_rx_getsockopt(dp->dccps_hc_rx_ccid, sk, optname,
len, (u32 __user *)optval, optlen);
@@ -685,6 +706,32 @@ int compat_dccp_getsockopt(struct sock *
EXPORT_SYMBOL_GPL(compat_dccp_getsockopt);
#endif
+static int dccp_msghdr_parse(struct msghdr *msg, __u32 **priority)
+{
+ struct cmsghdr *cmsg = CMSG_FIRSTHDR(msg);
+
+ for (; cmsg != NULL; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+
+ if (!CMSG_OK(msg, cmsg))
+ return -EINVAL;
+
+ /* Only look at DCCP-related socket control messages */
+ if (cmsg->cmsg_level != SOL_DCCP)
+ continue;
+
+ switch (cmsg->cmsg_type) {
+ case DCCP_SCM_PRIORITY:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(__u32)))
+ return -EINVAL;
+ *priority = (__u32 *)CMSG_DATA(cmsg);
+ break;
+ default:
+ return -EINVAL;
+ }
+ }
+ return 0;
+}
+
int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
@@ -693,15 +740,18 @@ int dccp_sendmsg(struct kiocb *iocb, str
const int noblock = flags & MSG_DONTWAIT;
struct sk_buff *skb;
int rc, size;
+ __u32 *pprio = NULL;
long timeo;
if (len > dp->dccps_mss_cache)
return -EMSGSIZE;
+ if (dccp_msghdr_parse(msg, &pprio))
+ return -EINVAL;
+
lock_sock(sk);
- if (sysctl_dccp_tx_qlen &&
- (sk->sk_write_queue.qlen >= sysctl_dccp_tx_qlen)) {
+ if (dccp_qpolicy_full(sk)) {
rc = -EAGAIN;
goto out_release;
}
@@ -729,7 +779,18 @@ int dccp_sendmsg(struct kiocb *iocb, str
if (rc != 0)
goto out_discard;
- skb_queue_tail(&sk->sk_write_queue, skb);
+ /*
+ * Assign the (opaque) qpolicy priority value to skb->priority.
+ *
+ * We are overloading this skb field for use with the qpolicy subystem.
+ * The skb->priority is normally used for the SO_PRIORITY option, which
+ * is initialised from sk_priority. Since the assignment of sk_priority
+ * to skb->priority happens later (on layer 3), we overload this field
+ * for use with queueing priorities as long as the skb is on layer 4.
+ */
+ skb->priority = pprio == NULL ? 0 : *pprio;
+
+ dccp_qpolicy_push(sk, skb);
dccp_write_xmit(sk);
out_release:
release_sock(sk);
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -238,7 +238,7 @@ static void dccp_xmit_packet(struct sock
{
int err, len;
struct dccp_sock *dp = dccp_sk(sk);
- struct sk_buff *skb = skb_dequeue(&sk->sk_write_queue);
+ struct sk_buff *skb = dccp_qpolicy_pop(sk);
if (unlikely(skb == NULL))
return;
@@ -257,6 +257,9 @@ static void dccp_xmit_packet(struct sock
DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_DATA;
}
+ /* Clear priority value used by the qpolicy subsystem */
+ skb->priority = 0;
+
err = dccp_transmit_skb(sk, skb);
if (err)
dccp_pr_debug("transmit_skb() returned err=%d\n", err);
@@ -328,7 +331,7 @@ void dccp_write_xmit(struct sock *sk)
struct dccp_sock *dp = dccp_sk(sk);
struct sk_buff *skb;
- while ((skb = skb_peek(&sk->sk_write_queue))) {
+ while ((skb = dccp_qpolicy_top(sk))) {
int rc = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb);
switch (ccid_packet_dequeue_eval(rc)) {
@@ -342,8 +345,7 @@ void dccp_write_xmit(struct sock *sk)
dccp_xmit_packet(sk);
break;
case CCID_PACKET_ERR:
- skb_dequeue(&sk->sk_write_queue);
- kfree_skb(skb);
+ dccp_qpolicy_drop(sk, skb);
dccp_pr_debug("packet discarded due to err=%d\n", rc);
}
}
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -215,6 +215,18 @@ extern void dccp_reqsk_send_ack(struct s
extern void dccp_send_sync(struct sock *sk, const u64 seq,
const enum dccp_pkt_type pkt_type);
+/*
+ * TX Packet Dequeueing Interface
+ */
+extern void dccp_qpolicy_push(struct sock *sk, struct sk_buff *skb);
+extern bool dccp_qpolicy_full(struct sock *sk);
+extern void dccp_qpolicy_drop(struct sock *sk, struct sk_buff *skb);
+extern struct sk_buff *dccp_qpolicy_top(struct sock *sk);
+extern struct sk_buff *dccp_qpolicy_pop(struct sock *sk);
+
+/*
+ * TX Packet Output and TX Timers
+ */
extern void dccp_write_xmit(struct sock *sk);
extern void dccp_write_space(struct sock *sk);
extern void dccp_flush_write_queue(struct sock *sk, long *time_budget);
--- /dev/null
+++ b/net/dccp/qpolicy.c
@@ -0,0 +1,124 @@
+/*
+ * net/dccp/qpolicy.c
+ *
+ * An implementation of the DCCP protocol
+ *
+ * Copyright (c) 2008 Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License v2
+ * as published by the Free Software Foundation.
+ */
+#include <asm/unaligned.h>
+#include "dccp.h"
+
+/*
+ * Simple Dequeueing Policy:
+ * If tx_qlen is different from 0, enqueue up to tx_qlen elements.
+ */
+static void qpolicy_simple_push(struct sock *sk, struct sk_buff *skb)
+{
+ skb_queue_tail(&sk->sk_write_queue, skb);
+}
+
+static bool qpolicy_simple_full(struct sock *sk)
+{
+ return dccp_sk(sk)->dccps_tx_qlen &&
+ sk->sk_write_queue.qlen >= dccp_sk(sk)->dccps_tx_qlen;
+}
+
+static struct sk_buff *qpolicy_simple_top(struct sock *sk)
+{
+ return skb_peek(&sk->sk_write_queue);
+}
+
+/*
+ * Priority-based Dequeueing Policy:
+ * If tx_qlen is different from 0 and the queue has reached its upper bound
+ * of tx_qlen elements, replace older packets lowest-priority-first.
+ */
+static struct sk_buff *qpolicy_prio_best_skb(struct sock *sk)
+{
+ struct sk_buff *skb, *best = NULL;
+
+ skb_queue_walk(&sk->sk_write_queue, skb)
+ if (best == NULL || skb->priority > best->priority)
+ best = skb;
+ return best;
+}
+
+static struct sk_buff *qpolicy_prio_worst_skb(struct sock *sk)
+{
+ struct sk_buff *skb, *worst = NULL;
+
+ skb_queue_walk(&sk->sk_write_queue, skb)
+ if (worst == NULL || skb->priority < worst->priority)
+ worst = skb;
+ return worst;
+}
+
+static bool qpolicy_prio_full(struct sock *sk)
+{
+ if (qpolicy_simple_full(sk))
+ dccp_qpolicy_drop(sk, qpolicy_prio_worst_skb(sk));
+ return false;
+}
+
+/**
+ * struct dccp_qpolicy_operations - TX Packet Dequeueing Interface
+ * @push: add a new @skb with possibly a struct dccp_packet_info
+ * @full: indicates that no more packets will be admitted
+ * @top: peeks at whatever the queueing policy defines as its top
+ */
+static struct dccp_qpolicy_operations {
+ void (*push) (struct sock *sk, struct sk_buff *skb);
+ bool (*full) (struct sock *sk);
+ struct sk_buff* (*top) (struct sock *sk);
+
+} qpol_table[DCCPQ_POLICY_MAX] = {
+ [DCCPQ_POLICY_SIMPLE] = {
+ .push = qpolicy_simple_push,
+ .full = qpolicy_simple_full,
+ .top = qpolicy_simple_top,
+ },
+ [DCCPQ_POLICY_PRIO] = {
+ .push = qpolicy_simple_push,
+ .full = qpolicy_prio_full,
+ .top = qpolicy_prio_best_skb,
+ },
+};
+
+/*
+ * Externally visible interface
+ */
+void dccp_qpolicy_push(struct sock *sk, struct sk_buff *skb)
+{
+ qpol_table[dccp_sk(sk)->dccps_qpolicy].push(sk, skb);
+}
+
+bool dccp_qpolicy_full(struct sock *sk)
+{
+ return qpol_table[dccp_sk(sk)->dccps_qpolicy].full(sk);
+}
+
+void dccp_qpolicy_drop(struct sock *sk, struct sk_buff *skb)
+{
+ if (skb != NULL) {
+ skb_unlink(skb, &sk->sk_write_queue);
+ kfree_skb(skb);
+ }
+}
+
+struct sk_buff *dccp_qpolicy_top(struct sock *sk)
+{
+ return qpol_table[dccp_sk(sk)->dccps_qpolicy].top(sk);
+}
+
+struct sk_buff *dccp_qpolicy_pop(struct sock *sk)
+{
+ struct sk_buff *skb = dccp_qpolicy_top(sk);
+
+ if (skb)
+ skb_unlink(skb, &sk->sk_write_queue);
+ return skb;
+}
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -1,7 +1,7 @@
obj-$(CONFIG_IP_DCCP) += dccp.o dccp_ipv4.o
dccp-y := ccid.o feat.o input.o minisocks.o options.o \
- output.o proto.o timer.o ackvec.o
+ qpolicy.o output.o proto.o timer.o ackvec.o
dccp_ipv4-y := ipv4.o
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv3 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-28 15:19 ` [DCCP] [RFC] [Patchv3 " Gerrit Renker
@ 2008-04-28 20:12 ` Tomasz Grobelny
0 siblings, 0 replies; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-28 20:12 UTC (permalink / raw)
To: Gerrit Renker; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
Dnia Monday 28 of April 2008, Gerrit Renker napisał:
> Here is a tidied-up version, the patch sent last week and stored in
>
> git://eden-feed.erg.abdn.ac.uk/dccp_exp (tree `qpolicy')
>
> was not very good, the updates are:
>
> * the skb->priority is now cleared when leaving the DCCP layer;
Ok. I only wonder if this kind of cleanup shouldn't happen in qpolicy specific
function. Such as already non-existant qpolicy_prio_pop(). Note that
qpolicy_simple_pop() would not need to clear skb->priority. This kind of
function could act as skb deinitializer. And I think qpolicy_*_pop() should
be added again...
The same question applies to skb initialization. In my opinion it should
happen in qpolicy_*_push(). So that dccp_msghdr_parse code should be moved
there.
This approach would clearly abstract qpolicy subsystem from the rest of dccp
code. We would have packet initialization in qpolicy_*_push() and
deinitialization in qpolicy_*_pop().
I'll try to write a patch that demonstrates it.
> * there is now documentation/hints about the use of skb->priority in DCCP;
There is something I don't understand in your approach regarding
skb->priority. I think I could understand it better it you could sketch how
in your opinion should an interface look like if packet timeout was to be
added.
Would you:
a) pack this data into this u32 identified by DCCP_SCM_PRIORITY. This is what
I thought you meant (but I may be wrong). And something I don't really like.
b) add another parameter type eg. DCCP_SCM_TIMEOUT_MS and use another u32 to
pass this parameter. This is something I like much better,
c) something else. Then what is that?
> * there is one question I have, and that can be reverted:
> - qpolicy_prio_push() has been replaced by a combination of
> - qpolicy_prio_full() [which now does the drop-on-queue-full] and
> qpolicy_simple_push() [which does the FIFO-queueing]
> This means fewer routines, but if you don't like it, I am ok to revert
> it.
>
I don't care that much about internal implementation. But I don't really like
mixing policies. Especially since when prio policy will have more features
these functions will have to be separated again. When making any choices I
always try to think how it will look like when more features are added
(especially packet timeouts and getting statistics).
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-28 13:10 ` Gerrit Renker
2008-04-28 15:19 ` [DCCP] [RFC] [Patchv3 " Gerrit Renker
@ 2008-04-28 21:03 ` Tomasz Grobelny
2008-04-30 7:53 ` Gerrit Renker
1 sibling, 1 reply; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-28 21:03 UTC (permalink / raw)
To: Gerrit Renker; +Cc: Arnaldo Carvalho de Melo, dccp, netdev
Dnia Monday 28 of April 2008, Gerrit Renker napisał:
> | (a bit of reordering)
> |
> | > So we can reduce this to the other question "is 32 bit enough for
> | > priority information".
> |
> | That is an important question. But I'm afraid the answer is no (even
> | though I thought otherwise when writing previous mail). When we want to
> | add packet expiry times we will need a field for timestamp. Which is
> | quite big (64 bits?). But note that it is only needed for in-kernel
> | implemetation. Userspace can write this information on as low as 16 bits.
> | This leads me to conclusions stated below.
>
> I had the same thoughts - it is easy to end up with a wide range of
> possible structs, numbers, etc. that are not related and require 20
> different APIs.
For now I can think of three possible solutions:
a) one structure per policy (of course only if additional data is needed),
b) one structure that could contain parameters from all possible policies
(that would mean that not all fields are used),
c) allow qpolicy to use each cmsg header as different parameter. So that apart
from DCCP_SCM_PRIORITY we could also have DCCP_SCM_TIMEOUT_MS. And this seems
to be a very nice approach. It has several advantages:
- extensible,
- application does not need to pass parameter if it is ok with default value.
It could for example pass only DCCP_SCM_TIMEOUT_MS without DCCP_SCM_PRIORITY,
- it is easy to maintain compatibility.
This series of parameters should be parsed by specific qpolicy that can make
use of them (read: prio). Other policies (read: simple) would not even
attempt to parse the list of parameters.
> But the basic line of thought is
> * the queue size is always limited since afaik the amount of skbs that
> can be queued depends on the socket's write memory (wmem);
Yes.
> * hence a strict priority ordering can be done even with u8 (up to 255)
> in not-too-extreme case
But note that it is not only about ordering but potentially also about when
(and if) packets should be dropped. And that may depend on many factors, not
only ordering, but also current time, current speed, current rtt, etc.
> | So this could look like that:
> | Userspace application fills in a structure:
> | struct dccp_packet_info {
> | s8 priority; //as it is now
> | u16 expire_after; //up to 65s with 1ms accuracy
> | }
> | and passes it to sendmsg(..., struct dccp_packet_info pktinfo), which
> | passes it to qpolicy_push which passes it to qpolicy_prio_push which does
> | the tricky stuff with CMSG and does:
> | skb->priority=pktinfo->priority;
> | skb->tstamp=now()+pktinfo->expire_after;
> | (possibly other things that could potentially store data in dccp_skb_cb)
>
> I notice you have discovered another skb field that could be reused :)
Yes :-)
> | This would allow us to:
> | 1. Use existing skb fields in a clean way. I mean according to their
> | names (priority, tstamp).
> | 2. Have a clean userspace interface not affected by internal kernel
> | implementation details.
> |
> | What do you think of such decoupling?
> | --
>
> Yes, great. The concept is good, it is some of the details that need
> some more work. In (2) you mention distributing the information among
> several skb fields. With regard to above comment, I think we need to do
> this with some care, to avoid interactions with other subsystems.
>
Sure.
> At the moment I have no concrete idea of how to implement a timeout-based
> policy ... probably this is something along the line of Ian's patches
> and what you sketched above.
>
I think that implementing timeouts should not be in separate policy but in
prio policy. Simply one more parameter would have to be passed to this policy
and timed out packets would be dropped in qpolicy_prio_full() or
qpolicy_prio_push() methods. The master question for now is how to pass this
additional parameter. I would be in favour of second struct cmsg* with type
set to DCCP_SCM_TIMEOUT_MS.
> Also in (2) is the size question. The bottom line is if 2^32-1 different
> numbers are sufficient to represent a range of policies, then we can
> work with skb->priority for the moment.
>
> My understanding of the above is
> * the per-policy data is an opaque bitstring whose only requirement
> is that it fits within 32 bits;
> * how the bitstring is interpreted depends on the chosen policy;
You mean data passed between userspace and kernelspace or in-kernel storage?
1. For user to kernel data for me seems to be not expressive enough. Who knows
what these 32-bits are supposed to mean? Ok, we may document that but to me
it is almost like passing void*: you never know what's inside.
2. For in-kernel storage 32-bits is simply not enough for priority+timeout.
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 1/1] [DCCP][QPOLICY]: External interface changes
2008-04-28 15:08 ` Gerrit Renker
@ 2008-04-28 21:29 ` Tomasz Grobelny
0 siblings, 0 replies; 47+ messages in thread
From: Tomasz Grobelny @ 2008-04-28 21:29 UTC (permalink / raw)
To: Gerrit Renker; +Cc: acme, dccp, netdev
Dnia Monday 28 of April 2008, Gerrit Renker napisał:
> | @@ -721,9 +721,7 @@ static int dccp_msghdr_parse(struct msghdr *msg,
> | __u32 **priority)
> |
> | switch (cmsg->cmsg_type) {
> | case DCCP_SCM_PRIORITY:
> | - if (cmsg->cmsg_len != CMSG_LEN(sizeof(__u32)))
> | - return -EINVAL;
> | - *priority = (__u32 *)CMSG_DATA(cmsg);
> | + *cmsg_qpolicy = cmsg;
> | break;
> | default:
> | return -EINVAL;
>
> Here the cmsg is exported to any current qpolicy. The disadvantages
> are that * there is no longer any length check, the functions using this
> interface have in the worst case no idea how much the user wanted to pass,
> so this is dangerous (compare also RFC 3542, 20.2);
The information about size can be retrieved in specific qpolicy. It is not
CMSG_DATA(cmsg) that is passed to qpolicy but cmsg itself along with cmsg_len
field.
> * the separation-of-concerns becomes unclear: now part of the parsing is
> done in dccp_msg_hdr_parse() and another part in some of the
> qpolicy_xxx_push() routines, so one routine does part of the work of
> another;
That is a problem. So I think all the code should be moved to
qpolicy_prio_push().
> * each qpolicy_xxx_push() routine needs to have a cmsghdr* argument even
> if it is not used.
>
Before the change I could say: why call dccp_msghdr_parse() if choosen policy
cannot make use of it's result.
No matter what we will always have either "unnecessary code", "unnecessary
parameter" or something else that is "unnecessary" simply because simple
policy is, well... simple. But it has to be so to keep interfaces clean.
> It is however possible to reach the same goal with just a small
> modification: * so far only one SCM message (DCCP_SCM_PRIORITY) has been
> defined; * maintainers generally accepted the use of skb->priority while on
> layer 4; * hence for all sub-variants of a "prio" policy (strict/partial
> orderings), one can use DCCP_SCM_PRIORITY, in combination with
> skb->priority; * for different policies, define a different DCCP_SCM_xxx
> message and store the cmsg data in a different field (to be decided);
> * this has the advantage that type/length checking is all in one place,
> so that the qpolicy routines need to do qpolicy only.
>
I would use different DCCP_SCM_xxx for different parameters. So that one
qpolicy could use more than one DCCP_SCM_xxx. And parsing of specific
DCCP_SCM_xxx should IMHO happen in qpolicy_*_push().
> | --- a/net/dccp/qpolicy.c
> | +++ b/net/dccp/qpolicy.c
> | @@ -57,8 +58,13 @@ static struct sk_buff *qpolicy_prio_worst_skb(struct
> | sock *sk) return worst;
> | }
> |
> | -static void qpolicy_prio_push(struct sock *sk, struct sk_buff *skb)
> | +static void qpolicy_prio_push(struct sock *sk, struct sk_buff *skb,
> | + struct cmsghdr *cmsg)
> | {
> | + struct dccp_qpolicy_prio_info info;
> | + memcpy(&info, CMSG_DATA(cmsg), min(cmsg->cmsg_len, sizeof(info)));
>
> That is the problem I see - `info' could end up with garbled data, if the
> user had accidentally passed `u32' instead of CMSG_LEN(sizeof priority).
>
Right, but this could easily be fixed by initializing info prior to memcpy.
Then only data provided by user would override defaults. But see below.
> The CMSG_LEN() ensures an additional check - that the user is in fact using
> proper encapsulation via CMSG_xxx macros and not via
> msg_control/msg_controllen directly. The latter leads to hard-to-debug
> problems: in my case, I found this works - by accident - on 32-bit Intel,
> but will fail on 64-bit computers in compatibility mode.
>
Then min(cmsg->cmsg_len, sizeof(info)) could be changed to min(CMSG_LEN(cmsg),
sizeof(info)), I see no problem with that.
That said I think that multiple DCCP_SCM_xxx per policy might be a better
approach.
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-28 21:03 ` [DCCP] [RFC] [Patchv2 " Tomasz Grobelny
@ 2008-04-30 7:53 ` Gerrit Renker
2008-05-02 20:39 ` Tomasz Grobelny
0 siblings, 1 reply; 47+ messages in thread
From: Gerrit Renker @ 2008-04-30 7:53 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: dccp, netdev
| > | > So we can reduce this to the other question "is 32 bit enough for
| > | > priority information".
| > |
| > | That is an important question. But I'm afraid the answer is no (even
| > | though I thought otherwise when writing previous mail). When we want to
| > | add packet expiry times we will need a field for timestamp. Which is
| > | quite big (64 bits?). But note that it is only needed for in-kernel
| > | implemetation. Userspace can write this information on as low as 16 bits.
| > | This leads me to conclusions stated below.
| >
| > I had the same thoughts - it is easy to end up with a wide range of
| > possible structs, numbers, etc. that are not related and require 20
| > different APIs.
| For now I can think of three possible solutions:
| a) one structure per policy (of course only if additional data is needed),
| b) one structure that could contain parameters from all possible policies
| (that would mean that not all fields are used),
| c) allow qpolicy to use each cmsg header as different parameter. So that apart
| from DCCP_SCM_PRIORITY we could also have DCCP_SCM_TIMEOUT_MS. And this seems
| to be a very nice approach. It has several advantages:
| - extensible,
| - application does not need to pass parameter if it is ok with default value.
| It could for example pass only DCCP_SCM_TIMEOUT_MS without DCCP_SCM_PRIORITY,
| - it is easy to maintain compatibility.
| This series of parameters should be parsed by specific qpolicy that can make
| use of them (read: prio). Other policies (read: simple) would not even
| attempt to parse the list of parameters.
|
I think (c) is best, here is what I'd support.
1. Using different DCCP_SCM_xxx is definitively okay:
the cmsg_type is `int', so there is more than enough room.
2. I want to make the relationship between DCCP_SCM_xxx and the type passed
as parameter explicit, i.e.
- each DCCP_SCM_xxx has exactly one parameter type associated with it
(e.g. DCCP_SCM_PRIORITY => uint32_t);
- it is okay to associate the same parameter type with several
different DCCP_SCM_xxx,but not ok to allow one DCCP_SCM_xxx to be
associated with more than one parameter type or parameter length;
- so that there is no confusion about the type of the parameters.
3. Parsing and interpreting parameters to be kept separate, i.e. the qpolicy
routines read skb fields which have either been set from cmsg ancillary
data to sendmsg() or are otherwise initialised to default values.
| > Also in (2) is the size question. The bottom line is if 2^32-1 different
| > numbers are sufficient to represent a range of policies, then we can
| > work with skb->priority for the moment.
| >
| > My understanding of the above is
| > * the per-policy data is an opaque bitstring whose only requirement
| > is that it fits within 32 bits;
| > * how the bitstring is interpreted depends on the chosen policy;
| You mean data passed between userspace and kernelspace or in-kernel storage?
| 1. For user to kernel data for me seems to be not expressive enough. Who knows
| what these 32-bits are supposed to mean? Ok, we may document that but to me
| it is almost like passing void*: you never know what's inside.
To me this means that you agree with (2) above.
With (2) in place, the question does not arise.
| 2. For in-kernel storage 32-bits is simply not enough for priority+timeout.
Ok, but this is a separate question and it is about a yet non-existing policy.
So far there is a "priority" policy, for which I think we agree that 32
bits are enough.
Then there is a yet non-existing "timeout" policy, which has no associated
field yet. If we can assume that e.g. the skb->tstamp field can be used
to store timeouts, then there are separate fields associated with separate
policies.
And both associated parameters can be parsed differently, in particular
there is no requirement to restrict DCCP_SCM_TIMEOUT to use 32 bits - it
could even pass a struct timeval or struct timespec.
Thirdly, there is the aggregate policy "priority+timeout", which can
then use both skb->priority and skb->tstamp.
I.e. to answer the question, I think it is best to implement "timeout" first,
solve the problems it brings up; when that is done, "priority+timeout" will
be easy to do - it could be constructed just out of the existing functions
defined for "priority" and "timeout".
In that manner, other policies can be modularly constructed - for instance
by combining "timeout" with a different form of the "priority" policy.
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-04-30 7:53 ` Gerrit Renker
@ 2008-05-02 20:39 ` Tomasz Grobelny
2008-05-02 20:56 ` Gerrit Renker
0 siblings, 1 reply; 47+ messages in thread
From: Tomasz Grobelny @ 2008-05-02 20:39 UTC (permalink / raw)
To: Gerrit Renker, dccp, netdev
Dnia Wednesday 30 of April 2008, Gerrit Renker napisał:
> | c) allow qpolicy to use each cmsg header as different parameter. So that
> (...)
> I think (c) is best, here is what I'd support.
> (...)
Seems we are getting closer in our views, see the just sent patch.
> Ok, but this is a separate question and it is about a yet non-existing
> policy.
>
Hopefully someone, someday will implement it. So it is good to have a basic
idea how it might work.
> So far there is a "priority" policy, for which I think we agree that 32
> bits are enough.
>
For storing just priority - yes. I'd even say that it is too much...
> And both associated parameters can be parsed differently, in particular
> there is no requirement to restrict DCCP_SCM_TIMEOUT to use 32 bits - it
> could even pass a struct timeval or struct timespec.
>
Yes, that's nice.
> Then there is a yet non-existing "timeout" policy, which has no associated
> field yet. If we can assume that e.g. the skb->tstamp field can be used
> to store timeouts, then there are separate fields associated with separate
> policies.
>
> Thirdly, there is the aggregate policy "priority+timeout", which can
> then use both skb->priority and skb->tstamp.
>
> I.e. to answer the question, I think it is best to implement "timeout"
> first, solve the problems it brings up; when that is done,
> "priority+timeout" will be easy to do - it could be constructed just out of
> the existing functions defined for "priority" and "timeout".
>
> In that manner, other policies can be modularly constructed - for instance
> by combining "timeout" with a different form of the "priority" policy.
>
I'm not entirely sure if such modular constructions would be possible. I
prefer to think of "timeout policy" and "prio policy" as a special cases
of "timeout+prio policy" with respectively DCCP_SCM_PRIORITY and
DCCP_SCM_TIMEOUT not supplied (and thus set to their default values: 0 and
INFINITY).
--
Regards,
Tomasz Grobelny
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set
2008-05-02 20:39 ` Tomasz Grobelny
@ 2008-05-02 20:56 ` Gerrit Renker
0 siblings, 0 replies; 47+ messages in thread
From: Gerrit Renker @ 2008-05-02 20:56 UTC (permalink / raw)
To: Tomasz Grobelny; +Cc: dccp, netdev
| > | c) allow qpolicy to use each cmsg header as different parameter. So that
| > (...)
| > I think (c) is best, here is what I'd support.
| > (...)
| Seems we are getting closer in our views, see the just sent patch.
|
That would be good - what I would like to do is to replace the 12 or so
patches in git://eden-feed.erg.abdn.ac.uk/dccp_exp [qpolicy subtree]
with a single patch and then put it into the test tree.
Will answer regarding the other patch separately and then resubmit the
combined patch to the list - if you are ok with it, you can add your
signed-off or point out where you disagree. It remains your patch.
What I'd still like to do is some testing on different architectures.
| > I.e. to answer the question, I think it is best to implement "timeout"
| > first, solve the problems it brings up; when that is done,
| > "priority+timeout" will be easy to do - it could be constructed just out of
| > the existing functions defined for "priority" and "timeout".
| >
| > In that manner, other policies can be modularly constructed - for instance
| > by combining "timeout" with a different form of the "priority" policy.
| >
| I'm not entirely sure if such modular constructions would be possible. I
| prefer to think of "timeout policy" and "prio policy" as a special cases
| of "timeout+prio policy" with respectively DCCP_SCM_PRIORITY and
| DCCP_SCM_TIMEOUT not supplied (and thus set to their default values: 0 and
| INFINITY).
| --
It depends on the way one looks at it. Your view is top-down, mine is
bottom-up, both can work. Agree with the parameters.
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2008-05-02 20:56 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-11 10:24 [PATCH 0/5] [DCCP]: Queuing policies Tomasz Grobelny
2008-04-14 6:50 ` Gerrit Renker
2008-04-14 7:39 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Gerrit Renker
2008-04-14 23:45 ` Tomasz Grobelny
2008-04-15 15:14 ` Gerrit Renker
2008-04-15 15:21 ` Gerrit Renker
2008-04-15 18:01 ` Tomasz Grobelny
2008-04-16 6:20 ` Gerrit Renker
2008-04-16 8:36 ` [DCCP] [RFC] [Patchv3 " Gerrit Renker
2008-04-17 20:03 ` [DCCP] [RFC] [Patchv2 " Tomasz Grobelny
2008-04-18 10:13 ` Gerrit Renker
2008-04-19 20:42 ` Tomasz Grobelny
2008-04-20 16:57 ` Arnaldo Carvalho de Melo
2008-04-20 20:12 ` Tomasz Grobelny
2008-04-21 11:45 ` Patrick McHardy
2008-04-21 13:12 ` Arnaldo Carvalho de Melo
2008-04-21 16:17 ` Tomasz Grobelny
2008-04-22 4:56 ` Patrick McHardy
2008-04-22 20:45 ` Tomasz Grobelny
2008-04-22 22:06 ` David Miller
2008-04-23 0:03 ` Patrick McHardy
2008-04-22 17:41 ` Gerrit Renker
2008-04-22 22:42 ` David Miller
2008-04-25 19:33 ` Tomasz Grobelny
2008-04-25 20:40 ` Arnaldo Carvalho de Melo
2008-04-25 20:58 ` David Miller
2008-04-28 7:21 ` Gerrit Renker
2008-04-28 7:39 ` David Miller
2008-04-22 17:30 ` Gerrit Renker
2008-04-22 20:30 ` Tomasz Grobelny
2008-04-28 13:10 ` Gerrit Renker
2008-04-28 15:19 ` [DCCP] [RFC] [Patchv3 " Gerrit Renker
2008-04-28 20:12 ` Tomasz Grobelny
2008-04-28 21:03 ` [DCCP] [RFC] [Patchv2 " Tomasz Grobelny
2008-04-30 7:53 ` Gerrit Renker
2008-05-02 20:39 ` Tomasz Grobelny
2008-05-02 20:56 ` Gerrit Renker
[not found] ` <20080424220704.0483DBC12@poczta.oswiecenia.net>
2008-04-24 22:16 ` [PATCH 1/1] [DCCP][QPOLICY]: External interface changes Tomasz Grobelny
2008-04-28 15:08 ` Gerrit Renker
2008-04-28 21:29 ` Tomasz Grobelny
2008-04-15 19:38 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Tomasz Grobelny
2008-04-15 20:04 ` Arnaldo Carvalho de Melo
2008-04-17 20:20 ` Tomasz Grobelny
2008-04-15 20:14 ` inconsistent lock state with kernel 2.6.24.4 Bernard Pidoux
2008-04-16 7:43 ` [DCCP] [RFC] [Patchv2 1/1]: Queuing policies -- reworked version of Tomasz's patch set Gerrit Renker
2008-04-17 18:03 ` Tomasz Grobelny
2008-04-17 18:29 ` Gerrit Renker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).