* [PATCH v4] sctp: be more restrictive in transport selection on bundled sacks
From: Neil Horman @ 2012-06-29 19:24 UTC (permalink / raw)
To: netdev; +Cc: Neil Horman, Vlad Yaseivch, David S. Miller, linux-sctp
In-Reply-To: <1340742704-2192-1-git-send-email-nhorman@tuxdriver.com>
It was noticed recently that when we send data on a transport, its possible that
we might bundle a sack that arrived on a different transport. While this isn't
a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
2960:
An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
etc.) to the same destination transport address from which it
received the DATA or control chunk to which it is replying. This
rule should also be followed if the endpoint is bundling DATA chunks
together with the reply chunk.
This patch seeks to correct that. It restricts the bundling of sack operations
to only those transports which have moved the ctsn of the association forward
since the last sack. By doing this we guarantee that we only bundle outbound
saks on a transport that has received a chunk since the last sack. This brings
us into stricter compliance with the RFC.
Vlad had initially suggested that we strictly allow only sack bundling on the
transport that last moved the ctsn forward. While this makes sense, I was
concerned that doing so prevented us from bundling in the case where we had
received chunks that moved the ctsn on multiple transports. In those cases, the
RFC allows us to select any of the transports having received chunks to bundle
the sack on. so I've modified the approach to allow for that, by adding a state
variable to each transport that tracks weather it has moved the ctsn since the
last sack. This I think keeps our behavior (and performance), close enough to
our current profile that I think we can do this without a sysctl knob to
enable/disable it.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yaseivch <vyasevich@gmail.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Reported-by: Michele Baldessari <michele@redhat.com>
Reported-by: sorin serban <sserban@redhat.com>
---
Change Notes:
V2)
* Removed unused variable as per Dave M. Request
* Delayed rwnd adjustment until we are sure we will sack (Vlad Y.)
V3)
* Switched test to use pkt->transport rather than chunk->transport
* Modified detection of sacka-able transport. Instead of just setting
and clearning a flag, we now mark each transport and association with
a sack generation tag. We increment the associations generation on
every sack, and assign that generation tag to every transport that
updates the ctsn. This prevents us from having to iterate over a for
loop on every sack, which is much more scalable.
V4)
* Fixed up wrapping comment and logic
---
include/net/sctp/structs.h | 4 ++++
include/net/sctp/tsnmap.h | 3 ++-
net/sctp/associola.c | 1 +
net/sctp/output.c | 9 +++++++--
net/sctp/sm_make_chunk.c | 18 ++++++++++++++++++
net/sctp/sm_sideeffect.c | 2 +-
net/sctp/transport.c | 2 ++
net/sctp/tsnmap.c | 6 +++++-
net/sctp/ulpevent.c | 3 ++-
net/sctp/ulpqueue.c | 2 +-
10 files changed, 43 insertions(+), 7 deletions(-)
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index e4652fe..fecdf31 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -912,6 +912,9 @@ struct sctp_transport {
/* Is this structure kfree()able? */
malloced:1;
+ /* Has this transport moved the ctsn since we last sacked */
+ __u32 sack_generation;
+
struct flowi fl;
/* This is the peer's IP address and port. */
@@ -1584,6 +1587,7 @@ struct sctp_association {
*/
__u8 sack_needed; /* Do we need to sack the peer? */
__u32 sack_cnt;
+ __u32 sack_generation;
/* These are capabilities which our peer advertised. */
__u8 ecn_capable:1, /* Can peer do ECN? */
diff --git a/include/net/sctp/tsnmap.h b/include/net/sctp/tsnmap.h
index e7728bc..2c5d2b4 100644
--- a/include/net/sctp/tsnmap.h
+++ b/include/net/sctp/tsnmap.h
@@ -117,7 +117,8 @@ void sctp_tsnmap_free(struct sctp_tsnmap *map);
int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn);
/* Mark this TSN as seen. */
-int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn);
+int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn,
+ struct sctp_transport *trans);
/* Mark this TSN and all lower as seen. */
void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 5bc9ab1..b16517e 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -271,6 +271,7 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
*/
asoc->peer.sack_needed = 1;
asoc->peer.sack_cnt = 0;
+ asoc->peer.sack_generation = 1;
/* Assume that the peer will tell us if he recognizes ASCONF
* as part of INIT exchange.
diff --git a/net/sctp/output.c b/net/sctp/output.c
index f1b7d4b..0de6cd5 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -240,14 +240,19 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
*/
if (sctp_chunk_is_data(chunk) && !pkt->has_sack &&
!pkt->has_cookie_echo) {
- struct sctp_association *asoc;
struct timer_list *timer;
- asoc = pkt->transport->asoc;
+ struct sctp_association *asoc = pkt->transport->asoc;
+
timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
/* If the SACK timer is running, we have a pending SACK */
if (timer_pending(timer)) {
struct sctp_chunk *sack;
+
+ if (pkt->transport->sack_generation !=
+ pkt->transport->asoc->peer.sack_generation)
+ return retval;
+
asoc->a_rwnd = asoc->rwnd;
sack = sctp_make_sack(asoc);
if (sack) {
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index a85eeeb..ae587d4 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -736,6 +736,7 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
__u16 num_gabs, num_dup_tsns;
struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
+ struct sctp_transport *trans;
memset(gabs, 0, sizeof(gabs));
ctsn = sctp_tsnmap_get_ctsn(map);
@@ -805,6 +806,23 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
sctp_tsnmap_get_dups(map));
+ /*
+ * Once we have a sack generated:
+ * 1) Check to see what our sack generation is, if its UINT_MAX, reset
+ * the transports to 1, and wrap the association generation back to
+ * zero
+ * 2) Increment out sack_generation
+ * The idea is that zero is never used as a valid generation for the
+ * association so no transport will match after a wrap event like this,
+ * Until the next sack
+ */
+ if (asoc->peer.sack_generation == UINT_MAX) {
+ list_for_each_entry(trans, &asoc->peer.transport_addr_list,
+ transports)
+ trans->sack_generation = 0;
+ ((struct sctp_association *)asoc)->peer.sack_generation = 0;
+ }
+ ((struct sctp_association *)asoc)->peer.sack_generation++;
nodata:
return retval;
}
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index c96d1a8..8716da1 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1268,7 +1268,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
case SCTP_CMD_REPORT_TSN:
/* Record the arrival of a TSN. */
error = sctp_tsnmap_mark(&asoc->peer.tsn_map,
- cmd->obj.u32);
+ cmd->obj.u32, NULL);
break;
case SCTP_CMD_REPORT_FWDTSN:
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index b026ba0..1dcceb6 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -68,6 +68,8 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
peer->af_specific = sctp_get_af_specific(addr->sa.sa_family);
memset(&peer->saddr, 0, sizeof(union sctp_addr));
+ peer->sack_generation = 0;
+
/* From 6.3.1 RTO Calculation:
*
* C1) Until an RTT measurement has been made for a packet sent to the
diff --git a/net/sctp/tsnmap.c b/net/sctp/tsnmap.c
index f1e40ceb..b5fb7c4 100644
--- a/net/sctp/tsnmap.c
+++ b/net/sctp/tsnmap.c
@@ -114,7 +114,8 @@ int sctp_tsnmap_check(const struct sctp_tsnmap *map, __u32 tsn)
/* Mark this TSN as seen. */
-int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
+int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn,
+ struct sctp_transport *trans)
{
u16 gap;
@@ -133,6 +134,9 @@ int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
*/
map->max_tsn_seen++;
map->cumulative_tsn_ack_point++;
+ if (trans)
+ trans->sack_generation =
+ trans->asoc->peer.sack_generation;
map->base_tsn++;
} else {
/* Either we already have a gap, or about to record a gap, so
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8a84017..33d8947 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -715,7 +715,8 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
* can mark it as received so the tsn_map is updated correctly.
*/
if (sctp_tsnmap_mark(&asoc->peer.tsn_map,
- ntohl(chunk->subh.data_hdr->tsn)))
+ ntohl(chunk->subh.data_hdr->tsn),
+ chunk->transport))
goto fail_mark;
/* First calculate the padding, so we don't inadvertently
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index f2d1de7..f5a6a4f 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -1051,7 +1051,7 @@ void sctp_ulpq_renege(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk,
if (chunk && (freed >= needed)) {
__u32 tsn;
tsn = ntohl(chunk->subh.data_hdr->tsn);
- sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
+ sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn, chunk->transport);
sctp_ulpq_tail_data(ulpq, chunk, gfp);
sctp_ulpq_partial_delivery(ulpq, chunk, gfp);
--
1.7.7.6
^ permalink raw reply related
* [PATCH net-next] cnic: Fix mmap regression.
From: Michael Chan @ 2012-06-29 19:32 UTC (permalink / raw)
To: davem; +Cc: netdev
commit 1f85d58cdf15354a7120fc9ccc9bb9c45b53af88
cnic: Remove uio mem[0].
introduced a regression as older versions of userspace app still rely
on this mmap. Restore the mmap functionality and get the base address
from pci_resource_start() as the nedev->base_addr has been deprecated for
PCI devices.
Update version to 2.5.12.
Signed-off-by: Michael Chan <mchan@broadocm.com>
---
drivers/net/ethernet/broadcom/cnic.c | 8 +++++++-
drivers/net/ethernet/broadcom/cnic_if.h | 4 ++--
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index f897306..22ad7b6 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1063,9 +1063,13 @@ static int cnic_init_uio(struct cnic_dev *dev)
uinfo = &udev->cnic_uinfo;
- uinfo->mem[0].memtype = UIO_MEM_NONE;
+ uinfo->mem[0].addr = pci_resource_start(dev->pcidev, 0);
+ uinfo->mem[0].internal_addr = dev->regview;
+ uinfo->mem[0].memtype = UIO_MEM_PHYS;
if (test_bit(CNIC_F_BNX2_CLASS, &dev->flags)) {
+ uinfo->mem[0].size = MB_GET_CID_ADDR(TX_TSS_CID +
+ TX_MAX_TSS_RINGS + 1);
uinfo->mem[1].addr = (unsigned long) cp->status_blk.gen &
PAGE_MASK;
if (cp->ethdev->drv_state & CNIC_DRV_STATE_USING_MSIX)
@@ -1075,6 +1079,8 @@ static int cnic_init_uio(struct cnic_dev *dev)
uinfo->name = "bnx2_cnic";
} else if (test_bit(CNIC_F_BNX2X_CLASS, &dev->flags)) {
+ uinfo->mem[0].size = pci_resource_len(dev->pcidev, 0);
+
uinfo->mem[1].addr = (unsigned long) cp->bnx2x_def_status_blk &
PAGE_MASK;
uinfo->mem[1].size = sizeof(*cp->bnx2x_def_status_blk);
diff --git a/drivers/net/ethernet/broadcom/cnic_if.h b/drivers/net/ethernet/broadcom/cnic_if.h
index 54f68f0..5cb8888 100644
--- a/drivers/net/ethernet/broadcom/cnic_if.h
+++ b/drivers/net/ethernet/broadcom/cnic_if.h
@@ -14,8 +14,8 @@
#include "bnx2x/bnx2x_mfw_req.h"
-#define CNIC_MODULE_VERSION "2.5.11"
-#define CNIC_MODULE_RELDATE "June 27, 2012"
+#define CNIC_MODULE_VERSION "2.5.12"
+#define CNIC_MODULE_RELDATE "June 29, 2012"
#define CNIC_ULP_RDMA 0
#define CNIC_ULP_ISCSI 1
--
1.7.1
^ permalink raw reply related
* Re: [PATCH v3] sctp: be more restrictive in transport selection on bundled sacks
From: Neil Horman @ 2012-06-29 19:21 UTC (permalink / raw)
To: Vlad Yasevich; +Cc: netdev, David S. Miller
In-Reply-To: <4FEDFEC0.2060405@gmail.com>
On Fri, Jun 29, 2012 at 03:15:12PM -0400, Vlad Yasevich wrote:
> On 06/29/2012 02:43 PM, Neil Horman wrote:
> >On Fri, Jun 29, 2012 at 02:29:52PM -0400, Vlad Yasevich wrote:
> >>On 06/29/2012 12:34 PM, Neil Horman wrote:
> >>>It was noticed recently that when we send data on a transport, its possible that
> >>>we might bundle a sack that arrived on a different transport. While this isn't
> >>>a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
> >>>2960:
> >>>
> >>> An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
> >>> etc.) to the same destination transport address from which it
> >>> received the DATA or control chunk to which it is replying. This
> >>> rule should also be followed if the endpoint is bundling DATA chunks
> >>> together with the reply chunk.
> >>>
> >>>This patch seeks to correct that. It restricts the bundling of sack operations
> >>>to only those transports which have moved the ctsn of the association forward
> >>>since the last sack. By doing this we guarantee that we only bundle outbound
> >>>saks on a transport that has received a chunk since the last sack. This brings
> >>>us into stricter compliance with the RFC.
> >>>
> >>>Vlad had initially suggested that we strictly allow only sack bundling on the
> >>>transport that last moved the ctsn forward. While this makes sense, I was
> >>>concerned that doing so prevented us from bundling in the case where we had
> >>>received chunks that moved the ctsn on multiple transports. In those cases, the
> >>>RFC allows us to select any of the transports having received chunks to bundle
> >>>the sack on. so I've modified the approach to allow for that, by adding a state
> >>>variable to each transport that tracks weather it has moved the ctsn since the
> >>>last sack. This I think keeps our behavior (and performance), close enough to
> >>>our current profile that I think we can do this without a sysctl knob to
> >>>enable/disable it.
> >>>
> >>>Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
> >>>CC: Vlad Yaseivch<vyasevich@gmail.com>
> >>>CC: David S. Miller<davem@davemloft.net>
> >>>Reported-by: Michele Baldessari<michele@redhat.com>
> >>>Reported-by: sorin serban<sserban@redhat.com>
> >>>
> >>>---
> >>>Change Notes:
> >>>V2)
> >>> * Removed unused variable as per Dave M. Request
> >>> * Delayed rwnd adjustment until we are sure we will sack (Vlad Y.)
> >>>V3)
> >>> * Switched test to use pkt->transport rather than chunk->transport
> >>> * Modified detection of sacka-able transport. Instead of just setting
> >>> and clearning a flag, we now mark each transport and association with
> >>> a sack generation tag. We increment the associations generation on
> >>> every sack, and assign that generation tag to every transport that
> >>> updates the ctsn. This prevents us from having to iterate over a for
> >>> loop on every sack, which is much more scalable.
> >>>---
> >>> include/net/sctp/structs.h | 4 ++++
> >>> include/net/sctp/tsnmap.h | 3 ++-
> >>> net/sctp/associola.c | 1 +
> >>> net/sctp/output.c | 9 +++++++--
> >>> net/sctp/sm_make_chunk.c | 10 ++++++++++
> >>> net/sctp/sm_sideeffect.c | 2 +-
> >>> net/sctp/transport.c | 2 ++
> >>> net/sctp/tsnmap.c | 6 +++++-
> >>> net/sctp/ulpevent.c | 3 ++-
> >>> net/sctp/ulpqueue.c | 2 +-
> >>> 10 files changed, 35 insertions(+), 7 deletions(-)
> >>>
> >>>diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> >>>index e4652fe..fecdf31 100644
> >>>--- a/include/net/sctp/structs.h
> >>>+++ b/include/net/sctp/structs.h
> >>>@@ -912,6 +912,9 @@ struct sctp_transport {
> >>> /* Is this structure kfree()able? */
> >>> malloced:1;
> >>>
> >>>+ /* Has this transport moved the ctsn since we last sacked */
> >>>+ __u32 sack_generation;
> >>>+
> >>> struct flowi fl;
> >>>
> >>> /* This is the peer's IP address and port. */
> >>>@@ -1584,6 +1587,7 @@ struct sctp_association {
> >>> */
> >>> __u8 sack_needed; /* Do we need to sack the peer? */
> >>> __u32 sack_cnt;
> >>>+ __u32 sack_generation;
> >>>
> >>> /* These are capabilities which our peer advertised. */
> >>> __u8 ecn_capable:1, /* Can peer do ECN? */
> >>>diff --git a/include/net/sctp/tsnmap.h b/include/net/sctp/tsnmap.h
> >>>index e7728bc..2c5d2b4 100644
> >>>--- a/include/net/sctp/tsnmap.h
> >>>+++ b/include/net/sctp/tsnmap.h
> >>>@@ -117,7 +117,8 @@ void sctp_tsnmap_free(struct sctp_tsnmap *map);
> >>> int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn);
> >>>
> >>> /* Mark this TSN as seen. */
> >>>-int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn);
> >>>+int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn,
> >>>+ struct sctp_transport *trans);
> >>>
> >>> /* Mark this TSN and all lower as seen. */
> >>> void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
> >>>diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> >>>index 5bc9ab1..6c66adb 100644
> >>>--- a/net/sctp/associola.c
> >>>+++ b/net/sctp/associola.c
> >>>@@ -271,6 +271,7 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
> >>> */
> >>> asoc->peer.sack_needed = 1;
> >>> asoc->peer.sack_cnt = 0;
> >>>+ asoc->peer.sack_generation=0;
> >>>
> >>> /* Assume that the peer will tell us if he recognizes ASCONF
> >>> * as part of INIT exchange.
> >>>diff --git a/net/sctp/output.c b/net/sctp/output.c
> >>>index f1b7d4b..0de6cd5 100644
> >>>--- a/net/sctp/output.c
> >>>+++ b/net/sctp/output.c
> >>>@@ -240,14 +240,19 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
> >>> */
> >>> if (sctp_chunk_is_data(chunk)&& !pkt->has_sack&&
> >>> !pkt->has_cookie_echo) {
> >>>- struct sctp_association *asoc;
> >>> struct timer_list *timer;
> >>>- asoc = pkt->transport->asoc;
> >>>+ struct sctp_association *asoc = pkt->transport->asoc;
> >>>+
> >>> timer =&asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
> >>>
> >>> /* If the SACK timer is running, we have a pending SACK */
> >>> if (timer_pending(timer)) {
> >>> struct sctp_chunk *sack;
> >>>+
> >>>+ if (pkt->transport->sack_generation !=
> >>>+ pkt->transport->asoc->peer.sack_generation)
> >>>+ return retval;
> >>>+
> >>> asoc->a_rwnd = asoc->rwnd;
> >>> sack = sctp_make_sack(asoc);
> >>> if (sack) {
> >>>diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> >>>index a85eeeb..ffa2a8e 100644
> >>>--- a/net/sctp/sm_make_chunk.c
> >>>+++ b/net/sctp/sm_make_chunk.c
> >>>@@ -736,6 +736,7 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
> >>> __u16 num_gabs, num_dup_tsns;
> >>> struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
> >>> struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
> >>>+ struct sctp_transport *trans;
> >>>
> >>> memset(gabs, 0, sizeof(gabs));
> >>> ctsn = sctp_tsnmap_get_ctsn(map);
> >>>@@ -805,6 +806,15 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
> >>> sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
> >>> sctp_tsnmap_get_dups(map));
> >>>
> >>>+ /*
> >>>+ * Once we have a sack generated, clear the moved_tsn information
> >>>+ * from all the transports
> >>>+ */
> >>>+ if (!asoc->peer.sack_generation)
> >>>+ list_for_each_entry(trans,&asoc->peer.transport_addr_list,
> >>>+ transports)
> >>>+ trans->sack_generation = UINT_MAX;
> >>>+ ((struct sctp_association *)asoc)->peer.sack_generation++;
> >>
> >>Two points here:
> >>1) The commend no longer matches the code
> >Crud, missed that, I'll fix it.
> >
> >>2) Why special case the peer.sack_generations == 0 and set the
> >>transport to UNIT_MAX?
> >>
> >To avoid wrapping problems leading to erroneous bundling errors. Consider a
> >long lived connection with two trasports (A and B).
> >
> >If all traffic is sent on A for a long time (generating UINT_MAX sacks), and the
> >peer chooses that moment to send data on transport B, its possible that we will
> >bundle a sack with that data chunk erroneously, because the associations
> >sack_generation has wrapped, and now matches with the transports, even though we
> >never received data on transport B. The special casing ensures that we never
> >hit that problem.
> >
>
> But you just move this condition to the UINT_MAX value instead. If
> we use the alternate transport at the time that sack_generation ==
> UINT_MAX, we may pick the wrong transport.
>
Yes, i noticed that as I was fixing the comment, thank you!
> You may want to consider value 0 reserved as UNUSED and make
> peer.sack_generation start at 1 and wrap to 1.
>
Thats exactly what I just did :)
Neil
> -vlad
>
^ permalink raw reply
* Re: [PATCH v3] sctp: be more restrictive in transport selection on bundled sacks
From: Vlad Yasevich @ 2012-06-29 19:15 UTC (permalink / raw)
To: Neil Horman; +Cc: netdev, David S. Miller
In-Reply-To: <20120629184310.GA24604@hmsreliant.think-freely.org>
On 06/29/2012 02:43 PM, Neil Horman wrote:
> On Fri, Jun 29, 2012 at 02:29:52PM -0400, Vlad Yasevich wrote:
>> On 06/29/2012 12:34 PM, Neil Horman wrote:
>>> It was noticed recently that when we send data on a transport, its possible that
>>> we might bundle a sack that arrived on a different transport. While this isn't
>>> a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
>>> 2960:
>>>
>>> An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
>>> etc.) to the same destination transport address from which it
>>> received the DATA or control chunk to which it is replying. This
>>> rule should also be followed if the endpoint is bundling DATA chunks
>>> together with the reply chunk.
>>>
>>> This patch seeks to correct that. It restricts the bundling of sack operations
>>> to only those transports which have moved the ctsn of the association forward
>>> since the last sack. By doing this we guarantee that we only bundle outbound
>>> saks on a transport that has received a chunk since the last sack. This brings
>>> us into stricter compliance with the RFC.
>>>
>>> Vlad had initially suggested that we strictly allow only sack bundling on the
>>> transport that last moved the ctsn forward. While this makes sense, I was
>>> concerned that doing so prevented us from bundling in the case where we had
>>> received chunks that moved the ctsn on multiple transports. In those cases, the
>>> RFC allows us to select any of the transports having received chunks to bundle
>>> the sack on. so I've modified the approach to allow for that, by adding a state
>>> variable to each transport that tracks weather it has moved the ctsn since the
>>> last sack. This I think keeps our behavior (and performance), close enough to
>>> our current profile that I think we can do this without a sysctl knob to
>>> enable/disable it.
>>>
>>> Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
>>> CC: Vlad Yaseivch<vyasevich@gmail.com>
>>> CC: David S. Miller<davem@davemloft.net>
>>> Reported-by: Michele Baldessari<michele@redhat.com>
>>> Reported-by: sorin serban<sserban@redhat.com>
>>>
>>> ---
>>> Change Notes:
>>> V2)
>>> * Removed unused variable as per Dave M. Request
>>> * Delayed rwnd adjustment until we are sure we will sack (Vlad Y.)
>>> V3)
>>> * Switched test to use pkt->transport rather than chunk->transport
>>> * Modified detection of sacka-able transport. Instead of just setting
>>> and clearning a flag, we now mark each transport and association with
>>> a sack generation tag. We increment the associations generation on
>>> every sack, and assign that generation tag to every transport that
>>> updates the ctsn. This prevents us from having to iterate over a for
>>> loop on every sack, which is much more scalable.
>>> ---
>>> include/net/sctp/structs.h | 4 ++++
>>> include/net/sctp/tsnmap.h | 3 ++-
>>> net/sctp/associola.c | 1 +
>>> net/sctp/output.c | 9 +++++++--
>>> net/sctp/sm_make_chunk.c | 10 ++++++++++
>>> net/sctp/sm_sideeffect.c | 2 +-
>>> net/sctp/transport.c | 2 ++
>>> net/sctp/tsnmap.c | 6 +++++-
>>> net/sctp/ulpevent.c | 3 ++-
>>> net/sctp/ulpqueue.c | 2 +-
>>> 10 files changed, 35 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>>> index e4652fe..fecdf31 100644
>>> --- a/include/net/sctp/structs.h
>>> +++ b/include/net/sctp/structs.h
>>> @@ -912,6 +912,9 @@ struct sctp_transport {
>>> /* Is this structure kfree()able? */
>>> malloced:1;
>>>
>>> + /* Has this transport moved the ctsn since we last sacked */
>>> + __u32 sack_generation;
>>> +
>>> struct flowi fl;
>>>
>>> /* This is the peer's IP address and port. */
>>> @@ -1584,6 +1587,7 @@ struct sctp_association {
>>> */
>>> __u8 sack_needed; /* Do we need to sack the peer? */
>>> __u32 sack_cnt;
>>> + __u32 sack_generation;
>>>
>>> /* These are capabilities which our peer advertised. */
>>> __u8 ecn_capable:1, /* Can peer do ECN? */
>>> diff --git a/include/net/sctp/tsnmap.h b/include/net/sctp/tsnmap.h
>>> index e7728bc..2c5d2b4 100644
>>> --- a/include/net/sctp/tsnmap.h
>>> +++ b/include/net/sctp/tsnmap.h
>>> @@ -117,7 +117,8 @@ void sctp_tsnmap_free(struct sctp_tsnmap *map);
>>> int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn);
>>>
>>> /* Mark this TSN as seen. */
>>> -int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn);
>>> +int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn,
>>> + struct sctp_transport *trans);
>>>
>>> /* Mark this TSN and all lower as seen. */
>>> void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
>>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>>> index 5bc9ab1..6c66adb 100644
>>> --- a/net/sctp/associola.c
>>> +++ b/net/sctp/associola.c
>>> @@ -271,6 +271,7 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
>>> */
>>> asoc->peer.sack_needed = 1;
>>> asoc->peer.sack_cnt = 0;
>>> + asoc->peer.sack_generation=0;
>>>
>>> /* Assume that the peer will tell us if he recognizes ASCONF
>>> * as part of INIT exchange.
>>> diff --git a/net/sctp/output.c b/net/sctp/output.c
>>> index f1b7d4b..0de6cd5 100644
>>> --- a/net/sctp/output.c
>>> +++ b/net/sctp/output.c
>>> @@ -240,14 +240,19 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
>>> */
>>> if (sctp_chunk_is_data(chunk)&& !pkt->has_sack&&
>>> !pkt->has_cookie_echo) {
>>> - struct sctp_association *asoc;
>>> struct timer_list *timer;
>>> - asoc = pkt->transport->asoc;
>>> + struct sctp_association *asoc = pkt->transport->asoc;
>>> +
>>> timer =&asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
>>>
>>> /* If the SACK timer is running, we have a pending SACK */
>>> if (timer_pending(timer)) {
>>> struct sctp_chunk *sack;
>>> +
>>> + if (pkt->transport->sack_generation !=
>>> + pkt->transport->asoc->peer.sack_generation)
>>> + return retval;
>>> +
>>> asoc->a_rwnd = asoc->rwnd;
>>> sack = sctp_make_sack(asoc);
>>> if (sack) {
>>> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>>> index a85eeeb..ffa2a8e 100644
>>> --- a/net/sctp/sm_make_chunk.c
>>> +++ b/net/sctp/sm_make_chunk.c
>>> @@ -736,6 +736,7 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
>>> __u16 num_gabs, num_dup_tsns;
>>> struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
>>> struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
>>> + struct sctp_transport *trans;
>>>
>>> memset(gabs, 0, sizeof(gabs));
>>> ctsn = sctp_tsnmap_get_ctsn(map);
>>> @@ -805,6 +806,15 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
>>> sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
>>> sctp_tsnmap_get_dups(map));
>>>
>>> + /*
>>> + * Once we have a sack generated, clear the moved_tsn information
>>> + * from all the transports
>>> + */
>>> + if (!asoc->peer.sack_generation)
>>> + list_for_each_entry(trans,&asoc->peer.transport_addr_list,
>>> + transports)
>>> + trans->sack_generation = UINT_MAX;
>>> + ((struct sctp_association *)asoc)->peer.sack_generation++;
>>
>> Two points here:
>> 1) The commend no longer matches the code
> Crud, missed that, I'll fix it.
>
>> 2) Why special case the peer.sack_generations == 0 and set the
>> transport to UNIT_MAX?
>>
> To avoid wrapping problems leading to erroneous bundling errors. Consider a
> long lived connection with two trasports (A and B).
>
> If all traffic is sent on A for a long time (generating UINT_MAX sacks), and the
> peer chooses that moment to send data on transport B, its possible that we will
> bundle a sack with that data chunk erroneously, because the associations
> sack_generation has wrapped, and now matches with the transports, even though we
> never received data on transport B. The special casing ensures that we never
> hit that problem.
>
But you just move this condition to the UINT_MAX value instead. If we
use the alternate transport at the time that sack_generation ==
UINT_MAX, we may pick the wrong transport.
You may want to consider value 0 reserved as UNUSED and make
peer.sack_generation start at 1 and wrap to 1.
-vlad
^ permalink raw reply
* Re: AF_BUS socket address family
From: Casey Schaufler @ 2012-06-29 18:45 UTC (permalink / raw)
To: Vincent Sanders; +Cc: netdev, linux-kernel, David S. Miller
In-Reply-To: <1340988354-26981-1-git-send-email-vincent.sanders@collabora.co.uk>
On 6/29/2012 9:45 AM, Vincent Sanders wrote:
> This series adds the bus address family (AF_BUS) it is against
> net-next as of yesterday.
>
> AF_BUS is a message oriented inter process communication system.
>
> The principle features are:
>
> - Reliable datagram based communication (all sockets are of type
> SOCK_SEQPACKET)
>
> - Multicast message delivery (one to many, unicast as a subset)
>
> - Strict ordering (messages are delivered to every client in the same order)
>
> - Ability to pass file descriptors
>
> - Ability to pass credentials
>
> The basic concept is to provide a virtual bus on which multiple
> processes can communicate and policy is imposed by a "bus master".
>
> Introduction
> ------------
>
> AF_BUS is based upon AF_UNIX but extended for multicast operation and
> removes stream operation, responding to extensive feedback on previous
> approaches we have made the implementation as isolated as
> possible. There are opportunities in the future to integrate the
> socket garbage collector with that of the unix socket implementation.
>
> The impetus for creating this IPC mechanism is to replace the
> underlying transport for D-Bus. The D-Bus system currently emulates this
> IPC mechanism using AF_UNIX sockets in userspace and has numerous
> undesirable behaviours. D-Bus is now widely deployed in many areas and
> has become a de-facto IPC standard. Using this IPC mechanism as a
> transport gives a significant (100% or more) improvement to throughput
> with comparable improvement to latency.
>
> This work was undertaken by Collabora for the GENIVI Alliance and we
> are committed to responding to feedback promptly and intend to continue
> to support this feature into the future.
>
> Operation
> ---------
>
> A bus is created by processes connecting on an AF_BUS socket. The
> "bus master" binds itself instead of connecting to the NULL address.
>
> The socket address is made up of a path component and a numeric
> component. The path component is either a pathname or an abstract
> socket similar to a unix socket. The numeric component is used to
> uniquely identify each connection to the bus. Thus the path identifies
> a specific bus and the numeric component the attachment to that bus.
>
> The numeric component of the address is divided into two fixed parts a
> prefix to identify multicast groups and a suffix which identifies the
> attachment. The kernel allocates a single address in prefix 0 to each
> socket upon connection.
>
> Connections are initially limited to communicating with address the
> bus master (address 0) . The bus master is responsible for making all
> policy decisions around manipulating other attachments including
> building multicast groups.
>
> It is expected that connecting clients use protocol specific messages
> to communicate with the bus master to negotiate differing
> configurations although a bus master might implement a fixed
> behaviour.
>
> AF_BUS itself is protocol agnostic and implements the configured
> policy between attachments which allows for a bus master to leave a
> bus and communication between clients to continue.
>
> Some test code has been written [1] which demonstrates the usage of
> AF_BUS.
>
> Use with BUS_PROTO_DBUS
> -----------------------
>
> The initial aim of AF_BUS is to provide a IPC mechanism suitable for
> use to provide the underlying transport for D-Bus.
>
> A socket created using BUS_PROTO_DBUS indicates that the messages
> passed will be in the D-Bus format. The userspace libraries have been
> updated to use this transport with an updated D-Bus daemon [2] as a bus
> master.
Why don't you go whole hog and put all of D-Bus into the kernel?
>
> The D-Bus protocol allows for multicast groups to be filtered depending
> on message contents. These filters are configured by the bus master
> but need to be enforced on message delivery.
>
> We have simply used the standard kernel netfilter mechanism to achieve
> this. This is used to filter delivery to clients that may be part of a
> multicast group where they are not receiving all messages according to
> policy. If a client wishes to further filter its input provision has
> been made to allow them to use BPF.
>
> The kernel based IPC has several benefits for D-Bus over the userspace
> emulation:
>
> - Context switching between userspace processes is reduced.
> - Message data copying is reduced.
> - System call overheads are reduced.
> - The userspace D-Bus daemon was subject to resource starvation,
> client contention and priority inversion.
> - Latency is reduced
> - Throughput is increased.
>
> The tools for testing these assertions are available [3] and
> consistently show a doubling in throughput and better than halving of
> latency.
Please cross-post Patches 04/15 and 05/15 to the linux-security-module list.
Please cross-post Patch 05/15 to the selinux list.
Where is the analogous patch for the Smack LSM?
>
> [1] http://cgit.collabora.com/git/user/javier/check-unix-multicast.git/log/?h=af-bus
> [2] http://cgit.collabora.com/git/user/rodrigo/dbus.git/
>
> [3] git://github.com/kanchev/dbus-ping.git
> https://github.com/kanchev/dbus-ping/blob/master/dbus-genivi-benchmarking.sh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply
* Re: [PATCH v3] sctp: be more restrictive in transport selection on bundled sacks
From: Neil Horman @ 2012-06-29 18:43 UTC (permalink / raw)
To: Vlad Yasevich; +Cc: netdev, David S. Miller
In-Reply-To: <4FEDF420.50507@gmail.com>
On Fri, Jun 29, 2012 at 02:29:52PM -0400, Vlad Yasevich wrote:
> On 06/29/2012 12:34 PM, Neil Horman wrote:
> >It was noticed recently that when we send data on a transport, its possible that
> >we might bundle a sack that arrived on a different transport. While this isn't
> >a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
> >2960:
> >
> > An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
> > etc.) to the same destination transport address from which it
> > received the DATA or control chunk to which it is replying. This
> > rule should also be followed if the endpoint is bundling DATA chunks
> > together with the reply chunk.
> >
> >This patch seeks to correct that. It restricts the bundling of sack operations
> >to only those transports which have moved the ctsn of the association forward
> >since the last sack. By doing this we guarantee that we only bundle outbound
> >saks on a transport that has received a chunk since the last sack. This brings
> >us into stricter compliance with the RFC.
> >
> >Vlad had initially suggested that we strictly allow only sack bundling on the
> >transport that last moved the ctsn forward. While this makes sense, I was
> >concerned that doing so prevented us from bundling in the case where we had
> >received chunks that moved the ctsn on multiple transports. In those cases, the
> >RFC allows us to select any of the transports having received chunks to bundle
> >the sack on. so I've modified the approach to allow for that, by adding a state
> >variable to each transport that tracks weather it has moved the ctsn since the
> >last sack. This I think keeps our behavior (and performance), close enough to
> >our current profile that I think we can do this without a sysctl knob to
> >enable/disable it.
> >
> >Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
> >CC: Vlad Yaseivch<vyasevich@gmail.com>
> >CC: David S. Miller<davem@davemloft.net>
> >Reported-by: Michele Baldessari<michele@redhat.com>
> >Reported-by: sorin serban<sserban@redhat.com>
> >
> >---
> >Change Notes:
> >V2)
> > * Removed unused variable as per Dave M. Request
> > * Delayed rwnd adjustment until we are sure we will sack (Vlad Y.)
> >V3)
> > * Switched test to use pkt->transport rather than chunk->transport
> > * Modified detection of sacka-able transport. Instead of just setting
> > and clearning a flag, we now mark each transport and association with
> > a sack generation tag. We increment the associations generation on
> > every sack, and assign that generation tag to every transport that
> > updates the ctsn. This prevents us from having to iterate over a for
> > loop on every sack, which is much more scalable.
> >---
> > include/net/sctp/structs.h | 4 ++++
> > include/net/sctp/tsnmap.h | 3 ++-
> > net/sctp/associola.c | 1 +
> > net/sctp/output.c | 9 +++++++--
> > net/sctp/sm_make_chunk.c | 10 ++++++++++
> > net/sctp/sm_sideeffect.c | 2 +-
> > net/sctp/transport.c | 2 ++
> > net/sctp/tsnmap.c | 6 +++++-
> > net/sctp/ulpevent.c | 3 ++-
> > net/sctp/ulpqueue.c | 2 +-
> > 10 files changed, 35 insertions(+), 7 deletions(-)
> >
> >diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> >index e4652fe..fecdf31 100644
> >--- a/include/net/sctp/structs.h
> >+++ b/include/net/sctp/structs.h
> >@@ -912,6 +912,9 @@ struct sctp_transport {
> > /* Is this structure kfree()able? */
> > malloced:1;
> >
> >+ /* Has this transport moved the ctsn since we last sacked */
> >+ __u32 sack_generation;
> >+
> > struct flowi fl;
> >
> > /* This is the peer's IP address and port. */
> >@@ -1584,6 +1587,7 @@ struct sctp_association {
> > */
> > __u8 sack_needed; /* Do we need to sack the peer? */
> > __u32 sack_cnt;
> >+ __u32 sack_generation;
> >
> > /* These are capabilities which our peer advertised. */
> > __u8 ecn_capable:1, /* Can peer do ECN? */
> >diff --git a/include/net/sctp/tsnmap.h b/include/net/sctp/tsnmap.h
> >index e7728bc..2c5d2b4 100644
> >--- a/include/net/sctp/tsnmap.h
> >+++ b/include/net/sctp/tsnmap.h
> >@@ -117,7 +117,8 @@ void sctp_tsnmap_free(struct sctp_tsnmap *map);
> > int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn);
> >
> > /* Mark this TSN as seen. */
> >-int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn);
> >+int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn,
> >+ struct sctp_transport *trans);
> >
> > /* Mark this TSN and all lower as seen. */
> > void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
> >diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> >index 5bc9ab1..6c66adb 100644
> >--- a/net/sctp/associola.c
> >+++ b/net/sctp/associola.c
> >@@ -271,6 +271,7 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
> > */
> > asoc->peer.sack_needed = 1;
> > asoc->peer.sack_cnt = 0;
> >+ asoc->peer.sack_generation=0;
> >
> > /* Assume that the peer will tell us if he recognizes ASCONF
> > * as part of INIT exchange.
> >diff --git a/net/sctp/output.c b/net/sctp/output.c
> >index f1b7d4b..0de6cd5 100644
> >--- a/net/sctp/output.c
> >+++ b/net/sctp/output.c
> >@@ -240,14 +240,19 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
> > */
> > if (sctp_chunk_is_data(chunk)&& !pkt->has_sack&&
> > !pkt->has_cookie_echo) {
> >- struct sctp_association *asoc;
> > struct timer_list *timer;
> >- asoc = pkt->transport->asoc;
> >+ struct sctp_association *asoc = pkt->transport->asoc;
> >+
> > timer =&asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
> >
> > /* If the SACK timer is running, we have a pending SACK */
> > if (timer_pending(timer)) {
> > struct sctp_chunk *sack;
> >+
> >+ if (pkt->transport->sack_generation !=
> >+ pkt->transport->asoc->peer.sack_generation)
> >+ return retval;
> >+
> > asoc->a_rwnd = asoc->rwnd;
> > sack = sctp_make_sack(asoc);
> > if (sack) {
> >diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> >index a85eeeb..ffa2a8e 100644
> >--- a/net/sctp/sm_make_chunk.c
> >+++ b/net/sctp/sm_make_chunk.c
> >@@ -736,6 +736,7 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
> > __u16 num_gabs, num_dup_tsns;
> > struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
> > struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
> >+ struct sctp_transport *trans;
> >
> > memset(gabs, 0, sizeof(gabs));
> > ctsn = sctp_tsnmap_get_ctsn(map);
> >@@ -805,6 +806,15 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
> > sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
> > sctp_tsnmap_get_dups(map));
> >
> >+ /*
> >+ * Once we have a sack generated, clear the moved_tsn information
> >+ * from all the transports
> >+ */
> >+ if (!asoc->peer.sack_generation)
> >+ list_for_each_entry(trans,&asoc->peer.transport_addr_list,
> >+ transports)
> >+ trans->sack_generation = UINT_MAX;
> >+ ((struct sctp_association *)asoc)->peer.sack_generation++;
>
> Two points here:
> 1) The commend no longer matches the code
Crud, missed that, I'll fix it.
> 2) Why special case the peer.sack_generations == 0 and set the
> transport to UNIT_MAX?
>
To avoid wrapping problems leading to erroneous bundling errors. Consider a
long lived connection with two trasports (A and B).
If all traffic is sent on A for a long time (generating UINT_MAX sacks), and the
peer chooses that moment to send data on transport B, its possible that we will
bundle a sack with that data chunk erroneously, because the associations
sack_generation has wrapped, and now matches with the transports, even though we
never received data on transport B. The special casing ensures that we never
hit that problem.
Neil
^ permalink raw reply
* [PATCH net-next 1/6] qlge: Fixed packet transmit errors due to potential driver errors.
From: Jitendra Kalsaria @ 2012-06-29 18:24 UTC (permalink / raw)
To: davem
Cc: netdev, Ron, "Mercer <ron.mercer",
Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1340994290-28832-1-git-send-email-jitendra.kalsaria@qlogic.com>
From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
qlge driver was acting wrongly when considering TX ring full
as a TX error. TX ring full is expected behavior when NIC is
overwhelmed and is expected to happen, as far as packets are
not lost.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
drivers/net/ethernet/qlogic/qlge/qlge_main.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 09d8d33..cdbc860 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2562,7 +2562,6 @@ static netdev_tx_t qlge_send(struct sk_buff *skb, struct net_device *ndev)
__func__, tx_ring_idx);
netif_stop_subqueue(ndev, tx_ring->wq_id);
atomic_inc(&tx_ring->queue_stopped);
- tx_ring->tx_errors++;
return NETDEV_TX_BUSY;
}
tx_ring_desc = &tx_ring->q[tx_ring->prod_idx];
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 0/6] qlge: bug fix
From: Jitendra Kalsaria @ 2012-06-29 18:24 UTC (permalink / raw)
To: davem
Cc: netdev, Ron, "Mercer <ron.mercer",
Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Please apply it to net-next.
Thanks,
Jitendra
^ permalink raw reply
* [PATCH net-next 5/6] qlge: Categorize receive frame errors from firmware.
From: Jitendra Kalsaria @ 2012-06-29 18:24 UTC (permalink / raw)
To: davem
Cc: netdev, Ron, "Mercer <ron.mercer",
Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1340994290-28832-1-git-send-email-jitendra.kalsaria@qlogic.com>
From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
drivers/net/ethernet/qlogic/qlge/qlge.h | 8 ++++
drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c | 14 +++++++
drivers/net/ethernet/qlogic/qlge/qlge_main.c | 46 ++++++++++++++++++----
3 files changed, 59 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h b/drivers/net/ethernet/qlogic/qlge/qlge.h
index 5a639df..e81bbb7 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge.h
+++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
@@ -1535,6 +1535,14 @@ struct nic_stats {
u64 rx_1024_to_1518_pkts;
u64 rx_1519_to_max_pkts;
u64 rx_len_err_pkts;
+ /* Receive Mac Err stats */
+ u64 rx_code_err;
+ u64 rx_oversize_err;
+ u64 rx_undersize_err;
+ u64 rx_preamble_err;
+ u64 rx_frame_len_err;
+ u64 rx_crc_err;
+ u64 rx_err_count;
/*
* These stats come from offset 500h to 5C8h
* in the XGMAC register.
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
index 966bd96..bbc4136 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
@@ -226,6 +226,13 @@ static char ql_stats_str_arr[][ETH_GSTRING_LEN] = {
{"rx_1024_to_1518_pkts"},
{"rx_1519_to_max_pkts"},
{"rx_len_err_pkts"},
+ {"rx_code_err"},
+ {"rx_oversize_err"},
+ {"rx_undersize_err"},
+ {"rx_preamble_err"},
+ {"rx_frame_len_err"},
+ {"rx_crc_err"},
+ {"rx_err_count"},
{"tx_cbfc_pause_frames0"},
{"tx_cbfc_pause_frames1"},
{"tx_cbfc_pause_frames2"},
@@ -320,6 +327,13 @@ ql_get_ethtool_stats(struct net_device *ndev,
*data++ = s->rx_1024_to_1518_pkts;
*data++ = s->rx_1519_to_max_pkts;
*data++ = s->rx_len_err_pkts;
+ *data++ = s->rx_code_err;
+ *data++ = s->rx_oversize_err;
+ *data++ = s->rx_undersize_err;
+ *data++ = s->rx_preamble_err;
+ *data++ = s->rx_frame_len_err;
+ *data++ = s->rx_crc_err;
+ *data++ = s->rx_err_count;
*data++ = s->tx_cbfc_pause_frames0;
*data++ = s->tx_cbfc_pause_frames1;
*data++ = s->tx_cbfc_pause_frames2;
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index aa514c5..0f56148 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -1433,6 +1433,34 @@ map_error:
return NETDEV_TX_BUSY;
}
+/* Categorizing receive firmware frame errors */
+static void ql_categorize_rx_err(struct ql_adapter *qdev, u8 rx_err)
+{
+ qdev->nic_stats.rx_err_count++;
+
+ switch (rx_err & IB_MAC_IOCB_RSP_ERR_MASK) {
+ case IB_MAC_IOCB_RSP_ERR_CODE_ERR:
+ qdev->nic_stats.rx_code_err++;
+ break;
+ case IB_MAC_IOCB_RSP_ERR_OVERSIZE:
+ qdev->nic_stats.rx_oversize_err++;
+ break;
+ case IB_MAC_IOCB_RSP_ERR_UNDERSIZE:
+ qdev->nic_stats.rx_undersize_err++;
+ break;
+ case IB_MAC_IOCB_RSP_ERR_PREAMBLE:
+ qdev->nic_stats.rx_preamble_err++;
+ break;
+ case IB_MAC_IOCB_RSP_ERR_FRAME_LEN:
+ qdev->nic_stats.rx_frame_len_err++;
+ break;
+ case IB_MAC_IOCB_RSP_ERR_CRC:
+ qdev->nic_stats.rx_crc_err++;
+ default:
+ break;
+ }
+}
+
/* Process an inbound completion from an rx ring. */
static void ql_process_mac_rx_gro_page(struct ql_adapter *qdev,
struct rx_ring *rx_ring,
@@ -1446,6 +1474,12 @@ static void ql_process_mac_rx_gro_page(struct ql_adapter *qdev,
napi->dev = qdev->ndev;
+ if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
+ ql_categorize_rx_err(qdev, ib_mac_rsp->flags2);
+ put_page(lbq_desc->p.pg_chunk.page);
+ return;
+ }
+
skb = napi_get_frags(napi);
if (!skb) {
netif_err(qdev, drv, qdev->ndev,
@@ -1502,9 +1536,7 @@ static void ql_process_mac_rx_page(struct ql_adapter *qdev,
/* Frame error, so drop the packet. */
if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
- netif_info(qdev, drv, qdev->ndev,
- "Receive error, flags2 = 0x%x\n", ib_mac_rsp->flags2);
- rx_ring->rx_errors++;
+ ql_categorize_rx_err(qdev, ib_mac_rsp->flags2);
goto err_out;
}
@@ -1595,10 +1627,8 @@ static void ql_process_mac_rx_skb(struct ql_adapter *qdev,
/* Frame error, so drop the packet. */
if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
- netif_info(qdev, drv, qdev->ndev,
- "Receive error, flags2 = 0x%x\n", ib_mac_rsp->flags2);
+ ql_categorize_rx_err(qdev, ib_mac_rsp->flags2);
dev_kfree_skb_any(skb);
- rx_ring->rx_errors++;
return;
}
@@ -1910,10 +1940,8 @@ static void ql_process_mac_split_rx_intr(struct ql_adapter *qdev,
/* Frame error, so drop the packet. */
if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
- netif_info(qdev, drv, qdev->ndev,
- "Receive error, flags2 = 0x%x\n", ib_mac_rsp->flags2);
+ ql_categorize_rx_err(qdev, ib_mac_rsp->flags2);
dev_kfree_skb_any(skb);
- rx_ring->rx_errors++;
return;
}
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 6/6] qlge: Bumped driver version to 1.00.00.31
From: Jitendra Kalsaria @ 2012-06-29 18:24 UTC (permalink / raw)
To: davem
Cc: netdev, Ron, "Mercer <ron.mercer",
Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1340994290-28832-1-git-send-email-jitendra.kalsaria@qlogic.com>
From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
drivers/net/ethernet/qlogic/qlge/qlge.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h b/drivers/net/ethernet/qlogic/qlge/qlge.h
index e81bbb7..5a8c00c 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge.h
+++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
@@ -18,7 +18,7 @@
*/
#define DRV_NAME "qlge"
#define DRV_STRING "QLogic 10 Gigabit PCI-E Ethernet Driver "
-#define DRV_VERSION "v1.00.00.30.00.00-01"
+#define DRV_VERSION "v1.00.00.31"
#define WQ_ADDR_ALIGN 0x3 /* 4 byte alignment */
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 4/6] qlge: Fixed double pci free upon tx_ring->q allocation failure.
From: Jitendra Kalsaria @ 2012-06-29 18:24 UTC (permalink / raw)
To: davem
Cc: netdev, Ron, "Mercer <ron.mercer",
Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1340994290-28832-1-git-send-email-jitendra.kalsaria@qlogic.com>
From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
drivers/net/ethernet/qlogic/qlge/qlge_main.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index cdbc860..aa514c5 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2701,11 +2701,9 @@ static int ql_alloc_tx_resources(struct ql_adapter *qdev,
pci_alloc_consistent(qdev->pdev, tx_ring->wq_size,
&tx_ring->wq_base_dma);
- if ((tx_ring->wq_base == NULL) ||
- tx_ring->wq_base_dma & WQ_ADDR_ALIGN) {
- netif_err(qdev, ifup, qdev->ndev, "tx_ring alloc failed.\n");
- return -ENOMEM;
- }
+ if ((tx_ring->wq_base == NULL) || tx_ring->wq_base_dma & WQ_ADDR_ALIGN)
+ goto err;
+
tx_ring->q =
kmalloc(tx_ring->wq_len * sizeof(struct tx_ring_desc), GFP_KERNEL);
if (tx_ring->q == NULL)
@@ -2713,8 +2711,12 @@ static int ql_alloc_tx_resources(struct ql_adapter *qdev,
return 0;
err:
- pci_free_consistent(qdev->pdev, tx_ring->wq_size,
+ if (tx_ring->wq_base) {
+ pci_free_consistent(qdev->pdev, tx_ring->wq_size,
tx_ring->wq_base, tx_ring->wq_base_dma);
+ tx_ring->wq_base = NULL;
+ }
+ netif_err(qdev, ifup, qdev->ndev, "tx_ring alloc failed.\n");
return -ENOMEM;
}
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 3/6] qlge: Garbage values shown in extra info during selftest.
From: Jitendra Kalsaria @ 2012-06-29 18:24 UTC (permalink / raw)
To: davem
Cc: netdev, Ron, "Mercer <ron.mercer",
Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1340994290-28832-1-git-send-email-jitendra.kalsaria@qlogic.com>
From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
while running selftest 'ethtool -t' multiple times will get
different values in the 'extra info' section, which was garbage.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
index 81672f5..966bd96 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
@@ -248,6 +248,9 @@ static char ql_stats_str_arr[][ETH_GSTRING_LEN] = {
static void ql_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
{
switch (stringset) {
+ case ETH_SS_TEST:
+ memcpy(buf, *ql_gstrings_test, QLGE_TEST_LEN * ETH_GSTRING_LEN);
+ break;
case ETH_SS_STATS:
memcpy(buf, ql_stats_str_arr, sizeof(ql_stats_str_arr));
break;
@@ -539,6 +542,8 @@ static void ql_self_test(struct net_device *ndev,
{
struct ql_adapter *qdev = netdev_priv(ndev);
+ memset(data, 0, sizeof(u64) * QLGE_TEST_LEN);
+
if (netif_running(ndev)) {
set_bit(QL_SELFTEST, &qdev->flags);
if (eth_test->flags == ETH_TEST_FL_OFFLINE) {
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 2/6] qlge: Stand-up card should not report supporting wol.
From: Jitendra Kalsaria @ 2012-06-29 18:24 UTC (permalink / raw)
To: davem
Cc: netdev, Ron, "Mercer <ron.mercer",
Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1340994290-28832-1-git-send-email-jitendra.kalsaria@qlogic.com>
From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c | 43 ++++++++++++++--------
1 files changed, 27 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
index 8e2c2a7..81672f5 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
@@ -388,10 +388,14 @@ static void ql_get_drvinfo(struct net_device *ndev,
static void ql_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
{
struct ql_adapter *qdev = netdev_priv(ndev);
- /* What we support. */
- wol->supported = WAKE_MAGIC;
- /* What we've currently got set. */
- wol->wolopts = qdev->wol;
+
+ if (qdev->pdev->subsystem_device == 0x0068 ||
+ qdev->pdev->subsystem_device == 0x0180) {
+ /* What we support. */
+ wol->supported = WAKE_MAGIC;
+ /* What we've currently got set. */
+ wol->wolopts = qdev->wol;
+ }
}
static int ql_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
@@ -399,19 +403,26 @@ static int ql_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
struct ql_adapter *qdev = netdev_priv(ndev);
int status;
- if (wol->wolopts & ~WAKE_MAGIC)
- return -EINVAL;
- qdev->wol = wol->wolopts;
-
- netif_info(qdev, drv, qdev->ndev, "Set wol option 0x%x\n", qdev->wol);
- if (!qdev->wol) {
- u32 wol = 0;
- status = ql_mb_wol_mode(qdev, wol);
- netif_err(qdev, drv, qdev->ndev, "WOL %s (wol code 0x%x)\n",
- status == 0 ? "cleared successfully" : "clear failed",
- wol);
+ if (qdev->pdev->subsystem_device == 0x0068 ||
+ qdev->pdev->subsystem_device == 0x0180) {
+ if (wol->wolopts & ~WAKE_MAGIC)
+ return -EINVAL;
+ qdev->wol = wol->wolopts;
+
+ netif_info(qdev, drv, qdev->ndev,
+ "Set wol option 0x%x\n", qdev->wol);
+ if (!qdev->wol) {
+ u32 wol = 0;
+ status = ql_mb_wol_mode(qdev, wol);
+ netif_err(qdev, drv, qdev->ndev,
+ "WOL %s (wol code 0x%x)\n",
+ status == 0 ? "cleared successfully" : "clear failed",
+ wol);
+ }
+ } else {
+ netif_info(qdev, drv, qdev->ndev,
+ "WOL is not supported on stand-up card\n");
}
-
return 0;
}
--
1.7.1
^ permalink raw reply related
* Re: [PATCH v3] sctp: be more restrictive in transport selection on bundled sacks
From: Vlad Yasevich @ 2012-06-29 18:29 UTC (permalink / raw)
To: Neil Horman; +Cc: netdev, David S. Miller
In-Reply-To: <1340987696-19205-1-git-send-email-nhorman@tuxdriver.com>
On 06/29/2012 12:34 PM, Neil Horman wrote:
> It was noticed recently that when we send data on a transport, its possible that
> we might bundle a sack that arrived on a different transport. While this isn't
> a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
> 2960:
>
> An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
> etc.) to the same destination transport address from which it
> received the DATA or control chunk to which it is replying. This
> rule should also be followed if the endpoint is bundling DATA chunks
> together with the reply chunk.
>
> This patch seeks to correct that. It restricts the bundling of sack operations
> to only those transports which have moved the ctsn of the association forward
> since the last sack. By doing this we guarantee that we only bundle outbound
> saks on a transport that has received a chunk since the last sack. This brings
> us into stricter compliance with the RFC.
>
> Vlad had initially suggested that we strictly allow only sack bundling on the
> transport that last moved the ctsn forward. While this makes sense, I was
> concerned that doing so prevented us from bundling in the case where we had
> received chunks that moved the ctsn on multiple transports. In those cases, the
> RFC allows us to select any of the transports having received chunks to bundle
> the sack on. so I've modified the approach to allow for that, by adding a state
> variable to each transport that tracks weather it has moved the ctsn since the
> last sack. This I think keeps our behavior (and performance), close enough to
> our current profile that I think we can do this without a sysctl knob to
> enable/disable it.
>
> Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
> CC: Vlad Yaseivch<vyasevich@gmail.com>
> CC: David S. Miller<davem@davemloft.net>
> Reported-by: Michele Baldessari<michele@redhat.com>
> Reported-by: sorin serban<sserban@redhat.com>
>
> ---
> Change Notes:
> V2)
> * Removed unused variable as per Dave M. Request
> * Delayed rwnd adjustment until we are sure we will sack (Vlad Y.)
> V3)
> * Switched test to use pkt->transport rather than chunk->transport
> * Modified detection of sacka-able transport. Instead of just setting
> and clearning a flag, we now mark each transport and association with
> a sack generation tag. We increment the associations generation on
> every sack, and assign that generation tag to every transport that
> updates the ctsn. This prevents us from having to iterate over a for
> loop on every sack, which is much more scalable.
> ---
> include/net/sctp/structs.h | 4 ++++
> include/net/sctp/tsnmap.h | 3 ++-
> net/sctp/associola.c | 1 +
> net/sctp/output.c | 9 +++++++--
> net/sctp/sm_make_chunk.c | 10 ++++++++++
> net/sctp/sm_sideeffect.c | 2 +-
> net/sctp/transport.c | 2 ++
> net/sctp/tsnmap.c | 6 +++++-
> net/sctp/ulpevent.c | 3 ++-
> net/sctp/ulpqueue.c | 2 +-
> 10 files changed, 35 insertions(+), 7 deletions(-)
>
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index e4652fe..fecdf31 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -912,6 +912,9 @@ struct sctp_transport {
> /* Is this structure kfree()able? */
> malloced:1;
>
> + /* Has this transport moved the ctsn since we last sacked */
> + __u32 sack_generation;
> +
> struct flowi fl;
>
> /* This is the peer's IP address and port. */
> @@ -1584,6 +1587,7 @@ struct sctp_association {
> */
> __u8 sack_needed; /* Do we need to sack the peer? */
> __u32 sack_cnt;
> + __u32 sack_generation;
>
> /* These are capabilities which our peer advertised. */
> __u8 ecn_capable:1, /* Can peer do ECN? */
> diff --git a/include/net/sctp/tsnmap.h b/include/net/sctp/tsnmap.h
> index e7728bc..2c5d2b4 100644
> --- a/include/net/sctp/tsnmap.h
> +++ b/include/net/sctp/tsnmap.h
> @@ -117,7 +117,8 @@ void sctp_tsnmap_free(struct sctp_tsnmap *map);
> int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn);
>
> /* Mark this TSN as seen. */
> -int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn);
> +int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn,
> + struct sctp_transport *trans);
>
> /* Mark this TSN and all lower as seen. */
> void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 5bc9ab1..6c66adb 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -271,6 +271,7 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
> */
> asoc->peer.sack_needed = 1;
> asoc->peer.sack_cnt = 0;
> + asoc->peer.sack_generation=0;
>
> /* Assume that the peer will tell us if he recognizes ASCONF
> * as part of INIT exchange.
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index f1b7d4b..0de6cd5 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -240,14 +240,19 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
> */
> if (sctp_chunk_is_data(chunk)&& !pkt->has_sack&&
> !pkt->has_cookie_echo) {
> - struct sctp_association *asoc;
> struct timer_list *timer;
> - asoc = pkt->transport->asoc;
> + struct sctp_association *asoc = pkt->transport->asoc;
> +
> timer =&asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
>
> /* If the SACK timer is running, we have a pending SACK */
> if (timer_pending(timer)) {
> struct sctp_chunk *sack;
> +
> + if (pkt->transport->sack_generation !=
> + pkt->transport->asoc->peer.sack_generation)
> + return retval;
> +
> asoc->a_rwnd = asoc->rwnd;
> sack = sctp_make_sack(asoc);
> if (sack) {
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index a85eeeb..ffa2a8e 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -736,6 +736,7 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
> __u16 num_gabs, num_dup_tsns;
> struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
> struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
> + struct sctp_transport *trans;
>
> memset(gabs, 0, sizeof(gabs));
> ctsn = sctp_tsnmap_get_ctsn(map);
> @@ -805,6 +806,15 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
> sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
> sctp_tsnmap_get_dups(map));
>
> + /*
> + * Once we have a sack generated, clear the moved_tsn information
> + * from all the transports
> + */
> + if (!asoc->peer.sack_generation)
> + list_for_each_entry(trans,&asoc->peer.transport_addr_list,
> + transports)
> + trans->sack_generation = UINT_MAX;
> + ((struct sctp_association *)asoc)->peer.sack_generation++;
Two points here:
1) The commend no longer matches the code
2) Why special case the peer.sack_generations == 0 and set the transport
to UNIT_MAX?
-vlad
> nodata:
> return retval;
> }
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index c96d1a8..8716da1 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -1268,7 +1268,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
> case SCTP_CMD_REPORT_TSN:
> /* Record the arrival of a TSN. */
> error = sctp_tsnmap_mark(&asoc->peer.tsn_map,
> - cmd->obj.u32);
> + cmd->obj.u32, NULL);
> break;
>
> case SCTP_CMD_REPORT_FWDTSN:
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index b026ba0..1dcceb6 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -68,6 +68,8 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
> peer->af_specific = sctp_get_af_specific(addr->sa.sa_family);
> memset(&peer->saddr, 0, sizeof(union sctp_addr));
>
> + peer->sack_generation = 0;
> +
> /* From 6.3.1 RTO Calculation:
> *
> * C1) Until an RTT measurement has been made for a packet sent to the
> diff --git a/net/sctp/tsnmap.c b/net/sctp/tsnmap.c
> index f1e40ceb..b5fb7c4 100644
> --- a/net/sctp/tsnmap.c
> +++ b/net/sctp/tsnmap.c
> @@ -114,7 +114,8 @@ int sctp_tsnmap_check(const struct sctp_tsnmap *map, __u32 tsn)
>
>
> /* Mark this TSN as seen. */
> -int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
> +int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn,
> + struct sctp_transport *trans)
> {
> u16 gap;
>
> @@ -133,6 +134,9 @@ int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
> */
> map->max_tsn_seen++;
> map->cumulative_tsn_ack_point++;
> + if (trans)
> + trans->sack_generation =
> + trans->asoc->peer.sack_generation;
> map->base_tsn++;
> } else {
> /* Either we already have a gap, or about to record a gap, so
> diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
> index 8a84017..33d8947 100644
> --- a/net/sctp/ulpevent.c
> +++ b/net/sctp/ulpevent.c
> @@ -715,7 +715,8 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
> * can mark it as received so the tsn_map is updated correctly.
> */
> if (sctp_tsnmap_mark(&asoc->peer.tsn_map,
> - ntohl(chunk->subh.data_hdr->tsn)))
> + ntohl(chunk->subh.data_hdr->tsn),
> + chunk->transport))
> goto fail_mark;
>
> /* First calculate the padding, so we don't inadvertently
> diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
> index f2d1de7..f5a6a4f 100644
> --- a/net/sctp/ulpqueue.c
> +++ b/net/sctp/ulpqueue.c
> @@ -1051,7 +1051,7 @@ void sctp_ulpq_renege(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk,
> if (chunk&& (freed>= needed)) {
> __u32 tsn;
> tsn = ntohl(chunk->subh.data_hdr->tsn);
> - sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
> + sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn, chunk->transport);
> sctp_ulpq_tail_data(ulpq, chunk, gfp);
>
> sctp_ulpq_partial_delivery(ulpq, chunk, gfp);
^ permalink raw reply
* Re: [PATCH 1/5] netfilter: ipset: fix interface comparision in hash-netiface sets
From: Florian Westphal @ 2012-06-29 18:24 UTC (permalink / raw)
To: David Laight; +Cc: pablo, netfilter-devel, davem, netdev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B6F6E@saturn3.aculab.com>
David Laight <David.Laight@ACULAB.COM> wrote:
> > From: Florian Westphal <fw@strlen.de>
> >
> > ifname_compare() assumes that skb->dev is zero-padded,
> > e.g 'eth1\0\0\0\0\0...'. This isn't always the case. e1000 driver does
> >
> > strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);
> >
> > in e1000_probe(), so once device is registered dev->name memory
> contains
> > 'eth1\0:0:3\0\0\0' (or something like that), which makes eth1 compare
> fail.
>
> strncpy() would normally zero-fill the destination buffer
> (at least the libc version does).
>
> So something else must be wrong.
No. driver .probe() runs before the device name is filled in, and no
explict zeroing happens there.
^ permalink raw reply
* Re[2]: BUG: NULL pointer in ctnetlink_conntrack_event
From: Hans Schillstrom @ 2012-06-29 18:22 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netdev, netfilter-devel
>On Fri, Jun 29, 2012 at 02:29:37PM +0200, Hans Schillstrom wrote:
>> Hello,
>>
>> There is a "hard to find" problem in ctnetlink_conntrack_event() when calling
>> netlink_has_listeners() net->nfnl is NULL.
>>
>> The rcu stuff seems to be right at a first look but who knows...
>>
>> The line below fix the problem, but that is not the root cause.
>>
>> int nfnetlink_has_listeners(struct net *net, unsigned int group)
>> {
>> - return netlink_has_listeners(net->nfnl, group);
>> + return net->nfnl ? netlink_has_listeners(net->nfnl, group) : 0 ;
>> }
>>
>> Yes it is a 3.0.26 kernel but this patch is applied
>> netfilter: nf_conntrack: make event callback registration per-netns
>
>I think this patch above is missing some rcu_access_pointer usage.
>
>Please, see patch attached.
Thanks it looks like it's the missing patch.
/Hans
^ permalink raw reply
* Re: AF_BUS socket address family
From: Chris Friesen @ 2012-06-29 18:16 UTC (permalink / raw)
To: Vincent Sanders; +Cc: netdev, linux-kernel, David S. Miller
In-Reply-To: <1340988354-26981-1-git-send-email-vincent.sanders@collabora.co.uk>
On 06/29/2012 10:45 AM, Vincent Sanders wrote:
> This series adds the bus address family (AF_BUS) it is against
> net-next as of yesterday.
>
> AF_BUS is a message oriented inter process communication system.
>
> The principle features are:
>
> - Reliable datagram based communication (all sockets are of type
> SOCK_SEQPACKET)
>
> - Multicast message delivery (one to many, unicast as a subset)
>
> - Strict ordering (messages are delivered to every client in the same order)
>
> - Ability to pass file descriptors
>
> - Ability to pass credentials
>
I haven't had time to look at the code yet, but if you haven't already
I'd like to propose adding the ability for someone with suitable
privileges to eavesdrop on all communications. We've been using
something similar to this (essentially a simplified multicast unix
datagram protocol) for many years now and having a tcpdump-like ability
is very useful for debugging.
Chris
^ permalink raw reply
* Reporting a Kernel Bug
From: Lucas Willian Bocchi @ 2012-06-29 17:49 UTC (permalink / raw)
To: netdev
Dear Sir
Proceeding as oriented, I'm sending the link to these probable bug.
https://bugzilla.kernel.org/show_bug.cgi?id=43901
Thanks for advance.
^ permalink raw reply
* tcp mtu probing oddity - small packets
From: George B. @ 2012-06-29 17:43 UTC (permalink / raw)
To: netdev
This is mainly meant to put this in the archive in case anyone else
has this problem. It was a hair-puller.
Setup is a load balancer doing source NAT to a linux server. Server
had /proc/sys/net/ipv4/tcp_mtu_probing=2
Apparently the PMTU probing somehow gets confused and the flow gets
clamped to the tcp_base_mss value (512 byte packets). But this PMTU
value is apparently cached for all traffic to the IP address and since
that is a source NAT, ALL flows from that IP address get clamped to
512 byte packets. So if something goes wonky with one flow, it
impacts all flows because they all appear to come from the same IP
address.
Setting tcp_mtu_probing to 1 has cleared the issue for now but it may
come back from time to time if a flow gets into trouble and whatever
Linux is doing in the PMTU probing causes the flow to be clamped to
tcp_base_mss but assuming that PMTU will age out at some point after
the problem client goes away, things should return to normal behavior.
There's something about how the PMTU probes work that apparently
cause them to fail when activated in this particular configuration and
clamp the packet size to tcp_base_mss for all flows to that IP
address. If you are behind a source NAT, all flows share the same IP
address and so all flows get clamped.
Linux 2.6.38-11
^ permalink raw reply
* Re: [net-next.git 4/4 (v9)] phy: add the EEE support and the way to access to the MMD registers.
From: Ben Hutchings @ 2012-06-29 17:36 UTC (permalink / raw)
To: Giuseppe CAVALLARO; +Cc: netdev, eric.dumazet, rayagond, davem, yuvalmin
In-Reply-To: <1340867678-18375-5-git-send-email-peppe.cavallaro@st.com>
On Thu, 2012-06-28 at 09:14 +0200, Giuseppe CAVALLARO wrote:
> This patch adds the support for the Energy-Efficient Ethernet (EEE)
> to the Physical Abstraction Layer.
> To support the EEE we have to access to the MMD registers 3.20 and
> 7.60/61. So two new functions have been added to read/write the MMD
> registers (clause 45).
>
> An Ethernet driver (I tested the stmmac) can invoke the phy_init_eee to properly
> check if the EEE is supported by the PHYs and it can also set the clock
> stop enable bit in the 3.0 register.
> The phy_get_eee_err can be used for reporting the number of time where
> the PHY failed to complete its normal wake sequence.
>
> In the end, this patch also adds the EEE ethtool support implementing:
> o phy_ethtool_set_eee
> o phy_ethtool_get_eee
>
> v1: initial patch
> v2: fixed some errors especially on naming convention
> v3: renamed again the mmd read/write functions thank to Ben's feedback
> v4: moved file to phy.c and added the ethtool support.
> v5: fixed phy_adv_to_eee, phy_eee_to_supported, phy_eee_to_adv return
> values according to ethtool API (thanks to Ben's feedback).
> Renamed some macros to avoid too long names.
> v6: fixed kernel-doc comments to be properly parsed.
> Fixed the phy_init_eee function: we need to check which link mode
> was autonegotiated and then the corresponding bits in 7.60 and 7.61
> registers.
> v7: reviewed the way to get the negotiated settings.
> v8: fixed a problem in the phy_init_eee return value erroneously added
> when included the phy_read_status call.
> v9: do not remove the MDIO_AN_EEE_ADV_100TX and MDIO_AN_EEE_ADV_1000T
> and fixed the eee_{cap,lp,adv} declaration as "int" instead of u16.
>
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
[...]
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
(but not tested in any way)
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [RFC] [TCP 1/3] tcp: Add MSG_NEW_PACKET flag to indicate preferable packet boundaries
From: Eric Dumazet @ 2012-06-29 17:15 UTC (permalink / raw)
To: Andreas Gruenbacher; +Cc: netdev, linux-kernel, Herbert Xu, David S. Miller
In-Reply-To: <1340984335.25450.24.camel@gurkel.linbit>
On Fri, 2012-06-29 at 17:38 +0200, Andreas Gruenbacher wrote:
> The primary use case is fast Gigabit (10 or more) Ethernet connections
> with jumbo frames and switches that support them. There, frames will go
> through unchanged and you can zero-copy receive all the time.
>
> Not sure how well the approach scales to other kinds of connections; it
> may work often enough to be worth it. When things get distorted between
> the sender and the receiver and tcp_recvbio() fails, the data can still
> be copied out of the socket as before.
If you have a packet loss, receiver can and will coalesce frames.
^ permalink raw reply
* Re: [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing
From: Pablo Neira Ayuso @ 2012-06-29 17:11 UTC (permalink / raw)
To: Vincent Sanders
Cc: netdev, linux-kernel, David S. Miller, Javier Martinez Canillas,
Alban Crequy
In-Reply-To: <1340988354-26981-14-git-send-email-vincent.sanders@collabora.co.uk>
On Fri, Jun 29, 2012 at 05:45:52PM +0100, Vincent Sanders wrote:
> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
>
> The netfilter D-Bus module needs to parse D-bus messages sent by
> applications to decide whether a peer can receive or not a D-Bus
> message. Add D-bus message parsing logic to be able to analyze.
Not talking about the entire patchset, only about the part I'm
responsible for.
I don't see why you think this belong to netfilter at all.
This doesn't integrate into the existing filtering infrastructure,
neither it extends it in any way.
> Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
> ---
> net/netfilter/nfdbus/message.c | 194 ++++++++++++++++++++++++++++++++++++++++
> net/netfilter/nfdbus/message.h | 71 +++++++++++++++
> 2 files changed, 265 insertions(+)
> create mode 100644 net/netfilter/nfdbus/message.c
> create mode 100644 net/netfilter/nfdbus/message.h
>
> diff --git a/net/netfilter/nfdbus/message.c b/net/netfilter/nfdbus/message.c
> new file mode 100644
> index 0000000..93c409c
> --- /dev/null
> +++ b/net/netfilter/nfdbus/message.c
> @@ -0,0 +1,194 @@
> +/*
> + * message.c Basic D-Bus message parsing
> + *
> + * Copyright (C) 2010-2012 Collabora Ltd
> + * Authors: Alban Crequy <alban.crequy@collabora.co.uk>
> + * Copyright (C) 2002, 2003, 2004, 2005 Red Hat Inc.
> + * Copyright (C) 2002, 2003 CodeFactory AB
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + *
> + */
> +
> +#include <linux/slab.h>
> +
> +#include "message.h"
> +
> +int dbus_message_type_from_string(const char *type_str)
> +{
> + if (strcmp(type_str, "method_call") == 0)
> + return DBUS_MESSAGE_TYPE_METHOD_CALL;
> + if (strcmp(type_str, "method_return") == 0)
> + return DBUS_MESSAGE_TYPE_METHOD_RETURN;
> + else if (strcmp(type_str, "signal") == 0)
> + return DBUS_MESSAGE_TYPE_SIGNAL;
> + else if (strcmp(type_str, "error") == 0)
> + return DBUS_MESSAGE_TYPE_ERROR;
> + else
> + return DBUS_MESSAGE_TYPE_INVALID;
> +}
> +
> +int dbus_message_parse(unsigned char *message, size_t len,
> + struct dbus_message *dbus_message)
> +{
> + unsigned char *cur;
> + int array_header_len;
> +
> + dbus_message->message = message;
> +
> + if (len < 4 + 4 + 4 + 4 || message[1] == 0 || message[1] > 4)
> + return -EINVAL;
> +
> + dbus_message->type = message[1];
> + dbus_message->body_length = *((u32 *)(message + 4));
> + cur = message + 12;
> + array_header_len = *(u32 *)cur;
> + dbus_message->len_offset = 12;
> + cur += 4;
> + while (cur < message + len
> + && cur < message + 12 + 4 + array_header_len) {
> + int header_code;
> + int signature_len;
> + unsigned char *signature;
> + int str_len;
> + unsigned char *str;
> +
> + /* D-Bus alignment craziness */
> + if ((cur - message) % 8 != 0)
> + cur += 8 - (cur - message) % 8;
> +
> + header_code = *(char *)cur;
> + cur++;
> + signature_len = *(char *)cur;
> + /* All header fields of the current D-Bus spec have a simple
> + * type, either o, s, g, or u */
> + if (signature_len != 1)
> + return -EINVAL;
> + cur++;
> + signature = cur;
> + cur += signature_len + 1;
> + if (signature[0] != 'o' &&
> + signature[0] != 's' &&
> + signature[0] != 'g' &&
> + signature[0] != 'u')
> + return -EINVAL;
> +
> + if (signature[0] == 'u') {
> + cur += 4;
> + continue;
> + }
> +
> + if (signature[0] != 'g') {
> + str_len = *(u32 *)cur;
> + cur += 4;
> + } else {
> + str_len = *(char *)cur;
> + cur += 1;
> + }
> +
> + str = cur;
> + switch (header_code) {
> + case 1:
> + dbus_message->path = str;
> + break;
> + case 2:
> + dbus_message->interface = str;
> + break;
> + case 3:
> + dbus_message->member = str;
> + break;
> + case 6:
> + dbus_message->destination = str;
> + break;
> + case 7:
> + dbus_message->sender = str;
> + break;
> + case 8:
> + dbus_message->body_signature = str;
> + break;
> + }
> + cur += str_len + 1;
> + }
> +
> + dbus_message->padding_end = (8 - (cur - message) % 8) % 8;
> +
> + /* Jump to body D-Bus alignment craziness */
> + if ((cur - message) % 8 != 0)
> + cur += 8 - (cur - message) % 8;
> + dbus_message->new_header_offset = cur - message;
> +
> + if (dbus_message->new_header_offset
> + + dbus_message->body_length != len) {
> + pr_warn("Message truncated? " \
> + "Header %d + Body %d != Length %zd\n",
> + dbus_message->new_header_offset,
> + dbus_message->body_length, len);
> + return -EINVAL;
> + }
> +
> + if (dbus_message->body_signature &&
> + dbus_message->body_signature[0] == 's') {
> + int str_len;
> + str_len = *(u32 *)cur;
> + cur += 4;
> + dbus_message->arg0 = cur;
> + cur += str_len + 1;
> + }
> +
> + if ((cur - message) % 4 != 0)
> + cur += 4 - (cur - message) % 4;
> +
> + if (dbus_message->body_signature &&
> + dbus_message->body_signature[0] == 's' &&
> + dbus_message->body_signature[1] == 's') {
> + int str_len;
> + str_len = *(u32 *)cur;
> + cur += 4;
> + dbus_message->arg1 = cur;
> + cur += str_len + 1;
> + }
> +
> + if ((cur - message) % 4 != 0)
> + cur += 4 - (cur - message) % 4;
> +
> + if (dbus_message->body_signature &&
> + dbus_message->body_signature[0] == 's' &&
> + dbus_message->body_signature[1] == 's' &&
> + dbus_message->body_signature[2] == 's') {
> + int str_len;
> + str_len = *(u32 *)cur;
> + cur += 4;
> + dbus_message->arg2 = cur;
> + cur += str_len + 1;
> + }
> +
> + if ((cur - message) % 4 != 0)
> + cur += 4 - (cur - message) % 4;
> +
> + if (dbus_message->type == DBUS_MESSAGE_TYPE_SIGNAL &&
> + dbus_message->sender && dbus_message->path &&
> + dbus_message->interface && dbus_message->member &&
> + dbus_message->arg0 &&
> + strcmp(dbus_message->sender, "org.freedesktop.DBus") == 0 &&
> + strcmp(dbus_message->interface, "org.freedesktop.DBus") == 0 &&
> + strcmp(dbus_message->path, "/org/freedesktop/DBus") == 0) {
> + if (strcmp(dbus_message->member, "NameAcquired") == 0)
> + dbus_message->name_acquired = dbus_message->arg0;
> + else if (strcmp(dbus_message->member, "NameLost") == 0)
> + dbus_message->name_lost = dbus_message->arg0;
> + }
> +
> + return 0;
> +}
> diff --git a/net/netfilter/nfdbus/message.h b/net/netfilter/nfdbus/message.h
> new file mode 100644
> index 0000000..e3ea4d3
> --- /dev/null
> +++ b/net/netfilter/nfdbus/message.h
> @@ -0,0 +1,71 @@
> +/*
> + * message.h Basic D-Bus message parsing
> + *
> + * Copyright (C) 2010 Collabora Ltd
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + *
> + */
> +
> +#ifndef DBUS_MESSAGE_H
> +#define DBUS_MESSAGE_H
> +
> +#include <linux/list.h>
> +
> +#define DBUS_MAXIMUM_MATCH_RULE_LENGTH 1024
> +
> +/* Types of message */
> +
> +#define DBUS_MESSAGE_TYPE_INVALID 0
> +#define DBUS_MESSAGE_TYPE_METHOD_CALL 1
> +#define DBUS_MESSAGE_TYPE_METHOD_RETURN 2
> +#define DBUS_MESSAGE_TYPE_ERROR 3
> +#define DBUS_MESSAGE_TYPE_SIGNAL 4
> +#define DBUS_NUM_MESSAGE_TYPES 5
> +
> +/* No need to implement a feature-complete parser. It only implement what is
> + * needed by the bus. */
> +struct dbus_message {
> + char *message;
> + size_t len;
> + size_t new_len;
> +
> + /* direct pointers to the fields */
> + int type;
> + char *path;
> + char *interface;
> + char *member;
> + char *destination;
> + char *sender;
> + char *body_signature;
> + int body_length;
> + char *arg0;
> + char *arg1;
> + char *arg2;
> + char *name_acquired;
> + char *name_lost;
> +
> + /* How to add the 'sender' field in the headers */
> + int new_header_offset;
> + int len_offset;
> + int padding_end;
> +};
> +
> +int dbus_message_type_from_string(const char *type_str);
> +
> +int dbus_message_parse(unsigned char *message, size_t len,
> + struct dbus_message *dbus_message);
> +
> +#endif /* DBUS_MESSAGE_H */
> --
> 1.7.10
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net-next 15/15] netfilter: add netfilter D-Bus module
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
To: netdev, linux-kernel, David S. Miller; +Cc: Alban Crequy
In-Reply-To: <1340988354-26981-1-git-send-email-vincent.sanders@collabora.co.uk>
From: Alban Crequy <alban.crequy@collabora.co.uk>
AF_BUS has netfilter hooks on the packet sending path. This allows the
netfilter subsystem to register netfilter hook handlers.
The netfilter_dbus module allows to inspect D-Bus messages and take
actions based on the information contained on these messages.
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
---
net/netfilter/Kconfig | 2 +
net/netfilter/Makefile | 3 +
net/netfilter/nfdbus/Kconfig | 12 ++
net/netfilter/nfdbus/Makefile | 6 +
net/netfilter/nfdbus/nfdbus.c | 456 +++++++++++++++++++++++++++++++++++++++++
net/netfilter/nfdbus/nfdbus.h | 44 ++++
6 files changed, 523 insertions(+)
create mode 100644 net/netfilter/nfdbus/Kconfig
create mode 100644 net/netfilter/nfdbus/Makefile
create mode 100644 net/netfilter/nfdbus/nfdbus.c
create mode 100644 net/netfilter/nfdbus/nfdbus.h
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c19b214..a105d9b 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -1187,3 +1187,5 @@ endmenu
source "net/netfilter/ipset/Kconfig"
source "net/netfilter/ipvs/Kconfig"
+
+source "net/netfilter/nfdbus/Kconfig"
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 1c5160f..6dd4ade 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -123,3 +123,6 @@ obj-$(CONFIG_IP_SET) += ipset/
# IPVS
obj-$(CONFIG_IP_VS) += ipvs/
+
+# Dbus
+obj-$(CONFIG_NETFILTER_DBUS) += nfdbus/
diff --git a/net/netfilter/nfdbus/Kconfig b/net/netfilter/nfdbus/Kconfig
new file mode 100644
index 0000000..25699a1
--- /dev/null
+++ b/net/netfilter/nfdbus/Kconfig
@@ -0,0 +1,12 @@
+#
+# Netfilter D-Bus module configuration
+#
+config NETFILTER_DBUS
+ tristate "Netfilter D-bus (EXPERIMENTAL)"
+ depends on AF_BUS && CONNECTOR && EXPERIMENTAL
+ ---help---
+ If you say Y here, you will include support for a netfilter hook to
+ parse D-Bus messages sent using the AF_BUS socket address family.
+
+ To compile this as a module, choose M here: the module will be
+ called netfilter_dbus.
diff --git a/net/netfilter/nfdbus/Makefile b/net/netfilter/nfdbus/Makefile
new file mode 100644
index 0000000..1a825f8
--- /dev/null
+++ b/net/netfilter/nfdbus/Makefile
@@ -0,0 +1,6 @@
+#
+# Makefile for the netfilter D-Bus module
+#
+obj-$(CONFIG_NETFILTER_DBUS) += netfilter_dbus.o
+
+netfilter_dbus-y := nfdbus.o message.o matchrule.o
diff --git a/net/netfilter/nfdbus/nfdbus.c b/net/netfilter/nfdbus/nfdbus.c
new file mode 100644
index 0000000..f6642e2
--- /dev/null
+++ b/net/netfilter/nfdbus/nfdbus.c
@@ -0,0 +1,456 @@
+/*
+ * nfdbus.c - Netfilter module for AF_BUS/BUS_PROTO_DBUS.
+ */
+
+#define DRIVER_AUTHOR "Alban Crequy"
+#define DRIVER_DESC "Netfilter module for AF_BUS/BUS_PROTO_DBUS."
+
+#include "nfdbus.h"
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter.h>
+#include <linux/connector.h>
+#include <net/af_bus.h>
+
+#include "message.h"
+#include "matchrule.h"
+
+static struct nf_hook_ops nfho_dbus;
+
+static struct cb_id cn_cmd_id = { CN_IDX_NFDBUS, CN_VAL_NFDBUS };
+
+static unsigned int hash;
+
+/* Scoped by AF_BUS address */
+struct hlist_head matchrules_table[BUS_HASH_SIZE];
+DEFINE_SPINLOCK(matchrules_lock);
+
+static struct bus_match_maker *find_match_maker(struct sockaddr_bus *addr,
+ bool create, bool delete)
+{
+ u64 hash;
+ struct hlist_node *node;
+ struct bus_match_maker *matchmaker;
+ int path_len = strlen(addr->sbus_path);
+
+ hash = csum_partial(addr->sbus_path,
+ strlen(addr->sbus_path), 0);
+ hash ^= addr->sbus_addr.s_addr;
+ hash ^= hash >> 32;
+ hash ^= hash >> 16;
+ hash ^= hash >> 8;
+ hash &= 0xff;
+
+ spin_lock(&matchrules_lock);
+ hlist_for_each_entry(matchmaker, node, &matchrules_table[hash],
+ table_node) {
+ if (addr->sbus_family == matchmaker->addr.sbus_family &&
+ addr->sbus_addr.s_addr == matchmaker->addr.sbus_addr.s_addr &&
+ !memcmp(addr->sbus_path, matchmaker->addr.sbus_path,
+ path_len)) {
+ kref_get(&matchmaker->kref);
+ if (delete)
+ hlist_del(&matchmaker->table_node);
+ spin_unlock(&matchrules_lock);
+ pr_debug("Found matchmaker for hash %llu", hash);
+ return matchmaker;
+ }
+ }
+ spin_unlock(&matchrules_lock);
+
+ if (!create) {
+ pr_debug("Matchmaker for hash %llu not found", hash);
+ return NULL;
+ }
+
+ matchmaker = bus_matchmaker_new(GFP_ATOMIC);
+ matchmaker->addr.sbus_family = addr->sbus_family;
+ matchmaker->addr.sbus_addr.s_addr = addr->sbus_addr.s_addr;
+ memcpy(matchmaker->addr.sbus_path, addr->sbus_path, BUS_PATH_MAX);
+
+ pr_debug("Create new matchmaker for hash %llu\n", hash);
+ spin_lock(&matchrules_lock);
+ hlist_add_head(&matchmaker->table_node, &matchrules_table[hash]);
+ kref_get(&matchmaker->kref);
+ spin_unlock(&matchrules_lock);
+ return matchmaker;
+}
+
+static unsigned int dbus_filter(unsigned int hooknum,
+ struct sk_buff *skb,
+ const struct net_device *in,
+ const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct bus_send_context *sendctx;
+ struct bus_match_maker *matchmaker = NULL;
+ struct bus_match_maker *sender = NULL;
+ struct dbus_message msg = {0,};
+ unsigned char *data;
+ size_t len;
+ int err;
+ int ret;
+
+ if (!skb->sk || skb->sk->sk_family != PF_BUS) {
+ WARN(1, "netfilter_dbus received an invalid skb");
+ return NF_DROP;
+ }
+
+ data = skb->data;
+ sendctx = BUSCB(skb).sendctx;
+ if (!sendctx || !sendctx->sender || !sendctx->sender_socket) {
+ WARN(1, "netfilter_dbus received an AF_BUS packet" \
+ " without context. This is a bug. Dropping the"
+ " packet.");
+ return NF_DROP;
+ }
+
+ if (sendctx->sender_socket->sk->sk_protocol != BUS_PROTO_DBUS) {
+ /* This kernel module is for D-Bus. It must not
+ * interfere with other users of AF_BUS. */
+ return NF_ACCEPT;
+ }
+ if (sendctx->recipient)
+ matchmaker = find_match_maker(sendctx->recipient, false, false);
+
+ len = skb_tail_pointer(skb) - data;
+
+ if (sendctx->to_master && sendctx->main_recipient) {
+ pr_debug("AF_BUS packet to the bus master. ACCEPT.\n");
+ ret = NF_ACCEPT;
+ goto out;
+ }
+
+ if (sendctx->main_recipient && !sendctx->bus_master_side) {
+ pr_debug("AF_BUS packet from a peer to a peer (unicast). ACCEPT.\n");
+ ret = NF_ACCEPT;
+ goto out;
+ }
+
+ err = dbus_message_parse(data, len, &msg);
+ if (err) {
+ if (!sendctx->main_recipient) {
+ pr_debug("AF_BUS packet for an eavesdropper or " \
+ "multicast is not parsable. DROP.\n");
+ ret = NF_DROP;
+ goto out;
+ } else if (sendctx->bus_master_side) {
+ pr_debug("AF_BUS packet from bus master is not parsable. ACCEPT.\n");
+ ret = NF_ACCEPT;
+ goto out;
+ } else {
+ pr_debug("AF_BUS packet from peer is not parsable. DROP.\n");
+ ret = NF_DROP;
+ goto out;
+ }
+ }
+
+ if (sendctx->bus_master_side && !sendctx->main_recipient) {
+ pr_debug("AF_BUS packet '%s' from the bus master is for an " \
+ "eavesdropper. DROP.\n",
+ msg.member ? msg.member : "");
+ ret = NF_DROP;
+ goto out;
+ }
+ if (sendctx->bus_master_side) {
+ if (msg.name_acquired) {
+ pr_debug("New name: %s [%p %p].\n",
+ msg.name_acquired, sendctx->sender,
+ sendctx->recipient);
+
+ sender = find_match_maker(sendctx->sender, true, false);
+ bus_matchmaker_add_name(sender, msg.name_acquired,
+ GFP_ATOMIC);
+ }
+ if (msg.name_lost) {
+ pr_debug("Lost name: %s [%p %p].\n",
+ msg.name_lost, sendctx->sender,
+ sendctx->recipient);
+
+ sender = find_match_maker(sendctx->sender, true, false);
+ bus_matchmaker_remove_name(sender, msg.name_acquired);
+ }
+
+ pr_debug("AF_BUS packet '%s' from the bus master. ACCEPT.\n",
+ msg.member ? msg.member : "");
+ ret = NF_ACCEPT;
+ goto out;
+ }
+
+ pr_debug("Multicast AF_BUS packet, %ld bytes, " \
+ "considering recipient %lld...\n", len,
+ sendctx->recipient ? sendctx->recipient->sbus_addr.s_addr : 0);
+
+ pr_debug("Message type %d %s->%s [iface: %s][member: %s][matchmaker=%p]...\n",
+ msg.type,
+ msg.sender ? msg.sender : "",
+ msg.destination ? msg.destination : "",
+ msg.interface ? msg.interface : "",
+ msg.member ? msg.member : "",
+ matchmaker);
+
+ if (!matchmaker) {
+ pr_debug("No match rules for this recipient. DROP.\n");
+ ret = NF_DROP;
+ goto out;
+ }
+
+ sender = find_match_maker(sendctx->sender, true, false);
+ err = bus_matchmaker_filter(matchmaker, sender, sendctx->eavesdropper,
+ &msg);
+ if (err) {
+ pr_debug("Matchmaker: ACCEPT.\n");
+ ret = NF_ACCEPT;
+ goto out;
+ } else {
+ pr_debug("Matchmaker: DROP.\n");
+ ret = NF_DROP;
+ goto out;
+ }
+
+out:
+ if (matchmaker)
+ kref_put(&matchmaker->kref, bus_matchmaker_free);
+ if (sender)
+ kref_put(&sender->kref, bus_matchmaker_free);
+ return ret;
+}
+
+/* Taken from drbd_nl_send_reply() */
+static void nfdbus_nl_send_reply(struct cn_msg *msg, int ret_code)
+{
+ char buffer[sizeof(struct cn_msg)+sizeof(struct nfdbus_nl_cfg_reply)];
+ struct cn_msg *cn_reply = (struct cn_msg *) buffer;
+ struct nfdbus_nl_cfg_reply *reply =
+ (struct nfdbus_nl_cfg_reply *)cn_reply->data;
+ int rr;
+
+ memset(buffer, 0, sizeof(buffer));
+ cn_reply->id = msg->id;
+
+ cn_reply->seq = msg->seq;
+ cn_reply->ack = msg->ack + 1;
+ cn_reply->len = sizeof(struct nfdbus_nl_cfg_reply);
+ cn_reply->flags = 0;
+
+ reply->ret_code = ret_code;
+
+ rr = cn_netlink_send(cn_reply, 0, GFP_NOIO);
+ if (rr && rr != -ESRCH)
+ pr_debug("nfdbus: cn_netlink_send()=%d\n", rr);
+}
+
+/**
+ * nfdbus_check_perm - check if a pid is allowed to update match rules
+ * @sockaddr_bus: the socket address of the bus
+ * @pid: the process id that wants to update the match rules set
+ *
+ * Test if a given process id is allowed to update the match rules set
+ * for this bus. Only the process that owns the bus master listen socket
+ * is allowed to update the match rules set for the bus.
+ */
+static bool nfdbus_check_perm(struct sockaddr_bus *sbusname, pid_t pid)
+{
+ struct net *net = get_net_ns_by_pid(pid);
+ struct sock *s;
+ struct bus_address *addr;
+ struct hlist_node *node;
+ int offset = (sbusname->sbus_path[0] == '\0');
+ int path_len = strnlen(sbusname->sbus_path + offset, BUS_PATH_MAX);
+ int len;
+ if (!net)
+ return false;
+
+ len = path_len + 1 + sizeof(__kernel_sa_family_t) +
+ sizeof(struct bus_addr);
+
+ spin_lock(&bus_address_lock);
+
+ hlist_for_each_entry(addr, node, &bus_address_table[hash],
+ table_node) {
+ s = addr->sock;
+
+ if (s->sk_protocol != BUS_PROTO_DBUS)
+ continue;
+
+ if (!net_eq(sock_net(s), net))
+ continue;
+
+ if (addr->len == len &&
+ addr->name->sbus_family == sbusname->sbus_family &&
+ addr->name->sbus_addr.s_addr == BUS_MASTER_ADDR &&
+ bus_same_bus(addr->name, sbusname) &&
+ pid_nr(s->sk_peer_pid) == pid) {
+ spin_unlock(&bus_address_lock);
+ return true;
+ }
+ }
+
+ spin_unlock(&bus_address_lock);
+
+ return false;
+}
+
+static void cn_cmd_cb(struct cn_msg *msg, struct netlink_skb_parms *nsp)
+{
+ struct nfdbus_nl_cfg_req *nlp = (struct nfdbus_nl_cfg_req *)msg->data;
+ struct cn_msg *cn_reply;
+ struct nfdbus_nl_cfg_reply *reply;
+ int retcode, rr;
+ pid_t pid = task_tgid_vnr(current);
+ int reply_size = sizeof(struct cn_msg)
+ + sizeof(struct nfdbus_nl_cfg_reply);
+
+ pr_debug("nfdbus: %s nsp->pid=%d pid=%d\n", __func__, nsp->pid, pid);
+
+ if (!nfdbus_check_perm(&nlp->addr, pid)) {
+ pr_debug(KERN_ERR "nfdbus: pid=%d is not allowed!\n", pid);
+ retcode = EPERM;
+ goto fail;
+ }
+
+ cn_reply = kzalloc(reply_size, GFP_KERNEL);
+ if (!cn_reply) {
+ retcode = ENOMEM;
+ goto fail;
+ }
+ reply = (struct nfdbus_nl_cfg_reply *) cn_reply->data;
+
+ if (msg->len < sizeof(struct nfdbus_nl_cfg_req)) {
+ reply->ret_code = EINVAL;
+ } else if (nlp->cmd == NFDBUS_CMD_ADDMATCH) {
+ struct bus_match_rule *rule;
+ struct bus_match_maker *matchmaker;
+ reply->ret_code = 0;
+
+ if (msg->len == 0)
+ reply->ret_code = EINVAL;
+
+ rule = bus_match_rule_parse(nlp->data, GFP_ATOMIC);
+ if (rule) {
+ matchmaker = find_match_maker(&nlp->addr, true, false);
+ pr_debug("Add match rule for matchmaker %p\n",
+ matchmaker);
+ bus_matchmaker_add_rule(matchmaker, rule);
+ kref_put(&matchmaker->kref, bus_matchmaker_free);
+ } else {
+ reply->ret_code = EINVAL;
+ }
+ } else if (nlp->cmd == NFDBUS_CMD_REMOVEMATCH) {
+ struct bus_match_rule *rule;
+ struct bus_match_maker *matchmaker;
+
+ rule = bus_match_rule_parse(nlp->data, GFP_ATOMIC);
+ matchmaker = find_match_maker(&nlp->addr, false, false);
+ if (!matchmaker) {
+ reply->ret_code = EINVAL;
+ } else {
+ pr_debug("Remove match rule for matchmaker %p\n",
+ matchmaker);
+ bus_matchmaker_remove_rule_by_value(matchmaker, rule);
+ kref_put(&matchmaker->kref, bus_matchmaker_free);
+ reply->ret_code = 0;
+ }
+ bus_match_rule_free(rule);
+
+ } else if (nlp->cmd == NFDBUS_CMD_REMOVEALLMATCH) {
+ struct bus_match_maker *matchmaker;
+
+ matchmaker = find_match_maker(&nlp->addr, false, true);
+ if (!matchmaker) {
+ reply->ret_code = EINVAL;
+ } else {
+ pr_debug("Remove matchmaker %p\n", matchmaker);
+ kref_put(&matchmaker->kref, bus_matchmaker_free);
+ kref_put(&matchmaker->kref, bus_matchmaker_free);
+ reply->ret_code = 0;
+ }
+
+ } else {
+ reply->ret_code = EINVAL;
+ }
+
+ cn_reply->id = msg->id;
+ cn_reply->seq = msg->seq;
+ cn_reply->ack = msg->ack + 1;
+ cn_reply->len = sizeof(struct nfdbus_nl_cfg_reply);
+ cn_reply->flags = 0;
+
+ rr = cn_netlink_reply(cn_reply, nsp->pid, GFP_KERNEL);
+ if (rr && rr != -ESRCH)
+ pr_debug("nfdbus: cn_netlink_send()=%d\n", rr);
+ pr_debug("nfdbus: cn_netlink_reply(pid=%d)=%d\n", nsp->pid, rr);
+
+ kfree(cn_reply);
+ return;
+fail:
+ nfdbus_nl_send_reply(msg, retcode);
+}
+
+static int __init nfdbus_init(void)
+{
+ int err;
+ struct bus_addr master_addr;
+
+ master_addr.s_addr = BUS_MASTER_ADDR;
+ hash = bus_compute_hash(master_addr);
+
+ pr_debug("Loading netfilter_dbus\n");
+
+ /* Install D-Bus netfilter hook */
+ nfho_dbus.hook = dbus_filter;
+ nfho_dbus.hooknum = NF_BUS_SENDING;
+ nfho_dbus.pf = NFPROTO_BUS; /* Do not use PF_BUS, you fool! */
+ nfho_dbus.priority = 0;
+ nfho_dbus.owner = THIS_MODULE;
+ err = nf_register_hook(&nfho_dbus);
+ if (err)
+ return err;
+ pr_debug("Netfilter hook for D-Bus: installed.\n");
+
+ /* Install connector hook */
+ err = cn_add_callback(&cn_cmd_id, "nfdbus", cn_cmd_cb);
+ if (err)
+ goto err_cn_cmd_out;
+ pr_debug("Connector hook: installed.\n");
+
+ return 0;
+
+err_cn_cmd_out:
+ nf_unregister_hook(&nfho_dbus);
+
+ return err;
+}
+
+static void __exit nfdbus_cleanup(void)
+{
+ int i;
+ struct hlist_node *node, *tmp;
+ struct bus_match_maker *matchmaker;
+ nf_unregister_hook(&nfho_dbus);
+
+ cn_del_callback(&cn_cmd_id);
+
+ spin_lock(&matchrules_lock);
+ for (i = 0; i < BUS_HASH_SIZE; i++) {
+ hlist_for_each_entry_safe(matchmaker, node, tmp,
+ &matchrules_table[i], table_node) {
+ hlist_del(&matchmaker->table_node);
+ kref_put(&matchmaker->kref, bus_matchmaker_free);
+ }
+ }
+ spin_unlock(&matchrules_lock);
+
+ pr_debug("Unloading netfilter_dbus\n");
+}
+
+module_init(nfdbus_init);
+module_exit(nfdbus_cleanup);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
+MODULE_ALIAS_NET_PF_PROTO(PF_BUS, BUS_PROTO_DBUS);
diff --git a/net/netfilter/nfdbus/nfdbus.h b/net/netfilter/nfdbus/nfdbus.h
new file mode 100644
index 0000000..477bde3
--- /dev/null
+++ b/net/netfilter/nfdbus/nfdbus.h
@@ -0,0 +1,44 @@
+/*
+ * nfdbus.h Netfilter module for AF_BUS/BUS_PROTO_DBUS.
+ *
+ * Copyright (C) 2012 Collabora Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#ifndef NETFILTER_DBUS_H
+#define NETFILTER_DBUS_H
+
+#include <linux/types.h>
+#include <linux/bus.h>
+
+#define NFDBUS_CMD_ADDMATCH 0x01
+#define NFDBUS_CMD_REMOVEMATCH 0x02
+#define NFDBUS_CMD_REMOVEALLMATCH 0x03
+
+struct nfdbus_nl_cfg_req {
+ __u32 cmd;
+ __u32 len;
+ struct sockaddr_bus addr;
+ __u64 pad;
+ unsigned char data[0];
+};
+
+struct nfdbus_nl_cfg_reply {
+ __u32 ret_code;
+};
+
+#endif /* NETFILTER_DBUS_H */
--
1.7.10
^ permalink raw reply related
* [PATCH net-next 14/15] netfilter: nfdbus: Add D-bus match rule implementation
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
To: netdev, linux-kernel, David S. Miller
Cc: Javier Martinez Canillas, Alban Crequy
In-Reply-To: <1340988354-26981-1-git-send-email-vincent.sanders@collabora.co.uk>
From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
The D-Bus netfilter module needs to decode D-Bus match rules to decide
if a given peer can receive or not a D-Bus message. Add a match rule
implementation to be used by the netfilter D-Bus module.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
---
net/netfilter/nfdbus/matchrule.c | 1132 ++++++++++++++++++++++++++++++++++++++
net/netfilter/nfdbus/matchrule.h | 82 +++
2 files changed, 1214 insertions(+)
create mode 100644 net/netfilter/nfdbus/matchrule.c
create mode 100644 net/netfilter/nfdbus/matchrule.h
diff --git a/net/netfilter/nfdbus/matchrule.c b/net/netfilter/nfdbus/matchrule.c
new file mode 100644
index 0000000..4106bd5
--- /dev/null
+++ b/net/netfilter/nfdbus/matchrule.c
@@ -0,0 +1,1132 @@
+/*
+ * matchrule.c D-Bus match rule implementation
+ *
+ * Based on signals.c from dbus
+ *
+ * Copyright (C) 2010 Collabora, Ltd.
+ * Copyright (C) 2003, 2005 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#include "matchrule.h"
+
+#include <linux/rbtree.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+
+#include "message.h"
+
+enum bus_match_flags {
+ BUS_MATCH_MESSAGE_TYPE = 1 << 0,
+ BUS_MATCH_INTERFACE = 1 << 1,
+ BUS_MATCH_MEMBER = 1 << 2,
+ BUS_MATCH_SENDER = 1 << 3,
+ BUS_MATCH_DESTINATION = 1 << 4,
+ BUS_MATCH_PATH = 1 << 5,
+ BUS_MATCH_ARGS = 1 << 6,
+ BUS_MATCH_PATH_NAMESPACE = 1 << 7,
+ BUS_MATCH_CLIENT_IS_EAVESDROPPING = 1 << 8
+};
+
+struct bus_match_rule {
+ /* For debugging only*/
+ char *rule_text;
+
+ unsigned int flags; /**< BusMatchFlags */
+
+ int message_type;
+ char *interface;
+ char *member;
+ char *sender;
+ char *destination;
+ char *path;
+
+ unsigned int *arg_lens;
+ char **args;
+ int args_len;
+
+ /* bus_match_rule is attached to rule_pool, either in a simple
+ * double-linked list if the rule does not have any interface, or in a
+ * red-black tree sorted by interface. If several rules can have the
+ * same interface, the first one is attached with struct rb_node and the
+ * next ones are in the list
+ */
+
+ struct rb_node node;
+ /* Doubly-linked non-circular list. If the rule has an interface, it is
+ * in the rb tree and the single head is right here. Otherwise, the
+ * single head is in rule_pool->rules_without_iface. With this data
+ * structure, we don't need any allocation to insert or remove the rule.
+ */
+ struct hlist_head first;
+ struct hlist_node list;
+
+ /* used to delete all names from the tree */
+ struct list_head del_list;
+};
+
+struct dbus_name {
+ struct rb_node node;
+ char *name;
+
+ /* used to delete all names from the tree */
+ struct list_head del_list;
+};
+
+#define BUS_MATCH_ARG_IS_PATH 0x8000000u
+
+#define DBUS_STRING_MAX_LENGTH 1024
+
+/** Max length of a match rule string; to keep people from hosing the
+ * daemon with some huge rule
+ */
+#define DBUS_MAXIMUM_MATCH_RULE_LENGTH 1024
+
+struct bus_match_rule *bus_match_rule_new(gfp_t gfp_flags)
+{
+ struct bus_match_rule *rule;
+
+ rule = kzalloc(sizeof(struct bus_match_rule), gfp_flags);
+ if (rule == NULL)
+ return NULL;
+
+ return rule;
+}
+
+void bus_match_rule_free(struct bus_match_rule *rule)
+{
+ kfree(rule->rule_text);
+ kfree(rule->interface);
+ kfree(rule->member);
+ kfree(rule->sender);
+ kfree(rule->destination);
+ kfree(rule->path);
+ kfree(rule->arg_lens);
+
+ /* can't use dbus_free_string_array() since there
+ * are embedded NULL
+ */
+ if (rule->args) {
+ int i;
+
+ i = 0;
+ while (i < rule->args_len) {
+ kfree(rule->args[i]);
+ ++i;
+ }
+
+ kfree(rule->args);
+ }
+
+ kfree(rule);
+}
+
+static int
+bus_match_rule_set_message_type(struct bus_match_rule *rule,
+ int type,
+ gfp_t gfp_flags)
+{
+ rule->flags |= BUS_MATCH_MESSAGE_TYPE;
+
+ rule->message_type = type;
+
+ return 1;
+}
+
+static int
+bus_match_rule_set_interface(struct bus_match_rule *rule,
+ const char *interface,
+ gfp_t gfp_flags)
+{
+ char *new;
+
+ WARN_ON(!interface);
+
+ new = kstrdup(interface, gfp_flags);
+ if (new == NULL)
+ return 0;
+
+ rule->flags |= BUS_MATCH_INTERFACE;
+ kfree(rule->interface);
+ rule->interface = new;
+
+ return 1;
+}
+
+static int
+bus_match_rule_set_member(struct bus_match_rule *rule,
+ const char *member,
+ gfp_t gfp_flags)
+{
+ char *new;
+
+ WARN_ON(!member);
+
+ new = kstrdup(member, gfp_flags);
+ if (new == NULL)
+ return 0;
+
+ rule->flags |= BUS_MATCH_MEMBER;
+ kfree(rule->member);
+ rule->member = new;
+
+ return 1;
+}
+
+static int
+bus_match_rule_set_sender(struct bus_match_rule *rule,
+ const char *sender,
+ gfp_t gfp_flags)
+{
+ char *new;
+
+ WARN_ON(!sender);
+
+ new = kstrdup(sender, gfp_flags);
+ if (new == NULL)
+ return 0;
+
+ rule->flags |= BUS_MATCH_SENDER;
+ kfree(rule->sender);
+ rule->sender = new;
+
+ return 1;
+}
+
+static int
+bus_match_rule_set_destination(struct bus_match_rule *rule,
+ const char *destination,
+ gfp_t gfp_flags)
+{
+ char *new;
+
+ WARN_ON(!destination);
+
+ new = kstrdup(destination, gfp_flags);
+ if (new == NULL)
+ return 0;
+
+ rule->flags |= BUS_MATCH_DESTINATION;
+ kfree(rule->destination);
+ rule->destination = new;
+
+ return 1;
+}
+
+#define ISWHITE(c) (((c) == ' ') || ((c) == '\t') || ((c) == '\n') || \
+ ((c) == '\r'))
+
+static int find_key(const char *str, int start, char *key, int *value_pos)
+{
+ const char *p;
+ const char *s;
+ const char *key_start;
+ const char *key_end;
+
+ s = str;
+
+ p = s + start;
+
+ while (*p && ISWHITE(*p))
+ ++p;
+
+ key_start = p;
+
+ while (*p && *p != '=' && !ISWHITE(*p))
+ ++p;
+
+ key_end = p;
+
+ while (*p && ISWHITE(*p))
+ ++p;
+
+ if (key_start == key_end) {
+ /* Empty match rules or trailing whitespace are OK */
+ *value_pos = p - s;
+ return 1;
+ }
+
+ if (*p != '=') {
+ pr_warn("Match rule has a key with no subsequent '=' character");
+ return 0;
+ }
+ ++p;
+
+ strncat(key, key_start, key_end - key_start);
+
+ *value_pos = p - s;
+
+ return 1;
+}
+
+static int find_value(const char *str, int start, const char *key, char *value,
+ int *value_end)
+{
+ const char *p;
+ const char *s;
+ char quote_char;
+ int orig_len;
+
+ orig_len = strlen(value);
+
+ s = str;
+
+ p = s + start;
+
+ quote_char = '\0';
+
+ while (*p) {
+ if (quote_char == '\0') {
+ switch (*p) {
+ case '\0':
+ goto done;
+
+ case '\'':
+ quote_char = '\'';
+ goto next;
+
+ case ',':
+ ++p;
+ goto done;
+
+ case '\\':
+ quote_char = '\\';
+ goto next;
+
+ default:
+ strncat(value, p, 1);
+ }
+ } else if (quote_char == '\\') {
+ /*\ only counts as an escape if escaping a quote mark */
+ if (*p != '\'')
+ strncat(value, "\\", 1);
+
+ strncat(value, p, 1);
+
+ quote_char = '\0';
+ } else {
+ if (*p == '\'')
+ quote_char = '\0';
+ else
+ strncat(value, p, 1);
+ }
+
+next:
+ ++p;
+ }
+
+done:
+
+ if (quote_char == '\\')
+ strncat(value, "\\", 1);
+ else if (quote_char == '\'') {
+ pr_warn("Unbalanced quotation marks in match rule");
+ return 0;
+ }
+
+ /* Zero-length values are allowed */
+
+ *value_end = p - s;
+
+ return 1;
+}
+
+/* duplicates aren't allowed so the real legitimate max is only 6 or
+ * so. Leaving extra so we don't have to bother to update it.
+ * FIXME this is sort of busted now with arg matching, but we let
+ * you match on up to 10 args for now
+ */
+#define MAX_RULE_TOKENS 16
+
+/* this is slightly too high level to be termed a "token"
+ * but let's not be pedantic.
+ */
+struct rule_token {
+ char *key;
+ char *value;
+};
+
+static int tokenize_rule(const char *rule_text,
+ struct rule_token tokens[MAX_RULE_TOKENS],
+ gfp_t gfp_flags)
+{
+ int i;
+ int pos;
+ int retval;
+
+ retval = 0;
+
+ i = 0;
+ pos = 0;
+ while (i < MAX_RULE_TOKENS &&
+ pos < strlen(rule_text)) {
+ char *key;
+ char *value;
+
+ key = kzalloc(DBUS_STRING_MAX_LENGTH, gfp_flags);
+ if (!key) {
+ pr_err("Out of memory");
+ return 0;
+ }
+
+ value = kzalloc(DBUS_STRING_MAX_LENGTH, gfp_flags);
+ if (!value) {
+ kfree(key);
+ pr_err("Out of memory");
+ return 0;
+ }
+
+ if (!find_key(rule_text, pos, key, &pos))
+ goto out;
+
+ if (strlen(key) == 0)
+ goto next;
+
+ tokens[i].key = key;
+
+ if (!find_value(rule_text, pos, tokens[i].key, value, &pos))
+ goto out;
+
+ tokens[i].value = value;
+
+next:
+ ++i;
+ }
+
+ retval = 1;
+
+out:
+ if (!retval) {
+ i = 0;
+ while (tokens[i].key || tokens[i].value) {
+ kfree(tokens[i].key);
+ kfree(tokens[i].value);
+ tokens[i].key = NULL;
+ tokens[i].value = NULL;
+ ++i;
+ }
+ }
+
+ return retval;
+}
+
+/*
+ * The format is comma-separated with strings quoted with single quotes
+ * as for the shell (to escape a literal single quote, use '\'').
+ *
+ * type='signal',sender='org.freedesktop.DBus',interface='org.freedesktop.DBus',
+ * member='Foo', path='/bar/foo',destination=':452345.34'
+ *
+ */
+struct bus_match_rule *bus_match_rule_parse(const char *rule_text,
+ gfp_t gfp_flags)
+{
+ struct bus_match_rule *rule;
+ struct rule_token tokens[MAX_RULE_TOKENS+1]; /* NULL termination + 1 */
+ int i;
+
+ if (strlen(rule_text) > DBUS_MAXIMUM_MATCH_RULE_LENGTH) {
+ pr_warn("Match rule text is %ld bytes, maximum is %d",
+ strlen(rule_text),
+ DBUS_MAXIMUM_MATCH_RULE_LENGTH);
+ return NULL;
+ }
+
+ memset(tokens, '\0', sizeof(tokens));
+
+ rule = bus_match_rule_new(gfp_flags);
+ if (rule == NULL) {
+ pr_err("Out of memory");
+ goto failed;
+ }
+
+ rule->rule_text = kstrdup(rule_text, gfp_flags);
+ if (rule->rule_text == NULL) {
+ pr_err("Out of memory");
+ goto failed;
+ }
+
+ if (!tokenize_rule(rule_text, tokens, gfp_flags))
+ goto failed;
+
+ i = 0;
+ while (tokens[i].key != NULL) {
+ const char *key = tokens[i].key;
+ const char *value = tokens[i].value;
+
+ if (strcmp(key, "type") == 0) {
+ int t;
+
+ if (rule->flags & BUS_MATCH_MESSAGE_TYPE) {
+ pr_warn("Key %s specified twice in match rule\n",
+ key);
+ goto failed;
+ }
+
+ t = dbus_message_type_from_string(value);
+
+ if (t == DBUS_MESSAGE_TYPE_INVALID) {
+ pr_warn("Invalid message type (%s) in match rule\n",
+ value);
+ goto failed;
+ }
+
+ if (!bus_match_rule_set_message_type(rule, t,
+ gfp_flags)) {
+ pr_err("Out of memeory");
+ goto failed;
+ }
+ } else if (strcmp(key, "sender") == 0) {
+ if (rule->flags & BUS_MATCH_SENDER) {
+ pr_warn("Key %s specified twice in match rule\n",
+ key);
+ goto failed;
+ }
+
+ if (!bus_match_rule_set_sender(rule, value,
+ gfp_flags)) {
+ pr_err("Out of memeory");
+ goto failed;
+ }
+ } else if (strcmp(key, "interface") == 0) {
+ if (rule->flags & BUS_MATCH_INTERFACE) {
+ pr_warn("Key %s specified twice in match rule\n",
+ key);
+ goto failed;
+ }
+
+ if (!bus_match_rule_set_interface(rule, value,
+ gfp_flags)) {
+ pr_err("Out of memeory");
+ goto failed;
+ }
+ } else if (strcmp(key, "member") == 0) {
+ if (rule->flags & BUS_MATCH_MEMBER) {
+ pr_warn("Key %s specified twice in match rule\n",
+ key);
+ goto failed;
+ }
+
+ if (!bus_match_rule_set_member(rule, value,
+ gfp_flags)) {
+ pr_err("Out of memeory");
+ goto failed;
+ }
+ } else if (strcmp(key, "destination") == 0) {
+ if (rule->flags & BUS_MATCH_DESTINATION) {
+ pr_warn("Key %s specified twice in match rule\n",
+ key);
+ goto failed;
+ }
+
+ if (!bus_match_rule_set_destination(rule, value,
+ gfp_flags)) {
+ pr_err("Out of memeory");
+ goto failed;
+ }
+ } else if (strcmp(key, "eavesdrop") == 0) {
+ if (strcmp(value, "true") == 0) {
+ rule->flags |= BUS_MATCH_CLIENT_IS_EAVESDROPPING;
+ } else if (strcmp(value, "false") == 0) {
+ rule->flags &= ~(BUS_MATCH_CLIENT_IS_EAVESDROPPING);
+ } else {
+ pr_warn("eavesdrop='%s' is invalid, " \
+ "it should be 'true' or 'false'\n",
+ value);
+ goto failed;
+ }
+ } else if (strncmp(key, "arg", 3) != 0) {
+ pr_warn("Unknown key \"%s\" in match rule\n",
+ key);
+ goto failed;
+ }
+
+ ++i;
+ }
+
+ goto out;
+
+failed:
+ if (rule) {
+ bus_match_rule_free(rule);
+ rule = NULL;
+ }
+
+out:
+
+ i = 0;
+ while (tokens[i].key || tokens[i].value) {
+ WARN_ON(i >= MAX_RULE_TOKENS);
+ kfree(tokens[i].key);
+ kfree(tokens[i].value);
+ ++i;
+ }
+
+ return rule;
+}
+
+/* return the match rule containing the hlist_head. It may not be the first
+ * match rule in the list. */
+struct bus_match_rule *match_rule_search(struct rb_root *root,
+ const char *interface)
+{
+ struct rb_node *node = root->rb_node;
+
+ while (node) {
+ struct bus_match_rule *data =
+ container_of(node, struct bus_match_rule, node);
+ int result;
+
+ result = strcmp(interface, data->interface);
+
+ if (result < 0)
+ node = node->rb_left;
+ else if (result > 0)
+ node = node->rb_right;
+ else
+ return data;
+ }
+ return NULL;
+}
+
+void match_rule_insert(struct rb_root *root, struct bus_match_rule *data)
+{
+ struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+ /* Figure out where to put new node */
+ while (*new) {
+ struct bus_match_rule *this =
+ container_of(*new, struct bus_match_rule, node);
+ int result = strcmp(data->interface, this->interface);
+
+ parent = *new;
+ if (result < 0)
+ new = &((*new)->rb_left);
+ else if (result > 0)
+ new = &((*new)->rb_right);
+ else {
+ /* the head is not used */
+ INIT_HLIST_HEAD(&data->first);
+ /* Add it at the beginning of the list */
+ hlist_add_head(&data->list, &this->first);
+ return;
+ }
+ }
+
+ /* this rule is single in its list */
+ INIT_HLIST_HEAD(&data->first);
+ hlist_add_head(&data->list, &data->first);
+
+ /* Add new node and rebalance tree. */
+ rb_link_node(&data->node, parent, new);
+ rb_insert_color(&data->node, root);
+}
+
+struct bus_match_maker *bus_matchmaker_new(gfp_t gfp_flags)
+{
+ struct bus_match_maker *matchmaker;
+ int i;
+
+ matchmaker = kzalloc(sizeof(struct bus_match_maker), gfp_flags);
+ if (matchmaker == NULL)
+ return NULL;
+
+ for (i = DBUS_MESSAGE_TYPE_INVALID; i < DBUS_NUM_MESSAGE_TYPES; i++) {
+ struct rule_pool *p = matchmaker->rules_by_type + i;
+
+ p->rules_by_iface = RB_ROOT;
+ }
+
+ kref_init(&matchmaker->kref);
+
+ return matchmaker;
+}
+
+void bus_matchmaker_free(struct kref *kref)
+{
+ struct bus_match_maker *matchmaker;
+ struct list_head del_list;
+ struct rb_node *n;
+ int i;
+
+ matchmaker = container_of(kref, struct bus_match_maker, kref);
+
+ /* free names */
+ INIT_LIST_HEAD(&del_list);
+ n = matchmaker->names.rb_node;
+ if (n) {
+ struct dbus_name *dbus_name, *cur, *tmp;
+
+ dbus_name = rb_entry(n, struct dbus_name, node);
+ list_add_tail(&dbus_name->del_list, &del_list);
+
+ list_for_each_entry(cur, &del_list, del_list) {
+ struct dbus_name *right, *left;
+ if (cur->node.rb_right) {
+ right = rb_entry(cur->node.rb_right,
+ struct dbus_name, node);
+ list_add_tail(&right->del_list, &del_list);
+ }
+ if (cur->node.rb_left) {
+ left = rb_entry(cur->node.rb_left,
+ struct dbus_name, node);
+ list_add_tail(&left->del_list, &del_list);
+ }
+ }
+ list_for_each_entry_safe(dbus_name, tmp, &del_list, del_list) {
+ kfree(dbus_name->name);
+ list_del(&dbus_name->del_list);
+ kfree(dbus_name);
+ }
+ }
+ WARN_ON(!list_empty_careful(&del_list));
+
+ /* free match rules */
+ for (i = 0 ; i < DBUS_NUM_MESSAGE_TYPES ; i++) {
+ struct rule_pool *pool = matchmaker->rules_by_type + i;
+ struct bus_match_rule *match_rule, *cur, *tmp;
+ struct hlist_node *list_tmp, *list_tmp2;
+
+ /* free match rules from the list */
+ hlist_for_each_entry_safe(cur, list_tmp, list_tmp2,
+ &pool->rules_without_iface, list) {
+ bus_match_rule_free(cur);
+ }
+
+ /* free match rules from the tree */
+ if (!pool->rules_by_iface.rb_node)
+ continue;
+ match_rule = rb_entry(pool->rules_by_iface.rb_node,
+ struct bus_match_rule, node);
+ list_add_tail(&match_rule->del_list, &del_list);
+
+ list_for_each_entry(cur, &del_list, del_list) {
+ struct bus_match_rule *right, *left;
+ if (cur->node.rb_right) {
+ right = rb_entry(cur->node.rb_right,
+ struct bus_match_rule, node);
+ list_add_tail(&right->del_list, &del_list);
+ }
+ if (cur->node.rb_left) {
+ left = rb_entry(cur->node.rb_left,
+ struct bus_match_rule, node);
+ list_add_tail(&left->del_list, &del_list);
+ }
+ }
+ list_for_each_entry_safe(match_rule, tmp, &del_list, del_list) {
+ /* keep a ref during the loop to ensure the first
+ * iteration of the loop does not delete it */
+ hlist_for_each_entry_safe(cur, list_tmp, list_tmp2,
+ &match_rule->first, list) {
+ if (cur != match_rule)
+ bus_match_rule_free(cur);
+ }
+ list_del(&match_rule->del_list);
+ bus_match_rule_free(match_rule);
+ }
+ WARN_ON(!list_empty_careful(&del_list));
+ }
+
+ kfree(matchmaker);
+}
+
+/* The rule can't be modified after it's added. */
+int bus_matchmaker_add_rule(struct bus_match_maker *matchmaker,
+ struct bus_match_rule *rule)
+{
+ struct rule_pool *pool;
+
+ WARN_ON(rule->message_type < 0);
+ WARN_ON(rule->message_type >= DBUS_NUM_MESSAGE_TYPES);
+
+ pool = matchmaker->rules_by_type + rule->message_type;
+
+ if (rule->interface)
+ match_rule_insert(&pool->rules_by_iface, rule);
+ else
+ hlist_add_head(&rule->list, &pool->rules_without_iface);
+
+ return 1;
+}
+
+static int match_rule_equal(struct bus_match_rule *a,
+ struct bus_match_rule *b)
+{
+ if (a->flags != b->flags)
+ return 0;
+
+ if ((a->flags & BUS_MATCH_MESSAGE_TYPE) &&
+ a->message_type != b->message_type)
+ return 0;
+
+ if ((a->flags & BUS_MATCH_MEMBER) &&
+ strcmp(a->member, b->member) != 0)
+ return 0;
+
+ if ((a->flags & BUS_MATCH_PATH) &&
+ strcmp(a->path, b->path) != 0)
+ return 0;
+
+ if ((a->flags & BUS_MATCH_INTERFACE) &&
+ strcmp(a->interface, b->interface) != 0)
+ return 0;
+
+ if ((a->flags & BUS_MATCH_SENDER) &&
+ strcmp(a->sender, b->sender) != 0)
+ return 0;
+
+ if ((a->flags & BUS_MATCH_DESTINATION) &&
+ strcmp(a->destination, b->destination) != 0)
+ return 0;
+
+ if (a->flags & BUS_MATCH_ARGS) {
+ int i;
+
+ if (a->args_len != b->args_len)
+ return 0;
+
+ i = 0;
+ while (i < a->args_len) {
+ int length;
+
+ if ((a->args[i] != NULL) != (b->args[i] != NULL))
+ return 0;
+
+ if (a->arg_lens[i] != b->arg_lens[i])
+ return 0;
+
+ length = a->arg_lens[i] & ~BUS_MATCH_ARG_IS_PATH;
+
+ if (a->args[i] != NULL) {
+ WARN_ON(!b->args[i]);
+ if (memcmp(a->args[i], b->args[i], length) != 0)
+ return 0;
+ }
+
+ ++i;
+ }
+ }
+
+ return 1;
+}
+
+/* Remove a single rule which is equal to the given rule by value */
+void bus_matchmaker_remove_rule_by_value(struct bus_match_maker *matchmaker,
+ struct bus_match_rule *rule)
+{
+ struct rule_pool *pool;
+
+ WARN_ON(rule->message_type < 0);
+ WARN_ON(rule->message_type >= DBUS_NUM_MESSAGE_TYPES);
+
+ pool = matchmaker->rules_by_type + rule->message_type;
+
+ if (rule->interface) {
+ struct bus_match_rule *head =
+ match_rule_search(&pool->rules_by_iface,
+ rule->interface);
+
+ struct hlist_node *cur;
+ struct bus_match_rule *cur_rule;
+ hlist_for_each_entry(cur_rule, cur, &head->first, list) {
+ if (match_rule_equal(cur_rule, rule)) {
+ hlist_del(cur);
+ if (hlist_empty(&head->first))
+ rb_erase(&head->node,
+ &pool->rules_by_iface);
+ bus_match_rule_free(cur_rule);
+ break;
+ }
+ }
+ } else {
+ struct hlist_head *head = &pool->rules_without_iface;
+
+ struct hlist_node *cur;
+ struct bus_match_rule *cur_rule;
+ hlist_for_each_entry(cur_rule, cur, head, list) {
+ if (match_rule_equal(cur_rule, rule)) {
+ hlist_del(cur);
+ bus_match_rule_free(cur_rule);
+ break;
+ }
+ }
+ }
+
+}
+
+static int connection_is_primary_owner(struct bus_match_maker *connection,
+ const char *service_name)
+{
+ struct rb_node *node = connection->names.rb_node;
+
+ if (!service_name)
+ return 0;
+
+ while (node) {
+ struct dbus_name *data = container_of(node, struct dbus_name,
+ node);
+ int result;
+
+ result = strcmp(service_name, data->name);
+
+ if (result < 0)
+ node = node->rb_left;
+ else if (result > 0)
+ node = node->rb_right;
+ else
+ return 1;
+ }
+ return 0;
+}
+
+static int match_rule_matches(struct bus_match_maker *matchmaker,
+ struct bus_match_maker *sender,
+ int eavesdrop,
+ struct bus_match_rule *rule,
+ const struct dbus_message *message)
+{
+ /* Don't consider the rule if this is a eavesdropping match rule
+ * and eavesdropping is not allowed on that peer */
+ if ((rule->flags & BUS_MATCH_CLIENT_IS_EAVESDROPPING) && !eavesdrop)
+ return 0;
+
+ /* Since D-Bus 1.5.6, match rules do not match messages which have a
+ * DESTINATION field unless the match rule specifically requests this
+ * by specifying eavesdrop='true' in the match rule. */
+ if (message->destination &&
+ !(rule->flags & BUS_MATCH_CLIENT_IS_EAVESDROPPING))
+ return 0;
+
+ if (rule->flags & BUS_MATCH_MEMBER) {
+ const char *member;
+
+ WARN_ON(!rule->member);
+
+ member = message->member;
+ if (member == NULL)
+ return 0;
+
+ if (strcmp(member, rule->member) != 0)
+ return 0;
+ }
+
+ if (rule->flags & BUS_MATCH_SENDER) {
+ WARN_ON(!rule->sender);
+
+ if (sender == NULL) {
+ if (strcmp(rule->sender,
+ "org.freedesktop.DBus") != 0)
+ return 0;
+ } else
+ if (!connection_is_primary_owner(sender, rule->sender))
+ return 0;
+ }
+
+ if (rule->flags & BUS_MATCH_DESTINATION) {
+ const char *destination;
+
+ WARN_ON(!rule->destination);
+
+ destination = message->destination;
+ if (destination == NULL)
+ return 0;
+
+ /* This will not just work out of the box because it this is
+ * an eavesdropping match rule. */
+ if (matchmaker == NULL) {
+ if (strcmp(rule->destination,
+ "org.freedesktop.DBus") != 0)
+ return 0;
+ } else
+ if (!connection_is_primary_owner(matchmaker,
+ rule->destination))
+ return 0;
+ }
+
+ if (rule->flags & BUS_MATCH_PATH) {
+ const char *path;
+
+ WARN_ON(!rule->path);
+
+ path = message->path;
+ if (path == NULL)
+ return 0;
+
+ if (strcmp(path, rule->path) != 0)
+ return 0;
+ }
+
+ return 1;
+}
+
+static bool get_recipients_from_list(struct bus_match_maker *matchmaker,
+ struct bus_match_maker *sender,
+ int eavesdrop,
+ struct hlist_head *rules,
+ const struct dbus_message *message)
+{
+ struct hlist_node *cur;
+ struct bus_match_rule *rule;
+
+ if (rules == NULL) {
+ pr_debug("no rules of this type\n");
+ return 0;
+ }
+
+ hlist_for_each_entry(rule, cur, rules, list) {
+ if (match_rule_matches(matchmaker, sender, eavesdrop, rule,
+ message)) {
+ pr_debug("[YES] deliver with match rule \"%s\"\n",
+ rule->rule_text);
+ return 1;
+ } else {
+ pr_debug("[NO] deliver with match rule \"%s\"\n",
+ rule->rule_text);
+ }
+ }
+ pr_debug("[NO] no match rules\n");
+ return 0;
+}
+
+static struct hlist_head
+*bus_matchmaker_get_rules(struct bus_match_maker *matchmaker,
+ int message_type, const char *interface)
+{
+ static struct hlist_head empty = {0,};
+ struct rule_pool *p;
+
+ WARN_ON(message_type < 0);
+ WARN_ON(message_type >= DBUS_NUM_MESSAGE_TYPES);
+
+ p = matchmaker->rules_by_type + message_type;
+
+ if (interface == NULL)
+ return &p->rules_without_iface;
+ else {
+ struct bus_match_rule *rule =
+ match_rule_search(&p->rules_by_iface, interface);
+ if (rule)
+ return &rule->first;
+ else
+ return ∅
+ }
+}
+
+bool bus_matchmaker_filter(struct bus_match_maker *matchmaker,
+ struct bus_match_maker *sender,
+ int eavesdrop,
+ const struct dbus_message *message)
+{
+ int type;
+ const char *interface;
+ struct hlist_head *neither, *just_type, *just_iface, *both;
+
+ type = message->type;
+ interface = message->interface;
+
+ neither = bus_matchmaker_get_rules(matchmaker,
+ DBUS_MESSAGE_TYPE_INVALID, NULL);
+ just_type = just_iface = both = NULL;
+
+ if (interface != NULL)
+ just_iface = bus_matchmaker_get_rules(matchmaker,
+ DBUS_MESSAGE_TYPE_INVALID,
+ interface);
+
+ if (type > DBUS_MESSAGE_TYPE_INVALID && type < DBUS_NUM_MESSAGE_TYPES) {
+ just_type = bus_matchmaker_get_rules(matchmaker, type, NULL);
+
+ if (interface != NULL)
+ both = bus_matchmaker_get_rules(matchmaker, type,
+ interface);
+ }
+
+ if (get_recipients_from_list(matchmaker, sender, eavesdrop, neither,
+ message))
+ return 1;
+ if (get_recipients_from_list(matchmaker, sender, eavesdrop, just_iface,
+ message))
+ return 1;
+ if (get_recipients_from_list(matchmaker, sender, eavesdrop, just_type,
+ message))
+ return 1;
+ if (get_recipients_from_list(matchmaker, sender, eavesdrop, both,
+ message))
+ return 1;
+
+ return connection_is_primary_owner(matchmaker, message->destination);
+}
+
+void bus_matchmaker_add_name(struct bus_match_maker *matchmaker,
+ const char *name,
+ gfp_t gfp_flags)
+{
+ struct dbus_name *dbus_name;
+ struct rb_node **new = &(matchmaker->names.rb_node), *parent = NULL;
+
+ dbus_name = kmalloc(sizeof(struct dbus_name), gfp_flags);
+ if (!dbus_name)
+ return;
+ dbus_name->name = kstrdup(name, gfp_flags);
+ if (!dbus_name->name)
+ return;
+
+ /* Figure out where to put new node */
+ while (*new) {
+ struct dbus_name *this = container_of(*new, struct dbus_name,
+ node);
+ int result = strcmp(dbus_name->name, this->name);
+
+ parent = *new;
+ if (result < 0)
+ new = &((*new)->rb_left);
+ else if (result > 0)
+ new = &((*new)->rb_right);
+ else
+ return;
+ }
+
+ /* Add new node and rebalance tree. */
+ rb_link_node(&dbus_name->node, parent, new);
+ rb_insert_color(&dbus_name->node, &matchmaker->names);
+}
+
+void bus_matchmaker_remove_name(struct bus_match_maker *matchmaker,
+ const char *name)
+{
+ struct rb_node *node = matchmaker->names.rb_node;
+
+ while (node) {
+ struct dbus_name *data = container_of(node, struct dbus_name,
+ node);
+ int result;
+
+ result = strcmp(name, data->name);
+
+ if (result < 0)
+ node = node->rb_left;
+ else if (result > 0)
+ node = node->rb_right;
+ else {
+ rb_erase(&data->node, &matchmaker->names);
+ kfree(data->name);
+ kfree(data);
+ }
+ }
+
+}
+
diff --git a/net/netfilter/nfdbus/matchrule.h b/net/netfilter/nfdbus/matchrule.h
new file mode 100644
index 0000000..e16580c
--- /dev/null
+++ b/net/netfilter/nfdbus/matchrule.h
@@ -0,0 +1,82 @@
+/*
+ * signals.h Bus signal connection implementation
+ *
+ * Copyright (C) 2003 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#ifndef BUS_SIGNALS_H
+#define BUS_SIGNALS_H
+
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+#include <net/af_bus.h>
+
+#include "message.h"
+
+struct bus_match_rule *bus_match_rule_new(gfp_t gfp_flags);
+void bus_match_rule_free(struct bus_match_rule *rule);
+
+struct bus_match_rule *bus_match_rule_parse(const char *rule_text,
+ gfp_t gfp_flags);
+
+struct rule_pool {
+ /* Maps non-NULL interface names to a list of bus_match_rule */
+ struct rb_root rules_by_iface;
+
+ /* List of bus_match_rule which don't specify an interface */
+ struct hlist_head rules_without_iface;
+};
+
+struct bus_match_maker {
+ struct sockaddr_bus addr;
+
+ struct hlist_node table_node;
+
+ /* Pools of rules, grouped by the type of message they match. 0
+ * (DBUS_MESSAGE_TYPE_INVALID) represents rules that do not specify a
+ * message type.
+ */
+ struct rule_pool rules_by_type[DBUS_NUM_MESSAGE_TYPES];
+
+ struct rb_root names;
+
+ struct kref kref;
+};
+
+
+struct bus_match_maker *bus_matchmaker_new(gfp_t gfp_flags);
+void bus_matchmaker_free(struct kref *kref);
+
+int bus_matchmaker_add_rule(struct bus_match_maker *matchmaker,
+ struct bus_match_rule *rule);
+void bus_matchmaker_remove_rule_by_value(struct bus_match_maker *matchmaker,
+ struct bus_match_rule *value);
+
+bool bus_matchmaker_filter(struct bus_match_maker *matchmaker,
+ struct bus_match_maker *sender,
+ int eavesdrop,
+ const struct dbus_message *message);
+
+void bus_matchmaker_add_name(struct bus_match_maker *matchmaker,
+ const char *name, gfp_t gfp_flags);
+void bus_matchmaker_remove_name(struct bus_match_maker *matchmaker,
+ const char *name);
+
+#endif /* BUS_SIGNALS_H */
--
1.7.10
^ permalink raw reply related
* [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
To: netdev, linux-kernel, David S. Miller
Cc: Javier Martinez Canillas, Alban Crequy
In-Reply-To: <1340988354-26981-1-git-send-email-vincent.sanders@collabora.co.uk>
From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
The netfilter D-Bus module needs to parse D-bus messages sent by
applications to decide whether a peer can receive or not a D-Bus
message. Add D-bus message parsing logic to be able to analyze.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
---
net/netfilter/nfdbus/message.c | 194 ++++++++++++++++++++++++++++++++++++++++
net/netfilter/nfdbus/message.h | 71 +++++++++++++++
2 files changed, 265 insertions(+)
create mode 100644 net/netfilter/nfdbus/message.c
create mode 100644 net/netfilter/nfdbus/message.h
diff --git a/net/netfilter/nfdbus/message.c b/net/netfilter/nfdbus/message.c
new file mode 100644
index 0000000..93c409c
--- /dev/null
+++ b/net/netfilter/nfdbus/message.c
@@ -0,0 +1,194 @@
+/*
+ * message.c Basic D-Bus message parsing
+ *
+ * Copyright (C) 2010-2012 Collabora Ltd
+ * Authors: Alban Crequy <alban.crequy@collabora.co.uk>
+ * Copyright (C) 2002, 2003, 2004, 2005 Red Hat Inc.
+ * Copyright (C) 2002, 2003 CodeFactory AB
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#include <linux/slab.h>
+
+#include "message.h"
+
+int dbus_message_type_from_string(const char *type_str)
+{
+ if (strcmp(type_str, "method_call") == 0)
+ return DBUS_MESSAGE_TYPE_METHOD_CALL;
+ if (strcmp(type_str, "method_return") == 0)
+ return DBUS_MESSAGE_TYPE_METHOD_RETURN;
+ else if (strcmp(type_str, "signal") == 0)
+ return DBUS_MESSAGE_TYPE_SIGNAL;
+ else if (strcmp(type_str, "error") == 0)
+ return DBUS_MESSAGE_TYPE_ERROR;
+ else
+ return DBUS_MESSAGE_TYPE_INVALID;
+}
+
+int dbus_message_parse(unsigned char *message, size_t len,
+ struct dbus_message *dbus_message)
+{
+ unsigned char *cur;
+ int array_header_len;
+
+ dbus_message->message = message;
+
+ if (len < 4 + 4 + 4 + 4 || message[1] == 0 || message[1] > 4)
+ return -EINVAL;
+
+ dbus_message->type = message[1];
+ dbus_message->body_length = *((u32 *)(message + 4));
+ cur = message + 12;
+ array_header_len = *(u32 *)cur;
+ dbus_message->len_offset = 12;
+ cur += 4;
+ while (cur < message + len
+ && cur < message + 12 + 4 + array_header_len) {
+ int header_code;
+ int signature_len;
+ unsigned char *signature;
+ int str_len;
+ unsigned char *str;
+
+ /* D-Bus alignment craziness */
+ if ((cur - message) % 8 != 0)
+ cur += 8 - (cur - message) % 8;
+
+ header_code = *(char *)cur;
+ cur++;
+ signature_len = *(char *)cur;
+ /* All header fields of the current D-Bus spec have a simple
+ * type, either o, s, g, or u */
+ if (signature_len != 1)
+ return -EINVAL;
+ cur++;
+ signature = cur;
+ cur += signature_len + 1;
+ if (signature[0] != 'o' &&
+ signature[0] != 's' &&
+ signature[0] != 'g' &&
+ signature[0] != 'u')
+ return -EINVAL;
+
+ if (signature[0] == 'u') {
+ cur += 4;
+ continue;
+ }
+
+ if (signature[0] != 'g') {
+ str_len = *(u32 *)cur;
+ cur += 4;
+ } else {
+ str_len = *(char *)cur;
+ cur += 1;
+ }
+
+ str = cur;
+ switch (header_code) {
+ case 1:
+ dbus_message->path = str;
+ break;
+ case 2:
+ dbus_message->interface = str;
+ break;
+ case 3:
+ dbus_message->member = str;
+ break;
+ case 6:
+ dbus_message->destination = str;
+ break;
+ case 7:
+ dbus_message->sender = str;
+ break;
+ case 8:
+ dbus_message->body_signature = str;
+ break;
+ }
+ cur += str_len + 1;
+ }
+
+ dbus_message->padding_end = (8 - (cur - message) % 8) % 8;
+
+ /* Jump to body D-Bus alignment craziness */
+ if ((cur - message) % 8 != 0)
+ cur += 8 - (cur - message) % 8;
+ dbus_message->new_header_offset = cur - message;
+
+ if (dbus_message->new_header_offset
+ + dbus_message->body_length != len) {
+ pr_warn("Message truncated? " \
+ "Header %d + Body %d != Length %zd\n",
+ dbus_message->new_header_offset,
+ dbus_message->body_length, len);
+ return -EINVAL;
+ }
+
+ if (dbus_message->body_signature &&
+ dbus_message->body_signature[0] == 's') {
+ int str_len;
+ str_len = *(u32 *)cur;
+ cur += 4;
+ dbus_message->arg0 = cur;
+ cur += str_len + 1;
+ }
+
+ if ((cur - message) % 4 != 0)
+ cur += 4 - (cur - message) % 4;
+
+ if (dbus_message->body_signature &&
+ dbus_message->body_signature[0] == 's' &&
+ dbus_message->body_signature[1] == 's') {
+ int str_len;
+ str_len = *(u32 *)cur;
+ cur += 4;
+ dbus_message->arg1 = cur;
+ cur += str_len + 1;
+ }
+
+ if ((cur - message) % 4 != 0)
+ cur += 4 - (cur - message) % 4;
+
+ if (dbus_message->body_signature &&
+ dbus_message->body_signature[0] == 's' &&
+ dbus_message->body_signature[1] == 's' &&
+ dbus_message->body_signature[2] == 's') {
+ int str_len;
+ str_len = *(u32 *)cur;
+ cur += 4;
+ dbus_message->arg2 = cur;
+ cur += str_len + 1;
+ }
+
+ if ((cur - message) % 4 != 0)
+ cur += 4 - (cur - message) % 4;
+
+ if (dbus_message->type == DBUS_MESSAGE_TYPE_SIGNAL &&
+ dbus_message->sender && dbus_message->path &&
+ dbus_message->interface && dbus_message->member &&
+ dbus_message->arg0 &&
+ strcmp(dbus_message->sender, "org.freedesktop.DBus") == 0 &&
+ strcmp(dbus_message->interface, "org.freedesktop.DBus") == 0 &&
+ strcmp(dbus_message->path, "/org/freedesktop/DBus") == 0) {
+ if (strcmp(dbus_message->member, "NameAcquired") == 0)
+ dbus_message->name_acquired = dbus_message->arg0;
+ else if (strcmp(dbus_message->member, "NameLost") == 0)
+ dbus_message->name_lost = dbus_message->arg0;
+ }
+
+ return 0;
+}
diff --git a/net/netfilter/nfdbus/message.h b/net/netfilter/nfdbus/message.h
new file mode 100644
index 0000000..e3ea4d3
--- /dev/null
+++ b/net/netfilter/nfdbus/message.h
@@ -0,0 +1,71 @@
+/*
+ * message.h Basic D-Bus message parsing
+ *
+ * Copyright (C) 2010 Collabora Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#ifndef DBUS_MESSAGE_H
+#define DBUS_MESSAGE_H
+
+#include <linux/list.h>
+
+#define DBUS_MAXIMUM_MATCH_RULE_LENGTH 1024
+
+/* Types of message */
+
+#define DBUS_MESSAGE_TYPE_INVALID 0
+#define DBUS_MESSAGE_TYPE_METHOD_CALL 1
+#define DBUS_MESSAGE_TYPE_METHOD_RETURN 2
+#define DBUS_MESSAGE_TYPE_ERROR 3
+#define DBUS_MESSAGE_TYPE_SIGNAL 4
+#define DBUS_NUM_MESSAGE_TYPES 5
+
+/* No need to implement a feature-complete parser. It only implement what is
+ * needed by the bus. */
+struct dbus_message {
+ char *message;
+ size_t len;
+ size_t new_len;
+
+ /* direct pointers to the fields */
+ int type;
+ char *path;
+ char *interface;
+ char *member;
+ char *destination;
+ char *sender;
+ char *body_signature;
+ int body_length;
+ char *arg0;
+ char *arg1;
+ char *arg2;
+ char *name_acquired;
+ char *name_lost;
+
+ /* How to add the 'sender' field in the headers */
+ int new_header_offset;
+ int len_offset;
+ int padding_end;
+};
+
+int dbus_message_type_from_string(const char *type_str);
+
+int dbus_message_parse(unsigned char *message, size_t len,
+ struct dbus_message *dbus_message);
+
+#endif /* DBUS_MESSAGE_H */
--
1.7.10
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox