Netdev List
 help / color / mirror / Atom feed
* [PATCH v2 1/5] net: fec: reset phy after pinctrl setup
From: Shawn Guo @ 2012-06-27 13:45 UTC (permalink / raw)
  To: David S. Miller
  Cc: Lothar Waßmann, Florian Fainelli, netdev, linux-arm-kernel,
	devicetree-discuss, Shawn Guo
In-Reply-To: <1340804724-29410-1-git-send-email-shawn.guo@linaro.org>

In case that bootloader or platform initialization does not set up
fec pins, the fec_reset_phy will not be able to succeed, because
fec_reset_phy is currently called before devm_pinctrl_get_select_default.
Move fec_reset_phy call to the place between devm_pinctrl_get_select_default
and fec_enet_init to have above case be taken care.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
---
 drivers/net/ethernet/freescale/fec.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
index ff7f4c5..e868a37 100644
--- a/drivers/net/ethernet/freescale/fec.c
+++ b/drivers/net/ethernet/freescale/fec.c
@@ -1593,8 +1593,6 @@ fec_probe(struct platform_device *pdev)
 		fep->phy_interface = ret;
 	}
 
-	fec_reset_phy(pdev);
-
 	for (i = 0; i < FEC_IRQ_NUM; i++) {
 		irq = platform_get_irq(pdev, i);
 		if (irq < 0) {
@@ -1634,6 +1632,8 @@ fec_probe(struct platform_device *pdev)
 	clk_prepare_enable(fep->clk_ahb);
 	clk_prepare_enable(fep->clk_ipg);
 
+	fec_reset_phy(pdev);
+
 	ret = fec_enet_init(ndev);
 	if (ret)
 		goto failed_init;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH v2 2/5] net: fec: enable regulator for fec phy
From: Shawn Guo @ 2012-06-27 13:45 UTC (permalink / raw)
  To: David S. Miller
  Cc: Lothar Waßmann, Florian Fainelli, netdev, linux-arm-kernel,
	devicetree-discuss, Shawn Guo
In-Reply-To: <1340804724-29410-1-git-send-email-shawn.guo@linaro.org>

If bootloader or platform initialization code does not enable the
power supply to fec phy, we need to do it in fec driver before calling
fec_reset_phy to have the phy powered on.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
---
 drivers/net/ethernet/freescale/fec.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
index e868a37..4dce9e3 100644
--- a/drivers/net/ethernet/freescale/fec.c
+++ b/drivers/net/ethernet/freescale/fec.c
@@ -49,6 +49,7 @@
 #include <linux/of_gpio.h>
 #include <linux/of_net.h>
 #include <linux/pinctrl/consumer.h>
+#include <linux/regulator/consumer.h>
 
 #include <asm/cacheflush.h>
 
@@ -1546,6 +1547,7 @@ fec_probe(struct platform_device *pdev)
 	const struct of_device_id *of_id;
 	static int dev_id;
 	struct pinctrl *pinctrl;
+	struct regulator *reg_phy;
 
 	of_id = of_match_device(fec_dt_ids, &pdev->dev);
 	if (of_id)
@@ -1632,6 +1634,16 @@ fec_probe(struct platform_device *pdev)
 	clk_prepare_enable(fep->clk_ahb);
 	clk_prepare_enable(fep->clk_ipg);
 
+	reg_phy = devm_regulator_get(&pdev->dev, "phy");
+	if (!IS_ERR(reg_phy)) {
+		ret = regulator_enable(reg_phy);
+		if (ret) {
+			dev_err(&pdev->dev,
+				"Failed to enable phy regulator: %d\n", ret);
+			goto failed_regulator;
+		}
+	}
+
 	fec_reset_phy(pdev);
 
 	ret = fec_enet_init(ndev);
@@ -1655,6 +1667,7 @@ failed_register:
 	fec_enet_mii_remove(fep);
 failed_mii_init:
 failed_init:
+failed_regulator:
 	clk_disable_unprepare(fep->clk_ahb);
 	clk_disable_unprepare(fep->clk_ipg);
 failed_pin:
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH v2 3/5] net: fec: use managed function devm_gpio_request_one
From: Shawn Guo @ 2012-06-27 13:45 UTC (permalink / raw)
  To: David S. Miller
  Cc: Lothar Waßmann, Florian Fainelli, netdev, linux-arm-kernel,
	devicetree-discuss, Shawn Guo
In-Reply-To: <1340804724-29410-1-git-send-email-shawn.guo@linaro.org>

Using gpio_request_one will require the probe fail-out call gpio_free,
which is missing currently.  Change to use devm_gpio_request_one to
fix the problem.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
---
 drivers/net/ethernet/freescale/fec.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
index 4dce9e3..f174070 100644
--- a/drivers/net/ethernet/freescale/fec.c
+++ b/drivers/net/ethernet/freescale/fec.c
@@ -1513,7 +1513,8 @@ static void __devinit fec_reset_phy(struct platform_device *pdev)
 		return;
 
 	phy_reset = of_get_named_gpio(np, "phy-reset-gpios", 0);
-	err = gpio_request_one(phy_reset, GPIOF_OUT_INIT_LOW, "phy-reset");
+	err = devm_gpio_request_one(&pdev->dev, phy_reset,
+				    GPIOF_OUT_INIT_LOW, "phy-reset");
 	if (err) {
 		pr_debug("FEC: failed to get gpio phy-reset: %d\n", err);
 		return;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH v2 4/5] net: fec: phy-reset-gpios is optional
From: Shawn Guo @ 2012-06-27 13:45 UTC (permalink / raw)
  To: David S. Miller
  Cc: Lothar Waßmann, Florian Fainelli, netdev, linux-arm-kernel,
	devicetree-discuss, Shawn Guo
In-Reply-To: <1340804724-29410-1-git-send-email-shawn.guo@linaro.org>

The phy-reset-gpios is an optional property for fec device tree boot.
Change the binding document to match the driver code.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
---
 Documentation/devicetree/bindings/net/fsl-fec.txt |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt b/Documentation/devicetree/bindings/net/fsl-fec.txt
index 7ab9e1a..0428920 100644
--- a/Documentation/devicetree/bindings/net/fsl-fec.txt
+++ b/Documentation/devicetree/bindings/net/fsl-fec.txt
@@ -7,10 +7,10 @@ Required properties:
 - phy-mode : String, operation mode of the PHY interface.
   Supported values are: "mii", "gmii", "sgmii", "tbi", "rmii",
   "rgmii", "rgmii-id", "rgmii-rxid", "rgmii-txid", "rtbi", "smii".
-- phy-reset-gpios : Should specify the gpio for phy reset
 
 Optional properties:
 - local-mac-address : 6 bytes, mac address
+- phy-reset-gpios : Should specify the gpio for phy reset
 
 Example:
 
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH v2 5/5] net: fec: add phy-reset-duration for device tree probe
From: Shawn Guo @ 2012-06-27 13:45 UTC (permalink / raw)
  To: David S. Miller
  Cc: Lothar Waßmann, Florian Fainelli, netdev, linux-arm-kernel,
	devicetree-discuss, Shawn Guo
In-Reply-To: <1340804724-29410-1-git-send-email-shawn.guo@linaro.org>

Different boards may require different phy reset duration.  Add property
phy-reset-duration for device tree probe, so that the boards that need
a longer reset duration can specify it in their device tree.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
---
 Documentation/devicetree/bindings/net/fsl-fec.txt |    4 ++++
 drivers/net/ethernet/freescale/fec.c              |    8 +++++++-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt b/Documentation/devicetree/bindings/net/fsl-fec.txt
index 0428920..f7a2fef 100644
--- a/Documentation/devicetree/bindings/net/fsl-fec.txt
+++ b/Documentation/devicetree/bindings/net/fsl-fec.txt
@@ -11,6 +11,10 @@ Required properties:
 Optional properties:
 - local-mac-address : 6 bytes, mac address
 - phy-reset-gpios : Should specify the gpio for phy reset
+- phy-reset-duration : Reset duration in milliseconds.  Should present
+  only if property "phy-reset-gpios" is available.  Missing the property
+  will have the duration be 1 millisecond.  Numbers greater than 1000 are
+  invalid and 1 millisecond will be used instead.
 
 Example:
 
diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
index f174070..dafd797 100644
--- a/drivers/net/ethernet/freescale/fec.c
+++ b/drivers/net/ethernet/freescale/fec.c
@@ -1507,11 +1507,17 @@ static int __devinit fec_get_phy_mode_dt(struct platform_device *pdev)
 static void __devinit fec_reset_phy(struct platform_device *pdev)
 {
 	int err, phy_reset;
+	int msec = 1;
 	struct device_node *np = pdev->dev.of_node;
 
 	if (!np)
 		return;
 
+	of_property_read_u32(np, "phy-reset-duration", &msec);
+	/* A sane reset duration should not be longer than 1s */
+	if (msec > 1000)
+		msec = 1;
+
 	phy_reset = of_get_named_gpio(np, "phy-reset-gpios", 0);
 	err = devm_gpio_request_one(&pdev->dev, phy_reset,
 				    GPIOF_OUT_INIT_LOW, "phy-reset");
@@ -1519,7 +1525,7 @@ static void __devinit fec_reset_phy(struct platform_device *pdev)
 		pr_debug("FEC: failed to get gpio phy-reset: %d\n", err);
 		return;
 	}
-	msleep(1);
+	msleep(msec);
 	gpio_set_value(phy_reset, 1);
 }
 #else /* CONFIG_OF */
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH v2] sctp: be more restrictive in transport selection on bundled sacks
From: Neil Horman @ 2012-06-27 14:23 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Vlad Yaseivch, David S. Miller
In-Reply-To: <1340742704-2192-1-git-send-email-nhorman@tuxdriver.com>

It was noticed recently that when we send data on a transport, its possible that
we might bundle a sack that arrived on a different transport.  While this isn't
a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
2960:

 An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
   etc.) to the same destination transport address from which it
   received the DATA or control chunk to which it is replying.  This
   rule should also be followed if the endpoint is bundling DATA chunks
   together with the reply chunk.

This patch seeks to correct that.  It restricts the bundling of sack operations
to only those transports which have moved the ctsn of the association forward
since the last sack.  By doing this we guarantee that we only bundle outbound
saks on a transport that has received a chunk since the last sack.  This brings
us into stricter compliance with the RFC.

Vlad had initially suggested that we strictly allow only sack bundling on the
transport that last moved the ctsn forward.  While this makes sense, I was
concerned that doing so prevented us from bundling in the case where we had
received chunks that moved the ctsn on multiple transports.  In those cases, the
RFC allows us to select any of the transports having received chunks to bundle
the sack on.  so I've modified the approach to allow for that, by adding a state
variable to each transport that tracks weather it has moved the ctsn since the
last sack.  This I think keeps our behavior (and performance), close enough to
our current profile that I think we can do this without a sysctl knob to
enable/disable it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yaseivch <vyasevich@gmail.com>
CC: David S. Miller <davem@davemloft.net>
Reported-by: Michele Baldessari <michele@redhat.com>
Reported-by: sorin serban <sserban@redhat.com>

---
Change Notes:
V2)
	* Removed unused variable as per Dave M. Request
	* Delayed rwnd adjustment until we are sure we will sack (Vlad Y.)
---
 include/net/sctp/structs.h |    5 ++++-
 include/net/sctp/tsnmap.h  |    3 ++-
 net/sctp/output.c          |   10 +++++++---
 net/sctp/sm_make_chunk.c   |    7 +++++++
 net/sctp/sm_sideeffect.c   |    2 +-
 net/sctp/tsnmap.c          |    5 ++++-
 net/sctp/ulpevent.c        |    3 ++-
 net/sctp/ulpqueue.c        |    2 +-
 8 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index e4652fe..712bf09 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -910,7 +910,10 @@ struct sctp_transport {
 		pmtu_pending:1,
 
 		/* Is this structure kfree()able? */
-		malloced:1;
+		malloced:1,
+
+		/* Has this transport moved the ctsn since we last sacked */
+		moved_ctsn:1;
 
 	struct flowi fl;
 
diff --git a/include/net/sctp/tsnmap.h b/include/net/sctp/tsnmap.h
index e7728bc..2c5d2b4 100644
--- a/include/net/sctp/tsnmap.h
+++ b/include/net/sctp/tsnmap.h
@@ -117,7 +117,8 @@ void sctp_tsnmap_free(struct sctp_tsnmap *map);
 int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn);
 
 /* Mark this TSN as seen.  */
-int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn);
+int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn,
+		     struct sctp_transport *trans);
 
 /* Mark this TSN and all lower as seen. */
 void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
diff --git a/net/sctp/output.c b/net/sctp/output.c
index f1b7d4b..6c8cb9e 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -240,17 +240,21 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
 	 */
 	if (sctp_chunk_is_data(chunk) && !pkt->has_sack &&
 	    !pkt->has_cookie_echo) {
-		struct sctp_association *asoc;
 		struct timer_list *timer;
-		asoc = pkt->transport->asoc;
+		struct sctp_association *asoc = pkt->transport->asoc;
+
 		timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
 
 		/* If the SACK timer is running, we have a pending SACK */
 		if (timer_pending(timer)) {
 			struct sctp_chunk *sack;
-			asoc->a_rwnd = asoc->rwnd;
+
+			if (chunk->transport && !chunk->transport->moved_ctsn)
+				return retval;
+
 			sack = sctp_make_sack(asoc);
 			if (sack) {
+				asoc->a_rwnd = asoc->rwnd;
 				retval = sctp_packet_append_chunk(pkt, sack);
 				asoc->peer.sack_needed = 0;
 				if (del_timer(timer))
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index a85eeeb..805babe 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -736,6 +736,7 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
 	__u16 num_gabs, num_dup_tsns;
 	struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
 	struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
+	struct sctp_transport *trans;
 
 	memset(gabs, 0, sizeof(gabs));
 	ctsn = sctp_tsnmap_get_ctsn(map);
@@ -805,6 +806,12 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
 		sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
 				 sctp_tsnmap_get_dups(map));
 
+	/*
+	 * Once we have a sack generated, clear the moved_tsn information
+	 * from all the transports
+	 */
+	list_for_each_entry(trans, &asoc->peer.transport_addr_list, transports)
+		trans->moved_ctsn = 0;
 nodata:
 	return retval;
 }
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index c96d1a8..8716da1 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1268,7 +1268,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 		case SCTP_CMD_REPORT_TSN:
 			/* Record the arrival of a TSN.  */
 			error = sctp_tsnmap_mark(&asoc->peer.tsn_map,
-						 cmd->obj.u32);
+						 cmd->obj.u32, NULL);
 			break;
 
 		case SCTP_CMD_REPORT_FWDTSN:
diff --git a/net/sctp/tsnmap.c b/net/sctp/tsnmap.c
index f1e40ceb..619c638 100644
--- a/net/sctp/tsnmap.c
+++ b/net/sctp/tsnmap.c
@@ -114,7 +114,8 @@ int sctp_tsnmap_check(const struct sctp_tsnmap *map, __u32 tsn)
 
 
 /* Mark this TSN as seen.  */
-int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
+int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn,
+		     struct sctp_transport *trans)
 {
 	u16 gap;
 
@@ -133,6 +134,8 @@ int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
 		 */
 		map->max_tsn_seen++;
 		map->cumulative_tsn_ack_point++;
+		if (trans)
+			trans->moved_ctsn = 1;
 		map->base_tsn++;
 	} else {
 		/* Either we already have a gap, or about to record a gap, so
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8a84017..33d8947 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -715,7 +715,8 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
 	 * can mark it as received so the tsn_map is updated correctly.
 	 */
 	if (sctp_tsnmap_mark(&asoc->peer.tsn_map,
-			     ntohl(chunk->subh.data_hdr->tsn)))
+			     ntohl(chunk->subh.data_hdr->tsn),
+			     chunk->transport))
 		goto fail_mark;
 
 	/* First calculate the padding, so we don't inadvertently
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index f2d1de7..f5a6a4f 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -1051,7 +1051,7 @@ void sctp_ulpq_renege(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk,
 	if (chunk && (freed >= needed)) {
 		__u32 tsn;
 		tsn = ntohl(chunk->subh.data_hdr->tsn);
-		sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
+		sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn, chunk->transport);
 		sctp_ulpq_tail_data(ulpq, chunk, gfp);
 
 		sctp_ulpq_partial_delivery(ulpq, chunk, gfp);
-- 
1.7.7.6

^ permalink raw reply related

* Re: [PATCH v2] sctp: be more restrictive in transport selection on bundled sacks
From: Vlad Yasevich @ 2012-06-27 15:10 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, David S. Miller
In-Reply-To: <1340807003-23139-1-git-send-email-nhorman@tuxdriver.com>

On 06/27/2012 10:23 AM, Neil Horman wrote:
> It was noticed recently that when we send data on a transport, its possible that
> we might bundle a sack that arrived on a different transport.  While this isn't
> a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
> 2960:
>
>   An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
>     etc.) to the same destination transport address from which it
>     received the DATA or control chunk to which it is replying.  This
>     rule should also be followed if the endpoint is bundling DATA chunks
>     together with the reply chunk.
>
> This patch seeks to correct that.  It restricts the bundling of sack operations
> to only those transports which have moved the ctsn of the association forward
> since the last sack.  By doing this we guarantee that we only bundle outbound
> saks on a transport that has received a chunk since the last sack.  This brings
> us into stricter compliance with the RFC.
>
> Vlad had initially suggested that we strictly allow only sack bundling on the
> transport that last moved the ctsn forward.  While this makes sense, I was
> concerned that doing so prevented us from bundling in the case where we had
> received chunks that moved the ctsn on multiple transports.  In those cases, the
> RFC allows us to select any of the transports having received chunks to bundle
> the sack on.  so I've modified the approach to allow for that, by adding a state
> variable to each transport that tracks weather it has moved the ctsn since the
> last sack.  This I think keeps our behavior (and performance), close enough to
> our current profile that I think we can do this without a sysctl knob to
> enable/disable it.
>
> Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
> CC: Vlad Yaseivch<vyasevich@gmail.com>
> CC: David S. Miller<davem@davemloft.net>
> Reported-by: Michele Baldessari<michele@redhat.com>
> Reported-by: sorin serban<sserban@redhat.com>
>
> ---
> Change Notes:
> V2)
> 	* Removed unused variable as per Dave M. Request
> 	* Delayed rwnd adjustment until we are sure we will sack (Vlad Y.)
> ---
>   include/net/sctp/structs.h |    5 ++++-
>   include/net/sctp/tsnmap.h  |    3 ++-
>   net/sctp/output.c          |   10 +++++++---
>   net/sctp/sm_make_chunk.c   |    7 +++++++
>   net/sctp/sm_sideeffect.c   |    2 +-
>   net/sctp/tsnmap.c          |    5 ++++-
>   net/sctp/ulpevent.c        |    3 ++-
>   net/sctp/ulpqueue.c        |    2 +-
>   8 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index e4652fe..712bf09 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -910,7 +910,10 @@ struct sctp_transport {
>   		pmtu_pending:1,
>
>   		/* Is this structure kfree()able? */
> -		malloced:1;
> +		malloced:1,
> +
> +		/* Has this transport moved the ctsn since we last sacked */
> +		moved_ctsn:1;
>
>   	struct flowi fl;
>
> diff --git a/include/net/sctp/tsnmap.h b/include/net/sctp/tsnmap.h
> index e7728bc..2c5d2b4 100644
> --- a/include/net/sctp/tsnmap.h
> +++ b/include/net/sctp/tsnmap.h
> @@ -117,7 +117,8 @@ void sctp_tsnmap_free(struct sctp_tsnmap *map);
>   int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn);
>
>   /* Mark this TSN as seen.  */
> -int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn);
> +int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn,
> +		     struct sctp_transport *trans);
>
>   /* Mark this TSN and all lower as seen. */
>   void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index f1b7d4b..6c8cb9e 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -240,17 +240,21 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
>   	 */
>   	if (sctp_chunk_is_data(chunk)&&  !pkt->has_sack&&
>   	!pkt->has_cookie_echo) {
> -		struct sctp_association *asoc;
>   		struct timer_list *timer;
> -		asoc = pkt->transport->asoc;
> +		struct sctp_association *asoc = pkt->transport->asoc;
> +
>   		timer =&asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
>
>   		/* If the SACK timer is running, we have a pending SACK */
>   		if (timer_pending(timer)) {
>   			struct sctp_chunk *sack;
> -			asoc->a_rwnd = asoc->rwnd;
> +
> +			if (chunk->transport&&  !chunk->transport->moved_ctsn)
> +				return retval;
> +

I didn't think of this yesterday, but I think it would be much better to 
use pkt->transport here since you are adding the chunk to the packet and 
it will go out on the transport of the packet.  You are also guaranteed 
that pkt->transport is set.

>   			sack = sctp_make_sack(asoc);
>   			if (sack) {
> +				asoc->a_rwnd = asoc->rwnd;
>   				retval = sctp_packet_append_chunk(pkt, sack);
>   				asoc->peer.sack_needed = 0;
>   				if (del_timer(timer))
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index a85eeeb..805babe 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -736,6 +736,7 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
>   	__u16 num_gabs, num_dup_tsns;
>   	struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
>   	struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
> +	struct sctp_transport *trans;
>
>   	memset(gabs, 0, sizeof(gabs));
>   	ctsn = sctp_tsnmap_get_ctsn(map);
> @@ -805,6 +806,12 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
>   		sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
>   				 sctp_tsnmap_get_dups(map));
>
> +	/*
> +	 * Once we have a sack generated, clear the moved_tsn information
> +	 * from all the transports
> +	 */
> +	list_for_each_entry(trans,&asoc->peer.transport_addr_list, transports)
> +		trans->moved_ctsn = 0;
>   nodata:
>   	return retval;
>   }
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index c96d1a8..8716da1 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -1268,7 +1268,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
>   		case SCTP_CMD_REPORT_TSN:
>   			/* Record the arrival of a TSN.  */
>   			error = sctp_tsnmap_mark(&asoc->peer.tsn_map,
> -						 cmd->obj.u32);
> +						 cmd->obj.u32, NULL);
>   			break;
>
>   		case SCTP_CMD_REPORT_FWDTSN:
> diff --git a/net/sctp/tsnmap.c b/net/sctp/tsnmap.c
> index f1e40ceb..619c638 100644
> --- a/net/sctp/tsnmap.c
> +++ b/net/sctp/tsnmap.c
> @@ -114,7 +114,8 @@ int sctp_tsnmap_check(const struct sctp_tsnmap *map, __u32 tsn)
>
>
>   /* Mark this TSN as seen.  */
> -int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
> +int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn,
> +		     struct sctp_transport *trans)
>   {
>   	u16 gap;
>
> @@ -133,6 +134,8 @@ int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
>   		 */
>   		map->max_tsn_seen++;
>   		map->cumulative_tsn_ack_point++;
> +		if (trans)
> +			trans->moved_ctsn = 1;
>   		map->base_tsn++;
>   	} else {
>   		/* Either we already have a gap, or about to record a gap, so
> diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
> index 8a84017..33d8947 100644
> --- a/net/sctp/ulpevent.c
> +++ b/net/sctp/ulpevent.c
> @@ -715,7 +715,8 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
>   	 * can mark it as received so the tsn_map is updated correctly.
>   	 */
>   	if (sctp_tsnmap_mark(&asoc->peer.tsn_map,
> -			     ntohl(chunk->subh.data_hdr->tsn)))
> +			     ntohl(chunk->subh.data_hdr->tsn),
> +			     chunk->transport))
>   		goto fail_mark;
>
>   	/* First calculate the padding, so we don't inadvertently
> diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
> index f2d1de7..f5a6a4f 100644
> --- a/net/sctp/ulpqueue.c
> +++ b/net/sctp/ulpqueue.c
> @@ -1051,7 +1051,7 @@ void sctp_ulpq_renege(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk,
>   	if (chunk&&  (freed>= needed)) {
>   		__u32 tsn;
>   		tsn = ntohl(chunk->subh.data_hdr->tsn);
> -		sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
> +		sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn, chunk->transport);
>   		sctp_ulpq_tail_data(ulpq, chunk, gfp);
>
>   		sctp_ulpq_partial_delivery(ulpq, chunk, gfp);

Also, I think you need to reset this bit in sctp_transport_reset(). 
Consider a potential association restart after SACKs have been received.

-vlad

^ permalink raw reply

* [patch net-next] virtio_net: allow to change mac when iface is running
From: Jiri Pirko @ 2012-06-27 15:27 UTC (permalink / raw)
  To: netdev; +Cc: davem, rusty, mst, virtualization, brouer

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/virtio_net.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f18149a..36a16d5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -679,11 +679,12 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct virtio_device *vdev = vi->vdev;
-	int ret;
+	struct sockaddr *addr = p;
 
-	ret = eth_mac_addr(dev, p);
-	if (ret)
-		return ret;
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+	memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
+	dev->addr_assign_type &= ~NET_ADDR_RANDOM;
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
 		vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
-- 
1.7.10.4

^ permalink raw reply related

* Fw: [Bug 43901] New: Packet Dropping when attach TBF to PRIO qdisc
From: Stephen Hemminger @ 2012-06-27 15:27 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Wed, 27 Jun 2012 12:14:28 +0000 (UTC)
From: bugzilla-daemon@bugzilla.kernel.org
To: shemminger@linux-foundation.org
Subject: [Bug 43901] New: Packet Dropping when attach TBF to PRIO qdisc


https://bugzilla.kernel.org/show_bug.cgi?id=43901

           Summary: Packet Dropping when attach TBF to PRIO qdisc
           Product: Networking
           Version: 2.5
    Kernel Version: 3.4.4
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
        AssignedTo: shemminger@linux-foundation.org
        ReportedBy: lucas.bocchi@gmail.com
        Regression: No


Created an attachment (id=74371)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=74371)
Kernel config file

Friends

When I do a upgrade in one server kernel with 2.6.32.5 kernel to actual 3.4.4
kernel, my TBF qdisc attached to a PRIO qdisc stops to work.

The first classes won't have any heavy traffic to justify the starvation of
another classes, but packets are dropped with 

See some information about the bug
Gnu C                  4.7
Gnu make               3.81
binutils               2.22
util-linux             2.20.1
mount                  support
module-init-tools      8
e2fsprogs              1.42.4
PPP                    2.4.5
Linux C Library        2.13
Dynamic linker (ldd)   2.13
Procps                 3.3.2
Net-tools              1.60
Kbd                    1.15.3
Sh-utils               8.13
Modules Loaded         sch_tbf xt_mark xt_mac xt_state xt_connmark cls_fw
cls_u32 sch_prio sha1_ssse3 sha1_generic arc4 ecb ppp_mppe ppp_async crc_ccitt
ppp_generic slhc xt_tcpudp ipt_MASQUERADE iptable_mangle iptable_nat nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_filter ip_tables x_tables
nfsd exportfs nfs nfs_acl auth_rpcgss lockd sunrpc ipv6 loop snd_hda_codec_via
iTCO_wdt snd_hda_intel snd_hda_codec acpi_cpufreq mperf freq_table coretemp
intel_agp intel_gtt rng_core snd_pcm evdev rtc_cmos i2c_i801 snd_page_alloc
i2c_core pcspkr snd_timer processor snd microcode soundcore button ext3 mbcache
jbd sr_mod cdrom ata_generic sd_mod pata_acpi usb_storage uas ata_piix thermal
thermal_sys uhci_hcd libata scsi_mod r8169 mii piix ide_pci_generic ide_core
ehci_hcd


The TC commands
tc qdisc del dev eth0 root
tc qdisc add dev eth0 root handle 01:0 prio bands 5 priomap 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4
tc qdisc add dev eth0 parent 01:01 handle 11:0 pfifo limit 100
tc qdisc add dev eth0 parent 01:02 handle 12:0 pfifo limit 100
tc qdisc add dev eth0 parent 01:03 handle 13:0 pfifo limit 100
tc qdisc add dev eth0 parent 01:04 handle 14:0 tbf rate 3Mbit latency 2s burst
10k
tc qdisc add dev eth0 parent 01:05 handle 15:0 tbf rate 3Mbit latency 2s burst
10k


tc filter add dev eth0 parent 01:00 prio 1 u32 match u8 64 0xff at 8 flowid 1:1
tc filter add dev eth0 parent 01:00 prio 1 u32 match ip protocol 6 0xff match
u8 0x10 0xff at nexthdr+13 flowid 1:1
tc filter add dev eth0 parent 01:00 prio 1 u32 match ip protocol 6 0xff match
u8 0x05 0x0f at 0 match u16 0x0000 0xffc0 at 2 match u8 0x10 0xff at 33 flowid
1:1
tc filter add dev eth0 parent 01:00 prio 2 protocol ip u32 match ip protocol 1
0xff flowid 1:1
tc filter add dev eth0 parent 01:00 prio 3 protocol ip u32 match ip dst
172.16.0.250 flowid 1:2
tc filter add dev eth0 parent 01:00 prio 3 protocol ip u32 match ip dst
172.16.0.58 flowid 1:2
tc filter add dev eth0 parent 01:00 prio 4 protocol ip u32 match ip sport 8291
0xffff flowid 1:3
tc filter add dev eth0 parent 01:00 prio 5 protocol ip handle 1 fw flowid 1:4

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply

* Re: [PATCH 7/7] netlink: Get rid of obsolete rtnetlink macros
From: Stephen Hemminger @ 2012-06-27 15:35 UTC (permalink / raw)
  To: Thomas Graf; +Cc: davem, netdev
In-Reply-To: <a10752bf3eb27e13e61b817664d58a69f9179fc9.1340788373.git.tgraf@suug.ch>

On Wed, 27 Jun 2012 11:36:16 +0200
Thomas Graf <tgraf@suug.ch> wrote:

> Removes all RTA_GET*() and RTA_PUT*() variations, as well as the
> the unused rtattr_strcmp(). Get rid of rtm_get_table() by moving
> it to its only user decnet.
> 
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---

Nack. The RTA_ macros are exported to user space through kernel
headers process. If you do this it will break iproute2 build.

^ permalink raw reply

* Re: HSR: How to set IF_OPER_LOWERLAYERDOWN?
From: Stephen Hemminger @ 2012-06-27 15:40 UTC (permalink / raw)
  To: Arvid Brodin; +Cc: netdev@vger.kernel.org, Javier Boticario, Bruno Ferreira
In-Reply-To: <4FEA64F3.8000400@xdin.com>

On Wed, 27 Jun 2012 01:42:12 +0000
Arvid Brodin <Arvid.Brodin@xdin.com> wrote:

> On 2012-06-27 00:33, Stephen Hemminger wrote:
> > On Tue, 26 Jun 2012 22:28:51 +0000
> > Arvid Brodin <Arvid.Brodin@xdin.com> wrote:
> > 
> >> Hi,
> >>
> >> According to Documentation/networking/operstates.txt a network interface have an
> >> operational state and an administrative state.
> >>
> >> If I understand things correctly the administrative state is the desired state set by
> >> userspace, and the operational state is the actual state which depends on things like the
> >> administrative state, whether a carrier is present, or (for virtual interfaces lite VLAN)
> >> whether the lower interface is available.
> >>
> >>
> >> In the driver I'm writing (for the "HSR" redundancy protocol) a hsr (virtual) interface is
> >> useable as long as any of its (physical) slaves are useable. I.e. the operstate of a hsr
> >> device might be set like this:
> >>
> >> void hsr_set_operstate()
> >> {
> >> 	if (!is_admin_up(hsr_dev)) /* Check IFF_UP */ {
> >> 		set_operstate(hsr_dev, IF_OPER_DOWN);
> >> 		return;
> >> 	}
> >>
> >> 	if (is_operstate_up(slave1) || is_operstate_up(slave2)) /* Check IF_OPER_UP */
> >> 		set_operstate(hsr_dev, IF_OPER_UP);
> >> 	else
> >> 		set_operstate(hsr_dev, IF_OPER_LOWERLAYERDOWN);
> >> }
> > 
> > 
> > According to 802.1X example in documentation to set it down you need to set IF_OPER_DORMANT
> > not IF_OPER_LOWERLAYERDOWN.  Probably a kernel bug in there somwhere.
> > 
> 
> Hmm... if you're referring to the example in Documentation/networking/operstates.txt it
> seems to me that the usage of IF_OPER_DORMANT there is in compliance with RFC2863 - i.e.
> the device is waiting for some kind of handshake to finish. I don't think it has anything
> to do with taking the device down?
> 
> Oh, and I see now that set_operstate() is called from do_setlink() in
> net/core/rtnetlink.c, which means this function is used to set operstate from userspace.
> The limitations then fits with the description in operstates.txt, and this function is
> probably not meant to set an interface's operational state from within the kernel. I wrote
> my own hsr_set_operstate that accepts any values, and it seems to work... (?)
> 
> 
> Is there a way to get notifications when an interface's operational state change?
> NETDEV_CHANGE seems to trigger on carrier change, and NETDEV_UP/DOWN triggers when the
> administrative state changes - is there something similar for operational state?
> 
> (Unfortunately NETDEV_UP/DOWN triggers before the operational state for the interface in
> question changes accordingly, so it's not possible to just check dev->operstate in the
> handler for these messages. NETDEV_CHANGE seems to trigger after the interface's
> operational state has been changed, though.)
> 

I think the original intention was for kernel drivers to use
netif_carrier_on/off rather than manipulating operstate directly.

^ permalink raw reply

* Re: [PATCHv1] net: added support for 40GbE link.
From: Ben Hutchings @ 2012-06-27 16:03 UTC (permalink / raw)
  To: Parav Pandit; +Cc: netdev
In-Reply-To: <054018eb-a85c-4401-a010-7479aa38d2ef@exht1.ad.emulex.com>

On Wed, 2012-06-27 at 19:26 +0530, Parav Pandit wrote:
> 1. removed code replication for tov calculation for 1G, 10G and
> made is common for speed > 1G (1G, 10G, 40G, 100G).
> 2. defines values for #4 different 40G Phys (KR4, LF4, SR4, CR4)
> 
> Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [net-next.git 4/4 (v8)] phy: add the EEE support and the way to access to the MMD registers.
From: Ben Hutchings @ 2012-06-27 16:15 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: netdev, eric.dumazet, rayagond, davem, yuvalmin
In-Reply-To: <1340258599-3083-5-git-send-email-peppe.cavallaro@st.com>

On Thu, 2012-06-21 at 08:03 +0200, Giuseppe CAVALLARO wrote:
[...]
> v8: fixed a problem in the phy_init_eee return value erroneously added
>     when included the phy_read_status call.

Almost there. :-/

[...]
> +int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable)
> +{
> +	int ret = -EPROTONOSUPPORT;
> +
> +	/* According to 802.3az,the EEE is supported only in full duplex-mode.
> +	 * Also EEE feature is active when core is operating with MII, GMII
> +	 * or RGMII.
> +	 */
> +	if ((phydev->duplex == DUPLEX_FULL) &&
> +	    ((phydev->interface == PHY_INTERFACE_MODE_MII) ||
> +	    (phydev->interface == PHY_INTERFACE_MODE_GMII) ||
> +	    (phydev->interface == PHY_INTERFACE_MODE_RGMII))) {
> +		u16 eee_lp, eee_cap, eee_adv;
> +		u32 lp, cap, adv;
> +		int idx, status;
> +
> +		/* Read phy status to properly get the right settings */
> +		status = phy_read_status(phydev);
> +		if (status)
> +			return status;
> +
> +		/* First check if the EEE ability is supported */
> +		eee_cap = phy_read_mmd_indirect(phydev->bus, MDIO_PCS_EEE_ABLE,
> +						MDIO_MMD_PCS, phydev->addr);
> +		if (eee_cap < 0)
> +			return eee_cap;
> +
> +		cap = phy_eee_to_supported(eee_cap);
> +		if (!cap)
> +			goto eee_exit;
> +
> +		/* Check which link settings negotiated and verify it in
> +		 * the EEE advertising registers.
> +		 */
> +		eee_lp = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_LPABLE,
> +					       MDIO_MMD_AN, phydev->addr);
> +		if (eee_lp < 0)
> +			return eee_lp;
> +
> +		eee_adv = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_ADV,
> +						MDIO_MMD_AN, phydev->addr);
> +		if (eee_adv < 0)
> +			return eee_adv;

You check for eee_{cap,lp,adv} < 0 but that's impossible since the
variables are declared unsigned (u16).  (I wonder what compiler you are
using, as I would expect this to result in a warning.)  I think they
need to be declared int.

> +		adv = phy_eee_to_adv(eee_adv);
> +		lp = phy_eee_to_adv(eee_lp);
> +		idx = phy_find_setting(phydev->speed, phydev->duplex);
> +		if ((lp & adv & settings[idx].setting))
> +			goto eee_exit;
> +
> +		if (clk_stop_enable) {
> +			/* Configure the PHY to stop receiving xMII
> +			 * clock while it is signaling LPI.
> +			 */
> +			u32 val = phy_read_mmd_indirect(phydev->bus, MDIO_CTRL1,
> +							MDIO_MMD_PCS,
> +							phydev->addr);
> +			if (val < 0)
> +				return val;

Same problem here.

[...]
> --- a/include/linux/mdio.h
> +++ b/include/linux/mdio.h
[...]
> @@ -237,9 +241,18 @@
>  #define MDIO_AN_10GBT_STAT_MS		0x4000	/* Master/slave config */
>  #define MDIO_AN_10GBT_STAT_MSFLT	0x8000	/* Master/slave config fault */
>  
> -/* AN EEE Advertisement register. */
> -#define MDIO_AN_EEE_ADV_100TX		0x0002	/* Advertise 100TX EEE cap */
> -#define MDIO_AN_EEE_ADV_1000T		0x0004	/* Advertise 1000T EEE cap */
[...]

This header is exported to userland so I don't think these definitions
can be removed.  But you could comment that they're redundant with the
following MDIO_EEE_* definitions.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 7/18] netfilter: nfnetlink_log: Move away from NLMSG_PUT().
From: Ben Hutchings @ 2012-06-27 16:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120626.220223.1090653207727010874.davem@davemloft.net>

On Tue, 2012-06-26 at 22:02 -0700, David Miller wrote:
> And use nlmsg_data() while we're here too.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/netfilter/nfnetlink_log.c |   29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
> index 3c3cfc0..169ab59 100644
> --- a/net/netfilter/nfnetlink_log.c
> +++ b/net/netfilter/nfnetlink_log.c
> @@ -326,18 +326,20 @@ __nfulnl_send(struct nfulnl_instance *inst)
>  {
>  	int status = -1;
>  
> -	if (inst->qlen > 1)
> -		NLMSG_PUT(inst->skb, 0, 0,
> -			  NLMSG_DONE,
> -			  sizeof(struct nfgenmsg));
> -
> +	if (inst->qlen > 1) {
> +		struct nlmsghdr *nlh = nlmsg_put(inst->skb, 0, 0,
> +						 NLMSG_DONE,
> +						 sizeof(struct nfgenmsg),
> +						 0);
> +		if (!nlh)
> +			goto out;
> +	}
>  	status = nfnetlink_unicast(inst->skb, &init_net, inst->peer_pid,
>  				   MSG_DONTWAIT);
>  
>  	inst->qlen = 0;
>  	inst->skb = NULL;
> -
> -nlmsg_failure:
> +out:
>  	return status;
>  }
[...]

It looks like this also leaks the skb on failure.  At least,
__nfulnl_flush(inst) is expected to dipose of inst->skb.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v2] sctp: be more restrictive in transport selection on bundled sacks
From: Neil Horman @ 2012-06-27 17:28 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, David S. Miller
In-Reply-To: <4FEB2262.1030803@gmail.com>

On Wed, Jun 27, 2012 at 11:10:26AM -0400, Vlad Yasevich wrote:
> On 06/27/2012 10:23 AM, Neil Horman wrote:
> >It was noticed recently that when we send data on a transport, its possible that
> >we might bundle a sack that arrived on a different transport.  While this isn't
> >a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
> >2960:
> >
> >  An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
> >    etc.) to the same destination transport address from which it
> >    received the DATA or control chunk to which it is replying.  This
> >    rule should also be followed if the endpoint is bundling DATA chunks
> >    together with the reply chunk.
> >
> >This patch seeks to correct that.  It restricts the bundling of sack operations
> >to only those transports which have moved the ctsn of the association forward
> >since the last sack.  By doing this we guarantee that we only bundle outbound
> >saks on a transport that has received a chunk since the last sack.  This brings
> >us into stricter compliance with the RFC.
> >
> >Vlad had initially suggested that we strictly allow only sack bundling on the
> >transport that last moved the ctsn forward.  While this makes sense, I was
> >concerned that doing so prevented us from bundling in the case where we had
> >received chunks that moved the ctsn on multiple transports.  In those cases, the
> >RFC allows us to select any of the transports having received chunks to bundle
> >the sack on.  so I've modified the approach to allow for that, by adding a state
> >variable to each transport that tracks weather it has moved the ctsn since the
> >last sack.  This I think keeps our behavior (and performance), close enough to
> >our current profile that I think we can do this without a sysctl knob to
> >enable/disable it.
> >
> >Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
> >CC: Vlad Yaseivch<vyasevich@gmail.com>
> >CC: David S. Miller<davem@davemloft.net>
> >Reported-by: Michele Baldessari<michele@redhat.com>
> >Reported-by: sorin serban<sserban@redhat.com>
> >
> >---
> >Change Notes:
> >V2)
> >	* Removed unused variable as per Dave M. Request
> >	* Delayed rwnd adjustment until we are sure we will sack (Vlad Y.)
> >---
> >  include/net/sctp/structs.h |    5 ++++-
> >  include/net/sctp/tsnmap.h  |    3 ++-
> >  net/sctp/output.c          |   10 +++++++---
> >  net/sctp/sm_make_chunk.c   |    7 +++++++
> >  net/sctp/sm_sideeffect.c   |    2 +-
> >  net/sctp/tsnmap.c          |    5 ++++-
> >  net/sctp/ulpevent.c        |    3 ++-
> >  net/sctp/ulpqueue.c        |    2 +-
> >  8 files changed, 28 insertions(+), 9 deletions(-)
> >
> >diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> >index e4652fe..712bf09 100644
> >--- a/include/net/sctp/structs.h
> >+++ b/include/net/sctp/structs.h
> >@@ -910,7 +910,10 @@ struct sctp_transport {
> >  		pmtu_pending:1,
> >
> >  		/* Is this structure kfree()able? */
> >-		malloced:1;
> >+		malloced:1,
> >+
> >+		/* Has this transport moved the ctsn since we last sacked */
> >+		moved_ctsn:1;
> >
> >  	struct flowi fl;
> >
> >diff --git a/include/net/sctp/tsnmap.h b/include/net/sctp/tsnmap.h
> >index e7728bc..2c5d2b4 100644
> >--- a/include/net/sctp/tsnmap.h
> >+++ b/include/net/sctp/tsnmap.h
> >@@ -117,7 +117,8 @@ void sctp_tsnmap_free(struct sctp_tsnmap *map);
> >  int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn);
> >
> >  /* Mark this TSN as seen.  */
> >-int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn);
> >+int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn,
> >+		     struct sctp_transport *trans);
> >
> >  /* Mark this TSN and all lower as seen. */
> >  void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
> >diff --git a/net/sctp/output.c b/net/sctp/output.c
> >index f1b7d4b..6c8cb9e 100644
> >--- a/net/sctp/output.c
> >+++ b/net/sctp/output.c
> >@@ -240,17 +240,21 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
> >  	 */
> >  	if (sctp_chunk_is_data(chunk)&&  !pkt->has_sack&&
> >  	!pkt->has_cookie_echo) {
> >-		struct sctp_association *asoc;
> >  		struct timer_list *timer;
> >-		asoc = pkt->transport->asoc;
> >+		struct sctp_association *asoc = pkt->transport->asoc;
> >+
> >  		timer =&asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
> >
> >  		/* If the SACK timer is running, we have a pending SACK */
> >  		if (timer_pending(timer)) {
> >  			struct sctp_chunk *sack;
> >-			asoc->a_rwnd = asoc->rwnd;
> >+
> >+			if (chunk->transport&&  !chunk->transport->moved_ctsn)
> >+				return retval;
> >+
> 
> I didn't think of this yesterday, but I think it would be much
> better to use pkt->transport here since you are adding the chunk to
> the packet and it will go out on the transport of the packet.  You
> are also guaranteed that pkt->transport is set.
> 
I don't think it really matters, as the chunk transport is used to lookup the
packet that we append to, and if the chunk transport is unset, its somewhat
questionable as to weather we should bundle, but if packet->transport is set,
its probably worth it to avoid the extra conditional.

> >  			sack = sctp_make_sack(asoc);
> >  			if (sack) {
> >+				asoc->a_rwnd = asoc->rwnd;
> >  				retval = sctp_packet_append_chunk(pkt, sack);
> >  				asoc->peer.sack_needed = 0;
> >  				if (del_timer(timer))
> >diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> >index a85eeeb..805babe 100644
> >--- a/net/sctp/sm_make_chunk.c
> >+++ b/net/sctp/sm_make_chunk.c
> >@@ -736,6 +736,7 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
> >  	__u16 num_gabs, num_dup_tsns;
> >  	struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
> >  	struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
> >+	struct sctp_transport *trans;
> >
> >  	memset(gabs, 0, sizeof(gabs));
> >  	ctsn = sctp_tsnmap_get_ctsn(map);
> >@@ -805,6 +806,12 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
> >  		sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
> >  				 sctp_tsnmap_get_dups(map));
> >
> >+	/*
> >+	 * Once we have a sack generated, clear the moved_tsn information
> >+	 * from all the transports
> >+	 */
> >+	list_for_each_entry(trans,&asoc->peer.transport_addr_list, transports)
> >+		trans->moved_ctsn = 0;
> >  nodata:
> >  	return retval;
> >  }
> >diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> >index c96d1a8..8716da1 100644
> >--- a/net/sctp/sm_sideeffect.c
> >+++ b/net/sctp/sm_sideeffect.c
> >@@ -1268,7 +1268,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
> >  		case SCTP_CMD_REPORT_TSN:
> >  			/* Record the arrival of a TSN.  */
> >  			error = sctp_tsnmap_mark(&asoc->peer.tsn_map,
> >-						 cmd->obj.u32);
> >+						 cmd->obj.u32, NULL);
> >  			break;
> >
> >  		case SCTP_CMD_REPORT_FWDTSN:
> >diff --git a/net/sctp/tsnmap.c b/net/sctp/tsnmap.c
> >index f1e40ceb..619c638 100644
> >--- a/net/sctp/tsnmap.c
> >+++ b/net/sctp/tsnmap.c
> >@@ -114,7 +114,8 @@ int sctp_tsnmap_check(const struct sctp_tsnmap *map, __u32 tsn)
> >
> >
> >  /* Mark this TSN as seen.  */
> >-int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
> >+int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn,
> >+		     struct sctp_transport *trans)
> >  {
> >  	u16 gap;
> >
> >@@ -133,6 +134,8 @@ int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
> >  		 */
> >  		map->max_tsn_seen++;
> >  		map->cumulative_tsn_ack_point++;
> >+		if (trans)
> >+			trans->moved_ctsn = 1;
> >  		map->base_tsn++;
> >  	} else {
> >  		/* Either we already have a gap, or about to record a gap, so
> >diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
> >index 8a84017..33d8947 100644
> >--- a/net/sctp/ulpevent.c
> >+++ b/net/sctp/ulpevent.c
> >@@ -715,7 +715,8 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
> >  	 * can mark it as received so the tsn_map is updated correctly.
> >  	 */
> >  	if (sctp_tsnmap_mark(&asoc->peer.tsn_map,
> >-			     ntohl(chunk->subh.data_hdr->tsn)))
> >+			     ntohl(chunk->subh.data_hdr->tsn),
> >+			     chunk->transport))
> >  		goto fail_mark;
> >
> >  	/* First calculate the padding, so we don't inadvertently
> >diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
> >index f2d1de7..f5a6a4f 100644
> >--- a/net/sctp/ulpqueue.c
> >+++ b/net/sctp/ulpqueue.c
> >@@ -1051,7 +1051,7 @@ void sctp_ulpq_renege(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk,
> >  	if (chunk&&  (freed>= needed)) {
> >  		__u32 tsn;
> >  		tsn = ntohl(chunk->subh.data_hdr->tsn);
> >-		sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
> >+		sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn, chunk->transport);
> >  		sctp_ulpq_tail_data(ulpq, chunk, gfp);
> >
> >  		sctp_ulpq_partial_delivery(ulpq, chunk, gfp);
> 
> Also, I think you need to reset this bit in sctp_transport_reset().
> Consider a potential association restart after SACKs have been
> received.
> 
Yeah, thats true.  I'll add that in. 

Thanks!
Neil

> -vlad
> 

^ permalink raw reply

* [PATCH v2 0/4] netdev/phy: 10G PHY support.
From: David Daney @ 2012-06-27 17:33 UTC (permalink / raw)
  To: Grant Likely, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, David S. Miller,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mips-6z/3iImG2C8G8FEW9MqTrA,
	afleming-Re5JQEeQqe8AvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, David Daney

From: David Daney <david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

The only non-cosmetic change from v1 is to pass an additional argument
to get_phy_device() that indicates that the PHY uses 802.3 clause 45
signaling, previously I had been using a high order bit of the addr
parameter for this.

There are also changes from v1 in the code and comment formatting.
These should now be closer to what David Miller prefers.

>From v1:

The existing PHY driver infrastructure supports IEEE 802.3 Clause 22
PHYs used with 10/100/1000MB Ethernet.  For 10G Ethernet, many PHYs
use 802.3 Clause 45.  These patches attempt to add core support for
this as well as a driver for BCM87XX 10G PHY devices.

This is reworked from patches I send about 9 months ago:

http://marc.info/?l=linux-netdev&m=131844282403852

Several of the patches have device tree bindings in them, so the
device tree people get to enjoy them too.

David Daney (4):
  netdev/phy: Handle IEEE802.3 clause 45 Ethernet PHYs
  netdev/phy/of: Handle IEEE802.3 clause 45 Ethernet PHYs in
    of_mdiobus_register()
  netdev/phy/of: Add more methods for binding PHY devices to drivers.
  netdev/phy: Add driver for Broadcom BCM87XX 10G Ethernet PHYs

 .../devicetree/bindings/net/broadcom-bcm87xx.txt   |   29 +++
 Documentation/devicetree/bindings/net/phy.txt      |   12 +-
 drivers/net/phy/Kconfig                            |    5 +
 drivers/net/phy/Makefile                           |    1 +
 drivers/net/phy/bcm87xx.c                          |  238 ++++++++++++++++++++
 drivers/net/phy/mdio_bus.c                         |    9 +-
 drivers/net/phy/phy_device.c                       |  105 ++++++++-
 drivers/of/of_mdio.c                               |   16 +-
 include/linux/phy.h                                |   24 ++-
 9 files changed, 424 insertions(+), 15 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/broadcom-bcm87xx.txt
 create mode 100644 drivers/net/phy/bcm87xx.c

-- 
1.7.2.3

^ permalink raw reply

* [PATCH v2 1/4] netdev/phy: Handle IEEE802.3 clause 45 Ethernet PHYs
From: David Daney @ 2012-06-27 17:33 UTC (permalink / raw)
  To: Grant Likely, Rob Herring, devicetree-discuss, David S. Miller,
	netdev
  Cc: linux-kernel, linux-mips, afleming, David Daney
In-Reply-To: <1340818418-10382-1-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <david.daney@cavium.com>

The IEEE802.3 clause 45 MDIO bus protocol allows for directly
addressing PHY registers using a 21 bit address, and is used by many
10G Ethernet PHYS.  Already existing is the ability of MDIO bus
drivers to use clause 45, with the MII_ADDR_C45 flag.  Here we add
struct phy_c45_device_ids to hold the device identifier registers
present in clause 45. struct phy_device gets a couple of new fields:
c45_ids to hold the identifiers and is_c45 to signal that it is clause
45.

get_phy_device() gets a new parameter is_c45 to indicate that the PHY
device should use the clause 45 protocol, and its callers are adjusted
to pass false.  The follow-on patch to of_mdio.c will pass true where
appropriate.

EXPORT phy_device_create() so that the follow-on patch to of_mdio.c
can use it to create phy devices for PHYs, that have non-standard
device identifier registers, based on the device tree bindings.

Signed-off-by: David Daney <david.daney@cavium.com>
---
 drivers/net/phy/mdio_bus.c   |    2 +-
 drivers/net/phy/phy_device.c |  105 ++++++++++++++++++++++++++++++++++++++---
 drivers/of/of_mdio.c         |    2 +-
 include/linux/phy.h          |   18 +++++++-
 4 files changed, 116 insertions(+), 11 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 31470b0..2cee6d2 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -232,7 +232,7 @@ struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr)
 	struct phy_device *phydev;
 	int err;
 
-	phydev = get_phy_device(bus, addr);
+	phydev = get_phy_device(bus, addr, false);
 	if (IS_ERR(phydev) || phydev == NULL)
 		return phydev;
 
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 18ab0da..ef4cdee 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -152,8 +152,8 @@ int phy_scan_fixups(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(phy_scan_fixups);
 
-static struct phy_device* phy_device_create(struct mii_bus *bus,
-					    int addr, int phy_id)
+struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
+			bool is_c45, struct phy_c45_device_ids *c45_ids)
 {
 	struct phy_device *dev;
 
@@ -174,8 +174,11 @@ static struct phy_device* phy_device_create(struct mii_bus *bus,
 
 	dev->autoneg = AUTONEG_ENABLE;
 
+	dev->is_c45 = is_c45;
 	dev->addr = addr;
 	dev->phy_id = phy_id;
+	if (c45_ids)
+		dev->c45_ids = *c45_ids;
 	dev->bus = bus;
 	dev->dev.parent = bus->parent;
 	dev->dev.bus = &mdio_bus_type;
@@ -200,20 +203,99 @@ static struct phy_device* phy_device_create(struct mii_bus *bus,
 
 	return dev;
 }
+EXPORT_SYMBOL(phy_device_create);
+
+/**
+ * get_phy_c45_ids - reads the specified addr for its 802.3-c45 IDs.
+ * @bus: the target MII bus
+ * @addr: PHY address on the MII bus
+ * @phy_id: where to store the ID retrieved.
+ * @c45_ids: where to store the c45 ID information.
+ *
+ *   If the PHY devices-in-package appears to be valid, it and the
+ *   corresponding identifiers are stored in @c45_ids, zero is stored
+ *   in @phy_id.  Otherwise 0xffffffff is stored in @phy_id.  Returns
+ *   zero on success.
+ *
+ */
+static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
+			   struct phy_c45_device_ids *c45_ids) {
+	int phy_reg;
+	int i, reg_addr;
+	const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
+
+	/* Find first non-zero Devices In package.  Device
+	 * zero is reserved, so don't probe it.
+	 */
+	for (i = 1;
+	     i < num_ids && c45_ids->devices_in_package == 0;
+	     i++) {
+		reg_addr = MII_ADDR_C45 | i << 16 | 6;
+		phy_reg = mdiobus_read(bus, addr, reg_addr);
+		if (phy_reg < 0)
+			return -EIO;
+		c45_ids->devices_in_package = (phy_reg & 0xffff) << 16;
+
+		reg_addr = MII_ADDR_C45 | i << 16 | 5;
+		phy_reg = mdiobus_read(bus, addr, reg_addr);
+		if (phy_reg < 0)
+			return -EIO;
+		c45_ids->devices_in_package |= (phy_reg & 0xffff);
+
+		/* If mostly Fs, there is no device there,
+		 * let's get out of here.
+		 */
+		if ((c45_ids->devices_in_package & 0x1fffffff) == 0x1fffffff) {
+			*phy_id = 0xffffffff;
+			return 0;
+		}
+	}
+
+	/* Now probe Device Identifiers for each device present. */
+	for (i = 1; i < num_ids; i++) {
+		if (!(c45_ids->devices_in_package & (1 << i)))
+			continue;
+
+		reg_addr = MII_ADDR_C45 | i << 16 | MII_PHYSID1;
+		phy_reg = mdiobus_read(bus, addr, reg_addr);
+		if (phy_reg < 0)
+			return -EIO;
+		c45_ids->device_ids[i] = (phy_reg & 0xffff) << 16;
+
+		reg_addr = MII_ADDR_C45 | i << 16 | MII_PHYSID2;
+		phy_reg = mdiobus_read(bus, addr, reg_addr);
+		if (phy_reg < 0)
+			return -EIO;
+		c45_ids->device_ids[i] |= (phy_reg & 0xffff);
+	}
+	*phy_id = 0;
+	return 0;
+}
 
 /**
  * get_phy_id - reads the specified addr for its ID.
  * @bus: the target MII bus
  * @addr: PHY address on the MII bus
  * @phy_id: where to store the ID retrieved.
+ * @is_c45: If true the PHY uses the 802.3 clause 45 protocol
+ * @c45_ids: where to store the c45 ID information.
+ *
+ * Description: In the case of a 802.3-c22 PHY, reads the ID registers
+ *   of the PHY at @addr on the @bus, stores it in @phy_id and returns
+ *   zero on success.
+ *
+ *   In the case of a 802.3-c45 PHY, get_phy_c45_ids() is invoked, and
+ *   its return value is in turn returned.
  *
- * Description: Reads the ID registers of the PHY at @addr on the
- *   @bus, stores it in @phy_id and returns zero on success.
  */
-static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id)
+static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id,
+		      bool is_c45, struct phy_c45_device_ids *c45_ids)
 {
 	int phy_reg;
 
+	if (is_c45)
+		return get_phy_c45_ids(bus, addr, phy_id, c45_ids);
+
 	/* Grab the bits from PHYIR1, and put them
 	 * in the upper half */
 	phy_reg = mdiobus_read(bus, addr, MII_PHYSID1);
@@ -238,17 +320,19 @@ static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id)
  * get_phy_device - reads the specified PHY device and returns its @phy_device struct
  * @bus: the target MII bus
  * @addr: PHY address on the MII bus
+ * @is_c45: If true the PHY uses the 802.3 clause 45 protocol
  *
  * Description: Reads the ID registers of the PHY at @addr on the
  *   @bus, then allocates and returns the phy_device to represent it.
  */
-struct phy_device * get_phy_device(struct mii_bus *bus, int addr)
+struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45)
 {
 	struct phy_device *dev = NULL;
 	u32 phy_id;
+	struct phy_c45_device_ids c45_ids = {0};
 	int r;
 
-	r = get_phy_id(bus, addr, &phy_id);
+	r = get_phy_id(bus, addr, &phy_id, is_c45, &c45_ids);
 	if (r)
 		return ERR_PTR(r);
 
@@ -256,7 +340,7 @@ struct phy_device * get_phy_device(struct mii_bus *bus, int addr)
 	if ((phy_id & 0x1fffffff) == 0x1fffffff)
 		return NULL;
 
-	dev = phy_device_create(bus, addr, phy_id);
+	dev = phy_device_create(bus, addr, phy_id, is_c45, &c45_ids);
 
 	return dev;
 }
@@ -449,6 +533,11 @@ static int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 	/* Assume that if there is no driver, that it doesn't
 	 * exist, and we should use the genphy driver. */
 	if (NULL == d->driver) {
+		if (phydev->is_c45) {
+			pr_err("No driver for phy %x\n", phydev->phy_id);
+			return -ENODEV;
+		}
+
 		d->driver = &genphy_driver.driver;
 
 		err = d->driver->probe(d);
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index 2574abd..6c24cad 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -79,7 +79,7 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)
 				mdio->irq[addr] = PHY_POLL;
 		}
 
-		phy = get_phy_device(mdio, addr);
+		phy = get_phy_device(mdio, addr, false);
 		if (!phy || IS_ERR(phy)) {
 			dev_err(&mdio->dev, "error probing PHY at address %i\n",
 				addr);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index c291cae..597d05d 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -243,6 +243,15 @@ enum phy_state {
 	PHY_RESUMING
 };
 
+/**
+ * struct phy_c45_device_ids - 802.3-c45 Device Identifiers
+ * @devices_in_package: Bit vector of devices present.
+ * @device_ids: The device identifer for each present device.
+ */
+struct phy_c45_device_ids {
+	u32 devices_in_package;
+	u32 device_ids[8];
+};
 
 /* phy_device: An instance of a PHY
  *
@@ -250,6 +259,8 @@ enum phy_state {
  * bus: Pointer to the bus this PHY is on
  * dev: driver model device structure for this PHY
  * phy_id: UID for this device found during discovery
+ * c45_ids: 802.3-c45 Device Identifers if is_c45.
+ * is_c45:  Set to true if this phy uses clause 45 addressing.
  * state: state of the PHY for management purposes
  * dev_flags: Device-specific flags used by the PHY driver.
  * addr: Bus address of PHY
@@ -285,6 +296,9 @@ struct phy_device {
 
 	u32 phy_id;
 
+	struct phy_c45_device_ids c45_ids;
+	bool is_c45;
+
 	enum phy_state state;
 
 	u32 dev_flags;
@@ -480,7 +494,9 @@ static inline int phy_write(struct phy_device *phydev, u32 regnum, u16 val)
 	return mdiobus_write(phydev->bus, phydev->addr, regnum, val);
 }
 
-struct phy_device* get_phy_device(struct mii_bus *bus, int addr);
+struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
+		bool is_c45, struct phy_c45_device_ids *c45_ids);
+struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45);
 int phy_device_register(struct phy_device *phy);
 int phy_init_hw(struct phy_device *phydev);
 struct phy_device * phy_attach(struct net_device *dev,
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH v2 2/4] netdev/phy/of: Handle IEEE802.3 clause 45 Ethernet PHYs in of_mdiobus_register()
From: David Daney @ 2012-06-27 17:33 UTC (permalink / raw)
  To: Grant Likely, Rob Herring, devicetree-discuss, David S. Miller,
	netdev
  Cc: linux-kernel, linux-mips, afleming, David Daney
In-Reply-To: <1340818418-10382-1-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <david.daney@cavium.com>

Define two new "compatible" values for Ethernet
PHYs. "ethernet-phy-ieee802.3-c22" and "ethernet-phy-ieee802.3-c45"
are used to indicate a PHY uses the corresponding protocol.

If a PHY is "compatible" with "ethernet-phy-ieee802.3-c45", we
indicate this so that get_phy_device() can properly probe the device.

If get_phy_device() fails, it was probably due to failing the probe of
the PHY identifier registers.  Since we have the device tree telling
us the PHY exists, go ahead and add it anyhow with a phy_id of zero.
There may be a driver match based on the "compatible" property.

Signed-off-by: David Daney <david.daney@cavium.com>
---
 Documentation/devicetree/bindings/net/phy.txt |   12 +++++++++++-
 drivers/of/of_mdio.c                          |   16 ++++++++++++----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt
index bb8c742..7cd18fb 100644
--- a/Documentation/devicetree/bindings/net/phy.txt
+++ b/Documentation/devicetree/bindings/net/phy.txt
@@ -14,10 +14,20 @@ Required properties:
  - linux,phandle :  phandle for this node; likely referenced by an
    ethernet controller node.
 
+Optional Properties:
+
+- compatible: Compatible list, may contain
+  "ethernet-phy-ieee802.3-c22" or "ethernet-phy-ieee802.3-c45" for
+  PHYs that implement IEEE802.3 clause 22 or IEEE802.3 clause 45
+  specifications. If neither of these are specified, the default is to
+  assume clause 22. The compatible list may also contain other
+  elements.
+
 Example:
 
 ethernet-phy@0 {
-	linux,phandle = <2452000>
+	compatible = "ethernet-phy-ieee802.3-c22";
+	linux,phandle = <2452000>;
 	interrupt-parent = <40000>;
 	interrupts = <35 1>;
 	reg = <0>;
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index 6c24cad..8e6c25f 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -57,6 +57,7 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)
 		const __be32 *paddr;
 		u32 addr;
 		int len;
+		bool is_c45;
 
 		/* A PHY must have a reg property in the range [0-31] */
 		paddr = of_get_property(child, "reg", &len);
@@ -79,11 +80,18 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)
 				mdio->irq[addr] = PHY_POLL;
 		}
 
-		phy = get_phy_device(mdio, addr, false);
+		is_c45 = of_device_is_compatible(child,
+						 "ethernet-phy-ieee802.3-c45");
+		phy = get_phy_device(mdio, addr, is_c45);
+
 		if (!phy || IS_ERR(phy)) {
-			dev_err(&mdio->dev, "error probing PHY at address %i\n",
-				addr);
-			continue;
+			phy = phy_device_create(mdio, addr, 0, false, NULL);
+			if (!phy || IS_ERR(phy)) {
+				dev_err(&mdio->dev,
+					"error creating PHY at address %i\n",
+					addr);
+				continue;
+			}
 		}
 
 		/* Associate the OF node with the device structure so it
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH v2 3/4] netdev/phy/of: Add more methods for binding PHY devices to drivers.
From: David Daney @ 2012-06-27 17:33 UTC (permalink / raw)
  To: Grant Likely, Rob Herring, devicetree-discuss, David S. Miller,
	netdev
  Cc: linux-kernel, linux-mips, afleming, David Daney
In-Reply-To: <1340818418-10382-1-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <david.daney@cavium.com>

Allow PHY drivers to supply their own device matching function
(match_phy_device()), or to be matched OF compatible properties.

PHYs following IEEE802.3 clause 45 have more than one device
identifier constants, which breaks the default device matching code.
Other 10G PHYs don't follow the standard manufacturer/device
identifier register layout standards, but they do use the standard
MDIO bus protocols for register access.  Both of these require
adjustments to the PHY driver to device matching code.

If the there is an of_node associated with such a PHY, we can match it
to its driver using the "compatible" properties, just as we do with
certain platform devices.  If the "compatible" property match fails,
first check if there is a driver supplied matching function, and if
not fall back to the existing identifier matching rules.

Signed-off-by: David Daney <david.daney@cavium.com>
---
 drivers/net/phy/mdio_bus.c |    7 +++++++
 include/linux/phy.h        |    6 ++++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 2cee6d2..170eb41 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -25,6 +25,7 @@
 #include <linux/init.h>
 #include <linux/delay.h>
 #include <linux/device.h>
+#include <linux/of_device.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
 #include <linux/skbuff.h>
@@ -308,6 +309,12 @@ static int mdio_bus_match(struct device *dev, struct device_driver *drv)
 	struct phy_device *phydev = to_phy_device(dev);
 	struct phy_driver *phydrv = to_phy_driver(drv);
 
+	if (of_driver_match_device(dev, drv))
+		return 1;
+
+	if (phydrv->match_phy_device)
+		return phydrv->match_phy_device(phydev);
+
 	return ((phydrv->phy_id & phydrv->phy_id_mask) ==
 		(phydev->phy_id & phydrv->phy_id_mask));
 }
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 597d05d..7eac80a 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -426,6 +426,12 @@ struct phy_driver {
 	/* Clears up any memory if needed */
 	void (*remove)(struct phy_device *phydev);
 
+	/* Returns true if this is a suitable driver for the given
+	 * phydev.  If NULL, matching is based on phy_id and
+	 * phy_id_mask.
+	 */
+	int (*match_phy_device)(struct phy_device *phydev);
+
 	/* Handles ethtool queries for hardware time stamping. */
 	int (*ts_info)(struct phy_device *phydev, struct ethtool_ts_info *ti);
 
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH v2 4/4] netdev/phy: Add driver for Broadcom BCM87XX 10G Ethernet PHYs
From: David Daney @ 2012-06-27 17:33 UTC (permalink / raw)
  To: Grant Likely, Rob Herring, devicetree-discuss, David S. Miller,
	netdev
  Cc: linux-kernel, linux-mips, afleming, David Daney
In-Reply-To: <1340818418-10382-1-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <david.daney@cavium.com>

Add a driver for BCM8706 and BCM8727 devices.  These are a 10Gig PHYs
which use MII_ADDR_C45 addressing.  They are always 10G full duplex, so
there is no autonegotiation.  All we do is report link state and send
interrupts when it changes.

If the PHY has a device tree of_node associated with it, the
"broadcom,c45-reg-init" property is used to supply register
initialization values when config_init() is called.

Signed-off-by: David Daney <david.daney@cavium.com>
---
 .../devicetree/bindings/net/broadcom-bcm87xx.txt   |   29 +++
 drivers/net/phy/Kconfig                            |    5 +
 drivers/net/phy/Makefile                           |    1 +
 drivers/net/phy/bcm87xx.c                          |  238 ++++++++++++++++++++
 4 files changed, 273 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/broadcom-bcm87xx.txt
 create mode 100644 drivers/net/phy/bcm87xx.c

diff --git a/Documentation/devicetree/bindings/net/broadcom-bcm87xx.txt b/Documentation/devicetree/bindings/net/broadcom-bcm87xx.txt
new file mode 100644
index 0000000..7c86d5e
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/broadcom-bcm87xx.txt
@@ -0,0 +1,29 @@
+The Broadcom BCM87XX devices are a family of 10G Ethernet PHYs.  They
+have these bindings in addition to the standard PHY bindings.
+
+Compatible: Should contain "broadcom,bcm8706" or "broadcom,bcm8727" and
+            "ethernet-phy-ieee802.3-c45"
+
+Optional Properties:
+
+- broadcom,c45-reg-init : one of more sets of 4 cells.  The first cell
+  is the MDIO Manageable Device (MMD) address, the second a register
+  address within the MMD, the third cell contains a mask to be ANDed
+  with the existing register value, and the fourth cell is ORed with
+  he result to yield the new register value.  If the third cell has a
+  value of zero, no read of the existing value is performed.
+
+Example:
+
+	ethernet-phy@5 {
+		reg = <5>;
+		compatible = "broadcom,bcm8706", "ethernet-phy-ieee802.3-c45";
+		interrupt-parent = <&gpio>;
+		interrupts = <12 8>; /* Pin 12, active low */
+		/*
+		 * Set PMD Digital Control Register for
+		 * GPIO[1] Tx/Rx
+		 * GPIO[0] R64 Sync Acquired
+		 */
+		broadcom,c45-reg-init = <1 0xc808 0xff8f 0x70>;
+	};
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 944cdfb..3090dc6 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -67,6 +67,11 @@ config BCM63XX_PHY
 	---help---
 	  Currently supports the 6348 and 6358 PHYs.
 
+config BCM87XX_PHY
+	tristate "Driver for Broadcom BCM8706 and BCM8727 PHYs"
+	help
+	  Currently supports the BCM8706 and BCM8727 10G Ethernet PHYs.
+
 config ICPLUS_PHY
 	tristate "Drivers for ICPlus PHYs"
 	---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index f51af68..6d2dc6c 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_SMSC_PHY)		+= smsc.o
 obj-$(CONFIG_VITESSE_PHY)	+= vitesse.o
 obj-$(CONFIG_BROADCOM_PHY)	+= broadcom.o
 obj-$(CONFIG_BCM63XX_PHY)	+= bcm63xx.o
+obj-$(CONFIG_BCM87XX_PHY)	+= bcm87xx.o
 obj-$(CONFIG_ICPLUS_PHY)	+= icplus.o
 obj-$(CONFIG_REALTEK_PHY)	+= realtek.o
 obj-$(CONFIG_LSI_ET1011C_PHY)	+= et1011c.o
diff --git a/drivers/net/phy/bcm87xx.c b/drivers/net/phy/bcm87xx.c
new file mode 100644
index 0000000..f5f0562
--- /dev/null
+++ b/drivers/net/phy/bcm87xx.c
@@ -0,0 +1,238 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011 - 2012 Cavium, Inc.
+ */
+
+#include <linux/module.h>
+#include <linux/phy.h>
+#include <linux/of.h>
+
+#define PHY_ID_BCM8706	0x0143bdc1
+#define PHY_ID_BCM8727	0x0143bff0
+
+#define BCM87XX_PMD_RX_SIGNAL_DETECT	(MII_ADDR_C45 | 0x1000a)
+#define BCM87XX_10GBASER_PCS_STATUS	(MII_ADDR_C45 | 0x30020)
+#define BCM87XX_XGXS_LANE_STATUS	(MII_ADDR_C45 | 0x40018)
+
+#define BCM87XX_LASI_CONTROL (MII_ADDR_C45 | 0x39002)
+#define BCM87XX_LASI_STATUS (MII_ADDR_C45 | 0x39005)
+
+#if IS_ENABLED(CONFIG_OF_MDIO)
+/* Set and/or override some configuration registers based on the
+ * marvell,reg-init property stored in the of_node for the phydev.
+ *
+ * broadcom,c45-reg-init = <devid reg mask value>,...;
+ *
+ * There may be one or more sets of <devid reg mask value>:
+ *
+ * devid: which sub-device to use.
+ * reg: the register.
+ * mask: if non-zero, ANDed with existing register value.
+ * value: ORed with the masked value and written to the regiser.
+ *
+ */
+static int bcm87xx_of_reg_init(struct phy_device *phydev)
+{
+	const __be32 *paddr;
+	const __be32 *paddr_end;
+	int len, ret;
+
+	if (!phydev->dev.of_node)
+		return 0;
+
+	paddr = of_get_property(phydev->dev.of_node,
+				"broadcom,c45-reg-init", &len);
+	if (!paddr)
+		return 0;
+
+	paddr_end = paddr + (len /= sizeof(*paddr));
+
+	ret = 0;
+
+	while (paddr + 3 < paddr_end) {
+		u16 devid	= be32_to_cpup(paddr++);
+		u16 reg		= be32_to_cpup(paddr++);
+		u16 mask	= be32_to_cpup(paddr++);
+		u16 val_bits	= be32_to_cpup(paddr++);
+		int val;
+		u32 regnum = MII_ADDR_C45 | (devid << 16) | reg;
+		val = 0;
+		if (mask) {
+			val = phy_read(phydev, regnum);
+			if (val < 0) {
+				ret = val;
+				goto err;
+			}
+			val &= mask;
+		}
+		val |= val_bits;
+
+		ret = phy_write(phydev, regnum, val);
+		if (ret < 0)
+			goto err;
+	}
+err:
+	return ret;
+}
+#else
+static int bcm87xx_of_reg_init(struct phy_device *phydev)
+{
+	return 0;
+}
+#endif /* CONFIG_OF_MDIO */
+
+static int bcm87xx_config_init(struct phy_device *phydev)
+{
+	phydev->supported = SUPPORTED_10000baseR_FEC;
+	phydev->advertising = ADVERTISED_10000baseR_FEC;
+	phydev->state = PHY_NOLINK;
+
+	bcm87xx_of_reg_init(phydev);
+
+	return 0;
+}
+
+static int bcm87xx_config_aneg(struct phy_device *phydev)
+{
+	return -EINVAL;
+}
+
+static int bcm87xx_read_status(struct phy_device *phydev)
+{
+	int rx_signal_detect;
+	int pcs_status;
+	int xgxs_lane_status;
+
+	rx_signal_detect = phy_read(phydev, BCM87XX_PMD_RX_SIGNAL_DETECT);
+	if (rx_signal_detect < 0)
+		return rx_signal_detect;
+
+	if ((rx_signal_detect & 1) == 0)
+		goto no_link;
+
+	pcs_status = phy_read(phydev, BCM87XX_10GBASER_PCS_STATUS);
+	if (pcs_status < 0)
+		return pcs_status;
+
+	if ((pcs_status & 1) == 0)
+		goto no_link;
+
+	xgxs_lane_status = phy_read(phydev, BCM87XX_XGXS_LANE_STATUS);
+	if (xgxs_lane_status < 0)
+		return xgxs_lane_status;
+
+	if ((xgxs_lane_status & 0x1000) == 0)
+		goto no_link;
+
+	phydev->speed = 10000;
+	phydev->link = 1;
+	phydev->duplex = 1;
+	return 0;
+
+no_link:
+	phydev->link = 0;
+	return 0;
+}
+
+static int bcm87xx_config_intr(struct phy_device *phydev)
+{
+	int reg, err;
+
+	reg = phy_read(phydev, BCM87XX_LASI_CONTROL);
+
+	if (reg < 0)
+		return reg;
+
+	if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
+		reg |= 1;
+	else
+		reg &= ~1;
+
+	err = phy_write(phydev, BCM87XX_LASI_CONTROL, reg);
+	return err;
+}
+
+static int bcm87xx_did_interrupt(struct phy_device *phydev)
+{
+	int reg;
+
+	reg = phy_read(phydev, BCM87XX_LASI_STATUS);
+
+	if (reg < 0) {
+		dev_err(&phydev->dev,
+			"Error: Read of BCM87XX_LASI_STATUS failed: %d\n", reg);
+		return 0;
+	}
+	return (reg & 1) != 0;
+}
+
+static int bcm87xx_ack_interrupt(struct phy_device *phydev)
+{
+	/* Reading the LASI status clears it. */
+	bcm87xx_did_interrupt(phydev);
+	return 0;
+}
+
+static int bcm8706_match_phy_device(struct phy_device *phydev)
+{
+	return phydev->c45_ids.device_ids[4] == PHY_ID_BCM8706;
+}
+
+static int bcm8727_match_phy_device(struct phy_device *phydev)
+{
+	return phydev->c45_ids.device_ids[4] == PHY_ID_BCM8727;
+}
+
+static struct phy_driver bcm8706_driver = {
+	.phy_id		= PHY_ID_BCM8706,
+	.phy_id_mask	= 0xffffffff,
+	.name		= "Broadcom BCM8706",
+	.flags		= PHY_HAS_INTERRUPT,
+	.config_init	= bcm87xx_config_init,
+	.config_aneg	= bcm87xx_config_aneg,
+	.read_status	= bcm87xx_read_status,
+	.ack_interrupt	= bcm87xx_ack_interrupt,
+	.config_intr	= bcm87xx_config_intr,
+	.did_interrupt	= bcm87xx_did_interrupt,
+	.match_phy_device = bcm8706_match_phy_device,
+	.driver		= { .owner = THIS_MODULE },
+};
+
+static struct phy_driver bcm8727_driver = {
+	.phy_id		= PHY_ID_BCM8727,
+	.phy_id_mask	= 0xffffffff,
+	.name		= "Broadcom BCM8727",
+	.flags		= PHY_HAS_INTERRUPT,
+	.config_init	= bcm87xx_config_init,
+	.config_aneg	= bcm87xx_config_aneg,
+	.read_status	= bcm87xx_read_status,
+	.ack_interrupt	= bcm87xx_ack_interrupt,
+	.config_intr	= bcm87xx_config_intr,
+	.did_interrupt	= bcm87xx_did_interrupt,
+	.match_phy_device = bcm8727_match_phy_device,
+	.driver		= { .owner = THIS_MODULE },
+};
+
+static int __init bcm87xx_init(void)
+{
+	int ret;
+
+	ret = phy_driver_register(&bcm8706_driver);
+	if (ret)
+		goto err;
+
+	ret = phy_driver_register(&bcm8727_driver);
+err:
+	return ret;
+}
+module_init(bcm87xx_init);
+
+static void __exit bcm87xx_exit(void)
+{
+	phy_driver_unregister(&bcm8706_driver);
+	phy_driver_unregister(&bcm8727_driver);
+}
+module_exit(bcm87xx_exit);
-- 
1.7.2.3

^ permalink raw reply related

* Re: [PATCH] Build fix in drivers/net/wireless/ath/ath9k/main.c
From: John W. Linville @ 2012-06-27 18:27 UTC (permalink / raw)
  To: Arvydas Sidorenko
  Cc: mcgrof, jouni, vthiagar, senthilb, linux-wireless, ath9k-devel,
	netdev, linux-kernel
In-Reply-To: <1340824779-5157-1-git-send-email-asido4@gmail.com>

Thanks, but Sujith beat you to the same fix.

John

On Wed, Jun 27, 2012 at 09:19:39PM +0200, Arvydas Sidorenko wrote:
> Commit fad29cd2f59949581050a937786c2c9bc78b2f04 broke the build if
> no CONFIG_ATH9K_BTCOEX_SUPPORT is enabled.
> 
> Signed-off-by: Arvydas Sidorenko <asido4@gmail.com>
> ---
>  drivers/net/wireless/ath/ath9k/main.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
> index c14cf5a..e4e73f0 100644
> --- a/drivers/net/wireless/ath/ath9k/main.c
> +++ b/drivers/net/wireless/ath/ath9k/main.c
> @@ -151,8 +151,10 @@ static void __ath_cancel_work(struct ath_softc *sc)
>  	cancel_delayed_work_sync(&sc->tx_complete_work);
>  	cancel_delayed_work_sync(&sc->hw_pll_work);
>  
> +#ifdef CONFIG_ATH9K_BTCOEX_SUPPORT
>  	if (ath9k_hw_mci_is_enabled(sc->sc_ah))
>  		cancel_work_sync(&sc->mci_work);
> +#endif
>  }
>  
>  static void ath_cancel_work(struct ath_softc *sc)
> -- 
> 1.7.8.6
> 
> 

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH v2] l2tp: use per-cpu variables for u64_stats updates
From: Eric Dumazet @ 2012-06-27 19:03 UTC (permalink / raw)
  To: Tom Parkin; +Cc: netdev, David.Laight, James Chapman
In-Reply-To: <1340798457-28270-1-git-send-email-tparkin@katalix.com>

On Wed, 2012-06-27 at 13:00 +0100, Tom Parkin wrote:
> This patch fixes a race condition in l2tp when updating tunnel and
> session statistics.  Previously it was possible for multiple threads
> to concurrently call u64_stats_update*(), which lead to statistics
> readers blocking forever.
> 
> This race was discovered on an AMD64 SMP machine running a 32bit
> kernel.  Running "ip l2tp" while sending data over an Ethernet
> pseudowire resulted in an occasional soft lockup in
> u64_stats_fetch_begin() called from l2tp_nl_session_send().
> 
> For safe lockless update of l2tp stats, data is now stored in per-cpu
> variables.  These per-cpu datasets are then summed at read time via.
> an extra helper function l2tp_stats_copy() which has been added to
> l2tp_core.c.
> 

Do we really need 64bits stats on 32bit arches for l2tp ?

> Signed-off-by: Tom Parkin <tparkin@katalix.com>
> Signed-off-by: James Chapman <jchapman@katalix.com>
> ---
>  net/l2tp/l2tp_core.c    |  286 ++++++++++++++++++++++++++++-------------------
>  net/l2tp/l2tp_core.h    |   44 ++++++--
>  net/l2tp/l2tp_debugfs.c |   42 ++++---
>  net/l2tp/l2tp_netlink.c |   64 ++++-------
>  net/l2tp/l2tp_ppp.c     |   75 ++++++++-----
>  5 files changed, 301 insertions(+), 210 deletions(-)
> 
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 32b2155..ab2ffc0 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -320,6 +320,43 @@ struct l2tp_tunnel *l2tp_tunnel_find_nth(struct net *net, int nth)
>  }
>  EXPORT_SYMBOL_GPL(l2tp_tunnel_find_nth);
>  
> +/*
> + * Sum tunnel/session statistics across all CPUs
> + */
> +int l2tp_stats_copy(struct l2tp_stats *cpustats, struct l2tp_stats *dest)
> +{
> +	int i;
> +	unsigned int start;
> +
> +	if (!cpustats || !dest)
> +		return 1;
> +
> +	memset(dest, 0, sizeof(struct l2tp_stats));
> +
> +	for_each_possible_cpu(i) {
> +		struct l2tp_stats *stats = per_cpu_ptr(cpustats, i);
> +
> +		do {
> +			start = u64_stats_fetch_begin(&stats->tx.syncp);
> +			dest->tx.packets += stats->tx.packets;
> +			dest->tx.bytes += stats->tx.bytes;
> +			dest->tx.errors += stats->tx.errors;

You cant do the sum in 'dest', since if loop is restarted, you'll have
accumulation of all values.

> +		} while (u64_stats_fetch_retry(&stats->tx.syncp, start));
> +
> +		do {
> +			start = u64_stats_fetch_begin(&stats->rx.syncp);
> +			dest->rx.packets += stats->rx.packets;
> +			dest->rx.bytes += stats->rx.bytes;
> +			dest->rx.errors += stats->rx.errors;
> +			dest->rx.seq_discards += stats->rx.seq_discards;
> +			dest->rx.oos_packets += stats->rx.oos_packets;

same problem

> +		} while (u64_stats_fetch_retry(&stats->rx.syncp, start));
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(l2tp_stats_copy);
> +


>  
>  static void pppol2tp_copy_stats(struct pppol2tp_ioc_stats *dest,
> -				struct l2tp_stats *stats)
> +				struct l2tp_stats *cpustats)
>  {
> -	dest->tx_packets = stats->tx_packets;
> -	dest->tx_bytes = stats->tx_bytes;
> -	dest->tx_errors = stats->tx_errors;
> -	dest->rx_packets = stats->rx_packets;
> -	dest->rx_bytes = stats->rx_bytes;
> -	dest->rx_seq_discards = stats->rx_seq_discards;
> -	dest->rx_oos_packets = stats->rx_oos_packets;
> -	dest->rx_errors = stats->rx_errors;
> +	struct l2tp_stats tmp;
> +
> +	if (0 != l2tp_stats_copy(cpustats, &tmp))
> +		return;

	if (l2tp_stats_copy(cpustats, &tmp) != 0)
		return;

> +
> +	dest->tx_packets = tmp.tx.packets;
> +	dest->tx_bytes = tmp.tx.bytes;
> +	dest->tx_errors = tmp.tx.errors;
> +	dest->rx_packets = tmp.rx.packets;
> +	dest->rx_bytes = tmp.rx.bytes;
> +	dest->rx_seq_discards = tmp.rx.seq_discards;
> +	dest->rx_oos_packets = tmp.rx.oos_packets;
> +	dest->rx_errors = tmp.rx.errors;
>  

^ permalink raw reply

* [PATCH] Build fix in drivers/net/wireless/ath/ath9k/main.c
From: Arvydas Sidorenko @ 2012-06-27 19:19 UTC (permalink / raw)
  To: mcgrof
  Cc: jouni, vthiagar, senthilb, linville, linux-wireless, ath9k-devel,
	netdev, linux-kernel, asido4

Commit fad29cd2f59949581050a937786c2c9bc78b2f04 broke the build if
no CONFIG_ATH9K_BTCOEX_SUPPORT is enabled.

Signed-off-by: Arvydas Sidorenko <asido4@gmail.com>
---
 drivers/net/wireless/ath/ath9k/main.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index c14cf5a..e4e73f0 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -151,8 +151,10 @@ static void __ath_cancel_work(struct ath_softc *sc)
 	cancel_delayed_work_sync(&sc->tx_complete_work);
 	cancel_delayed_work_sync(&sc->hw_pll_work);
 
+#ifdef CONFIG_ATH9K_BTCOEX_SUPPORT
 	if (ath9k_hw_mci_is_enabled(sc->sc_ah))
 		cancel_work_sync(&sc->mci_work);
+#endif
 }
 
 static void ath_cancel_work(struct ath_softc *sc)
-- 
1.7.8.6

^ permalink raw reply related

* Re: [PATCH v2] sctp: be more restrictive in transport selection on bundled sacks
From: Vlad Yasevich @ 2012-06-27 19:44 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, David S. Miller, linux-sctp
In-Reply-To: <20120627172802.GA9250@neilslaptop.think-freely.org>

On 06/27/2012 01:28 PM, Neil Horman wrote:
> On Wed, Jun 27, 2012 at 11:10:26AM -0400, Vlad Yasevich wrote:
>> On 06/27/2012 10:23 AM, Neil Horman wrote:
>>> It was noticed recently that when we send data on a transport, its possible that
>>> we might bundle a sack that arrived on a different transport.  While this isn't
>>> a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
>>> 2960:
>>>
>>>   An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
>>>     etc.) to the same destination transport address from which it
>>>     received the DATA or control chunk to which it is replying.  This
>>>     rule should also be followed if the endpoint is bundling DATA chunks
>>>     together with the reply chunk.
>>>
>>> This patch seeks to correct that.  It restricts the bundling of sack operations
>>> to only those transports which have moved the ctsn of the association forward
>>> since the last sack.  By doing this we guarantee that we only bundle outbound
>>> saks on a transport that has received a chunk since the last sack.  This brings
>>> us into stricter compliance with the RFC.
>>>
>>> Vlad had initially suggested that we strictly allow only sack bundling on the
>>> transport that last moved the ctsn forward.  While this makes sense, I was
>>> concerned that doing so prevented us from bundling in the case where we had
>>> received chunks that moved the ctsn on multiple transports.  In those cases, the
>>> RFC allows us to select any of the transports having received chunks to bundle
>>> the sack on.  so I've modified the approach to allow for that, by adding a state
>>> variable to each transport that tracks weather it has moved the ctsn since the
>>> last sack.  This I think keeps our behavior (and performance), close enough to
>>> our current profile that I think we can do this without a sysctl knob to
>>> enable/disable it.
>>>
>>> Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
>>> CC: Vlad Yaseivch<vyasevich@gmail.com>
>>> CC: David S. Miller<davem@davemloft.net>
>>> Reported-by: Michele Baldessari<michele@redhat.com>
>>> Reported-by: sorin serban<sserban@redhat.com>
>>>
>>> ---
>>> Change Notes:
>>> V2)
>>> 	* Removed unused variable as per Dave M. Request
>>> 	* Delayed rwnd adjustment until we are sure we will sack (Vlad Y.)
>>> ---
>>>   include/net/sctp/structs.h |    5 ++++-
>>>   include/net/sctp/tsnmap.h  |    3 ++-
>>>   net/sctp/output.c          |   10 +++++++---
>>>   net/sctp/sm_make_chunk.c   |    7 +++++++
>>>   net/sctp/sm_sideeffect.c   |    2 +-
>>>   net/sctp/tsnmap.c          |    5 ++++-
>>>   net/sctp/ulpevent.c        |    3 ++-
>>>   net/sctp/ulpqueue.c        |    2 +-
>>>   8 files changed, 28 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>>> index e4652fe..712bf09 100644
>>> --- a/include/net/sctp/structs.h
>>> +++ b/include/net/sctp/structs.h
>>> @@ -910,7 +910,10 @@ struct sctp_transport {
>>>   		pmtu_pending:1,
>>>
>>>   		/* Is this structure kfree()able? */
>>> -		malloced:1;
>>> +		malloced:1,
>>> +
>>> +		/* Has this transport moved the ctsn since we last sacked */
>>> +		moved_ctsn:1;
>>>
>>>   	struct flowi fl;
>>>
>>> diff --git a/include/net/sctp/tsnmap.h b/include/net/sctp/tsnmap.h
>>> index e7728bc..2c5d2b4 100644
>>> --- a/include/net/sctp/tsnmap.h
>>> +++ b/include/net/sctp/tsnmap.h
>>> @@ -117,7 +117,8 @@ void sctp_tsnmap_free(struct sctp_tsnmap *map);
>>>   int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn);
>>>
>>>   /* Mark this TSN as seen.  */
>>> -int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn);
>>> +int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn,
>>> +		     struct sctp_transport *trans);
>>>
>>>   /* Mark this TSN and all lower as seen. */
>>>   void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
>>> diff --git a/net/sctp/output.c b/net/sctp/output.c
>>> index f1b7d4b..6c8cb9e 100644
>>> --- a/net/sctp/output.c
>>> +++ b/net/sctp/output.c
>>> @@ -240,17 +240,21 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
>>>   	 */
>>>   	if (sctp_chunk_is_data(chunk)&&   !pkt->has_sack&&
>>>   	!pkt->has_cookie_echo) {
>>> -		struct sctp_association *asoc;
>>>   		struct timer_list *timer;
>>> -		asoc = pkt->transport->asoc;
>>> +		struct sctp_association *asoc = pkt->transport->asoc;
>>> +
>>>   		timer =&asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
>>>
>>>   		/* If the SACK timer is running, we have a pending SACK */
>>>   		if (timer_pending(timer)) {
>>>   			struct sctp_chunk *sack;
>>> -			asoc->a_rwnd = asoc->rwnd;
>>> +
>>> +			if (chunk->transport&&   !chunk->transport->moved_ctsn)
>>> +				return retval;
>>> +
>>
>> I didn't think of this yesterday, but I think it would be much
>> better to use pkt->transport here since you are adding the chunk to
>> the packet and it will go out on the transport of the packet.  You
>> are also guaranteed that pkt->transport is set.
>>
> I don't think it really matters, as the chunk transport is used to lookup the
> packet that we append to, and if the chunk transport is unset, its somewhat
> questionable as to weather we should bundle, but if packet->transport is set,
> its probably worth it to avoid the extra conditional.
>

Just looked at the code flow.  chunk->transport may not be set until the 
end of sctp_packet_append_chunk.  For new data, transport may not be 
set.  For retransmitted data, transport is set to last transport data 
was sent on.  So, we could be looking at the wrong transport.  What you 
are trying to decided is if the current transport we will be used can 
take the SACK, but you may not be looking at the current transport. 
Looking at packet->transport is the correct thing to do.

-vlad

>>>   			sack = sctp_make_sack(asoc);
>>>   			if (sack) {
>>> +				asoc->a_rwnd = asoc->rwnd;
>>>   				retval = sctp_packet_append_chunk(pkt, sack);
>>>   				asoc->peer.sack_needed = 0;
>>>   				if (del_timer(timer))
>>> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>>> index a85eeeb..805babe 100644
>>> --- a/net/sctp/sm_make_chunk.c
>>> +++ b/net/sctp/sm_make_chunk.c
>>> @@ -736,6 +736,7 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
>>>   	__u16 num_gabs, num_dup_tsns;
>>>   	struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
>>>   	struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
>>> +	struct sctp_transport *trans;
>>>
>>>   	memset(gabs, 0, sizeof(gabs));
>>>   	ctsn = sctp_tsnmap_get_ctsn(map);
>>> @@ -805,6 +806,12 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
>>>   		sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
>>>   				 sctp_tsnmap_get_dups(map));
>>>
>>> +	/*
>>> +	 * Once we have a sack generated, clear the moved_tsn information
>>> +	 * from all the transports
>>> +	 */
>>> +	list_for_each_entry(trans,&asoc->peer.transport_addr_list, transports)
>>> +		trans->moved_ctsn = 0;
>>>   nodata:
>>>   	return retval;
>>>   }
>>> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
>>> index c96d1a8..8716da1 100644
>>> --- a/net/sctp/sm_sideeffect.c
>>> +++ b/net/sctp/sm_sideeffect.c
>>> @@ -1268,7 +1268,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
>>>   		case SCTP_CMD_REPORT_TSN:
>>>   			/* Record the arrival of a TSN.  */
>>>   			error = sctp_tsnmap_mark(&asoc->peer.tsn_map,
>>> -						 cmd->obj.u32);
>>> +						 cmd->obj.u32, NULL);
>>>   			break;
>>>
>>>   		case SCTP_CMD_REPORT_FWDTSN:
>>> diff --git a/net/sctp/tsnmap.c b/net/sctp/tsnmap.c
>>> index f1e40ceb..619c638 100644
>>> --- a/net/sctp/tsnmap.c
>>> +++ b/net/sctp/tsnmap.c
>>> @@ -114,7 +114,8 @@ int sctp_tsnmap_check(const struct sctp_tsnmap *map, __u32 tsn)
>>>
>>>
>>>   /* Mark this TSN as seen.  */
>>> -int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
>>> +int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn,
>>> +		     struct sctp_transport *trans)
>>>   {
>>>   	u16 gap;
>>>
>>> @@ -133,6 +134,8 @@ int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
>>>   		 */
>>>   		map->max_tsn_seen++;
>>>   		map->cumulative_tsn_ack_point++;
>>> +		if (trans)
>>> +			trans->moved_ctsn = 1;
>>>   		map->base_tsn++;
>>>   	} else {
>>>   		/* Either we already have a gap, or about to record a gap, so
>>> diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
>>> index 8a84017..33d8947 100644
>>> --- a/net/sctp/ulpevent.c
>>> +++ b/net/sctp/ulpevent.c
>>> @@ -715,7 +715,8 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
>>>   	 * can mark it as received so the tsn_map is updated correctly.
>>>   	 */
>>>   	if (sctp_tsnmap_mark(&asoc->peer.tsn_map,
>>> -			     ntohl(chunk->subh.data_hdr->tsn)))
>>> +			     ntohl(chunk->subh.data_hdr->tsn),
>>> +			     chunk->transport))
>>>   		goto fail_mark;
>>>
>>>   	/* First calculate the padding, so we don't inadvertently
>>> diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
>>> index f2d1de7..f5a6a4f 100644
>>> --- a/net/sctp/ulpqueue.c
>>> +++ b/net/sctp/ulpqueue.c
>>> @@ -1051,7 +1051,7 @@ void sctp_ulpq_renege(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk,
>>>   	if (chunk&&   (freed>= needed)) {
>>>   		__u32 tsn;
>>>   		tsn = ntohl(chunk->subh.data_hdr->tsn);
>>> -		sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
>>> +		sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn, chunk->transport);
>>>   		sctp_ulpq_tail_data(ulpq, chunk, gfp);
>>>
>>>   		sctp_ulpq_partial_delivery(ulpq, chunk, gfp);
>>
>> Also, I think you need to reset this bit in sctp_transport_reset().
>> Consider a potential association restart after SACKs have been
>> received.
>>
> Yeah, thats true.  I'll add that in.
>
> Thanks!
> Neil
>
>> -vlad
>>

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: avoid tx starvation by SYNACK packets
From: Florian Westphal @ 2012-06-27 19:50 UTC (permalink / raw)
  To: David Miller
  Cc: brouer, eric.dumazet, hans.schillstrom, subramanian.vijay,
	dave.taht, netdev, ncardwell, therbert, mph
In-Reply-To: <20120626.235423.588696200884989114.davem@davemloft.net>

David Miller <davem@davemloft.net> wrote:
> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Wed, 27 Jun 2012 08:32:13 +0200
> 
> > Using it as default, might be "dangerous" and open an attack vector
> > on SYN cookies in Linux.
> 
> If it's dangerous for syncookies then it's just as dangerous for
> the routing hash and the socket hashes where we use it already.
> Therefore, this sounds like a baseless claim to me.

I doubt using jhash is safe for syncookies.

There a several differences to other uses in kernel:
- all hash input except u32 cookie_secret[2] is known
- we transmit hash result (i.e, its visible to 3rd party)
- we do not re-seed the secret, ever

it should be quite easy to recompute cookie_secret[] from known syncookie
values?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox