Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] tipc: cleanup function namespace
From: Neil Horman @ 2010-10-15 11:01 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Paul Gortmaker, David Miller, netdev, allan.stephens, Jon Maloy
In-Reply-To: <20101014151333.1446a90c@nehalam>

On Thu, Oct 14, 2010 at 03:13:33PM -0700, Stephen Hemminger wrote:
> On Thu, 14 Oct 2010 17:44:27 -0400
> Paul Gortmaker <paul.gortmaker@windriver.com> wrote:
> 
> > [Re: [PATCH net-next] tipc: cleanup function namespace] On 14/10/2010 (Thu 11:33) Stephen Hemminger wrote:
> > 
> > > On Thu, 14 Oct 2010 13:53:21 -0400
> > > Paul Gortmaker <paul.gortmaker@windriver.com> wrote:
> > > 
> > > > On 10-10-13 09:29 PM, Neil Horman wrote:
> > > > > On Wed, Oct 13, 2010 at 08:23:24PM -0400, Paul Gortmaker wrote:
> > > > >> On 10-10-13 07:20 PM, Stephen Hemminger wrote:
> > > > >>> Do some cleanups of TIPC based on make namespacecheck
> > > > >>>     1. Don't export unused symbols
> > > > >>>     2. Eliminate dead code
> > > > >>>     3. Make functions and variables local
> > > > >>>     4. Rename buf_acquire to tipc_buf_acquire since it is used in several files
> > > > >>>
> > > > >>> Compile tested only.
> > > > >>> This make break out of tree kernel modules that depend on TIPC routines.
> > > > >>
> > > > >> Hi Stephen,
> > > > >>
> > > > >> When I first started looking at TIPC code, I too came to the
> > > > >> same conclusion as you did and was about to do #1,2,3 -- but
> > > > >> then I was told that the exported symbols were part of an API
> > > > >> and might be in use by folks here and there as per this thread:
> > > > >>
> > > > >> http://www.mail-archive.com/netdev@vger.kernel.org/msg30208.html
> > > > >>
> > > > > I think its telling the the argument in the above thread for keeping the API
> > > > > were that users of it were out there and 'likely to contribute' in the future.
> > > > > That thread was 3 years ago.  They might be using the API from outside the
> > > > > kernel tree, but they're not planning on contributing.  As Christoph noted,
> > > > > they're freeloaders.  The community really doesn't need or want to maintain an
> > > > > API like that.  If these users are your customers, and removing the API is
> > > > > unacceptable, perhaps its time to move the entire TIPC module out of tree.
> > > > 
> > > > As I'd said -- I don't know what the use cases of these API users are,
> > > > and so as far as I know they aren't customers either.  For what it is
> > > > worth, know that I personally wouldn't try and use a business case to
> > > > justify a technically wrong decision here on netdev anyway.
> > > > 
> > > > I was just describing the history of the situation, and suggesting
> > > > one possible slower approach of phasing it out as a courtesy to those
> > > > users, in the same way that the kernel community has extended that
> > > > same courtesy with other things in feature-removal.txt
> > > > 
> > > > In the end, since Jon is OK with the removal, and is in the process of
> > > > communicating this to the API users he is aware of, I sure don't have
> > > > any reason to try and save the API.  If folks are good with having it
> > > > just go away overnight, then great -- I'll be just as happy to see it
> > > > disappear as you and Stephen.  So, a long winded way of saying...
> > > > 
> > > > Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> > > 
> > > How about putting an entry in feature-removal.txt with a short (6 month)
> > > window?
> > 
> > I'm fine with that too.
> > 
> > P.
> > 
> > From 5a15a26de63a29fcb6cb7a7fb83b6d2fc63cbadb Mon Sep 17 00:00:00 2001
> > From: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Date: Thu, 14 Oct 2010 17:29:08 -0400
> > Subject: [PATCH] TIPC: Document the demise of the Native API for March 2011
> > 
> > The native API in the TIPC code exists as a bunch of functions
> > and exported symbols that aren't actually used by any currently
> > in-tree kernel code/modules.
> > 
> > Since this code is anomalous to the general guiding principle that
> > the kernel should not be libc, coverage tools and people intending
> > to do general cleanups keep finding this code and suggesting that
> > it be removed.
> > 
> > It seems the right thing to do is to just finally delete it once
> > and for all, after giving a reasonable window for any existing
> > users to find alternative solutions to their custom use case(s).
> > 
> > Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> > ---
> >  Documentation/feature-removal-schedule.txt |   12 ++++++++++++
> >  1 files changed, 12 insertions(+), 0 deletions(-)
> > 
> > diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
> > index f456389..1def37e 100644
> > --- a/Documentation/feature-removal-schedule.txt
> > +++ b/Documentation/feature-removal-schedule.txt
> > @@ -573,3 +573,15 @@ Why:	Hareware scan is the prefer method for iwlwifi devices for
> >  Who:	Wey-Yi Guy <wey-yi.w.guy@intel.com>
> >  
> >  ----------------------------
> > +
> > +What:	TIPC: Delete all code and exported symbols specific to Native API
> > +When:	March 2011
> > +Why:	The TIPC Native API, as described here:
> > +	http://tipc.sourceforge.net/doc/tipc_1.7_prog_guide.html#native_api
> > +	is implemented by exporting a bunch of otherwise unused functions
> > +	for possible modular linkage by custom end-user code.  This goes
> > +	against the general concept that the kernel should not be libc.
> > +
> > +Who:	Paul Gortmaker <paul.gortmaker@windriver.com>
> > +	
> > +----------------------------
> 
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>


^ permalink raw reply

* Re: [PATCH net-next 3/5] tipc: Optimizations to bearer enabling logic
From: Neil Horman @ 2010-10-15 11:00 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: davem, netdev, allan.stephens
In-Reply-To: <20101015011139.GB5048@windriver.com>

On Thu, Oct 14, 2010 at 09:11:51PM -0400, Paul Gortmaker wrote:
> [Re: [PATCH net-next 3/5] tipc: Optimizations to bearer enabling logic] On 13/10/2010 (Wed 10:58) Neil Horman wrote:
> 
> > On Tue, Oct 12, 2010 at 08:25:56PM -0400, Paul Gortmaker wrote:
> > > From: Allan Stephens <allan.stephens@windriver.com>
> > > 
> > > Introduces "enabling" state during activation of a new TIPC bearer,
> > > which supplements the existing "disabled" and "enabled" states.
> > > This change allows the new bearer to be added without having to
> > > temporarily block the processing of incoming packets on existing
> > > bearers during the binding of the new bearer to its associated
> > > interface. It also makes it unnecessary to zero out the entire
> > > bearer structure at the start of activation.
> > > 
> 
> [...]
> 
> > > +	b_ptr->state = BEARER_ENABLING;
> > >  	strcpy(b_ptr->publ.name, name);
> > > +	b_ptr->priority = priority;
> > > +
> > > +	write_unlock_bh(&tipc_net_lock);
> > Why the 3rd state?  Doesn't seem needed. 
> 
> I'm a bit disappointed in myself for also not noticing that it
> was set but never tested for.  The following should give the
> same end result but without the obfuscation of an extra state.
> 
> This one also doesn't explicitly depend on any other changes,
> so if it is now OK, the option is there for it to be applied
> independently of the others that haven't been reworked yet.
> 
> Thanks,
> Paul.
> 
> 
> From 86d0d5c92439d0a3f5a0f165aa8bd842d377dae9 Mon Sep 17 00:00:00 2001
> From: Allan Stephens <allan.stephens@windriver.com>
> Date: Thu, 14 Oct 2010 16:09:23 -0400
> Subject: [PATCH] tipc: Optimizations to bearer enabling logic
> 
> Allow new bearers to be added without having to temporarily block
> the processing of incoming packets on existing bearers during the
> binding of the new bearer to its associated interface. Eliminates
> zeroing out of the new bearer structure at the start of activation,
> since it is already in that state.
> 
> Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  net/tipc/bearer.c |    7 ++++---
>  1 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
> index fd9c06c..2ff8181 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -556,14 +556,15 @@ restart:
>  	}
>  
>  	b_ptr = &tipc_bearers[bearer_id];
> -	memset(b_ptr, 0, sizeof(struct bearer));
> -
>  	strcpy(b_ptr->publ.name, name);
> +
> +	write_unlock_bh(&tipc_net_lock);
>  	res = m_ptr->enable_bearer(&b_ptr->publ);
>  	if (res) {
>  		warn("Bearer <%s> rejected, enable failure (%d)\n", name, -res);
> -		goto failed;
> +		return res;
>  	}
> +	write_lock_bh(&tipc_net_lock);
>  
This definately looks more concise, but I don't see why its necessecary to drop
the tipc_net_lock around the call to enable_bearer.  The only caler of
tipc_register_media sets the enable_bearer pointer to the enable_bearer
function, and  I' don't see any path through that function which would
potentially retake that lock.  In fact it seems dropping it might be dangerous,
if other paths (like from cfg_named_msg_event), tried to enable a bearer in
parallel with a user space directive from the netlink socket).  With out the
protection of tipc_net_lock, you could use the same eth_bearer twice, and
corrupt that array.

Neil

>  	b_ptr->identity = bearer_id;
>  	b_ptr->media = m_ptr;
> -- 
> 1.7.2.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [PATCH 5/7] can: mcp251x: write intf only when needed
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
  To: socketcan-core; +Cc: netdev, Marc Kleine-Budde
In-Reply-To: <1287139762-23356-1-git-send-email-mkl@pengutronix.de>

This patch introduces a variable "clear_intf" that hold the bits that
should be cleared. Only read-modify-write register if "clear_intf"
is set.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/mcp251x.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index 7e2f951..f5e2edd 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -125,6 +125,8 @@
 #  define CANINTF_TX0IF 0x04
 #  define CANINTF_RX1IF 0x02
 #  define CANINTF_RX0IF 0x01
+#  define CANINTF_ERR_TX \
+	(CANINTF_ERRIF | CANINTF_TX2IF | CANINTF_TX1IF | CANINTF_TX0IF)
 #define EFLG	      0x2d
 #  define EFLG_EWARN	0x01
 #  define EFLG_RXWAR	0x02
@@ -769,10 +771,12 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 	while (!priv->force_quit) {
 		enum can_state new_state;
 		u8 intf, eflag;
+		u8 clear_intf = 0;
 		int can_id = 0, data1 = 0;
 
 		mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);
 
+		/* receive buffer 0 */
 		if (intf & CANINTF_RX0IF) {
 			mcp251x_hw_rx(spi, 0);
 			/* Free one buffer ASAP */
@@ -780,10 +784,17 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 					   0x00);
 		}
 
-		if (intf & CANINTF_RX1IF)
+		/* receive buffer 1 */
+		if (intf & CANINTF_RX1IF) {
 			mcp251x_hw_rx(spi, 1);
+			clear_intf |= CANINTF_RX1IF;
+		}
 
-		mcp251x_write_bits(spi, CANINTF, intf, 0x00);
+		/* any error or tx interrupt we need to clear? */
+		if (intf & CANINTF_ERR_TX)
+			clear_intf |= intf & CANINTF_ERR_TX;
+		if (clear_intf)
+			mcp251x_write_bits(spi, CANINTF, clear_intf, 0x00);
 
 		if (eflag)
 			mcp251x_write_bits(spi, EFLG, eflag, 0x00);
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 7/7] can: mcp251x: optimize 2515, rx int gets cleared automatically
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
  To: socketcan-core; +Cc: netdev, Marc Kleine-Budde
In-Reply-To: <1287139762-23356-1-git-send-email-mkl@pengutronix.de>

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/mcp251x.c |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index 28281f9..a42c3fa 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -787,15 +787,20 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 		/* receive buffer 0 */
 		if (intf & CANINTF_RX0IF) {
 			mcp251x_hw_rx(spi, 0);
-			/* Free one buffer ASAP */
-			mcp251x_write_bits(spi, CANINTF, intf & CANINTF_RX0IF,
-					   0x00);
+			/*
+			 * Free one buffer ASAP
+			 * (The MCP2515 does this automatically.)
+			 */
+			if (mcp251x_is_2510(spi))
+				mcp251x_write_bits(spi, CANINTF, CANINTF_RX0IF, 0x00);
 		}
 
 		/* receive buffer 1 */
 		if (intf & CANINTF_RX1IF) {
 			mcp251x_hw_rx(spi, 1);
-			clear_intf |= CANINTF_RX1IF;
+			/* the MCP2515 does this automatically */
+			if (mcp251x_is_2510(spi))
+				clear_intf |= CANINTF_RX1IF;
 		}
 
 		/* any error or tx interrupt we need to clear? */
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 6/7] can: mcp251x: define helper functions mcp251x_is_2510, mcp251x_is_2515
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287139762-23356-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
 drivers/net/can/mcp251x.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index f5e2edd..28281f9 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -252,6 +252,16 @@ struct mcp251x_priv {
 	int restart_tx;
 };
 
+#define MCP251X_IS(_model) \
+static inline int mcp251x_is_##_model(struct spi_device *spi) \
+{ \
+	struct mcp251x_platform_data *pdata = spi->dev.platform_data; \
+	return pdata->model == CAN_MCP251X_MCP##_model; \
+}
+
+MCP251X_IS(2510);
+MCP251X_IS(2515);
+
 static void mcp251x_clean(struct net_device *net)
 {
 	struct mcp251x_priv *priv = netdev_priv(net);
@@ -362,10 +372,9 @@ static void mcp251x_write_bits(struct spi_device *spi, u8 reg,
 static void mcp251x_hw_tx_frame(struct spi_device *spi, u8 *buf,
 				int len, int tx_buf_idx)
 {
-	struct mcp251x_platform_data *pdata = spi->dev.platform_data;
 	struct mcp251x_priv *priv = dev_get_drvdata(&spi->dev);
 
-	if (pdata->model == CAN_MCP251X_MCP2510) {
+	if (mcp251x_is_2510(spi)) {
 		int i;
 
 		for (i = 1; i < TXBDAT_OFF + len; i++)
@@ -408,9 +417,8 @@ static void mcp251x_hw_rx_frame(struct spi_device *spi, u8 *buf,
 				int buf_idx)
 {
 	struct mcp251x_priv *priv = dev_get_drvdata(&spi->dev);
-	struct mcp251x_platform_data *pdata = spi->dev.platform_data;
 
-	if (pdata->model == CAN_MCP251X_MCP2510) {
+	if (mcp251x_is_2510(spi)) {
 		int i, len;
 
 		for (i = 1; i < RXBDAT_OFF; i++)
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 4/7] can: mcp251x: read-modify-write eflag only when needed
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287139762-23356-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

From: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

Use read-modify-write instead of a simple write to change the register
contents, to close existing the race window between the original manual
read and write.

Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
 drivers/net/can/mcp251x.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index 9b3466a..7e2f951 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -785,7 +785,8 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 
 		mcp251x_write_bits(spi, CANINTF, intf, 0x00);
 
-		mcp251x_write_reg(spi, EFLG, 0x00);
+		if (eflag)
+			mcp251x_write_bits(spi, EFLG, eflag, 0x00);
 
 		/* Update can state */
 		if (eflag & EFLG_TXBO) {
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 3/7] can: mcp251x: allow to read two registers in one spi transfer
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde,
	Uwe Kleine-König
In-Reply-To: <1287139762-23356-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

From: Sascha Hauer <s.hauer@pengutronix.de>

This patch bases on work done earlier by David Jander.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: David Jander <david@protonic.nl>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/mcp251x.c |   20 +++++++++++++++++---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index fdea752..9b3466a 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -319,6 +319,20 @@ static u8 mcp251x_read_reg(struct spi_device *spi, uint8_t reg)
 	return val;
 }
 
+static void mcp251x_read_2regs(struct spi_device *spi, uint8_t reg,
+		uint8_t *v1, uint8_t *v2)
+{
+	struct mcp251x_priv *priv = dev_get_drvdata(&spi->dev);
+
+	priv->spi_tx_buf[0] = INSTRUCTION_READ;
+	priv->spi_tx_buf[1] = reg;
+
+	mcp251x_spi_trans(spi, 4);
+
+	*v1 = priv->spi_rx_buf[2];
+	*v2 = priv->spi_rx_buf[3];
+}
+
 static void mcp251x_write_reg(struct spi_device *spi, u8 reg, uint8_t val)
 {
 	struct mcp251x_priv *priv = dev_get_drvdata(&spi->dev);
@@ -754,10 +768,11 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 	mutex_lock(&priv->mcp_lock);
 	while (!priv->force_quit) {
 		enum can_state new_state;
-		u8 intf = mcp251x_read_reg(spi, CANINTF);
-		u8 eflag;
+		u8 intf, eflag;
 		int can_id = 0, data1 = 0;
 
+		mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);
+
 		if (intf & CANINTF_RX0IF) {
 			mcp251x_hw_rx(spi, 0);
 			/* Free one buffer ASAP */
@@ -770,7 +785,6 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 
 		mcp251x_write_bits(spi, CANINTF, intf, 0x00);
 
-		eflag = mcp251x_read_reg(spi, EFLG);
 		mcp251x_write_reg(spi, EFLG, 0x00);
 
 		/* Update can state */
-- 
1.7.0.4

_______________________________________________
Socketcan-core mailing list
Socketcan-core@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/socketcan-core

^ permalink raw reply related

* [PATCH 2/7] can: mcp251x: increase rx_errors on overflow, not only rx_over_errors
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287139762-23356-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

From: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
 drivers/net/can/mcp251x.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index c06e023..fdea752 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -816,10 +816,14 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 		if (intf & CANINTF_ERRIF) {
 			/* Handle overflow counters */
 			if (eflag & (EFLG_RX0OVR | EFLG_RX1OVR)) {
-				if (eflag & EFLG_RX0OVR)
+				if (eflag & EFLG_RX0OVR) {
 					net->stats.rx_over_errors++;
-				if (eflag & EFLG_RX1OVR)
+					net->stats.rx_errors++;
+				}
+				if (eflag & EFLG_RX1OVR) {
 					net->stats.rx_over_errors++;
+					net->stats.rx_errors++;
+				}
 				can_id |= CAN_ERR_CRTL;
 				data1 |= CAN_ERR_CRTL_RX_OVERFLOW;
 			}
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 1/7] can: mcp251x: fix NOHZ local_softirq_pending 08 warning
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287139762-23356-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

This patch replaces netif_rx() with netif_rx_ni() which has to be used
from the threaded interrupt i.e. process context context.

Thanks to Christian Pellegrin for pointing at the right fix:
481a8199142c050b72bff8a1956a49fd0a75bbe0 by Oliver Hartkopp.

Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
 drivers/net/can/mcp251x.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index b11a0cb..c06e023 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -451,7 +451,7 @@ static void mcp251x_hw_rx(struct spi_device *spi, int buf_idx)
 
 	priv->net->stats.rx_packets++;
 	priv->net->stats.rx_bytes += frame->can_dlc;
-	netif_rx(skb);
+	netif_rx_ni(skb);
 }
 
 static void mcp251x_hw_sleep(struct spi_device *spi)
@@ -676,7 +676,7 @@ static void mcp251x_error_skb(struct net_device *net, int can_id, int data1)
 	if (skb) {
 		frame->can_id = can_id;
 		frame->data[1] = data1;
-		netif_rx(skb);
+		netif_rx_ni(skb);
 	} else {
 		dev_err(&net->dev,
 			"cannot allocate error skb\n");
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH V2 0/7] can: mcp251x: fix and optimize driver
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

Moin,

this series of patches improves the mcp251x driver. It first fixes the
local_softirq_pending problem. Then the amount of SPI transfers is reduced
in order to optimise the driver.

This series has been tested with a mcp2515 on i.MX35.

Changes since V1:
- Fix broken encoding in S-o-b

Please review, test and consider to apply.

regards, Marc

---

The following changes since commit cd2638a86c7b90e77ce623c09de2a26177f2a5c1:
  Carolyn Wyborny (1):
        igb: add check for fiber/serdes devices to igb_set_spd_dplx;

are available in the git repository at:

  git://git.pengutronix.de/git/mkl/linux-2.6.git can/mcp251x-for-net-next

Marc Kleine-Budde (4):
      can: mcp251x: fix NOHZ local_softirq_pending 08 warning
      can: mcp251x: write intf only when needed
      can: mcp251x: define helper functions mcp251x_is_2510, mcp251x_is_2515
      can: mcp251x: optimize 2515, rx int gets cleared automatically

Sascha Hauer (3):
      can: mcp251x: increase rx_errors on overflow, not only rx_over_errors
      can: mcp251x: allow to read two registers in one spi transfer
      can: mcp251x: read-modify-write eflag only when needed

 drivers/net/can/mcp251x.c |   77 +++++++++++++++++++++++++++++++++++----------
 1 files changed, 60 insertions(+), 17 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next 2/5] tipc: Simplify bearer shutdown logic
From: Neil Horman @ 2010-10-15 10:48 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: davem, netdev, allan.stephens
In-Reply-To: <20101014235825.GA5048@windriver.com>

On Thu, Oct 14, 2010 at 07:58:26PM -0400, Paul Gortmaker wrote:
> [Re: [PATCH net-next 2/5] tipc: Simplify bearer shutdown logic] On 13/10/2010 (Wed 10:39) Neil Horman wrote:
> 
> > On Tue, Oct 12, 2010 at 08:25:55PM -0400, Paul Gortmaker wrote:
> > > From: Allan Stephens <allan.stephens@windriver.com>
> > > 
> > > Disable all active bearers when TIPC is shut down without having to do
> > > a name-based search to locate each bearer object.
> > > 
> > It seems like you're doing a good deal more in this patch than just disabling
> > all active bearers without doing a name search.  The description is implemented
> > in the for loop of tipc_bearer_stop.  Whats the rest of it for?
> 
> It seems the original needlessly bloated out the patch size by
> swapping the order of tipc_bearer_find_interface & bearer_find
> in the file (now fixed) - and you are right, the locking change
> wasn't properly covered in the commit log.  The extra test you'd
> suggested tossing out is also now gone.
> 
> This change doesn't explicitly depend on any other changes,
> so if it is now OK, the option is there for it to be applied
> independently of the others that haven't been reworked yet.
> 
> Thanks,
> Paul.
> 
> 
> From 1771ad642cb076dbeb71e3533a25cb2f07df9cd8 Mon Sep 17 00:00:00 2001
> From: Allan Stephens <allan.stephens@windriver.com>
> Date: Sat, 4 Sep 2010 09:29:04 -0400
> Subject: [PATCH] tipc: Simplify bearer shutdown logic
> 
> Optimize processing in TIPC's bearer shutdown code, including:
> 
> 1. Remove an unnecessary check to see if TIPC bearer's can exist.
> 2. Don't release spinlocks before calling a media-specific disabling
> routine, since the routine can't sleep.
> 3. Make bearer_disable() operate directly on a struct bearer, instead
> of needlessly taking a name and then mapping that to the struct.
> 
> Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  net/tipc/bearer.c |   38 +++++++++++---------------------------
>  1 files changed, 11 insertions(+), 27 deletions(-)
> 
> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
> index 9c10c6b..fd9c06c 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -288,9 +288,6 @@ static struct bearer *bearer_find(const char *name)
>  	struct bearer *b_ptr;
>  	u32 i;
>  
> -	if (tipc_mode != TIPC_NET_MODE)
> -		return NULL;
> -
>  	for (i = 0, b_ptr = tipc_bearers; i < MAX_BEARERS; i++, b_ptr++) {
>  		if (b_ptr->active && (!strcmp(b_ptr->publ.name, name)))
>  			return b_ptr;
> @@ -630,30 +627,17 @@ int tipc_block_bearer(const char *name)
>   * Note: This routine assumes caller holds tipc_net_lock.
>   */
>  
> -static int bearer_disable(const char *name)
> +static int bearer_disable(struct bearer *b_ptr)
>  {
> -	struct bearer *b_ptr;
>  	struct link *l_ptr;
>  	struct link *temp_l_ptr;
>  
> -	b_ptr = bearer_find(name);
> -	if (!b_ptr) {
> -		warn("Attempt to disable unknown bearer <%s>\n", name);
> -		return -EINVAL;
> -	}
> -
> -	info("Disabling bearer <%s>\n", name);
> +	info("Disabling bearer <%s>\n", b_ptr->publ.name);
>  	tipc_disc_stop_link_req(b_ptr->link_req);
>  	spin_lock_bh(&b_ptr->publ.lock);
>  	b_ptr->link_req = NULL;
>  	b_ptr->publ.blocked = 1;
> -	if (b_ptr->media->disable_bearer) {
> -		spin_unlock_bh(&b_ptr->publ.lock);
> -		write_unlock_bh(&tipc_net_lock);
> -		b_ptr->media->disable_bearer(&b_ptr->publ);
> -		write_lock_bh(&tipc_net_lock);
> -		spin_lock_bh(&b_ptr->publ.lock);
> -	}
> +	b_ptr->media->disable_bearer(&b_ptr->publ);
>  	list_for_each_entry_safe(l_ptr, temp_l_ptr, &b_ptr->links, link_list) {
>  		tipc_link_delete(l_ptr);
>  	}
> @@ -664,10 +648,16 @@ static int bearer_disable(const char *name)
>  
>  int tipc_disable_bearer(const char *name)
>  {
> +	struct bearer *b_ptr;
>  	int res;
>  
>  	write_lock_bh(&tipc_net_lock);
> -	res = bearer_disable(name);
> +	b_ptr = bearer_find(name);
> +	if (b_ptr == NULL) {
> +		warn("Attempt to disable unknown bearer <%s>\n", name);
> +		res = -EINVAL;
> +	} else
> +		res = bearer_disable(b_ptr);
>  	write_unlock_bh(&tipc_net_lock);
>  	return res;
>  }
> @@ -680,13 +670,7 @@ void tipc_bearer_stop(void)
>  
>  	for (i = 0; i < MAX_BEARERS; i++) {
>  		if (tipc_bearers[i].active)
> -			tipc_bearers[i].publ.blocked = 1;
> -	}
> -	for (i = 0; i < MAX_BEARERS; i++) {
> -		if (tipc_bearers[i].active)
> -			bearer_disable(tipc_bearers[i].publ.name);
> +			bearer_disable(&tipc_bearers[i]);
>  	}
>  	media_count = 0;
>  }
> -
> -
> -- 
> 1.7.2.1
> 
> 

Yes, this looks much better, thank you.
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>


^ permalink raw reply

* [PATCH 2/7] can: mcp251x: increase rx_errors on overflow, not only rx_over_errors
From: Marc Kleine-Budde @ 2010-10-15 10:34 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287138845-20561-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

From: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
 drivers/net/can/mcp251x.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index c06e023..fdea752 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -816,10 +816,14 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 		if (intf & CANINTF_ERRIF) {
 			/* Handle overflow counters */
 			if (eflag & (EFLG_RX0OVR | EFLG_RX1OVR)) {
-				if (eflag & EFLG_RX0OVR)
+				if (eflag & EFLG_RX0OVR) {
 					net->stats.rx_over_errors++;
-				if (eflag & EFLG_RX1OVR)
+					net->stats.rx_errors++;
+				}
+				if (eflag & EFLG_RX1OVR) {
 					net->stats.rx_over_errors++;
+					net->stats.rx_errors++;
+				}
 				can_id |= CAN_ERR_CRTL;
 				data1 |= CAN_ERR_CRTL_RX_OVERFLOW;
 			}
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 1/7] can: mcp251x: fix NOHZ local_softirq_pending 08 warning
From: Marc Kleine-Budde @ 2010-10-15 10:33 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287138845-20561-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

This patch replaces netif_rx() with netif_rx_ni() which has to be used
from the threaded interrupt i.e. process context context.

Thanks to Christian Pellegrin for pointing at the right fix:
481a8199142c050b72bff8a1956a49fd0a75bbe0 by Oliver Hartkopp.

Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
 drivers/net/can/mcp251x.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index b11a0cb..c06e023 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -451,7 +451,7 @@ static void mcp251x_hw_rx(struct spi_device *spi, int buf_idx)
 
 	priv->net->stats.rx_packets++;
 	priv->net->stats.rx_bytes += frame->can_dlc;
-	netif_rx(skb);
+	netif_rx_ni(skb);
 }
 
 static void mcp251x_hw_sleep(struct spi_device *spi)
@@ -676,7 +676,7 @@ static void mcp251x_error_skb(struct net_device *net, int can_id, int data1)
 	if (skb) {
 		frame->can_id = can_id;
 		frame->data[1] = data1;
-		netif_rx(skb);
+		netif_rx_ni(skb);
 	} else {
 		dev_err(&net->dev,
 			"cannot allocate error skb\n");
-- 
1.7.0.4

^ permalink raw reply related

* (unknown), 
From: Marc Kleine-Budde @ 2010-10-15 10:33 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

Moin,

this series of patches improves the mcp251x driver. It first fixes the
local_softirq_pending problem. Then the amount of SPI transfers is reduced
in order to optimise the driver.

This series has been tested with a mcp2515 on i.MX35.

Please review and test,
cheers, Marc


The following changes since commit cd2638a86c7b90e77ce623c09de2a26177f2a5c1:
  Carolyn Wyborny (1):
        igb: add check for fiber/serdes devices to igb_set_spd_dplx;

are available in the git repository at:

  git://git.pengutronix.de/git/mkl/linux-2.6.git can/mcp251x-for-net-next

Marc Kleine-Budde (4):
      can: mcp251x: fix NOHZ local_softirq_pending 08 warning
      can: mcp251x: write intf only when needed
      can: mcp251x: define helper functions mcp251x_is_2510, mcp251x_is_2515
      can: mcp251x: optimize 2515, rx int gets cleared automatically

Sascha Hauer (3):
      can: mcp251x: increase rx_errors on overflow, not only rx_over_errors
      can: mcp251x: allow to read two registers in one spi transfer
      can: mcp251x: read-modify-write eflag only when needed

 drivers/net/can/mcp251x.c |   77 +++++++++++++++++++++++++++++++++++----------
 1 files changed, 60 insertions(+), 17 deletions(-)

^ permalink raw reply

* [PATCH] connector: remove lazy workqueue creation
From: Tejun Heo @ 2010-10-15  9:55 UTC (permalink / raw)
  To: Evgeniy Polyakov, netdev@vger.kernel.org, Frederic Weisbecker,
	David S. Miller

Commit 1a5645bc (connector: create connector workqueue only while
needed once) implements lazy workqueue creation for connector
workqueue.  With cmwq now in place, lazy workqueue creation doesn't
make much sense while adding a lot of complexity.  Remove it and
allocate an ordered workqueue during initialization.

This also removes a call to flush_scheduled_work() which is deprecated
and scheduled to be removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
---
 drivers/connector/cn_queue.c  |   75 ++++--------------------------------------
 drivers/connector/connector.c |    9 ++---
 include/linux/connector.h     |    8 ----
 3 files changed, 12 insertions(+), 80 deletions(-)

Index: work/drivers/connector/cn_queue.c
===================================================================
--- work.orig/drivers/connector/cn_queue.c
+++ work/drivers/connector/cn_queue.c
@@ -31,48 +31,6 @@
 #include <linux/connector.h>
 #include <linux/delay.h>

-
-/*
- * This job is sent to the kevent workqueue.
- * While no event is once sent to any callback, the connector workqueue
- * is not created to avoid a useless waiting kernel task.
- * Once the first event is received, we create this dedicated workqueue which
- * is necessary because the flow of data can be high and we don't want
- * to encumber keventd with that.
- */
-static void cn_queue_create(struct work_struct *work)
-{
-	struct cn_queue_dev *dev;
-
-	dev = container_of(work, struct cn_queue_dev, wq_creation);
-
-	dev->cn_queue = create_singlethread_workqueue(dev->name);
-	/* If we fail, we will use keventd for all following connector jobs */
-	WARN_ON(!dev->cn_queue);
-}
-
-/*
- * Queue a data sent to a callback.
- * If the connector workqueue is already created, we queue the job on it.
- * Otherwise, we queue the job to kevent and queue the connector workqueue
- * creation too.
- */
-int queue_cn_work(struct cn_callback_entry *cbq, struct work_struct *work)
-{
-	struct cn_queue_dev *pdev = cbq->pdev;
-
-	if (likely(pdev->cn_queue))
-		return queue_work(pdev->cn_queue, work);
-
-	/* Don't create the connector workqueue twice */
-	if (atomic_inc_return(&pdev->wq_requested) == 1)
-		schedule_work(&pdev->wq_creation);
-	else
-		atomic_dec(&pdev->wq_requested);
-
-	return schedule_work(work);
-}
-
 void cn_queue_wrapper(struct work_struct *work)
 {
 	struct cn_callback_entry *cbq =
@@ -111,11 +69,7 @@ cn_queue_alloc_callback_entry(char *name

 static void cn_queue_free_callback(struct cn_callback_entry *cbq)
 {
-	/* The first jobs have been sent to kevent, flush them too */
-	flush_scheduled_work();
-	if (cbq->pdev->cn_queue)
-		flush_workqueue(cbq->pdev->cn_queue);
-
+	flush_workqueue(cbq->pdev->cn_queue);
 	kfree(cbq);
 }

@@ -193,11 +147,14 @@ struct cn_queue_dev *cn_queue_alloc_dev(
 	atomic_set(&dev->refcnt, 0);
 	INIT_LIST_HEAD(&dev->queue_list);
 	spin_lock_init(&dev->queue_lock);
-	init_waitqueue_head(&dev->wq_created);

 	dev->nls = nls;

-	INIT_WORK(&dev->wq_creation, cn_queue_create);
+	dev->cn_queue = alloc_ordered_workqueue(dev->name, 0);
+	if (!dev->cn_queue) {
+		kfree(dev);
+		return NULL;
+	}

 	return dev;
 }
@@ -205,25 +162,9 @@ struct cn_queue_dev *cn_queue_alloc_dev(
 void cn_queue_free_dev(struct cn_queue_dev *dev)
 {
 	struct cn_callback_entry *cbq, *n;
-	long timeout;
-	DEFINE_WAIT(wait);

-	/* Flush the first pending jobs queued on kevent */
-	flush_scheduled_work();
-
-	/* If the connector workqueue creation is still pending, wait for it */
-	prepare_to_wait(&dev->wq_created, &wait, TASK_UNINTERRUPTIBLE);
-	if (atomic_read(&dev->wq_requested) && !dev->cn_queue) {
-		timeout = schedule_timeout(HZ * 2);
-		if (!timeout && !dev->cn_queue)
-			WARN_ON(1);
-	}
-	finish_wait(&dev->wq_created, &wait);
-
-	if (dev->cn_queue) {
-		flush_workqueue(dev->cn_queue);
-		destroy_workqueue(dev->cn_queue);
-	}
+	flush_workqueue(dev->cn_queue);
+	destroy_workqueue(dev->cn_queue);

 	spin_lock_bh(&dev->queue_lock);
 	list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry)
Index: work/drivers/connector/connector.c
===================================================================
--- work.orig/drivers/connector/connector.c
+++ work/drivers/connector/connector.c
@@ -133,7 +133,8 @@ static int cn_call_callback(struct sk_bu
 					__cbq->data.skb == NULL)) {
 				__cbq->data.skb = skb;

-				if (queue_cn_work(__cbq, &__cbq->work))
+				if (queue_work(dev->cbdev->cn_queue,
+					       &__cbq->work))
 					err = 0;
 				else
 					err = -EINVAL;
@@ -148,13 +149,11 @@ static int cn_call_callback(struct sk_bu
 					d->callback = __cbq->data.callback;
 					d->free = __new_cbq;

-					__new_cbq->pdev = __cbq->pdev;
-
 					INIT_WORK(&__new_cbq->work,
 							&cn_queue_wrapper);

-					if (queue_cn_work(__new_cbq,
-						    &__new_cbq->work))
+					if (queue_work(dev->cbdev->cn_queue,
+						       &__new_cbq->work))
 						err = 0;
 					else {
 						kfree(__new_cbq);
Index: work/include/linux/connector.h
===================================================================
--- work.orig/include/linux/connector.h
+++ work/include/linux/connector.h
@@ -88,12 +88,6 @@ struct cn_queue_dev {
 	unsigned char name[CN_CBQ_NAMELEN];

 	struct workqueue_struct *cn_queue;
-	/* Sent to kevent to create cn_queue only when needed */
-	struct work_struct wq_creation;
-	/* Tell if the wq_creation job is pending/completed */
-	atomic_t wq_requested;
-	/* Wait for cn_queue to be created */
-	wait_queue_head_t wq_created;

 	struct list_head queue_list;
 	spinlock_t queue_lock;
@@ -141,8 +135,6 @@ int cn_netlink_send(struct cn_msg *, u32
 int cn_queue_add_callback(struct cn_queue_dev *dev, char *name, struct cb_id *id, void (*callback)(struct cn_msg *, struct netlink_skb_parms *));
 void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id);

-int queue_cn_work(struct cn_callback_entry *cbq, struct work_struct *work);
-
 struct cn_queue_dev *cn_queue_alloc_dev(char *name, struct sock *);
 void cn_queue_free_dev(struct cn_queue_dev *dev);


^ permalink raw reply

* [PATCH 08/22] rds: stop including asm-generic/bitops/le.h
From: Akinobu Mita @ 2010-10-15  9:46 UTC (permalink / raw)
  To: linux-kernel, linux-arch, Arnd Bergmann, Christoph Hellwig,
	Andrew Morton
  Cc: Akinobu Mita, Andy Grover, rds-devel, David S. Miller, netdev
In-Reply-To: <1287135981-17604-1-git-send-email-akinobu.mita@gmail.com>

No need to include asm-generic/bitops/le.h as all architectures
provide little endian bit operations now.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andy Grover <andy.grover@oracle.com>
Cc: rds-devel@oss.oracle.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
---
 net/rds/cong.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/net/rds/cong.c b/net/rds/cong.c
index c6784d5..15a65f0 100644
--- a/net/rds/cong.c
+++ b/net/rds/cong.c
@@ -34,8 +34,6 @@
 #include <linux/types.h>
 #include <linux/rbtree.h>
 
-#include <asm-generic/bitops/le.h>
-
 #include "rds.h"
 
 /*
-- 
1.7.1.231.gd0b16

^ permalink raw reply related

* [PATCH v13 16/16] An example how to alloc user buffer based on napi_gro_frags() interface.
From: xiaohui.xin @ 2010-10-15  9:12 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

This example is made on ixgbe driver which using napi_gro_frags().
It can get buffers from guest side directly using netdev_alloc_page()
and release guest buffers using netdev_free_page().

Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
---
 drivers/net/ixgbe/ixgbe_main.c |   24 ++++++++++++++++++++----
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index a4a5263..47663ac 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1032,7 +1032,14 @@ static inline void ixgbe_release_rx_desc(struct ixgbe_hw *hw,
 static bool is_rx_buffer_mapped_as_page(struct ixgbe_rx_buffer *bi,
 					struct net_device *dev)
 {
-	return true;
+	return dev_is_mpassthru(dev);
+}
+
+static u32 get_page_skb_offset(struct net_device *dev)
+{
+	if (!dev_is_mpassthru(dev))
+		return 0;
+	return dev->mp_port->vnet_hlen;
 }
 
 /**
@@ -1105,7 +1112,8 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter,
 				adapter->alloc_rx_page_failed++;
 				goto no_buffers;
 			}
-			bi->page_skb_offset = 0;
+			bi->page_skb_offset =
+				get_page_skb_offset(adapter->netdev);
 			bi->dma = dma_map_page(&pdev->dev, bi->page_skb,
 					bi->page_skb_offset,
 					(PAGE_SIZE / 2),
@@ -1242,8 +1250,10 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			len = le16_to_cpu(rx_desc->wb.upper.length);
 		}
 
-		if (is_no_buffer(rx_buffer_info))
+		if (is_no_buffer(rx_buffer_info)) {
+			printk("no buffers\n");
 			break;
+		}
 		cleaned = true;
 
 		if (!rx_buffer_info->mapped_as_page) {
@@ -1299,6 +1309,11 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 						rx_buffer_info->page_skb,
 						rx_buffer_info->page_skb_offset,
 						len);
+				if (dev_is_mpassthru(netdev) &&
+						netdev->mp_port->hash)
+					skb_shinfo(skb)->destructor_arg =
+						netdev->mp_port->hash(netdev,
+						rx_buffer_info->page_skb);
 				rx_buffer_info->page_skb = NULL;
 				skb->len += len;
 				skb->data_len += len;
@@ -1316,7 +1331,8 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			                   upper_len);
 
 			if ((rx_ring->rx_buf_len > (PAGE_SIZE / 2)) ||
-			    (page_count(rx_buffer_info->page) != 1))
+			    (page_count(rx_buffer_info->page) != 1) ||
+				dev_is_mpassthru(netdev))
 				rx_buffer_info->page = NULL;
 			else
 				get_page(rx_buffer_info->page);
-- 
1.7.3

^ permalink raw reply related

* [PATCH v13 15/16] An example how to modifiy NIC driver to use napi_gro_frags() interface
From: xiaohui.xin @ 2010-10-15  9:12 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

This example is made on ixgbe driver.
It provides API is_rx_buffer_mapped_as_page() to indicate
if the driver use napi_gro_frags() interface or not.
The example allocates 2 pages for DMA for one ring descriptor
using netdev_alloc_page(). When packets is coming, using
napi_gro_frags() to allocate skb and to receive the packets.

Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
---
 drivers/net/ixgbe/ixgbe.h      |    3 +
 drivers/net/ixgbe/ixgbe_main.c |  163 +++++++++++++++++++++++++++++++---------
 2 files changed, 131 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index 9e15eb9..89367ca 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -131,6 +131,9 @@ struct ixgbe_rx_buffer {
 	struct page *page;
 	dma_addr_t page_dma;
 	unsigned int page_offset;
+	u16 mapped_as_page;
+	struct page *page_skb;
+	unsigned int page_skb_offset;
 };
 
 struct ixgbe_queue_stats {
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index e32af43..a4a5263 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1029,6 +1029,12 @@ static inline void ixgbe_release_rx_desc(struct ixgbe_hw *hw,
 	IXGBE_WRITE_REG(hw, IXGBE_RDT(rx_ring->reg_idx), val);
 }
 
+static bool is_rx_buffer_mapped_as_page(struct ixgbe_rx_buffer *bi,
+					struct net_device *dev)
+{
+	return true;
+}
+
 /**
  * ixgbe_alloc_rx_buffers - Replace used receive buffers; packet split
  * @adapter: address of board private structure
@@ -1045,13 +1051,17 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter,
 	i = rx_ring->next_to_use;
 	bi = &rx_ring->rx_buffer_info[i];
 
+
 	while (cleaned_count--) {
 		rx_desc = IXGBE_RX_DESC_ADV(*rx_ring, i);
 
+		bi->mapped_as_page =
+			is_rx_buffer_mapped_as_page(bi, adapter->netdev);
+
 		if (!bi->page_dma &&
 		    (rx_ring->flags & IXGBE_RING_RX_PS_ENABLED)) {
 			if (!bi->page) {
-				bi->page = alloc_page(GFP_ATOMIC);
+				bi->page = netdev_alloc_page(adapter->netdev);
 				if (!bi->page) {
 					adapter->alloc_rx_page_failed++;
 					goto no_buffers;
@@ -1068,7 +1078,7 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter,
 						    DMA_FROM_DEVICE);
 		}
 
-		if (!bi->skb) {
+		if (!bi->mapped_as_page && !bi->skb) {
 			struct sk_buff *skb;
 			/* netdev_alloc_skb reserves 32 bytes up front!! */
 			uint bufsz = rx_ring->rx_buf_len + SMP_CACHE_BYTES;
@@ -1088,6 +1098,19 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter,
 			                         rx_ring->rx_buf_len,
 						 DMA_FROM_DEVICE);
 		}
+
+		if (bi->mapped_as_page && !bi->page_skb) {
+			bi->page_skb = netdev_alloc_page(adapter->netdev);
+			if (!bi->page_skb) {
+				adapter->alloc_rx_page_failed++;
+				goto no_buffers;
+			}
+			bi->page_skb_offset = 0;
+			bi->dma = dma_map_page(&pdev->dev, bi->page_skb,
+					bi->page_skb_offset,
+					(PAGE_SIZE / 2),
+					PCI_DMA_FROMDEVICE);
+		}
 		/* Refresh the desc even if buffer_addrs didn't change because
 		 * each write-back erases this info. */
 		if (rx_ring->flags & IXGBE_RING_RX_PS_ENABLED) {
@@ -1165,6 +1188,13 @@ struct ixgbe_rsc_cb {
 	bool delay_unmap;
 };
 
+static bool is_no_buffer(struct ixgbe_rx_buffer *rx_buffer_info)
+{
+	return (!rx_buffer_info->skb ||
+		!rx_buffer_info->page_skb) &&
+		!rx_buffer_info->page;
+}
+
 #define IXGBE_RSC_CB(skb) ((struct ixgbe_rsc_cb *)(skb)->cb)
 
 static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
@@ -1174,6 +1204,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	struct net_device *netdev = adapter->netdev;
 	struct pci_dev *pdev = adapter->pdev;
+	struct napi_struct *napi = &q_vector->napi;
 	union ixgbe_adv_rx_desc *rx_desc, *next_rxd;
 	struct ixgbe_rx_buffer *rx_buffer_info, *next_buffer;
 	struct sk_buff *skb;
@@ -1211,32 +1242,68 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			len = le16_to_cpu(rx_desc->wb.upper.length);
 		}
 
+		if (is_no_buffer(rx_buffer_info))
+			break;
 		cleaned = true;
-		skb = rx_buffer_info->skb;
-		prefetch(skb->data);
-		rx_buffer_info->skb = NULL;
 
-		if (rx_buffer_info->dma) {
-			if ((adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) &&
-			    (!(staterr & IXGBE_RXD_STAT_EOP)) &&
-				 (!(skb->prev))) {
-				/*
-				 * When HWRSC is enabled, delay unmapping
-				 * of the first packet. It carries the
-				 * header information, HW may still
-				 * access the header after the writeback.
-				 * Only unmap it when EOP is reached
-				 */
-				IXGBE_RSC_CB(skb)->delay_unmap = true;
-				IXGBE_RSC_CB(skb)->dma = rx_buffer_info->dma;
-			} else {
-				dma_unmap_single(&pdev->dev,
-				                 rx_buffer_info->dma,
-				                 rx_ring->rx_buf_len,
-				                 DMA_FROM_DEVICE);
+		if (!rx_buffer_info->mapped_as_page) {
+			skb = rx_buffer_info->skb;
+			prefetch(skb->data);
+			rx_buffer_info->skb = NULL;
+
+			if (rx_buffer_info->dma) {
+				if ((adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) &&
+						(!(staterr & IXGBE_RXD_STAT_EOP)) &&
+						(!(skb->prev))) {
+					/*
+					 * When HWRSC is enabled, delay unmapping
+					 * of the first packet. It carries the
+					 * header information, HW may still
+					 * access the header after the writeback.
+					 * Only unmap it when EOP is reached
+					 */
+					IXGBE_RSC_CB(skb)->delay_unmap = true;
+					IXGBE_RSC_CB(skb)->dma = rx_buffer_info->dma;
+				} else
+					dma_unmap_single(&pdev->dev,
+							rx_buffer_info->dma,
+							rx_ring->rx_buf_len,
+							DMA_FROM_DEVICE);
+				rx_buffer_info->dma = 0;
+				skb_put(skb, len);
+			}
+		} else {
+			skb = napi_get_frags(napi);
+			prefetch(rx_buffer_info->page_skb_offset);
+			rx_buffer_info->skb = NULL;
+			if (rx_buffer_info->dma) {
+				if ((adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) &&
+						(!(staterr & IXGBE_RXD_STAT_EOP)) &&
+						(!(skb->prev))) {
+					/*
+					 * When HWRSC is enabled, delay unmapping
+					 * of the first packet. It carries the
+					 * header information, HW may still
+					 * access the header after the writeback.
+					 * Only unmap it when EOP is reached
+					 */
+					IXGBE_RSC_CB(skb)->delay_unmap = true;
+					IXGBE_RSC_CB(skb)->dma = rx_buffer_info->dma;
+				} else
+					dma_unmap_page(&pdev->dev, rx_buffer_info->dma,
+							PAGE_SIZE / 2,
+							PCI_DMA_FROMDEVICE);
+				rx_buffer_info->dma = 0;
+				skb_fill_page_desc(skb,
+						skb_shinfo(skb)->nr_frags,
+						rx_buffer_info->page_skb,
+						rx_buffer_info->page_skb_offset,
+						len);
+				rx_buffer_info->page_skb = NULL;
+				skb->len += len;
+				skb->data_len += len;
+				skb->truesize += len;
 			}
-			rx_buffer_info->dma = 0;
-			skb_put(skb, len);
 		}
 
 		if (upper_len) {
@@ -1283,10 +1350,16 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 				skb = ixgbe_transform_rsc_queue(skb, &(rx_ring->rsc_count));
 			if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) {
 				if (IXGBE_RSC_CB(skb)->delay_unmap) {
-					dma_unmap_single(&pdev->dev,
-							 IXGBE_RSC_CB(skb)->dma,
-					                 rx_ring->rx_buf_len,
-							 DMA_FROM_DEVICE);
+					if (!rx_buffer_info->mapped_as_page)
+						dma_unmap_single(&pdev->dev,
+								IXGBE_RSC_CB(skb)->dma,
+								rx_ring->rx_buf_len,
+								DMA_FROM_DEVICE);
+					else
+						dma_unmap_page(&pdev->dev,
+								IXGBE_RSC_CB(skb)->dma,
+								PAGE_SIZE / 2,
+								DMA_FROM_DEVICE);
 					IXGBE_RSC_CB(skb)->dma = 0;
 					IXGBE_RSC_CB(skb)->delay_unmap = false;
 				}
@@ -1304,6 +1377,11 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 				rx_buffer_info->dma = next_buffer->dma;
 				next_buffer->skb = skb;
 				next_buffer->dma = 0;
+				if (rx_buffer_info->mapped_as_page) {
+					rx_buffer_info->page_skb =
+							next_buffer->page_skb;
+					next_buffer->page_skb = NULL;
+				}
 			} else {
 				skb->next = next_buffer->skb;
 				skb->next->prev = skb;
@@ -1323,7 +1401,8 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		total_rx_bytes += skb->len;
 		total_rx_packets++;
 
-		skb->protocol = eth_type_trans(skb, adapter->netdev);
+		if (!rx_buffer_info->mapped_as_page)
+			skb->protocol = eth_type_trans(skb, adapter->netdev);
 #ifdef IXGBE_FCOE
 		/* if ddp, not passing to ULD unless for FCP_RSP or error */
 		if (adapter->flags & IXGBE_FLAG_FCOE_ENABLED) {
@@ -1332,7 +1411,14 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 				goto next_desc;
 		}
 #endif /* IXGBE_FCOE */
-		ixgbe_receive_skb(q_vector, skb, staterr, rx_ring, rx_desc);
+
+		if (!rx_buffer_info->mapped_as_page)
+			ixgbe_receive_skb(q_vector, skb, staterr,
+						rx_ring, rx_desc);
+		else {
+			skb_record_rx_queue(skb, rx_ring->queue_index);
+			napi_gro_frags(napi);
+		}
 
 next_desc:
 		rx_desc->wb.upper.status_error = 0;
@@ -3622,9 +3708,16 @@ static void ixgbe_clean_rx_ring(struct ixgbe_adapter *adapter,
 
 		rx_buffer_info = &rx_ring->rx_buffer_info[i];
 		if (rx_buffer_info->dma) {
-			dma_unmap_single(&pdev->dev, rx_buffer_info->dma,
-			                 rx_ring->rx_buf_len,
-					 DMA_FROM_DEVICE);
+			if (!rx_buffer_info->mapped_as_page)
+				dma_unmap_single(&pdev->dev, rx_buffer_info->dma,
+						rx_ring->rx_buf_len,
+						PCI_DMA_FROMDEVICE);
+			else {
+				dma_unmap_page(&pdev->dev, rx_buffer_info->dma,
+						PAGE_SIZE / 2,
+						PCI_DMA_FROMDEVICE);
+				rx_buffer_info->page_skb = NULL;
+			}
 			rx_buffer_info->dma = 0;
 		}
 		if (rx_buffer_info->skb) {
@@ -3651,7 +3744,7 @@ static void ixgbe_clean_rx_ring(struct ixgbe_adapter *adapter,
 				       PAGE_SIZE / 2, DMA_FROM_DEVICE);
 			rx_buffer_info->page_dma = 0;
 		}
-		put_page(rx_buffer_info->page);
+		netdev_free_page(adapter->netdev, rx_buffer_info->page);
 		rx_buffer_info->page = NULL;
 		rx_buffer_info->page_offset = 0;
 	}
-- 
1.7.3

^ permalink raw reply related

* [PATCH v13 14/16]Provides multiple submits and asynchronous notifications.
From: xiaohui.xin @ 2010-10-15  9:12 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

    The vhost-net backend now only supports synchronous send/recv
    operations. The patch provides multiple submits and asynchronous
    notifications. This is needed for zero-copy case.

    Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
---
 drivers/vhost/net.c   |  355 +++++++++++++++++++++++++++++++++++++++++++++----
 drivers/vhost/vhost.c |   78 +++++++++++
 drivers/vhost/vhost.h |   15 ++-
 3 files changed, 423 insertions(+), 25 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7c80082..17c599a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -24,6 +24,8 @@
 #include <linux/if_arp.h>
 #include <linux/if_tun.h>
 #include <linux/if_macvlan.h>
+#include <linux/mpassthru.h>
+#include <linux/aio.h>
 
 #include <net/sock.h>
 
@@ -32,6 +34,7 @@
 /* Max number of bytes transferred before requeueing the job.
  * Using this limit prevents one virtqueue from starving others. */
 #define VHOST_NET_WEIGHT 0x80000
+static struct kmem_cache *notify_cache;
 
 enum {
 	VHOST_NET_VQ_RX = 0,
@@ -49,6 +52,7 @@ struct vhost_net {
 	struct vhost_dev dev;
 	struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
 	struct vhost_poll poll[VHOST_NET_VQ_MAX];
+	struct kmem_cache *cache;
 	/* Tells us whether we are polling a socket for TX.
 	 * We only do this when socket buffer fills up.
 	 * Protected by tx vq lock. */
@@ -109,11 +113,184 @@ static void tx_poll_start(struct vhost_net *net, struct socket *sock)
 	net->tx_poll_state = VHOST_NET_POLL_STARTED;
 }
 
+struct kiocb *notify_dequeue(struct vhost_virtqueue *vq)
+{
+	struct kiocb *iocb = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vq->notify_lock, flags);
+	if (!list_empty(&vq->notifier)) {
+		iocb = list_first_entry(&vq->notifier,
+				struct kiocb, ki_list);
+		list_del(&iocb->ki_list);
+	}
+	spin_unlock_irqrestore(&vq->notify_lock, flags);
+	return iocb;
+}
+
+static void handle_iocb(struct kiocb *iocb)
+{
+	struct vhost_virtqueue *vq = iocb->private;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vq->notify_lock, flags);
+	list_add_tail(&iocb->ki_list, &vq->notifier);
+	spin_unlock_irqrestore(&vq->notify_lock, flags);
+}
+
+static int is_async_vq(struct vhost_virtqueue *vq)
+{
+	return (vq->link_state == VHOST_VQ_LINK_ASYNC);
+}
+
+static void handle_async_rx_events_notify(struct vhost_net *net,
+		struct vhost_virtqueue *vq,
+		struct socket *sock)
+{
+	struct kiocb *iocb = NULL;
+	struct vhost_log *vq_log = NULL;
+	int rx_total_len = 0;
+	unsigned int head, log, in, out;
+	int size;
+
+	if (!is_async_vq(vq))
+		return;
+
+	if (sock->sk->sk_data_ready)
+		sock->sk->sk_data_ready(sock->sk, 0);
+
+	vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
+		vq->log : NULL;
+
+	while ((iocb = notify_dequeue(vq)) != NULL) {
+		if (!iocb->ki_left) {
+			vhost_add_used_and_signal(&net->dev, vq,
+					iocb->ki_pos, iocb->ki_nbytes);
+			size = iocb->ki_nbytes;
+			head = iocb->ki_pos;
+			rx_total_len += iocb->ki_nbytes;
+
+			if (iocb->ki_dtor)
+				iocb->ki_dtor(iocb);
+			kmem_cache_free(net->cache, iocb);
+
+			/* when log is enabled, recomputing the log is needed,
+			 * since these buffers are in async queue, may not get
+			 * the log info before.
+			 */
+			if (unlikely(vq_log)) {
+				if (!log)
+					__vhost_get_vq_desc(&net->dev, vq,
+							vq->iov,
+							ARRAY_SIZE(vq->iov),
+							&out, &in, vq_log,
+							&log, head);
+				vhost_log_write(vq, vq_log, log, size);
+			}
+			if (unlikely(rx_total_len >= VHOST_NET_WEIGHT)) {
+				vhost_poll_queue(&vq->poll);
+				break;
+			}
+		} else {
+			int i = 0;
+			int count = iocb->ki_left;
+			int hc = count;
+			while (count--) {
+				if (iocb) {
+					vq->heads[i].id = iocb->ki_pos;
+					vq->heads[i].len = iocb->ki_nbytes;
+					size = iocb->ki_nbytes;
+					head = iocb->ki_pos;
+					rx_total_len += iocb->ki_nbytes;
+
+					if (iocb->ki_dtor)
+						iocb->ki_dtor(iocb);
+					kmem_cache_free(net->cache, iocb);
+
+					if (unlikely(vq_log)) {
+						if (!log)
+							__vhost_get_vq_desc(
+							&net->dev, vq, vq->iov,
+							ARRAY_SIZE(vq->iov),
+							&out, &in, vq_log,
+							&log, head);
+						vhost_log_write(
+							vq, vq_log, log, size);
+					}
+				} else
+					break;
+
+				i++;
+				if (count)
+					iocb = notify_dequeue(vq);
+			}
+			vhost_add_used_and_signal_n(
+					&net->dev, vq, vq->heads, hc);
+		}
+	}
+}
+
+static void handle_async_tx_events_notify(struct vhost_net *net,
+		struct vhost_virtqueue *vq)
+{
+	struct kiocb *iocb = NULL;
+	struct list_head *entry, *tmp;
+	unsigned long flags;
+	int tx_total_len = 0;
+
+	if (!is_async_vq(vq))
+		return;
+
+	spin_lock_irqsave(&vq->notify_lock, flags);
+	list_for_each_safe(entry, tmp, &vq->notifier) {
+		iocb = list_entry(entry,
+				struct kiocb, ki_list);
+		if (!iocb->ki_flags)
+			continue;
+		list_del(&iocb->ki_list);
+		vhost_add_used_and_signal(&net->dev, vq,
+				iocb->ki_pos, 0);
+		tx_total_len += iocb->ki_nbytes;
+
+		if (iocb->ki_dtor)
+			iocb->ki_dtor(iocb);
+
+		kmem_cache_free(net->cache, iocb);
+		if (unlikely(tx_total_len >= VHOST_NET_WEIGHT)) {
+			vhost_poll_queue(&vq->poll);
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&vq->notify_lock, flags);
+}
+
+static struct kiocb *create_iocb(struct vhost_net *net,
+		struct vhost_virtqueue *vq,
+		unsigned head)
+{
+	struct kiocb *iocb = NULL;
+
+	if (!is_async_vq(vq))
+		return NULL;
+
+	iocb = kmem_cache_zalloc(net->cache, GFP_KERNEL);
+	if (!iocb)
+		return NULL;
+	iocb->private = vq;
+	iocb->ki_pos = head;
+	iocb->ki_dtor = handle_iocb;
+	if (vq == &net->dev.vqs[VHOST_NET_VQ_RX])
+		iocb->ki_user_data = vq->num;
+	iocb->ki_iovec = vq->hdr;
+	return iocb;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
 {
 	struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
+	struct kiocb *iocb = NULL;
 	unsigned out, in, s;
 	int head;
 	struct msghdr msg = {
@@ -146,6 +323,10 @@ static void handle_tx(struct vhost_net *net)
 	if (wmem < sock->sk->sk_sndbuf / 2)
 		tx_poll_stop(net);
 	hdr_size = vq->vhost_hlen;
+	if (!vq->vhost_hlen && is_async_vq(vq))
+		hdr_size = vq->sock_hlen;
+
+	handle_async_tx_events_notify(net, vq);
 
 	for (;;) {
 		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
@@ -157,11 +338,14 @@ static void handle_tx(struct vhost_net *net)
 			break;
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
 		if (head == vq->num) {
-			wmem = atomic_read(&sock->sk->sk_wmem_alloc);
-			if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
-				tx_poll_start(net, sock);
-				set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
-				break;
+			if (!is_async_vq(vq)) {
+				wmem = atomic_read(&sock->sk->sk_wmem_alloc);
+				if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
+					tx_poll_start(net, sock);
+					set_bit(SOCK_ASYNC_NOSPACE,
+					&sock->flags);
+					break;
+				}
 			}
 			if (unlikely(vhost_enable_notify(vq))) {
 				vhost_disable_notify(vq);
@@ -178,6 +362,13 @@ static void handle_tx(struct vhost_net *net)
 		s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out);
 		msg.msg_iovlen = out;
 		len = iov_length(vq->iov, out);
+		/* if async operations supported */
+		if (is_async_vq(vq)) {
+			iocb = create_iocb(net, vq, head);
+			if (!iocb)
+				break;
+		}
+
 		/* Sanity check */
 		if (!len) {
 			vq_err(vq, "Unexpected header len for TX: "
@@ -186,12 +377,18 @@ static void handle_tx(struct vhost_net *net)
 			break;
 		}
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
-		err = sock->ops->sendmsg(NULL, sock, &msg, len);
+		err = sock->ops->sendmsg(iocb, sock, &msg, len);
 		if (unlikely(err < 0)) {
+			if (is_async_vq(vq))
+				kmem_cache_free(net->cache, iocb);
 			vhost_discard_vq_desc(vq, 1);
 			tx_poll_start(net, sock);
 			break;
 		}
+
+		if (is_async_vq(vq))
+			continue;
+
 		if (err != len)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
@@ -203,6 +400,8 @@ static void handle_tx(struct vhost_net *net)
 		}
 	}
 
+	handle_async_tx_events_notify(net, vq);
+
 	mutex_unlock(&vq->mutex);
 	unuse_mm(net->dev.mm);
 }
@@ -396,7 +595,8 @@ static void handle_rx_big(struct vhost_net *net)
 static void handle_rx_mergeable(struct vhost_net *net)
 {
 	struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
-	unsigned uninitialized_var(in), log;
+	unsigned uninitialized_var(in), log, out;
+	struct kiocb *iocb;
 	struct vhost_log *vq_log;
 	struct msghdr msg = {
 		.msg_name = NULL,
@@ -417,28 +617,44 @@ static void handle_rx_mergeable(struct vhost_net *net)
 	size_t vhost_hlen, sock_hlen;
 	size_t vhost_len, sock_len;
 	struct socket *sock = rcu_dereference(vq->private_data);
-	if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
+	if (!sock || (skb_queue_empty(&sock->sk->sk_receive_queue) &&
+		      !is_async_vq(vq)))
 		return;
-
 	use_mm(net->dev.mm);
 	mutex_lock(&vq->mutex);
 	vhost_disable_notify(vq);
 	vhost_hlen = vq->vhost_hlen;
 	sock_hlen = vq->sock_hlen;
 
+	/* In async cases, when write log is enabled, in case the submitted
+	 * buffers did not get log info before the log enabling, so we'd
+	 * better recompute the log info when needed. We do this in
+	 * handle_async_rx_events_notify().
+	 */
+
 	vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
 		vq->log : NULL;
 
-	while ((sock_len = peek_head_len(sock->sk))) {
-		sock_len += sock_hlen;
-		vhost_len = sock_len + vhost_hlen;
-		headcount = get_rx_bufs(vq, vq->heads, vhost_len,
+	handle_async_rx_events_notify(net, vq, sock);
+
+	while (is_async_vq(vq) || (sock_len = peek_head_len(sock->sk))) {
+		if (is_async_vq(vq))
+			headcount = vhost_get_vq_desc(&net->dev, vq, vq->iov,
+						      ARRAY_SIZE(vq->iov),
+						      &out, &in,
+						      vq->log, &log);
+		else {
+			sock_len += sock_hlen;
+			vhost_len = sock_len + vhost_hlen;
+			headcount = get_rx_bufs(vq, vq->heads, vhost_len,
 					&in, vq_log, &log);
+		}
 		/* On error, stop handling until the next kick. */
 		if (unlikely(headcount < 0))
 			break;
 		/* OK, now we need to know about added descriptors. */
-		if (!headcount) {
+		if ((!headcount && !is_async_vq(vq)) ||
+			(headcount == vq->num && is_async_vq(vq))) {
 			if (unlikely(vhost_enable_notify(vq))) {
 				/* They have slipped one in as we were
 				 * doing that: check again. */
@@ -450,16 +666,41 @@ static void handle_rx_mergeable(struct vhost_net *net)
 			break;
 		}
 		/* We don't need to be notified again. */
-		if (unlikely((vhost_hlen)))
-			/* Skip header. TODO: support TSO. */
-			move_iovec_hdr(vq->iov, vq->hdr, vhost_hlen, in);
-		else
-			/* Copy the header for use in VIRTIO_NET_F_MRG_RXBUF:
-			 * needed because sendmsg can modify msg_iov. */
-			copy_iovec_hdr(vq->iov, vq->hdr, sock_hlen, in);
+		if (unlikely((vhost_hlen))) {
+			if (is_async_vq(vq))
+				vq->hdr[0].iov_len = vhost_hlen;
+			else
+				/* Skip header. TODO: support TSO. */
+				move_iovec_hdr(vq->iov, vq->hdr,
+						vhost_hlen, in);
+		} else {
+			if (is_async_vq(vq))
+				vq->hdr[0].iov_len = sock_hlen;
+			else
+				/* Copy the header for use in
+				 * VIRTIO_NET_F_MRG_RXBUF:
+				 * needed because sendmsg can
+				 * modify msg_iov. */
+				copy_iovec_hdr(vq->iov, vq->hdr,
+						sock_hlen, in);
+		}
 		msg.msg_iovlen = in;
-		err = sock->ops->recvmsg(NULL, sock, &msg,
+		if (is_async_vq(vq)) {
+			iocb = create_iocb(net, vq, headcount);
+			if (!iocb)
+				break;
+		}
+		err = sock->ops->recvmsg(iocb, sock, &msg,
 					 sock_len, MSG_DONTWAIT | MSG_TRUNC);
+		if (is_async_vq(vq)) {
+			if (err < 0) {
+				kmem_cache_free(net->cache, iocb);
+				vhost_discard_vq_desc(vq, headcount);
+				break;
+			}
+			continue;
+		}
+
 		/* Userspace might have consumed the packet meanwhile:
 		 * it's not supposed to do this usually, but might be hard
 		 * to prevent. Discard data we got (if any) and keep going. */
@@ -496,6 +737,8 @@ static void handle_rx_mergeable(struct vhost_net *net)
 		}
 	}
 
+	handle_async_rx_events_notify(net, vq, sock);
+
 	mutex_unlock(&vq->mutex);
 	unuse_mm(net->dev.mm);
 }
@@ -561,6 +804,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
 	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
 	n->tx_poll_state = VHOST_NET_POLL_DISABLED;
+	n->cache = NULL;
 
 	f->private_data = n;
 
@@ -624,6 +868,21 @@ static void vhost_net_flush(struct vhost_net *n)
 	vhost_net_flush_vq(n, VHOST_NET_VQ_RX);
 }
 
+static void vhost_async_cleanup(struct vhost_net *n)
+{
+	/* clean the notifier */
+	struct vhost_virtqueue *vq;
+	struct kiocb *iocb = NULL;
+	if (n->cache) {
+		vq = &n->dev.vqs[VHOST_NET_VQ_RX];
+		while ((iocb = notify_dequeue(vq)) != NULL)
+			kmem_cache_free(n->cache, iocb);
+		vq = &n->dev.vqs[VHOST_NET_VQ_TX];
+		while ((iocb = notify_dequeue(vq)) != NULL)
+			kmem_cache_free(n->cache, iocb);
+	}
+}
+
 static int vhost_net_release(struct inode *inode, struct file *f)
 {
 	struct vhost_net *n = f->private_data;
@@ -640,6 +899,7 @@ static int vhost_net_release(struct inode *inode, struct file *f)
 	/* We do an extra flush before freeing memory,
 	 * since jobs can re-queue themselves. */
 	vhost_net_flush(n);
+	vhost_async_cleanup(n);
 	kfree(n);
 	return 0;
 }
@@ -691,21 +951,61 @@ static struct socket *get_tap_socket(int fd)
 	return sock;
 }
 
-static struct socket *get_socket(int fd)
+static struct socket *get_mp_socket(int fd)
+{
+	struct file *file = fget(fd);
+	struct socket *sock;
+	if (!file)
+		return ERR_PTR(-EBADF);
+	sock = mp_get_socket(file);
+	if (IS_ERR(sock))
+		fput(file);
+	return sock;
+}
+
+static struct socket *get_socket(struct vhost_virtqueue *vq, int fd,
+				 enum vhost_vq_link_state *state)
 {
 	struct socket *sock;
 	/* special case to disable backend */
 	if (fd == -1)
 		return NULL;
+
+	*state = VHOST_VQ_LINK_SYNC;
+
 	sock = get_raw_socket(fd);
 	if (!IS_ERR(sock))
 		return sock;
 	sock = get_tap_socket(fd);
 	if (!IS_ERR(sock))
 		return sock;
+	/* If we dont' have notify_cache, then dont do mpassthru */
+	if (!notify_cache)
+		return ERR_PTR(-ENOTSOCK);
+	/* If we don't have mergeable buffer then dont do mpassthru */
+	if (vhost_has_feature(vq->dev, VIRTIO_NET_F_MRG_RXBUF)) {
+		sock = get_mp_socket(fd);
+		if (!IS_ERR(sock)) {
+			*state = VHOST_VQ_LINK_ASYNC;
+			return sock;
+		}
+	}
 	return ERR_PTR(-ENOTSOCK);
 }
 
+static void vhost_init_link_state(struct vhost_net *n, int index)
+{
+	struct vhost_virtqueue *vq = n->vqs + index;
+
+	WARN_ON(!mutex_is_locked(&vq->mutex));
+	if (vq->link_state == VHOST_VQ_LINK_ASYNC) {
+		INIT_LIST_HEAD(&vq->notifier);
+		spin_lock_init(&vq->notify_lock);
+		if (!n->cache)
+			n->cache = notify_cache;
+	}
+}
+
 static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 {
 	struct socket *sock, *oldsock;
@@ -729,12 +1029,14 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 		r = -EFAULT;
 		goto err_vq;
 	}
-	sock = get_socket(fd);
+	sock = get_socket(vq, fd, &vq->link_state);
 	if (IS_ERR(sock)) {
 		r = PTR_ERR(sock);
 		goto err_vq;
 	}
 
+	vhost_init_link_state(n, index);
+
 	/* start polling new socket */
 	oldsock = vq->private_data;
 	if (sock != oldsock) {
@@ -879,6 +1181,9 @@ static struct miscdevice vhost_net_misc = {
 
 static int vhost_net_init(void)
 {
+	notify_cache = kmem_cache_create("vhost_kiocb",
+					sizeof(struct kiocb), 0,
+					SLAB_HWCACHE_ALIGN, NULL);
 	return misc_register(&vhost_net_misc);
 }
 module_init(vhost_net_init);
@@ -886,6 +1191,8 @@ module_init(vhost_net_init);
 static void vhost_net_exit(void)
 {
 	misc_deregister(&vhost_net_misc);
+	if (notify_cache)
+		kmem_cache_destroy(notify_cache);
 }
 module_exit(vhost_net_exit);
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dd3d6f7..295d9ab 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1015,6 +1015,84 @@ static int get_indirect(struct vhost_dev *dev, struct vhost_virtqueue *vq,
 	return 0;
 }
 
+/* To recompute the log */
+int __vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
+			struct iovec iov[], unsigned int iov_size,
+			unsigned int *out_num, unsigned int *in_num,
+			struct vhost_log *log, unsigned int *log_num,
+			unsigned int head)
+{
+	struct vring_desc desc;
+	unsigned int i, found = 0;
+	int ret;
+
+	/* When we start there are none of either input nor output. */
+	*out_num = *in_num = 0;
+	if (unlikely(log))
+		*log_num = 0;
+
+	i = head;
+	do {
+		unsigned iov_count = *in_num + *out_num;
+		if (unlikely(i >= vq->num)) {
+			vq_err(vq, "Desc index is %u > %u, head = %u",
+					i, vq->num, head);
+			return -EINVAL;
+		}
+		if (unlikely(++found > vq->num)) {
+			vq_err(vq, "Loop detected: last one at %u "
+					"vq size %u head %u\n",
+					i, vq->num, head);
+			return -EINVAL;
+		}
+		ret = copy_from_user(&desc, vq->desc + i, sizeof desc);
+		if (unlikely(ret)) {
+			vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+					i, vq->desc + i);
+			return -EFAULT;
+		}
+		if (desc.flags & VRING_DESC_F_INDIRECT) {
+			ret = get_indirect(dev, vq, iov, iov_size,
+					out_num, in_num,
+					log, log_num, &desc);
+			if (unlikely(ret < 0)) {
+				vq_err(vq, "Failure detected "
+				       "in indirect descriptor at idx %d\n", i);
+				return ret;
+			}
+			continue;
+		}
+
+		ret = translate_desc(dev, desc.addr, desc.len, iov + iov_count,
+				iov_size - iov_count);
+		if (unlikely(ret < 0)) {
+			vq_err(vq, "Translation failure %d descriptor idx %d\n",
+					ret, i);
+			return ret;
+		}
+		if (desc.flags & VRING_DESC_F_WRITE) {
+			/* If this is an input descriptor,
+			 * increment that count. */
+			*in_num += ret;
+			if (unlikely(log)) {
+				log[*log_num].addr = desc.addr;
+				log[*log_num].len = desc.len;
+				++*log_num;
+			}
+		} else {
+			/* If it's an output descriptor, they're all supposed
+			 * to come before any input descriptors. */
+			if (unlikely(*in_num)) {
+				vq_err(vq, "Descriptor has out after in: "
+						"idx %d\n", i);
+				return -EINVAL;
+			}
+			*out_num += ret;
+		}
+	} while ((i = next_desc(&desc)) != -1);
+
+	return head;
+}
 /* This looks in the virtqueue and for the first available buffer, and converts
  * it to an iovec for convenient access.  Since descriptors consist of some
  * number of output then some number of input descriptors, it's actually two
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index afd7729..915336d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -55,6 +55,11 @@ struct vhost_log {
 	u64 len;
 };
 
+enum vhost_vq_link_state {
+	VHOST_VQ_LINK_SYNC = 0,
+	VHOST_VQ_LINK_ASYNC = 1,
+};
+
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
 	struct vhost_dev *dev;
@@ -110,6 +115,10 @@ struct vhost_virtqueue {
 	/* Log write descriptors */
 	void __user *log_base;
 	struct vhost_log log[VHOST_NET_MAX_SG];
+	/* Differiate async socket for 0-copy from normal */
+	enum vhost_vq_link_state link_state;
+	struct list_head notifier;
+	spinlock_t notify_lock;
 };
 
 struct vhost_dev {
@@ -136,7 +145,11 @@ void vhost_dev_cleanup(struct vhost_dev *);
 long vhost_dev_ioctl(struct vhost_dev *, unsigned int ioctl, unsigned long arg);
 int vhost_vq_access_ok(struct vhost_virtqueue *vq);
 int vhost_log_access_ok(struct vhost_dev *);
-
+int __vhost_get_vq_desc(struct vhost_dev *, struct vhost_virtqueue *,
+			  struct iovec iov[], unsigned int iov_count,
+			  unsigned int *out_num, unsigned int *in_num,
+			  struct vhost_log *log, unsigned int *log_num,
+			  unsigned int head);
 int vhost_get_vq_desc(struct vhost_dev *, struct vhost_virtqueue *,
 		      struct iovec iov[], unsigned int iov_count,
 		      unsigned int *out_num, unsigned int *in_num,
-- 
1.7.3


^ permalink raw reply related

* [PATCH v13 13/16] Add a kconfig entry and make entry for mp device.
From: xiaohui.xin @ 2010-10-15  9:12 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
 drivers/vhost/Kconfig  |   10 ++++++++++
 drivers/vhost/Makefile |    2 ++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index e4e2fd1..a6b8cbf 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -9,3 +9,13 @@ config VHOST_NET
 	  To compile this driver as a module, choose M here: the module will
 	  be called vhost_net.
 
+config MEDIATE_PASSTHRU
+	tristate "mediate passthru network driver (EXPERIMENTAL)"
+	depends on VHOST_NET
+	---help---
+	  zerocopy network I/O support, we call it as mediate passthru to
+	  be distiguish with hardare passthru.
+
+	  To compile this driver as a module, choose M here: the module will
+	  be called mpassthru.
+
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 72dd020..c18b9fc 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -1,2 +1,4 @@
 obj-$(CONFIG_VHOST_NET) += vhost_net.o
 vhost_net-y := vhost.o net.o
+
+obj-$(CONFIG_MEDIATE_PASSTHRU) += mpassthru.o
-- 
1.7.3


^ permalink raw reply related

* [PATCH v13 11/16] Add header file for mp device.
From: xiaohui.xin @ 2010-10-15  9:12 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
 include/linux/mpassthru.h |  133 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 133 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/mpassthru.h

diff --git a/include/linux/mpassthru.h b/include/linux/mpassthru.h
new file mode 100644
index 0000000..1115f55
--- /dev/null
+++ b/include/linux/mpassthru.h
@@ -0,0 +1,133 @@
+#ifndef __MPASSTHRU_H
+#define __MPASSTHRU_H
+
+#include <linux/types.h>
+#include <linux/if_ether.h>
+#include <linux/ioctl.h>
+
+/* ioctl defines */
+#define MPASSTHRU_BINDDEV      _IOW('M', 213, int)
+#define MPASSTHRU_UNBINDDEV    _IO('M', 214)
+#define MPASSTHRU_SET_MEM_LOCKED       _IOW('M', 215, unsigned long)
+#define MPASSTHRU_GET_MEM_LOCKED_NEED  _IOR('M', 216, unsigned long)
+
+#define COPY_THRESHOLD (L1_CACHE_BYTES * 4)
+#define COPY_HDR_LEN   (L1_CACHE_BYTES < 64 ? 64 : L1_CACHE_BYTES)
+
+#define DEFAULT_NEED   ((8192*2*2)*4096)
+
+struct frag {
+	u16     offset;
+	u16     size;
+};
+
+#define HASH_BUCKETS    (8192*2)
+struct page_info {
+	struct list_head        list;
+	struct page_info        *next;
+	struct page_info        *prev;
+	struct page             *pages[MAX_SKB_FRAGS];
+	struct sk_buff          *skb;
+	struct page_pool        *pool;
+
+	/* The pointer relayed to skb, to indicate
+	 * it's a external allocated skb or kernel
+	 */
+	struct skb_ext_page    ext_page;
+	/* flag to indicate read or write */
+#define INFO_READ                      0
+#define INFO_WRITE                     1
+	unsigned                flags;
+	/* exact number of locked pages */
+	unsigned                pnum;
+
+	/* The fields after that is for backend
+	 * driver, now for vhost-net.
+	 */
+	/* the kiocb structure related to */
+	struct kiocb            *iocb;
+	/* the ring descriptor index */
+	unsigned int            desc_pos;
+	/* the iovec coming from backend, we only
+	 * need few of them */
+	struct iovec            hdr[2];
+	struct iovec            iov[2];
+};
+
+struct page_pool {
+	/* the queue for rx side */
+	struct list_head        readq;
+	/* the lock to protect readq */
+	spinlock_t              read_lock;
+	/* record the orignal rlimit */
+	struct rlimit           o_rlim;
+	/* userspace wants to locked */
+	int                     locked_pages;
+	/* currently locked pages */
+	int                     cur_pages;
+	/* the memory locked before */
+	unsigned long		orig_locked_vm;
+	/* the device according to */
+	struct net_device       *dev;
+	/* the mp_port according to dev */
+	struct mp_port          port;
+	/* the hash_table list to find each locked page */
+	struct page_info        **hash_table;
+};
+
+static struct kmem_cache *ext_page_info_cache;
+
+#ifdef __KERNEL__
+#if defined(CONFIG_MEDIATE_PASSTHRU) || defined(CONFIG_MEDIATE_PASSTHRU_MODULE)
+struct socket *mp_get_socket(struct file *);
+struct page_pool *page_pool_create(struct net_device *dev,
+				   struct socket *sock);
+int async_recvmsg(struct kiocb *iocb, struct page_pool *pool,
+		  struct iovec *iov, int count, int flags);
+int async_sendmsg(struct sock *sk, struct kiocb *iocb,
+		  struct page_pool *pool, struct iovec *iov,
+		  int count);
+void async_data_ready(struct sock *sk, struct page_pool *pool);
+void dev_change_state(struct net_device *dev);
+void page_pool_destroy(struct mm_struct *mm, struct page_pool *pool);
+#else
+#include <linux/err.h>
+#include <linux/errno.h>
+struct file;
+struct socket;
+static inline struct socket *mp_get_socket(struct file *f)
+{
+	return ERR_PTR(-EINVAL);
+}
+static inline struct page_pool *page_pool_create(struct net_device *dev,
+		struct socket *sock)
+{
+	return ERR_PTR(-EINVAL);
+}
+static inline int async_recvmsg(struct kiocb *iocb, struct page_pool *pool,
+		struct iovec *iov, int count, int flags)
+{
+	return -EINVAL;
+}
+static inline int async_sendmsg(struct sock *sk, struct kiocb *iocb,
+		struct page_pool *pool, struct iovec *iov,
+		int count)
+{
+	return -EINVAL;
+}
+static inline void async_data_ready(struct sock *sk, struct page_pool *pool)
+{
+	return;
+}
+static inline void dev_change_state(struct net_device *dev)
+{
+	return;
+}
+static inline void page_pool_destroy(struct mm_struct *mm,
+				     struct page_pool *pool)
+{
+	return;
+}
+#endif /* CONFIG_MEDIATE_PASSTHRU */
+#endif /* __KERNEL__ */
+#endif /* __MPASSTHRU_H */
-- 
1.7.3

^ permalink raw reply related

* [PATCH v13 09/16]Don't do skb recycle, if device use external buffer.
From: xiaohui.xin @ 2010-10-15  9:12 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
 net/core/skbuff.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 02439e0..196aa99 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -558,6 +558,12 @@ bool skb_recycle_check(struct sk_buff *skb, int skb_size)
 	if (skb_shared(skb) || skb_cloned(skb))
 		return false;
 
+	/* if the device wants to do mediate passthru, the skb may
+	 * get external buffer, so don't recycle
+	 */
+	if (dev_is_mpassthru(skb->dev))
+		return 0;
+
 	skb_release_head_state(skb);
 
 	shinfo = skb_shinfo(skb);
-- 
1.7.3

^ permalink raw reply related

* [PATCH v13 08/16] Modify netdev_free_page() to release external buffer
From: xiaohui.xin @ 2010-10-15  9:12 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

Currently, it can get external buffers from mp device.

Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
 include/linux/skbuff.h |    4 +++-
 net/core/skbuff.c      |   24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 696e690..8cfde3e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1585,9 +1585,11 @@ static inline struct page *netdev_alloc_page(struct net_device *dev)
 	return __netdev_alloc_page(dev, GFP_ATOMIC);
 }
 
+extern void __netdev_free_page(struct net_device *dev, struct page *page);
+
 static inline void netdev_free_page(struct net_device *dev, struct page *page)
 {
-	__free_page(page);
+	__netdev_free_page(dev, page);
 }
 
 /**
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f39d372..02439e0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -299,6 +299,30 @@ struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__netdev_alloc_page);
 
+void netdev_free_ext_page(struct net_device *dev, struct page *page)
+{
+	struct skb_ext_page *ext_page = NULL;
+	if (dev_is_mpassthru(dev) && dev->mp_port->hash) {
+		ext_page = dev->mp_port->hash(dev, page);
+		if (ext_page)
+			ext_page->dtor(ext_page);
+		else
+			__free_page(page);
+	}
+}
+EXPORT_SYMBOL(netdev_free_ext_page);
+
+void __netdev_free_page(struct net_device *dev, struct page *page)
+{
+	if (dev_is_mpassthru(dev)) {
+		netdev_free_ext_page(dev, page);
+		return;
+	}
+
+	__free_page(page);
+}
+EXPORT_SYMBOL(__netdev_free_page);
+
 void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
 		int size)
 {
-- 
1.7.3


^ permalink raw reply related

* [PATCH v13 07/16] Modify netdev_alloc_page() to get external buffer
From: xiaohui.xin @ 2010-10-15  9:12 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

    Currently, it can get external buffers from mp device.

    Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
    Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
    Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
 net/core/skbuff.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5e6d69c..f39d372 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -262,11 +262,38 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 }
 EXPORT_SYMBOL(__netdev_alloc_skb);
 
+struct page *netdev_alloc_ext_pages(struct net_device *dev, int npages)
+{
+	struct mp_port *port;
+	struct skb_ext_page *ext_page = NULL;
+
+	port = dev->mp_port;
+	if (!port)
+		goto out;
+	ext_page = port->ctor(port, NULL, npages);
+	if (ext_page)
+		return ext_page->page;
+out:
+	return NULL;
+
+}
+EXPORT_SYMBOL(netdev_alloc_ext_pages);
+
+struct page *netdev_alloc_ext_page(struct net_device *dev)
+{
+	return netdev_alloc_ext_pages(dev, 1);
+
+}
+EXPORT_SYMBOL(netdev_alloc_ext_page);
+
 struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
 {
 	int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
 	struct page *page;
 
+	if (dev_is_mpassthru(dev))
+		return netdev_alloc_ext_page(dev);
+
 	page = alloc_pages_node(node, gfp_mask, 0);
 	return page;
 }
-- 
1.7.3

^ permalink raw reply related

* [PATCH v13 06/16]Use callback to deal with skb_release_data() specially.
From: xiaohui.xin @ 2010-10-15  9:12 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

If buffer is external, then use the callback to destruct
buffers.

Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
 net/core/skbuff.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c83b421..5e6d69c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -210,6 +210,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 	/* make sure we initialize shinfo sequentially */
 	shinfo = skb_shinfo(skb);
+	shinfo->destructor_arg = NULL;
 	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
 	atomic_set(&shinfo->dataref, 1);
 
@@ -343,6 +344,13 @@ static void skb_release_data(struct sk_buff *skb)
 		if (skb_has_frags(skb))
 			skb_drop_fraglist(skb);
 
+		if (skb->dev && dev_is_mpassthru(skb->dev)) {
+			struct skb_ext_page *ext_page =
+				skb_shinfo(skb)->destructor_arg;
+			if (ext_page && ext_page->dtor)
+				ext_page->dtor(ext_page);
+		}
+
 		kfree(skb->head);
 	}
 }
-- 
1.7.3


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox