Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: fix htmldocs sunrpc, clnt.c
From: Benny Halevy @ 2009-09-24 16:46 UTC (permalink / raw)
  To: Randy Dunlap, Jaswinder Singh Rajput
  Cc: Ricardo Labiaga, Andy Adamson, Trond Myklebust, David Miller,
	linux-nfs, netdev, LKML
In-Reply-To: <4ABB9C17.3020307@oracle.com>

On Sep. 24, 2009, 19:19 +0300, Randy Dunlap <randy.dunlap@oracle.com> wrote:
> Jaswinder Singh Rajput wrote:
>>   DOCPROC Documentation/DocBook/networking.xml
>>   Warning(net/sunrpc/clnt.c:647): No description found for parameter 'req'
>>   Warning(net/sunrpc/clnt.c:647): No description found for parameter 'tk_ops'
>>   Warning(net/sunrpc/clnt.c:647): Excess function parameter 'ops' description in 'rpc_run_bc_task'
>>
>> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
> 
> Ack.  Already sent, but possibly lost.

Ack.  thanks!

> 
>> Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
>> Cc: Benny Halevy <bhalevy@panasas.com>
>> Cc: Andy Adamson <andros@netapp.com>
>> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
>> Cc: Randy Dunlap <randy.dunlap@oracle.com>
>> Cc: David Miller <davem@davemloft.net>
>> ---
>>  net/sunrpc/clnt.c |    5 +++--
>>  1 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>> index a417d5a..38829e2 100644
>> --- a/net/sunrpc/clnt.c
>> +++ b/net/sunrpc/clnt.c
>> @@ -640,10 +640,11 @@ EXPORT_SYMBOL_GPL(rpc_call_async);
>>  /**
>>   * rpc_run_bc_task - Allocate a new RPC task for backchannel use, then run
>>   * rpc_execute against it
>> - * @ops: RPC call ops
>> + * @req: RPC request
>> + * @tk_ops: RPC call ops
>>   */
>>  struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req,
>> -					const struct rpc_call_ops *tk_ops)
>> +				const struct rpc_call_ops *tk_ops)
>>  {
>>  	struct rpc_task *task;
>>  	struct xdr_buf *xbufp = &req->rq_snd_buf;
> 

^ permalink raw reply

* Re: ixgbe patch to provide NIC's tx/rx counters via ethtool
From: Rick Jones @ 2009-09-24 16:30 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev
In-Reply-To: <4ABAD48C.9010808@candelatech.com>

Ben Greear wrote:
> Rick Jones wrote:
>> Ben Greear wrote:
>>
>>> When LRO is enabled, the received packet and byte counters represent the
>>> LRO'd packets, not the packets/bytes on the wire. 
>>
>>
>> When LRO is enabled, are all the bytes on the wire actually 
>> transferred into the host?
> 
> No...the ethernet, IP and TCP headers and such are not, for packets that 
> are combined into a single large SKB.
> 
> That is why the driver counts them wrong.  The bytes are off by a few 
> percentage points, but the packet count is off by an order of magnitude.

An overly philosphical question perhaps, but are ethtool stats supposed to 
represent what was on the wire, or what entered the host?

rick

^ permalink raw reply

* Re: [PATCH] net: fix htmldocs sunrpc, clnt.c
From: Randy Dunlap @ 2009-09-24 16:19 UTC (permalink / raw)
  To: Jaswinder Singh Rajput
  Cc: Ricardo Labiaga, Benny Halevy, Andy Adamson, Trond Myklebust,
	David Miller, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev, LKML
In-Reply-To: <1253794781.5860.29.camel-6Ww87KsxWewAvxtiuMwx3w@public.gmane.org>

Jaswinder Singh Rajput wrote:
>   DOCPROC Documentation/DocBook/networking.xml
>   Warning(net/sunrpc/clnt.c:647): No description found for parameter 'req'
>   Warning(net/sunrpc/clnt.c:647): No description found for parameter 'tk_ops'
>   Warning(net/sunrpc/clnt.c:647): Excess function parameter 'ops' description in 'rpc_run_bc_task'
> 
> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Ack.  Already sent, but possibly lost.

> Cc: Ricardo Labiaga <Ricardo.Labiaga-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
> Cc: Benny Halevy <bhalevy-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org>
> Cc: Andy Adamson <andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
> Cc: Trond Myklebust <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
> Cc: Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Cc: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> ---
>  net/sunrpc/clnt.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index a417d5a..38829e2 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -640,10 +640,11 @@ EXPORT_SYMBOL_GPL(rpc_call_async);
>  /**
>   * rpc_run_bc_task - Allocate a new RPC task for backchannel use, then run
>   * rpc_execute against it
> - * @ops: RPC call ops
> + * @req: RPC request
> + * @tk_ops: RPC call ops
>   */
>  struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req,
> -					const struct rpc_call_ops *tk_ops)
> +				const struct rpc_call_ops *tk_ops)
>  {
>  	struct rpc_task *task;
>  	struct xdr_buf *xbufp = &req->rq_snd_buf;

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next-2.6] ixgbe: correct the parameter description
From: Jiri Pirko @ 2009-09-24 14:36 UTC (permalink / raw)
  To: davem; +Cc: netdev

ccffad25b5136958d4769ed6de5e87992dd9c65c changed parameters for function
ixgbe_update_uc_addr_list_generic but parameter description was not updated.
This patch corrects it.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/drivers/net/ixgbe/ixgbe_common.c b/drivers/net/ixgbe/ixgbe_common.c
index 6621e17..2c7db17 100644
--- a/drivers/net/ixgbe/ixgbe_common.c
+++ b/drivers/net/ixgbe/ixgbe_common.c
@@ -1355,9 +1355,7 @@ static void ixgbe_add_uc_addr(struct ixgbe_hw *hw, u8 *addr, u32 vmdq)
 /**
  *  ixgbe_update_uc_addr_list_generic - Updates MAC list of secondary addresses
  *  @hw: pointer to hardware structure
- *  @addr_list: the list of new addresses
- *  @addr_count: number of addresses
- *  @next: iterator function to walk the address list
+ *  @uc_list: the list of new addresses
  *
  *  The given list replaces any existing list.  Clears the secondary addrs from
  *  receive address registers.  Uses unused receive address registers for the

^ permalink raw reply related

* Re: [PATCH v2] ems_pci: fix size of CAN controllers BAR mapping for CPC-PCI v2
From: Wolfgang Grandegger @ 2009-09-24 14:17 UTC (permalink / raw)
  To: Sebastian Haas
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <20090924135505.13453.61811.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>

Sebastian Haas wrote:
> The driver mapped only 128 bytes of the CAN controller address space when a
> CPC-PCI v2 was detected (incl. CPC-104P). This patch will fix it by always
> mapping the whole address space (4096 bytes on all boards) of the
> corresponding PCI BAR.
> 
> Signed-off-by: Sebastian Haas <haas-zsNKPWJ8Pib6hrUXjxyGrA@public.gmane.org>
Signed-off-by: Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>

Thanks,

Wolfgang.

^ permalink raw reply

* [PATCH v2] ems_pci: fix size of CAN controllers BAR mapping for CPC-PCI v2
From: Sebastian Haas @ 2009-09-24 13:55 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, wg-5Yr1BZd7O62+XT7JhA+gdA

The driver mapped only 128 bytes of the CAN controller address space when a
CPC-PCI v2 was detected (incl. CPC-104P). This patch will fix it by always
mapping the whole address space (4096 bytes on all boards) of the
corresponding PCI BAR.

Signed-off-by: Sebastian Haas <haas-zsNKPWJ8Pib6hrUXjxyGrA@public.gmane.org>
---

 drivers/net/can/sja1000/ems_pci.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/can/sja1000/ems_pci.c b/drivers/net/can/sja1000/ems_pci.c
index 7d84b8a..fd04789 100644
--- a/drivers/net/can/sja1000/ems_pci.c
+++ b/drivers/net/can/sja1000/ems_pci.c
@@ -94,12 +94,14 @@ struct ems_pci_card {
 #define EMS_PCI_CDR             (CDR_CBP | CDR_CLKOUT_MASK)
 
 #define EMS_PCI_V1_BASE_BAR     1
-#define EMS_PCI_V1_MEM_SIZE     4096
+#define EMS_PCI_V1_CONF_SIZE    4096 /* size of PITA control area */
 #define EMS_PCI_V2_BASE_BAR     2
-#define EMS_PCI_V2_MEM_SIZE     128
+#define EMS_PCI_V2_CONF_SIZE    128 /* size of PLX control area */
 #define EMS_PCI_CAN_BASE_OFFSET 0x400 /* offset where the controllers starts */
 #define EMS_PCI_CAN_CTRL_SIZE   0x200 /* memory size for each controller */
 
+#define EMS_PCI_BASE_SIZE  4096 /* size of controller area */
+
 static struct pci_device_id ems_pci_tbl[] = {
 	/* CPC-PCI v1 */
 	{PCI_VENDOR_ID_SIEMENS, 0x2104, PCI_ANY_ID, PCI_ANY_ID,},
@@ -224,7 +226,7 @@ static int __devinit ems_pci_add_card(struct pci_dev *pdev,
 	struct sja1000_priv *priv;
 	struct net_device *dev;
 	struct ems_pci_card *card;
-	int max_chan, mem_size, base_bar;
+	int max_chan, conf_size, base_bar;
 	int err, i;
 
 	/* Enabling PCI device */
@@ -251,22 +253,22 @@ static int __devinit ems_pci_add_card(struct pci_dev *pdev,
 		card->version = 2; /* CPC-PCI v2 */
 		max_chan = EMS_PCI_V2_MAX_CHAN;
 		base_bar = EMS_PCI_V2_BASE_BAR;
-		mem_size = EMS_PCI_V2_MEM_SIZE;
+		conf_size = EMS_PCI_V2_CONF_SIZE;
 	} else {
 		card->version = 1; /* CPC-PCI v1 */
 		max_chan = EMS_PCI_V1_MAX_CHAN;
 		base_bar = EMS_PCI_V1_BASE_BAR;
-		mem_size = EMS_PCI_V1_MEM_SIZE;
+		conf_size = EMS_PCI_V1_CONF_SIZE;
 	}
 
 	/* Remap configuration space and controller memory area */
-	card->conf_addr = pci_iomap(pdev, 0, mem_size);
+	card->conf_addr = pci_iomap(pdev, 0, conf_size);
 	if (card->conf_addr == NULL) {
 		err = -ENOMEM;
 		goto failure_cleanup;
 	}
 
-	card->base_addr = pci_iomap(pdev, base_bar, mem_size);
+	card->base_addr = pci_iomap(pdev, base_bar, EMS_PCI_BASE_SIZE);
 	if (card->base_addr == NULL) {
 		err = -ENOMEM;
 		goto failure_cleanup;

-- 
EMS Dr. Thomas Wuensche e.K.
Sonnenhang 3
85304 Ilmmuenster
HRA Neuburg a.d. Donau, HR-Nr. 70.106
Phone: +49-8441-490260
Fax  : +49-8441-81860
http://www.ems-wuensche.com

^ permalink raw reply related

* [PATCH] net: fix htmldocs sunrpc, clnt.c
From: Jaswinder Singh Rajput @ 2009-09-24 12:19 UTC (permalink / raw)
  To: Ricardo Labiaga, Benny Halevy, Andy Adamson, Trond Myklebust,
	Randy Dunlap <randy.


  DOCPROC Documentation/DocBook/networking.xml
  Warning(net/sunrpc/clnt.c:647): No description found for parameter 'req'
  Warning(net/sunrpc/clnt.c:647): No description found for parameter 'tk_ops'
  Warning(net/sunrpc/clnt.c:647): Excess function parameter 'ops' description in 'rpc_run_bc_task'

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Miller <davem@davemloft.net>
---
 net/sunrpc/clnt.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index a417d5a..38829e2 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -640,10 +640,11 @@ EXPORT_SYMBOL_GPL(rpc_call_async);
 /**
  * rpc_run_bc_task - Allocate a new RPC task for backchannel use, then run
  * rpc_execute against it
- * @ops: RPC call ops
+ * @req: RPC request
+ * @tk_ops: RPC call ops
  */
 struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req,
-					const struct rpc_call_ops *tk_ops)
+				const struct rpc_call_ops *tk_ops)
 {
 	struct rpc_task *task;
 	struct xdr_buf *xbufp = &req->rq_snd_buf;
-- 
1.6.0.6



^ permalink raw reply related

* Re: [PATCH] ems_pci: fix size of CAN controllers BAR mapping for CPC-PCI v2
From: Sebastian Haas @ 2009-09-24 11:52 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, Wolfgang Grandegger
In-Reply-To: <4ABB2FBF.8080906-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Wolfgang,

Wolfgang Grandegger schrieb:
> Hi Sebastian,
> 
> Sebastian Haas wrote:
>> The driver mapped only 128 bytes of the CAN controller address space when a
>> CPC-PCI v2 was detected (incl. CPC-104P). This patch will fix it by always
>> mapping the whole address space (4096 bytes on all boards) of the
>> corresponding PCI BAR.
>>
>> Signed-off-by: Sebastian Haas <haas-zsNKPWJ8Pib6hrUXjxyGrA@public.gmane.org>
>> ---
>>
>>  drivers/net/can/sja1000/ems_pci.c |    8 +++++---
>>  1 files changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/can/sja1000/ems_pci.c b/drivers/net/can/sja1000/ems_pci.c
>> index 7d84b8a..ba98063 100644
>> --- a/drivers/net/can/sja1000/ems_pci.c
>> +++ b/drivers/net/can/sja1000/ems_pci.c
>> @@ -94,12 +94,14 @@ struct ems_pci_card {
>>  #define EMS_PCI_CDR             (CDR_CBP | CDR_CLKOUT_MASK)
>>  
>>  #define EMS_PCI_V1_BASE_BAR     1
>> -#define EMS_PCI_V1_MEM_SIZE     4096
>> +#define EMS_PCI_V1_MEM_SIZE     4096 /* size of PITA control area */
>>  #define EMS_PCI_V2_BASE_BAR     2
>> -#define EMS_PCI_V2_MEM_SIZE     128
>> +#define EMS_PCI_V2_MEM_SIZE     128 /* size of PLX control area */
>>  #define EMS_PCI_CAN_BASE_OFFSET 0x400 /* offset where the controllers starts */
>>  #define EMS_PCI_CAN_CTRL_SIZE   0x200 /* memory size for each controller */
>>  
>> +#define EMS_PCI_CONTR_MEM_SIZE  4096 /* size of controller area */
>> +
>>  static struct pci_device_id ems_pci_tbl[] = {
>>  	/* CPC-PCI v1 */
>>  	{PCI_VENDOR_ID_SIEMENS, 0x2104, PCI_ANY_ID, PCI_ANY_ID,},
>> @@ -266,7 +268,7 @@ static int __devinit ems_pci_add_card(struct pci_dev *pdev,
>>  		goto failure_cleanup;
>>  	}
>>  
>> -	card->base_addr = pci_iomap(pdev, base_bar, mem_size);
>> +	card->base_addr = pci_iomap(pdev, base_bar, EMS_PCI_CONTR_MEM_SIZE);
>>  	if (card->base_addr == NULL) {
>>  		err = -ENOMEM;
>>  		goto failure_cleanup;
> 
> I see. To avoid confusion I suggest renaming some variables and defines:
> 
> s/EMS_PCI_V1_MEM_SIZE/EMS_PCI_V1_CONF_SIZE/
> s/EMS_PCI_V2_MEM_SIZE/EMS_PCI_V2_CONF_SIZE/
> s/mem_size/conf_size/
> s/EMS_PCI_CONTR_MEM_SIZE/EMS_PCI_BASE_SIZE/
> 
> Would that not be more appropriate?
Okay, I just wanted to minimize changes. Will change it and resubmit.

Sebastian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkq7XZQACgkQpqRB8PJG7XwjeQCfZZJP1FDD7TLBf2hK3dV64GgX
3oMAn2jIkdOx9Euc1UihiK/BXLSIUp06
=W9tL
-----END PGP SIGNATURE-----
-- 
EMS Dr. Thomas Wuensche e.K.
Sonnenhang 3
85304 Ilmmuenster
HRA Neuburg a.d. Donau, HR-Nr. 70.106
Phone: +49-8441-490260
Fax  : +49-8441-81860
http://www.ems-wuensche.com

^ permalink raw reply

* Re: r8169 chips on some Intel D945GSEJT boards fail to work after PXE boot
From: Simon Farnsworth @ 2009-09-24 11:12 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev
In-Reply-To: <20090923205723.GA28058@electric-eye.fr.zoreil.com>

Francois Romieu wrote:
> Simon Farnsworth <simon.farnsworth@onelan.com> :
> [...]
>> Some boards are good, and just work, whether I boot via PXE or boot from
>> the local disk; dmesg.working and lspci.working are from a good board.
>>
>> Some boards are bad; they work fine if I boot from local disk (including
>> network), but the kernel cannot detect link, or send or receive data if
>> I PXE boot. dmesg.broken and lspci.broken are from a bad board.
> 
> No cunning theroy in sight but does reducing the amount of memory on a
> bad board from 1 Go to 512 Mo turn it into a good one ?
> 
We've tried this, and we've tried 2GB and 1GB modules; the failure to
boot sticks with the board, not with the memory module. On my most
recent attempt, the failing board isn't showing a correctable error
status, so I've not yet tried your patch, on the assumption that it just
clears the error status.

Is my assumption wrong? If not, is there anything else I can do that
would help you diagnose this?

> The failing board exhibits a correctable error status bit. Clearing it
> is the least we can do.
> 
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 50c6a3c..79bc4ab 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -2200,6 +2200,11 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	tp->pcie_cap = pci_find_capability(pdev, PCI_CAP_ID_EXP);
>  	if (!tp->pcie_cap && netif_msg_probe(tp))
>  		dev_info(&pdev->dev, "no PCI Express capability\n");
> +	else {
> +		pci_write_config_word(pdev, tp->pcie_cap + PCI_EXP_DEVSTA,
> +				      PCI_EXP_DEVSTA_CED | PCI_EXP_DEVSTA_NFED |
> +				      PCI_EXP_DEVSTA_FED | PCI_EXP_DEVSTA_URD);
> +	}
>  
>  	RTL_W16(IntrMask, 0x0000);
>  

-- 
Simon Farnsworth


^ permalink raw reply

* Re: pktgen: tricks
From: Eric Dumazet @ 2009-09-24 10:32 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Stephen Hemminger, Jesper Dangaard Brouer, Robert Olsson, netdev
In-Reply-To: <200909241310.05297.denys@visp.net.lb>

Denys Fedoryschenko a écrit :
> On Thursday 24 September 2009 03:41:41 Stephen Hemminger wrote:
>> Other kernel config help:
>>   - turn off lock dependency checker, kmecheck, page alloc debug
>>     basically anything that slows stuff down
>>   - turn off content group scheduler
> Maybe, but i'm not sure (i can't test it):
> Disable randomize VA space? On embedded boards it was helping. 
> In some case disabling SMP helped, when various SMP locks involved, but maybe 
> not for pktgen.
> 
>

pktgen is a kernel module, and is not affected by randomize VA space.

But of course, disabling SMP must help, as long as your machine needs
one cpu only :)


^ permalink raw reply

* Re: pktgen: tricks
From: Denys Fedoryschenko @ 2009-09-24 10:10 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jesper Dangaard Brouer, Robert Olsson, netdev
In-Reply-To: <20090923174141.1d350103@s6510>

On Thursday 24 September 2009 03:41:41 Stephen Hemminger wrote:
> Other kernel config help:
>   - turn off lock dependency checker, kmecheck, page alloc debug
>     basically anything that slows stuff down
>   - turn off content group scheduler
Maybe, but i'm not sure (i can't test it):
Disable randomize VA space? On embedded boards it was helping. 
In some case disabling SMP helped, when various SMP locks involved, but maybe 
not for pktgen.


> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply

* Re: [PATCH] ems_pci: fix size of CAN controllers BAR mapping for CPC-PCI v2
From: Wolfgang Grandegger @ 2009-09-24  8:37 UTC (permalink / raw)
  To: Sebastian Haas
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <20090923153747.30154.99860.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>

Hi Sebastian,

Sebastian Haas wrote:
> The driver mapped only 128 bytes of the CAN controller address space when a
> CPC-PCI v2 was detected (incl. CPC-104P). This patch will fix it by always
> mapping the whole address space (4096 bytes on all boards) of the
> corresponding PCI BAR.
> 
> Signed-off-by: Sebastian Haas <haas-zsNKPWJ8Pib6hrUXjxyGrA@public.gmane.org>
> ---
> 
>  drivers/net/can/sja1000/ems_pci.c |    8 +++++---
>  1 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/can/sja1000/ems_pci.c b/drivers/net/can/sja1000/ems_pci.c
> index 7d84b8a..ba98063 100644
> --- a/drivers/net/can/sja1000/ems_pci.c
> +++ b/drivers/net/can/sja1000/ems_pci.c
> @@ -94,12 +94,14 @@ struct ems_pci_card {
>  #define EMS_PCI_CDR             (CDR_CBP | CDR_CLKOUT_MASK)
>  
>  #define EMS_PCI_V1_BASE_BAR     1
> -#define EMS_PCI_V1_MEM_SIZE     4096
> +#define EMS_PCI_V1_MEM_SIZE     4096 /* size of PITA control area */
>  #define EMS_PCI_V2_BASE_BAR     2
> -#define EMS_PCI_V2_MEM_SIZE     128
> +#define EMS_PCI_V2_MEM_SIZE     128 /* size of PLX control area */
>  #define EMS_PCI_CAN_BASE_OFFSET 0x400 /* offset where the controllers starts */
>  #define EMS_PCI_CAN_CTRL_SIZE   0x200 /* memory size for each controller */
>  
> +#define EMS_PCI_CONTR_MEM_SIZE  4096 /* size of controller area */
> +
>  static struct pci_device_id ems_pci_tbl[] = {
>  	/* CPC-PCI v1 */
>  	{PCI_VENDOR_ID_SIEMENS, 0x2104, PCI_ANY_ID, PCI_ANY_ID,},
> @@ -266,7 +268,7 @@ static int __devinit ems_pci_add_card(struct pci_dev *pdev,
>  		goto failure_cleanup;
>  	}
>  
> -	card->base_addr = pci_iomap(pdev, base_bar, mem_size);
> +	card->base_addr = pci_iomap(pdev, base_bar, EMS_PCI_CONTR_MEM_SIZE);
>  	if (card->base_addr == NULL) {
>  		err = -ENOMEM;
>  		goto failure_cleanup;

I see. To avoid confusion I suggest renaming some variables and defines:

s/EMS_PCI_V1_MEM_SIZE/EMS_PCI_V1_CONF_SIZE/
s/EMS_PCI_V2_MEM_SIZE/EMS_PCI_V2_CONF_SIZE/
s/mem_size/conf_size/
s/EMS_PCI_CONTR_MEM_SIZE/EMS_PCI_BASE_SIZE/

Would that not be more appropriate?

Wolfgang.

^ permalink raw reply

* Re: [AX25] kernel panic
From: Jarek Poplawski @ 2009-09-24  8:07 UTC (permalink / raw)
  To: Bernard Pidoux F6BVP
  Cc: Bernard Pidoux, Ralf Baechle DL5RB, Linux Netdev List, linux-hams
In-Reply-To: <4ABA9058.3010605@free.fr>

On Wed, Sep 23, 2009 at 11:17:12PM +0200, Bernard Pidoux F6BVP wrote:
> Hi Jarek,
Hi Bernard,

>
> After applying your second patch I had to wait until today before I catched
> these kernel messages.
> The last three ones where single lines.
> There was no kernel panic. 

Probably you didn't hit the some kind of traffic/problems, because
there was only some additional reporting in this patch vs. take 1.

In the meantime I found some strangeness in refcounting around
ax25_send_frame, and below is a patch which IMHO should fix it. I
don't know if it matters in your case (the previous report suggested
there should be something more), but let's try. Please, don't remove
the previous debugging patch yet, while testing this one.

Thanks,
Jarek P.
--- (ax25_send_frame patch take 1 for testing)

 include/net/netrom.h  |    2 ++
 net/ax25/ax25_out.c   |    6 ++++++
 net/netrom/nr_route.c |   11 ++++++-----
 net/rose/rose_link.c  |    8 ++++++++
 net/rose/rose_route.c |    5 +++++
 5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/include/net/netrom.h b/include/net/netrom.h
index 15696b1..ab170a6 100644
--- a/include/net/netrom.h
+++ b/include/net/netrom.h
@@ -132,6 +132,8 @@ static __inline__ void nr_node_put(struct nr_node *nr_node)
 static __inline__ void nr_neigh_put(struct nr_neigh *nr_neigh)
 {
 	if (atomic_dec_and_test(&nr_neigh->refcount)) {
+		if (nr_neigh->ax25)
+			ax25_cb_put(nr_neigh->ax25);
 		kfree(nr_neigh->digipeat);
 		kfree(nr_neigh);
 	}
diff --git a/net/ax25/ax25_out.c b/net/ax25/ax25_out.c
index bf706f8..1491260 100644
--- a/net/ax25/ax25_out.c
+++ b/net/ax25/ax25_out.c
@@ -92,6 +92,12 @@ ax25_cb *ax25_send_frame(struct sk_buff *skb, int paclen, ax25_address *src, ax2
 #endif
 	}
 
+	/*
+	 * There is one ref for the state machine; a caller needs
+	 * one more to put it back, just like with the existing one.
+	 */
+	ax25_cb_hold(ax25);
+
 	ax25_cb_add(ax25);
 
 	ax25->state = AX25_STATE_1;
diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index 4eb1ac9..850ffc0 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -842,12 +842,13 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
 	dptr  = skb_push(skb, 1);
 	*dptr = AX25_P_NETROM;
 
-	ax25s = ax25_send_frame(skb, 256, (ax25_address *)dev->dev_addr, &nr_neigh->callsign, nr_neigh->digipeat, nr_neigh->dev);
-	if (nr_neigh->ax25 && ax25s) {
-		/* We were already holding this ax25_cb */
+	ax25s = nr_neigh->ax25;
+	nr_neigh->ax25 = ax25_send_frame(skb, 256,
+					 (ax25_address *)dev->dev_addr,
+					 &nr_neigh->callsign,
+					 nr_neigh->digipeat, nr_neigh->dev);
+	if (ax25s)
 		ax25_cb_put(ax25s);
-	}
-	nr_neigh->ax25 = ax25s;
 
 	dev_put(dev);
 	ret = (nr_neigh->ax25 != NULL);
diff --git a/net/rose/rose_link.c b/net/rose/rose_link.c
index bd86a63..5ef5f69 100644
--- a/net/rose/rose_link.c
+++ b/net/rose/rose_link.c
@@ -101,13 +101,17 @@ static void rose_t0timer_expiry(unsigned long param)
 static int rose_send_frame(struct sk_buff *skb, struct rose_neigh *neigh)
 {
 	ax25_address *rose_call;
+	ax25_cb *ax25s;
 
 	if (ax25cmp(&rose_callsign, &null_ax25_address) == 0)
 		rose_call = (ax25_address *)neigh->dev->dev_addr;
 	else
 		rose_call = &rose_callsign;
 
+	ax25s = neigh->ax25;
 	neigh->ax25 = ax25_send_frame(skb, 260, rose_call, &neigh->callsign, neigh->digipeat, neigh->dev);
+	if (ax25s)
+		ax25_cb_put(ax25s);
 
 	return (neigh->ax25 != NULL);
 }
@@ -120,13 +124,17 @@ static int rose_send_frame(struct sk_buff *skb, struct rose_neigh *neigh)
 static int rose_link_up(struct rose_neigh *neigh)
 {
 	ax25_address *rose_call;
+	ax25_cb *ax25s;
 
 	if (ax25cmp(&rose_callsign, &null_ax25_address) == 0)
 		rose_call = (ax25_address *)neigh->dev->dev_addr;
 	else
 		rose_call = &rose_callsign;
 
+	ax25s = neigh->ax25;
 	neigh->ax25 = ax25_find_cb(rose_call, &neigh->callsign, neigh->digipeat, neigh->dev);
+	if (ax25s)
+		ax25_cb_put(ax25s);
 
 	return (neigh->ax25 != NULL);
 }
diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c
index 9478d9b..8e5d5ac 100644
--- a/net/rose/rose_route.c
+++ b/net/rose/rose_route.c
@@ -234,6 +234,8 @@ static void rose_remove_neigh(struct rose_neigh *rose_neigh)
 
 	if ((s = rose_neigh_list) == rose_neigh) {
 		rose_neigh_list = rose_neigh->next;
+		if (rose_neigh->ax25)
+			ax25_cb_put(rose_neigh->ax25);
 		kfree(rose_neigh->digipeat);
 		kfree(rose_neigh);
 		return;
@@ -242,6 +244,8 @@ static void rose_remove_neigh(struct rose_neigh *rose_neigh)
 	while (s != NULL && s->next != NULL) {
 		if (s->next == rose_neigh) {
 			s->next = rose_neigh->next;
+			if (rose_neigh->ax25)
+				ax25_cb_put(rose_neigh->ax25);
 			kfree(rose_neigh->digipeat);
 			kfree(rose_neigh);
 			return;
@@ -814,6 +818,7 @@ void rose_link_failed(ax25_cb *ax25, int reason)
 
 	if (rose_neigh != NULL) {
 		rose_neigh->ax25 = NULL;
+		ax25_cb_put(ax25);
 
 		rose_del_route_by_neigh(rose_neigh);
 		rose_kill_by_neigh(rose_neigh);

^ permalink raw reply related

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Avi Kivity @ 2009-09-24  8:03 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Ira W. Snyder, Michael S. Tsirkin, netdev, virtualization, kvm,
	linux-kernel, mingo, linux-mm, akpm, hpa, Rusty Russell, s.hetze,
	alacrityvm-devel
In-Reply-To: <4ABA78DC.7070604@redhat.com>

On 09/23/2009 10:37 PM, Avi Kivity wrote:
>
> Example: feature negotiation.  If it happens in userspace, it's easy 
> to limit what features we expose to the guest.  If it happens in the 
> kernel, we need to add an interface to let the kernel know which 
> features it should expose to the guest.  We also need to add an 
> interface to let userspace know which features were negotiated, if we 
> want to implement live migration.  Something fairly trivial bloats 
> rapidly.

btw, we have this issue with kvm reporting cpuid bits to the guest.  
Instead of letting kvm talk directly to the hardware and the guest, kvm 
gets the cpuid bits from the hardware, strips away features it doesn't 
support, exposes that to userspace, and expects userspace to program the 
cpuid bits it wants to expose to the guest (which may be different than 
what kvm exposed to userspace, and different from guest to guest).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 3/3 V2] i2400m-sdio: select IWMC3200TOP in Kconfig
From: Tomas Winkler @ 2009-09-24  8:00 UTC (permalink / raw)
  To: davem, linville, netdev, linux-wireless, linux-mmc
  Cc: yi.zhu, inaky.perez-gonzalez, cindy.h.kao, guy.cohen,
	ron.rindjunsky, Tomas Winkler

i2400m-sdio requires iwmc3200top for its operation

create separate config option to separate 3200 specifics
from eventual further wimax sdio HW.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
---
V2: create separate config option as discussed with Inaky

 drivers/net/wimax/i2400m/Kconfig |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/net/wimax/i2400m/Kconfig b/drivers/net/wimax/i2400m/Kconfig
index d623b3d..9723c5c 100644
--- a/drivers/net/wimax/i2400m/Kconfig
+++ b/drivers/net/wimax/i2400m/Kconfig
@@ -31,6 +31,15 @@ config WIMAX_I2400M_SDIO
 
 	  If unsure, it is safe to select M (module).
 
+config WIMAX_IWMC3200_SDIO
+	bool "Intel Wireless Multicom WiMAX Connection 3200 over SDIO"
+	depends on WIMAX_I2400M_SDIO
+	select IWMC3200TOP
+	help
+	  Select if you have a device based on the Intel Multicom WiMAX
+          Connection 3200 over SDIO.
+ 
+
 config WIMAX_I2400M_DEBUG_LEVEL
 	int "WiMAX i2400m debug level"
 	depends on WIMAX_I2400M
-- 
1.6.0.6

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply related

* [PATCH] tunnels: Optimize tx path
From: Eric Dumazet @ 2009-09-24  7:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

We currently dirty a cache line to update tunnel device stats
(tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
counters that already are present in cpu cache, in the cache
line shared with txq->_xmit_lock

This patch extends IPTUNNEL_XMIT() macro to use txq pointer
provided by the caller.

Also &tunnel->dev->stats can be replaced by &dev->stats

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/ipip.h |    6 +++---
 net/ipv4/ip_gre.c  |    5 +++--
 net/ipv4/ipip.c    |    5 +++--
 net/ipv6/sit.c     |    5 +++--
 4 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/include/net/ipip.h b/include/net/ipip.h
index 5d3036f..8f990bb 100644
--- a/include/net/ipip.h
+++ b/include/net/ipip.h
@@ -50,9 +50,9 @@ struct ip_tunnel_prl_entry
 	ip_select_ident(iph, &rt->u.dst, NULL);				\
 									\
 	err = ip_local_out(skb);					\
-	if (net_xmit_eval(err) == 0) {					\
-		stats->tx_bytes += pkt_len;				\
-		stats->tx_packets++;					\
+	if (likely(net_xmit_eval(err) == 0)) {				\
+		txq->tx_bytes += pkt_len;				\
+		txq->tx_packets++;					\
 	} else {							\
 		stats->tx_errors++;					\
 		stats->tx_aborted_errors++;				\
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index d9645c9..45b8785 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -665,7 +665,8 @@ drop_nolock:
 static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
-	struct net_device_stats *stats = &tunnel->dev->stats;
+	struct net_device_stats *stats = &dev->stats;
+	struct netdev_queue *txq = netdev_get_tx_queue(dev, 0);
 	struct iphdr  *old_iph = ip_hdr(skb);
 	struct iphdr  *tiph;
 	u8     tos;
@@ -818,7 +819,7 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 		struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom);
 		if (!new_skb) {
 			ip_rt_put(rt);
-			stats->tx_dropped++;
+			txq->tx_dropped++;
 			dev_kfree_skb(skb);
 			tunnel->recursion--;
 			return NETDEV_TX_OK;
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 62548cb..40b3730 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -390,7 +390,8 @@ static int ipip_rcv(struct sk_buff *skb)
 static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
-	struct net_device_stats *stats = &tunnel->dev->stats;
+	struct net_device_stats *stats = &dev->stats;
+	struct netdev_queue *txq = netdev_get_tx_queue(dev, 0);
 	struct iphdr  *tiph = &tunnel->parms.iph;
 	u8     tos = tunnel->parms.iph.tos;
 	__be16 df = tiph->frag_off;
@@ -483,7 +484,7 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 		struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom);
 		if (!new_skb) {
 			ip_rt_put(rt);
-			stats->tx_dropped++;
+			txq->tx_dropped++;
 			dev_kfree_skb(skb);
 			tunnel->recursion--;
 			return NETDEV_TX_OK;
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0ae4f64..a4aeb2b 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -613,7 +613,8 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 				     struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
-	struct net_device_stats *stats = &tunnel->dev->stats;
+	struct net_device_stats *stats = &dev->stats;
+	struct netdev_queue *txq = netdev_get_tx_queue(dev, 0);
 	struct iphdr  *tiph = &tunnel->parms.iph;
 	struct ipv6hdr *iph6 = ipv6_hdr(skb);
 	u8     tos = tunnel->parms.iph.tos;
@@ -751,7 +752,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 		struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom);
 		if (!new_skb) {
 			ip_rt_put(rt);
-			stats->tx_dropped++;
+			txq->tx_dropped++;
 			dev_kfree_skb(skb);
 			tunnel->recursion--;
 			return NETDEV_TX_OK;

^ permalink raw reply related

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Avi Kivity @ 2009-09-24  7:18 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Ira W. Snyder, Michael S. Tsirkin, netdev, virtualization, kvm,
	linux-kernel, mingo, linux-mm, akpm, hpa, Rusty Russell, s.hetze,
	alacrityvm-devel
In-Reply-To: <4ABA8FDC.5010008@gmail.com>

On 09/24/2009 12:15 AM, Gregory Haskins wrote:
>
>>> There are various aspects about designing high-performance virtual
>>> devices such as providing the shortest paths possible between the
>>> physical resources and the consumers.  Conversely, we also need to
>>> ensure that we meet proper isolation/protection guarantees at the same
>>> time.  What this means is there are various aspects to any
>>> high-performance PV design that require to be placed in-kernel to
>>> maximize the performance yet properly isolate the guest.
>>>
>>> For instance, you are required to have your signal-path (interrupts and
>>> hypercalls), your memory-path (gpa translation), and
>>> addressing/isolation model in-kernel to maximize performance.
>>>
>>>        
>> Exactly.  That's what vhost puts into the kernel and nothing more.
>>      
> Actually, no.  Generally, _KVM_ puts those things into the kernel, and
> vhost consumes them.  Without KVM (or something equivalent), vhost is
> incomplete.  One of my goals with vbus is to generalize the "something
> equivalent" part here.
>    

I don't really see how vhost and vbus are different here.  vhost expects 
signalling to happen through a couple of eventfds and requires someone 
to supply them and implement kernel support (if needed).  vbus requires 
someone to write a connector to provide the signalling implementation.  
Neither will work out-of-the-box when implementing virtio-net over 
falling dominos, for example.

>>> Vbus accomplishes its in-kernel isolation model by providing a
>>> "container" concept, where objects are placed into this container by
>>> userspace.  The host kernel enforces isolation/protection by using a
>>> namespace to identify objects that is only relevant within a specific
>>> container's context (namely, a "u32 dev-id").  The guest addresses the
>>> objects by its dev-id, and the kernel ensures that the guest can't
>>> access objects outside of its dev-id namespace.
>>>
>>>        
>> vhost manages to accomplish this without any kernel support.
>>      
> No, vhost manages to accomplish this because of KVMs kernel support
> (ioeventfd, etc).   Without a KVM-like in-kernel support, vhost is a
> merely a kind of "tuntap"-like clone signalled by eventfds.
>    

Without a vbus-connector-falling-dominos, vbus-venet can't do anything 
either.  Both vhost and vbus need an interface, vhost's is just narrower 
since it doesn't do configuration or enumeration.

> This goes directly to my rebuttal of your claim that vbus places too
> much in the kernel.  I state that, one way or the other, address decode
> and isolation _must_ be in the kernel for performance.  Vbus does this
> with a devid/container scheme.  vhost+virtio-pci+kvm does it with
> pci+pio+ioeventfd.
>    

vbus doesn't do kvm guest address decoding for the fast path.  It's 
still done by ioeventfd.

>>   The guest
>> simply has not access to any vhost resources other than the guest->host
>> doorbell, which is handed to the guest outside vhost (so it's somebody
>> else's problem, in userspace).
>>      
> You mean _controlled_ by userspace, right?  Obviously, the other side of
> the kernel still needs to be programmed (ioeventfd, etc).  Otherwise,
> vhost would be pointless: e.g. just use vanilla tuntap if you don't need
> fast in-kernel decoding.
>    

Yes (though for something like level-triggered interrupts we're probably 
keeping it in userspace, enjoying the benefits of vhost data path while 
paying more for signalling).

>>> All that is required is a way to transport a message with a "devid"
>>> attribute as an address (such as DEVCALL(devid)) and the framework
>>> provides the rest of the decode+execute function.
>>>
>>>        
>> vhost avoids that.
>>      
> No, it doesn't avoid it.  It just doesn't specify how its done, and
> relies on something else to do it on its behalf.
>    

That someone else can be in userspace, apart from the actual fast path.

> Conversely, vbus specifies how its done, but not how to transport the
> verb "across the wire".  That is the role of the vbus-connector abstraction.
>    

So again, vbus does everything in the kernel (since it's so easy and 
cheap) but expects a vbus-connector.  vhost does configuration in 
userspace (since it's so clunky and fragile) but expects a couple of 
eventfds.

>>> Contrast this to vhost+virtio-pci (called simply "vhost" from here).
>>>
>>>        
>> It's the wrong name.  vhost implements only the data path.
>>      
> Understood, but vhost+virtio-pci is what I am contrasting, and I use
> "vhost" for short from that point on because I am too lazy to type the
> whole name over and over ;)
>    

If you #define A A+B+C don't expect intelligent conversation afterwards.

>>> It is not immune to requiring in-kernel addressing support either, but
>>> rather it just does it differently (and its not as you might expect via
>>> qemu).
>>>
>>> Vhost relies on QEMU to render PCI objects to the guest, which the guest
>>> assigns resources (such as BARs, interrupts, etc).
>>>        
>> vhost does not rely on qemu.  It relies on its user to handle
>> configuration.  In one important case it's qemu+pci.  It could just as
>> well be the lguest launcher.
>>      
> I meant vhost=vhost+virtio-pci here.  Sorry for the confusion.
>
> The point I am making specifically is that vhost in general relies on
> other in-kernel components to function.  I.e. It cannot function without
> having something like the PCI model to build an IO namespace.  That
> namespace (in this case, pio addresses+data tuples) are used for the
> in-kernel addressing function under KVM + virtio-pci.
>
> The case of the lguest launcher is a good one to highlight.  Yes, you
> can presumably also use lguest with vhost, if the requisite facilities
> are exposed to lguest-bus, and some eventfd based thing like ioeventfd
> is written for the host (if it doesnt exist already).
>
> And when the next virt design "foo" comes out, it can make a "foo-bus"
> model, and implement foo-eventfd on the backend, etc, etc.
>    

It's exactly the same with vbus needing additional connectors for 
additional transports.

> Ira can make ira-bus, and ira-eventfd, etc, etc.
>
> Each iteration will invariably introduce duplicated parts of the stack.
>    

Invariably?  Use libraries (virtio-shmem.ko, libvhost.so).


>> For the N+1th time, no.  vhost is perfectly usable without pci.  Can we
>> stop raising and debunking this point?
>>      
> Again, I understand vhost is decoupled from PCI, and I don't mean to
> imply anything different.  I use PCI as an example here because a) its
> the only working example of vhost today (to my knowledge), and b) you
> have stated in the past that PCI is the only "right" way here, to
> paraphrase.  Perhaps you no longer feel that way, so I apologize if you
> feel you already recanted your position on PCI and I missed it.
>    

For kvm/x86 pci definitely remains king.  I was talking about the two 
lguest users and Ira.

> I digress.  My point here isn't PCI.  The point here is the missing
> component for when PCI is not present.  The component that is partially
> satisfied by vbus's devid addressing scheme.  If you are going to use
> vhost, and you don't have PCI, you've gotta build something to replace it.
>    

Yes, that's why people have keyboards.  They'll write that glue code if 
they need it.  If it turns out to be a hit an people start having virtio 
transport module writing parties, they'll figure out a way to share code.

>>> All you really need is a simple decode+execute mechanism, and a way to
>>> program it from userspace control.  vbus tries to do just that:
>>> commoditize it so all you need is the transport of the control messages
>>> (like DEVCALL()), but the decode+execute itself is reuseable, even
>>> across various environments (like KVM or Iras rig).
>>>
>>>        
>> If you think it should be "commodotized", write libvhostconfig.so.
>>      
> I know you are probably being facetious here, but what do you propose
> for the parts that must be in-kernel?
>    

On the guest side, virtio-shmem.ko can unify the ring access.  It 
probably makes sense even today.  On the host side I eventfd is the 
kernel interface and libvhostconfig.so can provide the configuration 
when an existing ABI is not imposed.

>>> And your argument, I believe, is that vbus allows both to be implemented
>>> in the kernel (though to reiterate, its optional) and is therefore a bad
>>> design, so lets discuss that.
>>>
>>> I believe the assertion is that things like config-space are best left
>>> to userspace, and we should only relegate fast-path duties to the
>>> kernel.  The problem is that, in my experience, a good deal of
>>> config-space actually influences the fast-path and thus needs to
>>> interact with the fast-path mechanism eventually anyway.
>>> Whats left
>>> over that doesn't fall into this category may cheaply ride on existing
>>> plumbing, so its not like we created something new or unnatural just to
>>> support this subclass of config-space.
>>>
>>>        
>> Flexibility is reduced, because changing code in the kernel is more
>> expensive than in userspace, and kernel/user interfaces aren't typically
>> as wide as pure userspace interfaces.  Security is reduced, since a bug
>> in the kernel affects the host, while a bug in userspace affects just on
>> guest.
>>      
> For a mac-address attribute?  Thats all we are really talking about
> here.  These points you raise, while true of any kernel code I suppose,
> are a bit of a stretch in this context.
>    

Look at the virtio-net feature negotiation.  There's a lot more there 
than the MAC address, and it's going to grow.

>> Example: feature negotiation.  If it happens in userspace, it's easy to
>> limit what features we expose to the guest.
>>      
> Its not any harder in the kernel.  I do this today.
>
> And when you are done negotiating said features, you will generally have
> to turn around and program the feature into the backend anyway (e.g.
> ioctl() to vhost module).  Now you have to maintain some knowledge of
> that particular feature and how to program it in two places.
>    

No, you can leave it enabled unconditionally in vhost (the guest won't 
use what it doesn't know about).

> Conversely, I am eliminating the (unnecessary) middleman by letting the
> feature negotiating take place directly between the two entities that
> will consume it.
>    

The middleman is necessary, if you want to support live migration, or to 
restrict a guest to a subset of your features.

>>   If it happens in the
>> kernel, we need to add an interface to let the kernel know which
>> features it should expose to the guest.
>>      
> You need this already either way for both models anyway.  As an added
> bonus, vbus has generalized that interface using sysfs attributes, so
> all models are handled in a similar and community accepted way.
>    

vhost doesn't need it since userspace takes care of it.

>>   We also need to add an
>> interface to let userspace know which features were negotiated, if we
>> want to implement live migration.  Something fairly trivial bloats rapidly.
>>      
> Can you elaborate on the requirements for live-migration?  Wouldnt an
> opaque save/restore model work here? (e.g. why does userspace need to be
> able to interpret the in-kernel state, just pass it along as a blob to
> the new instance).
>    

A blob would work, if you commit to forward and backward compatibility 
in the kernel side (i.e. an older kernel must be able to accept a blob 
from a newer one).  I don't like blobs though, they tie you to the 
implemenetation.

>> As you can see above, userspace needs to be involved in this, and the
>> number of interfaces required is smaller if it's in userspace:
>>      
> Actually, no.  My experience has been the opposite.  Anytime I sat down
> and tried to satisfy your request to move things to the userspace,
> things got ugly and duplicative really quick.  I suspect part of the
> reason you may think its easier because you already have part of
> virtio-net in userspace and its surrounding support, but that is not the
> case moving forward for new device types.
>    

I can't comment on your experience, but we'll definitely build on 
existing code for new device types.

>> you only
>> need to know which features the kernel supports (they can be enabled
>> unconditionally, just not exposed).
>>
>> Further, some devices are perfectly happy to be implemented in
>> userspace, so we need userspace configuration support anyway.  Why
>> reimplement it in the kernel?
>>      
> Thats fine.  vbus is targetted for high-performance IO.  So if you have
> a robust userspace (like KVM+QEMU) and low-performance constraints (say,
> for a console or something), put it in userspace and vbus is not
> involved.  I don't care.
>    

So now the hypothetical non-pci hypervisor needs to support two busses.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [BUG] af_unix race in close?
From: Stephen Hemminger @ 2009-09-24  5:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jike Song
In-Reply-To: <4ABAF727.8060905@gmail.com>

On Thu, 24 Sep 2009 06:35:51 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> inted (2.6.31-0.125.4.2.rc5.git2.fc12.i686 #1) AOA110
> > EIP: 0060:[] EFLAGS: 00010202 CPU: 0
> > EIP is at unix_write_space+0x45/0x87
> > EAX: 6b6b6b6b EBX: ec988780 ECX: 00000000 EDX: 6b6b6b8f
> > ESI: ec988950 EDI: ffffff20 EBP: ec941e28 ESP: ec941e1c
> >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > Process metacity (pid: 6809, ti=ec940000 task=e63095c0 task.ti=ec940000)
> > Stack:
> >  37dc7803 ec988780 000000e1 ec941e40 c0772142 37dc7803 dcc1c900 dcc1c900  
> > <0> c07f6a02 ec941e50 c0775766 37dc7803 dcc1c900 ec941e60 c07754ae 37dc7803
> > <0> dcc1c900 ec941e78 c07755db 37dc7803 ec98b0c0 dcc1c900 00000000 ec941ea0  
> > Call Trace:
> >  [] ? sock_wfree+0x44/0x68
> >  [] ? unix_release_sock+0x182/0x1e0
> >  [] ? skb_release_head_state+0x6c/0xcb
> >  [] ? __kfree_skb+0x20/0x94
> >  [] ? kfree_skb+0x68/0x7f
> >  [] ? unix_release_sock+0x182/0x1e0
> >  [] ? unix_release+0x2f/0x42
> >  [] ? sock_release+0x29/0x7f
> >  [] ? sock_close+0x30/0x45
> >  [] ? __fput+0x101/0x1a2

Good thanks. It should probably go up to stable as well.

^ permalink raw reply

* Re: [BUG] af_unix race in close?
From: Eric Dumazet @ 2009-09-24  4:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev, Jike Song
In-Reply-To: <20090923165421.60e0d49c@s6510>

Stephen Hemminger a écrit :
> This oops seems to show lots of times:
> http://www.kerneloops.org/guilty.php?guilty=unix_write_space&version=2.6.31-release&start=2064384&end=2097151&class=oops
> Looks like race in unix domain socket close with data outstanding.
> 
> BUG: unable to handle kernel paging request at 6b6b6b8f
> IP: [] unix_write_space+0x45/0x87
> *pde = 00000000 
> Oops: 0000 [#1] SMP 
> last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT1/charge_full
> Modules linked in: ext2 fuse nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath uinput uvcvideo videodev v4l1_compat arc4 snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support ecb serio_raw i2c_i801 snd_hda_intel joydev snd_hda_codec snd_hwdep snd_pcm snd_timer ath5k r8169 snd mac80211 mii soundcore ath snd_page_alloc jmb38x_ms cfg80211 memstick rfkill wmi squashfs vfat fat mmc_block i915 sdhci_pci ata_generic pata_acpi sdhci mmc_core drm i2c_algo_bit i2c_core usb_storage video output [last unloaded: microcode]
> 
> Pid: 6809, comm: metacity Not tainted (2.6.31-0.125.4.2.rc5.git2.fc12.i686 #1) AOA110
> EIP: 0060:[] EFLAGS: 00010202 CPU: 0
> EIP is at unix_write_space+0x45/0x87
> EAX: 6b6b6b6b EBX: ec988780 ECX: 00000000 EDX: 6b6b6b8f
> ESI: ec988950 EDI: ffffff20 EBP: ec941e28 ESP: ec941e1c
>  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Process metacity (pid: 6809, ti=ec940000 task=e63095c0 task.ti=ec940000)
> Stack:
>  37dc7803 ec988780 000000e1 ec941e40 c0772142 37dc7803 dcc1c900 dcc1c900
> <0> c07f6a02 ec941e50 c0775766 37dc7803 dcc1c900 ec941e60 c07754ae 37dc7803
> <0> dcc1c900 ec941e78 c07755db 37dc7803 ec98b0c0 dcc1c900 00000000 ec941ea0
> Call Trace:
>  [] ? sock_wfree+0x44/0x68
>  [] ? unix_release_sock+0x182/0x1e0
>  [] ? skb_release_head_state+0x6c/0xcb
>  [] ? __kfree_skb+0x20/0x94
>  [] ? kfree_skb+0x68/0x7f
>  [] ? unix_release_sock+0x182/0x1e0
>  [] ? unix_release+0x2f/0x42
>  [] ? sock_release+0x29/0x7f
>  [] ? sock_close+0x30/0x45
>  [] ? __fput+0x101/0x1a2
>  [] ? fput+0x27/0x3a
>  [] ? filp_close+0x64/0x7f
>  [] ? put_files_struct+0x68/0xbd
>  [] ? exit_files+0x43/0x59
>  [] ? do_exit+0x1d6/0x648
>  [] ? audit_syscall_entry+0x134/0x167
>  [] ? do_group_exit+0x72/0x99
>  [] ? sys_exit_group+0x27/0x3c
>  [] ? syscall_call+0x7/0xb
> Code: 00 89 45 f4 31 c0 89 f0 e8 9a 76 02 00 8b 83 dc 00 00 00 c1 e0 02 3b 83 e4 00 00 00 7f 32 8b 83 a4 00 00 00 85 c0 74 17 8d 50 24 <39> 50 24 74 0f b9 01 00 00 00 ba 01 00 00 00 e8 bb cf c3 ff b9 
> EIP: [] unix_write_space+0x45/0x87 SS:ESP 0068:ec941e1c
> CR2: 000000006b6b6b8f
> ---[ end trace 4a36bd1eb2fc9896 ]---
> 

Hello Stephen

I already took a look at the problem, and I re-sent possible fix for this yesterday

http://patchwork.ozlabs.org/patch/34162/

First reporter I am aware of was Jike Song

Thanks

^ permalink raw reply

* Re: [PATCH 6/8] [RFC] CAIF Protocol Stack
From: Randy Dunlap @ 2009-09-24  4:00 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, Kim.xx.Lilliestierna
In-Reply-To: <1253727091-10383-1-git-send-email-sjur.brandeland@stericsson.com>

On Wed, 23 Sep 2009 19:31:31 +0200 sjur.brandeland@stericsson.com wrote:

> From: Kim Lilliestierna <Kim.xx.Lilliestierna@ericsson.com>
> 
> Signed-off-by: sjur.brandeland@stericsson.com
> 
> ---
>  drivers/net/caif/Kconfig      |   64 +++
>  drivers/net/caif/Makefile     |   29 ++
>  drivers/net/caif/chnl_tty.c   |  220 +++++++++++
>  drivers/net/caif/phyif_loop.c |  309 +++++++++++++++
>  drivers/net/caif/phyif_ser.c  |  189 +++++++++
>  drivers/net/caif/phyif_shm.c  |  870 +++++++++++++++++++++++++++++++++++++++++
>  drivers/net/caif/shm.h        |   95 +++++
>  drivers/net/caif/shm_cfgifc.c |   60 +++
>  drivers/net/caif/shm_mbxifc.c |   98 +++++
>  drivers/net/caif/shm_smbx.c   |   81 ++++
>  10 files changed, 2015 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/net/caif/Kconfig
>  create mode 100644 drivers/net/caif/Makefile
>  create mode 100644 drivers/net/caif/chnl_tty.c
>  create mode 100644 drivers/net/caif/phyif_loop.c
>  create mode 100644 drivers/net/caif/phyif_ser.c
>  create mode 100644 drivers/net/caif/phyif_shm.c
>  create mode 100644 drivers/net/caif/shm.h
>  create mode 100644 drivers/net/caif/shm_cfgifc.c
>  create mode 100644 drivers/net/caif/shm_mbxifc.c
>  create mode 100644 drivers/net/caif/shm_smbx.c
> 
> diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig
> new file mode 100644
> index 0000000..3cbe302
> --- /dev/null
> +++ b/drivers/net/caif/Kconfig
> @@ -0,0 +1,64 @@
> +#
> +# CAIF net configurations
> +#
> +
> +if CAIF
> +
> +# Include physical drivers
> +# should be broken out into its own config file
> +# source "drivers/net/caif/Kconfig"
> +
> +# Some options here should be mande platform dependent

                                 made

> +
> +comment "CAIF physical drivers"
> +
> +config CAIF_TTY
> +	tristate "CAIF TTY transport driver "

	no trailing space, please (before the final ")

> +	default CAIF
> +	---help---
> +	The CAIF TTY transport driver
> +	If sou say yes here you will also need to build a users space utility to set the line disicpline on the the

	If you ......                                     userspace [or user space]

	drop final (duplicate) "the" above.

> +	tty, see Documentation/net/caif/examples/linedsc
> +
> +config CAIF_SHM
> +	tristate "CAIF shared memory transport driver"
> +	default n
> +	---help---
> +	The caif low level driver for the shared memory driver

	End sentences with a period ('.').

> +	Be aware that if you enable this you need to also enable a low level shared memory driver

	End above with period.  Begin below with "The".

> +        the default is to include the loopback test driver.
> +
> +config CAIF_LOOPBACK
> +	tristate "CAIF loopback driver test driver"
> +	default CAIF
> +	---help---
> +	Loopback test driver
> +
> +if CAIF_SHM
> +
> +comment "CAIF shared memory low level physical drivers"
> +
> +config CAIF_SHM_LOOPBACK
> +	tristate "Caif shared memory loopback driver"
> +	default CAIF_USE_SHM
> +	---help---
> +	loop back driver that emulates a real shared memory transport
> +	mainly used for debugging.
> +
> +config CAIF_MBXIF
> +	tristate "Caif shared mailbox interface"
> +	default CAIF_SHM
> +	---help---
> +	Generic shared mailbox interface
> +
> +config CAIF_SMBX
> +	tristate "Use Simluated Mail box"
> +	default CAIF_MBXIF
> +	---help---
> +	Answer y if you whant to use a simulated mail box interface for caif shared memory transport

	                want
	End above sentence with a period.

> +	Mainly used for debugging and as example driver
> +	This can also be built as a module

Ditto.

> +
> +endif #CAIF_USE_SHM
> +
> +endif # CAIF



---
~Randy

^ permalink raw reply

* Re: pull request: wireless-next-2.6 2009-09-23
From: John W. Linville @ 2009-09-24  3:50 UTC (permalink / raw)
  To: David Miller
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20090923.162449.141768862.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

On Wed, Sep 23, 2009 at 04:24:49PM -0700, David Miller wrote:

> Also, please put the branch name, even if it's simply 'master'
> in your GIT pull urls you give me.

Noted...thanks!

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org			might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: ixgbe patch to provide NIC's tx/rx counters via ethtool
From: Ben Greear @ 2009-09-24  2:08 UTC (permalink / raw)
  To: Rick Jones; +Cc: NetDev
In-Reply-To: <4ABAB727.2020507@hp.com>

Rick Jones wrote:
> Ben Greear wrote:
>> When LRO is enabled, the received packet and byte counters represent the
>> LRO'd packets, not the packets/bytes on the wire. 
>
> When LRO is enabled, are all the bytes on the wire actually 
> transferred into the host?
No...the ethernet, IP and TCP headers and such are not, for packets that 
are combined into a single
large SKB.

That is why the driver counts them wrong.  The bytes are off by a few 
percentage points, but the
packet count is off by an order of magnitude.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply

* Re: [PATCH 2/5] Implement loss counting on TFRC-SP receiver
From: Ivo Calado @ 2009-09-24  1:43 UTC (permalink / raw)
  To: Gerrit Renker; +Cc: dccp, netdev
In-Reply-To: <425e6efa0909231501p499059a4y3530d1a9f55b5a2a@mail.gmail.com>

On Sat, Sep 19, 2009 at 9:11 AM,  <gerrit@erg.abdn.ac.uk> wrote:
>>>                s64 len = dccp_delta_seqno(cur->li_seqno,
>>> cong_evt_seqno);
>>>                if ((len <= 0) ||
>>>                    (!tfrc_lh_closed_check(cur,
>>> cong_evt->tfrchrx_ccval))) {
>>> +                       cur->li_losses += rh->num_losses;
>>>                        return false;
>>>                }
>>> This has a multiplicative effect, since rh->num_losses is added to
> cur->li_losses
>>> each time the condition is evaluated. E.g. if 3 times in a row
> reordered
>>> (earlier)
>>> sequence numbers arrive, or if the CCvals do not differ (high-speed
> networks),
>>> we end up with 3 * rh->num_losses, which can't be correct.
>>
>>
>> The following code would be correct then?
>>
>>               if ((len <= 0) ||
>>                   (!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval)))
> {
>> +                       cur->li_losses += rh->num_losses;
>> +                       rh->num_losses  = 0;
>>                       return false;
>> With this change I suppose the could be fixed. With that, the
>> rh->num_losses couldn't added twice. Am I correct?
>>
>>
> The function tfrc_lh_interval_add() is called when
>  * __two_after_loss() returns true (a new loss is detected) or
>  * a data packet is ECN-CE marked.
>
> I am still not sure about the 'len <= 0' case; this would be true
> if an ECN-marked packet arrives whose sequence number is 'before'
> the start of the current loss interval, or if a loss is detected
> which is older than the start of the current loss interval.
>
> The other case (tfrc_lh_closed_check) returns 1 if the current loss
> interval is 'closed' according to RFC 4342, 10.2.
>
> Intuitively, in the first case it refers to the preceding loss
> interval (i.e. not cur->...), in the second case it seems correct.
>
> Doing the first case is complicated due to going back in history.
> The simplest solution I can think of at the moment is to ignore
> the exception-case of reordered packets and do something like
>
>    if (len <= 0) {
>       /* FIXME: this belongs into the previous loss interval */
> tfrc_pr_debug("Warning: ignoring loss due to reordering");
>       return false;
>    }
>    if (!tfrc_lh_closed_check(...)) {
>        // your code from above
>    }

Okay, i'll add your sugestion. But i don't know how this would be fixed at all.

>
> However, there is a much deeper underlying question: currently the
> implementation is not really what the specification says; if we
> wanted to abide by the letter of the law, we would have to implement
> the Loss Intervals Option first, and then sort out such details as
> above. Discussion continues further below.
>

Yes, the loss intervals options is mandatory. I'll discuss this too below.

>>> --- dccp_tree_work4.orig/net/dccp/ccids/lib/packet_history_sp.c +++
> dccp_tree_work4/net/dccp/ccids/lib/packet_history_sp.c
>>> @@ -244,6 +244,7 @@
>>>                h->loss_count = 3;
>>>                tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 3),
>                                               skb, n3);
>>> +               h->num_losses = dccp_loss_count(s2, s3, n3);
>>>                return 1;
>>>        }
>>> This only measures the gap between s2 and s3, but the "hole" starts at s0,
>>> so it would need to be dccp_loss_count(s0, s3, n3). Algorithm is
> documented at
>>> http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
>>> ccid3/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
>>
>> <snip>
>>
>>>  }
>>> Here it is also between s0 and s2, not between s0 and s3. It is case
> VI(c.3).
>>> However, the above still is a crude approximation, since it only
> measures between
>>> the last sequence number received before the loss and the third
> sequence
>>> number
>>> after the loss. It would be better to either
>>>  * use the first sequence number after the loss (this can be s1, s2, or
> s3) or
>>>  * check if there are more holes between the first/second and the
> second/third
>>>   sequence numbers after the loss.
>>> The second option would be the correct one, it should also take the NDP
> counts
>>> of each gap into account. And already we have a fairly complicated
> algorithm.
>>
>>
>>
>> I'll study loss_detection_algorithm_notes.txt and correct the code. But
> I have one question, that i don't know if is already answered by the
> documentation:
>> Further holes, between the the first and third packet received after the
> hole are accounted only in future calls to the function, right? Because
> the receiver needs to receive more packets to confirm loss, right?
>> So, it's really necessary to look for other holes after the loss? Will
> not this other holes be identified as losses in future?
> I stand corrected, you are right: only the hole between
>  * the highest sequence number before the loss (S0) and
>  * the first sequence number after the loss
>   (S1 or S3 depending on reordering)
> are relevant.

Thanks, already corrected in next patch.

>
> Continuing the point from above, I would like to ask which way you would
> like to go with your implementation:
>  (a) receiver computes the Loss Event Rate, sender just uses this value
>  (b) receiver only gathers the data (loss intervals, lost packets),
>     sender does all the verification, book-keeping, and computation.
>
> From reading your patches, I think it is going in the direction of (a).
> But if this is the case, we don't need the Dropped Packets Option from
> RFC 5622, 8.7. By definition it only makes sense if Loss Intervals
> Options are also present.
>
> So it is necessary to decide whether to go the full way, which means
>  * support Loss Intervals and Dropped Packets alike
>  * modify TFRC library (it will be a redesign)
>  * modify receiver code
>  * modify sender code,
> or to use the present approach where
>  * the receiver computes the Loss Rate and
>  * a Mandatory Send Loss Event Rate feature is present during feature
>   negotiation, to avoid problems with incompatible senders
>   (there is a comment explaining this, in net/dccp/feat.c).
>
> Thoughts?
>

Initially I conceived that the receiver would compute the loss event
rate and send it, along with loss intervals and dropped packets
option. And this would enable the sender to just use the loss event
rate reported, as is done currently (in current implementation of
TFRC), or compute it itself, using loss intervals option (in case of
CCID3) and dropped packets option too (CCID4's case).
In my code, I compute the loss event rate using the options, then
compare the two values. I don't know if would be better to keep all
the loss event rate calc only at one side, sender or receiver.

I believe that the first way is better (to "support Loss Intervals and
Dropped Packets alike..."), because RFC requires loss intervals option
to be sent. And so, proceed and implement dropped packets option for
TFRC-SP. You are right, this would need a redesign and rewrite of
sender and receiver code.




-- 
Ivo Augusto Andrade Rocha Calado
MSc. Candidate
Embedded Systems and Pervasive Computing Lab - http://embedded.ufcg.edu.br
Systems and Computing Department - http://www.dsc.ufcg.edu.br
Electrical Engineering and Informatics Center - http://www.ceei.ufcg.edu.br
Federal University of Campina Grande - http://www.ufcg.edu.br

PGP: 0x03422935
Quidquid latine dictum sit, altum viditur.

^ permalink raw reply

* Re: r8169, enabling TX checksumming breaks things?
From: David Dillow @ 2009-09-24  1:12 UTC (permalink / raw)
  To: Denys Fedoryschenko; +Cc: romieu, netdev
In-Reply-To: <200909231724.23132.denys@visp.net.lb>

On Wed, 2009-09-23 at 17:24 +0300, Denys Fedoryschenko wrote:
> On Wednesday 23 September 2009 17:02:24 David Dillow wrote:
> > On Wed, 2009-09-23 at 09:15 +0300, Denys Fedoryschenko wrote:
> > > Hi
> > >
> > > Is it expected that:
> > > 1)TX checksumming is off by default
> > > 2)If i try to enable it over ethtool -K eth0 tx on , TCP sessions on
> > > proxy getting stuck, even in tcpdump looks everything fine and packets
> > > reaching destination, i don't understand what is a reason of failure.
> > > Maybe if this feature supposed to not work - user must not be able just
> > > to turn it on?
> >
> > It is broken for large swaths of the hardware -- I have patches that got
> > it and TSO working on my hardware, and they provide a framework to see
> > about getting it working on yours.
> >
> > Basically, the fields are in different places depending on the chip
> > revision. I'll try to dig those out tonight and send them along so we
> > can experiment.
> Thanks, i have 8 hosts (4 hosts with RTL8168b/8111b. and 4 with 
> RTL8168d/8111d) to test. Ready for patches to test them :-)

This is the patch I have against HEAD, which Works For Me (TM). If I
recall correctly the list in rtl8169_set_tso_csum() is from combing the
vendor drivers, but I'm not 100% on that. Too much time has passed.

I added the MAC version to the driver banner, which should make it
easier to figure out which MAC you have, and to change that function if
it does not work for you out of the box. It seems that some chips don't
work with this at all, based on testing by Michael Riepe (XID 3c4000c0).

As should be obvious, this isn't ready to go anywhere near upstream yet
I think.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 50c6a3c..c58062e 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -28,7 +28,7 @@
 #include <asm/io.h>
 #include <asm/irq.h>
 
-#define RTL8169_VERSION "2.3LK-NAPI"
+#define RTL8169_VERSION "2.3LK-NAPI-TSO"
 #define MODULENAME "r8169"
 #define PFX MODULENAME ": "
 
@@ -384,13 +384,19 @@ enum desc_status_bit {
 	FirstFrag	= (1 << 29), /* First segment of a packet */
 	LastFrag	= (1 << 28), /* Final segment of a packet */
 
-	/* Tx private */
+	/* Tx private -- TxDesc->opts1 */
 	LargeSend	= (1 << 27), /* TCP Large Send Offload (TSO) */
 	MSSShift	= 16,        /* MSS value position */
 	MSSMask		= 0xfff,     /* MSS value + LargeSend bit: 12 bits */
 	IPCS		= (1 << 18), /* Calculate IP checksum */
 	UDPCS		= (1 << 17), /* Calculate UDP/IP checksum */
 	TCPCS		= (1 << 16), /* Calculate TCP/IP checksum */
+
+	/* Tx private -- TxDesc->opts2 */
+	UDPCS_2		= (1 << 31), /* Calculate UDP/IP checksum */
+	TCPCS_2		= (1 << 30), /* Calculate TCP/IP checksum */
+	IPCS_2		= (1 << 29), /* Calculate IP checksum */
+	MSSShift_2	= 18,        /* MSS value position */
 	TxVlanTag	= (1 << 17), /* Add VLAN tag */
 
 	/* Rx private */
@@ -2310,11 +2316,12 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (netif_msg_probe(tp)) {
 		u32 xid = RTL_R32(TxConfig) & 0x9cf0f8ff;
 
-		printk(KERN_INFO "%s: %s at 0x%lx, "
+		printk(KERN_INFO "%s: %s (%d) at 0x%lx, "
 		       "%2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x, "
 		       "XID %08x IRQ %d\n",
 		       dev->name,
 		       rtl_chip_info[tp->chipset].name,
+		       rtl_chip_info[tp->chipset].mac_version,
 		       dev->base_addr,
 		       dev->dev_addr[0], dev->dev_addr[1],
 		       dev->dev_addr[2], dev->dev_addr[3],
@@ -3299,7 +3306,7 @@ static void rtl8169_tx_timeout(struct net_device *dev)
 }
 
 static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb,
-			      u32 opts1)
+			      u32 opts1, u32 opts2)
 {
 	struct skb_shared_info *info = skb_shinfo(skb);
 	unsigned int cur_frag, entry;
@@ -3323,6 +3330,7 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb,
 		status = opts1 | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
 
 		txd->opts1 = cpu_to_le32(status);
+		txd->opts2 = cpu_to_le32(opts2);
 		txd->addr = cpu_to_le64(mapping);
 
 		tp->tx_skb[entry].len = len;
@@ -3336,24 +3344,64 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb,
 	return cur_frag;
 }
 
-static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev)
+static void rtl8169_set_tso_csum(struct sk_buff *skb, struct net_device *dev,
+				u32 *opts1, u32 *opts2)
 {
-	if (dev->features & NETIF_F_TSO) {
-		u32 mss = skb_shinfo(skb)->gso_size;
+	struct rtl8169_private *tp = netdev_priv(dev);
+	const struct iphdr *ip = ip_hdr(skb);
+	u32 mss = 0;
 
-		if (mss)
-			return LargeSend | ((mss & MSSMask) << MSSShift);
-	}
-	if (skb->ip_summed == CHECKSUM_PARTIAL) {
-		const struct iphdr *ip = ip_hdr(skb);
+	if (dev->features & NETIF_F_TSO)
+		mss = skb_shinfo(skb)->gso_size;
 
-		if (ip->protocol == IPPROTO_TCP)
-			return IPCS | TCPCS;
-		else if (ip->protocol == IPPROTO_UDP)
-			return IPCS | UDPCS;
-		WARN_ON(1);	/* we need a WARN() */
+	switch (tp->mac_version) {
+	case RTL_GIGA_MAC_VER_01: case RTL_GIGA_MAC_VER_02:
+	case RTL_GIGA_MAC_VER_03: case RTL_GIGA_MAC_VER_04:
+	case RTL_GIGA_MAC_VER_05: case RTL_GIGA_MAC_VER_06:
+	case RTL_GIGA_MAC_VER_10: case RTL_GIGA_MAC_VER_11:
+	case RTL_GIGA_MAC_VER_12: case RTL_GIGA_MAC_VER_13:
+	case RTL_GIGA_MAC_VER_16: case RTL_GIGA_MAC_VER_17:
+		/* LargeSend implies checksums, as IPCS et al occupy the
+		 * same bits as the MSS.
+		 *
+		 * FIXME: MSS may actually go in opts2 on all MAC versions,
+		 * but I only have the one below so I'll keep this as it was.
+		 */
+		if (mss) {
+			*opts1 |= LargeSend;
+			*opts1 |= (mss & MSSMask) << MSSShift;
+		} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+			if (ip->protocol == IPPROTO_TCP)
+				*opts1 |= IPCS | TCPCS;
+			else if (ip->protocol == IPPROTO_UDP)
+				*opts1 |= IPCS | UDPCS;
+			else
+				WARN_ON_ONCE(1);
+		}
+		break;
+	default:
+		/* LargeSend implies checksums
+		 */
+		if (mss) {
+			/* This used to put (mss & MSSMask) << MSSShift
+			 * in opts1, but that causes my NIC to retransmit
+			 * the last packet over and over and over again...
+			 *
+			 * I have XID 281000c0, so this may be different
+			 * for other chips. Testers welcomed!
+			 */
+			*opts1 |= LargeSend;
+			*opts2 |= (mss & MSSMask) << MSSShift_2;
+		} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+			if (ip->protocol == IPPROTO_TCP)
+				*opts2 |= IPCS_2 | TCPCS_2;
+			else if (ip->protocol == IPPROTO_UDP)
+				*opts2 |= IPCS_2 | UDPCS_2;
+			else
+				WARN_ON_ONCE(1);
+		}
+		break;
 	}
-	return 0;
 }
 
 static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
@@ -3365,7 +3413,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	void __iomem *ioaddr = tp->mmio_addr;
 	dma_addr_t mapping;
 	u32 status, len;
-	u32 opts1;
+	u32 opts1, opts2;
 
 	if (unlikely(TX_BUFFS_AVAIL(tp) < skb_shinfo(skb)->nr_frags)) {
 		if (netif_msg_drv(tp)) {
@@ -3379,9 +3427,11 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	if (unlikely(le32_to_cpu(txd->opts1) & DescOwn))
 		goto err_stop;
 
-	opts1 = DescOwn | rtl8169_tso_csum(skb, dev);
+	opts1 = DescOwn;
+	opts2 = rtl8169_tx_vlan_tag(tp, skb);
+	rtl8169_set_tso_csum(skb, dev, &opts1, &opts2);
 
-	frags = rtl8169_xmit_frags(tp, skb, opts1);
+	frags = rtl8169_xmit_frags(tp, skb, opts1, opts2);
 	if (frags) {
 		len = skb_headlen(skb);
 		opts1 |= FirstFrag;
@@ -3395,7 +3445,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	tp->tx_skb[entry].len = len;
 	txd->addr = cpu_to_le64(mapping);
-	txd->opts2 = cpu_to_le32(rtl8169_tx_vlan_tag(tp, skb));
+	txd->opts2 = cpu_to_le32(opts2);
 
 	wmb();
 



^ permalink raw reply related

* Re: pktgen: tricks
From: Rick Jones @ 2009-09-24  1:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jesper Dangaard Brouer, Robert Olsson, netdev
In-Reply-To: <20090923174141.1d350103@s6510>

Stephen Hemminger wrote:
> On Tue, 22 Sep 2009 22:49:02 -0700
> Stephen Hemminger <shemminger@vyatta.com> wrote:
> 
> 
>>I thought others want to know how to get maximum speed of pktgen.
>>
>>1. Run nothing else (even X11), just a command line
>>2. Make sure ethernet flow control is disabled
>>   ethtool -A eth0 autoneg off rx off tx off
>>3. Make sure clocksource is TSC.  On my old SMP Opteron's
>>   needed to get patch since in 2.6.30 or later, the clock guru's
>>   decided to remove it on all non Intel machines.  Look for patch
>>   than enables "tsc=reliable"
>>4. Compile Ethernet drivers in, the overhead of the indirect
>>   function call required for modules (or cache footprint),
>>   slows things down.
>>5. Increase transmit ring size to 1000
>>   ethtool -G eth0 tx 1000
>>
>>Result: OK: 70408581(c70405979+d2602) nsec, 100000000 (60byte,0frags)
>>  1420281pps 681Mb/sec (681734880bps) errors: 0
> 
> 
> Other kernel config help:
>   - turn off lock dependency checker, kmecheck, page alloc debug
>     basically anything that slows stuff down
>   - turn off content group scheduler

I and thought netperf was getting away from real-world?-)

rick jones

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox