Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] net: phy: smsc: force all capable mode if the phy is started in powerdown mode
From: Philippe Reynes @ 2012-12-01 20:44 UTC (permalink / raw)
  To: netdev, linux-kernel, davem
  Cc: otavio, javier, jkosina, eric.jarrige, julien.boibessot,
	thomas.petazzoni, Philippe Reynes

A SMSC PHY in power down mode can't be used.
If a SMSC PHY is in this mode in the config_init
stage, the mode "all capable" is set. So the PHY
could then be used.

Signed-off-by: Philippe Reynes <tremyfr@yahoo.fr>
---
 drivers/net/phy/smsc.c  |   27 ++++++++++++++++++++++++++-
 include/linux/smscphy.h |    5 +++++
 2 files changed, 31 insertions(+), 1 deletions(-)

diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index 88e3991..63018d0 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -43,7 +43,32 @@ static int smsc_phy_ack_interrupt(struct phy_device *phydev)
 
 static int smsc_phy_config_init(struct phy_device *phydev)
 {
-	int rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
+	/*
+	 * If the SMSC PHY is in power down mode, then set it
+	 * in all capable mode before using it.
+	 */
+	int rc = phy_read(phydev, MII_LAN83C185_SPECIAL_MODES);
+	if (rc < 0)
+		return rc;
+
+	if ((rc & MII_LAN83C185_MODE_MASK) == MII_LAN83C185_MODE_POWERDOWN) {
+		int timeout = 50000;
+
+		/* set "all capable" mode and reset the phy */
+		rc |= MII_LAN83C185_MODE_ALL;
+		phy_write(phydev, MII_LAN83C185_SPECIAL_MODES, rc);
+		phy_write(phydev, MII_BMCR, BMCR_RESET);
+
+		/* wait end of reset (max 500 ms) */
+		do {
+			udelay(10);
+			if (timeout-- == 0)
+				return -1;
+			rc = phy_read(phydev, MII_BMCR);
+		} while (rc & BMCR_RESET);
+	}
+
+	rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
 	if (rc < 0)
 		return rc;
 
diff --git a/include/linux/smscphy.h b/include/linux/smscphy.h
index ce718cb..f4bf16e 100644
--- a/include/linux/smscphy.h
+++ b/include/linux/smscphy.h
@@ -4,6 +4,7 @@
 #define MII_LAN83C185_ISF 29 /* Interrupt Source Flags */
 #define MII_LAN83C185_IM  30 /* Interrupt Mask */
 #define MII_LAN83C185_CTRL_STATUS 17 /* Mode/Status Register */
+#define MII_LAN83C185_SPECIAL_MODES 18 /* Special Modes Register */
 
 #define MII_LAN83C185_ISF_INT1 (1<<1) /* Auto-Negotiation Page Received */
 #define MII_LAN83C185_ISF_INT2 (1<<2) /* Parallel Detection Fault */
@@ -22,4 +23,8 @@
 #define MII_LAN83C185_EDPWRDOWN (1 << 13) /* EDPWRDOWN */
 #define MII_LAN83C185_ENERGYON  (1 << 1)  /* ENERGYON */
 
+#define MII_LAN83C185_MODE_MASK      0xE0
+#define MII_LAN83C185_MODE_POWERDOWN 0xC0 /* Power Down mode */
+#define MII_LAN83C185_MODE_ALL       0xE0 /* All capable mode */
+
 #endif /* __LINUX_SMSCPHY_H__ */
-- 
1.7.4.4

^ permalink raw reply related

* Re: [PATCH] irda: ep7211-sir: Convert to platform_diver
From: Arnd Bergmann @ 2012-12-01 20:50 UTC (permalink / raw)
  To: Alexander Shiyan; +Cc: netdev, Samuel Ortiz
In-Reply-To: <1354360046-4818-1-git-send-email-shc_work@mail.ru>

On Saturday 01 December 2012, Alexander Shiyan wrote:
> This patch converts ep7211-sir driver to platform_driver.
> Since driver can be used not only for EP7211 CPU, function names
> was be renamed to generic clps711x...
> 
> Signed-off-by: Alexander Shiyan <shc_work@mail.ru>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply

* Re: GRO + splice panics in 3.7.0-rc5
From: Willy Tarreau @ 2012-12-01 20:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <20121201194304.GI25450@1wt.eu>

[-- Attachment #1: Type: text/plain, Size: 2878 bytes --]

OK now I have a simple and reliable reproducer for the bug. It
just creates 2 TCP sockets connected to each other and splices
from one to the other. It looks 100% reliable, I'm attaching it
hoping it can help, it's much easier and more reliable than
starting haproxy with clients and servers!

I'm noting that it only crashes when I send more than one page
of data. 4096 bytes and below are OK, 4097 and above do crash.

I have added a few printk() and it dies in get_page() called from
do_tcp_sendpages() :

  888                  i = skb_shinfo(skb)->nr_frags;
  889  printk(KERN_DEBUG "%d: page=%p i=%d skb=%p offset=%d\n", __LINE__, page, i, skb, offset);
  890                  can_coalesce = skb_can_coalesce(skb, i, page, offset);
  891  printk(KERN_DEBUG "%d: can_coalesce=%d\n", __LINE__, can_coalesce);
  892                  if (!can_coalesce && i >= MAX_SKB_FRAGS) {
  893  printk(KERN_DEBUG "%d\n", __LINE__);
  894                          tcp_mark_push(tp, skb);
  895                          goto new_segment;
  896                  }
  897                  if (!sk_wmem_schedule(sk, copy)) {
  898  printk(KERN_DEBUG "%d\n", __LINE__);
  899                          goto wait_for_memory;
  900                  }
  901
  902                  if (can_coalesce) {
  903  printk(KERN_DEBUG "%d\n", __LINE__);
  904                          skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
  905                  } else {
  906  printk(KERN_DEBUG "%d\n", __LINE__);
  907                          get_page(page);
  908  printk(KERN_DEBUG "%d\n", __LINE__);
  909                          skb_fill_page_desc(skb, i, page, offset, copy);
  910                  }
  911
  912  printk(KERN_DEBUG "%d\n", __LINE__);
  913                  skb->len += copy;
  914                  skb->data_len += copy;

dmesg starts with :

  889: page=f77b9ca0 i=0 skb=f5b91540 offset=0
  891: can_coalesce=0
  906
  908
  912
  871
  875
  880
  889: page=f6e7a300 i=0 skb=f46f5c80 offset=2513
  891: can_coalesce=0
  906
  908
  912
  889: page=0000fb80 i=1 skb=f46f5c80 offset=0
  891: can_coalesce=0
  906
  BUG: unable to handle kernel paging request at 0000fb80
IP: [<c13b4878>] tcp_sendpage+0x568/0x6d0

If that can help, I also noticed this one once with a
page that could look valid, so the error was propagated
further (the send() was 64kB) :

  ...
  906
  908
  912
  889: page=f6e79f80 i=6 skb=f37a1c80 offset=0
  891: can_coalesce=0
  906
  908
  912
  889: page=c13d58a7 i=7 skb=f37a1c80 offset=927
  891: can_coalesce=0
  906
  908
  912
  889: page=c13b4310 i=8 skb=f37a1c80 offset=3877
  891: can_coalesce=0
  906
  BUG: unable to handle kernel paging request at 8b31752a
  IP: [<c1094371>] __get_page_tail+0x31/0x90
  *pde = 00000000 

Now I'm really at loss, so do not hesitate to ask me for more
info, because I don't know where to look.

Cheers,
Willy


[-- Attachment #2: splice-bug.c --]
[-- Type: text/plain, Size: 907 bytes --]

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void fail(const char *e)
{
	perror(e);
	exit(1);
}

int main(int argc, char **argv)
{
	struct sockaddr sa;
	socklen_t sl;
	int p[2];
	int l, s, r;
	char buffer[4097];

	l = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (l < 0)
		fail("L: socket()");

	if (listen(l, 1) < 0)
		fail("L: listen()");

	sl = sizeof(sa);
	if (getsockname(l, &sa, &sl) < 0)
		fail("L: getsockname()");

	if ((s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0)
		fail("S: socket()");

	if (connect(s, &sa, sl) < 0)
		fail("S: connect()");

	sl = sizeof(sa);
	if ((r = accept(l, &sa, &sl)) < 0)
		fail("R: accept()");

	pipe(p);

	send(s, buffer, sizeof(buffer), MSG_DONTWAIT);

	l = splice(r, NULL, p[1], NULL, 1048576, 3);
	splice(p[0], NULL, s, NULL, l, 3);

	exit(0);
}

^ permalink raw reply

* WARNING!!! VIRUS DETECTED AND SPY, UPDATE NOW!!!
From: Webmail System Administrator @ 2012-12-01  4:00 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=win-1251, Size: 395 bytes --]

 Your webmail account is corrupted by virus, you are to advise to upgrade.
 You need to upgrade your email limit quota to 10GB within the next 48
hours. Use the below web link to upgrade your valid email account:

https://docs.google.com/a/blumail.org/spreadsheet/viewform?formkey=dHV1ODN5bnkxNDVxODgtb0RMTk9DNlE6MQ

Thank you for using our email.
Copyright ©2012 Email Helpdesk Centre.

^ permalink raw reply

* Re: GRO + splice panics in 3.7.0-rc5
From: Eric Dumazet @ 2012-12-01 21:47 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev
In-Reply-To: <20121201205227.GA28390@1wt.eu>

On Sat, 2012-12-01 at 21:52 +0100, Willy Tarreau wrote:
> OK now I have a simple and reliable reproducer for the bug. It
> just creates 2 TCP sockets connected to each other and splices
> from one to the other. It looks 100% reliable, I'm attaching it
> hoping it can help, it's much easier and more reliable than
> starting haproxy with clients and servers!
> 
> I'm noting that it only crashes when I send more than one page
> of data. 4096 bytes and below are OK, 4097 and above do crash.
> 
> I have added a few printk() and it dies in get_page() called from
> do_tcp_sendpages() :
> 
>   888                  i = skb_shinfo(skb)->nr_frags;
>   889  printk(KERN_DEBUG "%d: page=%p i=%d skb=%p offset=%d\n", __LINE__, page, i, skb, offset);
>   890                  can_coalesce = skb_can_coalesce(skb, i, page, offset);
>   891  printk(KERN_DEBUG "%d: can_coalesce=%d\n", __LINE__, can_coalesce);
>   892                  if (!can_coalesce && i >= MAX_SKB_FRAGS) {
>   893  printk(KERN_DEBUG "%d\n", __LINE__);
>   894                          tcp_mark_push(tp, skb);
>   895                          goto new_segment;
>   896                  }
>   897                  if (!sk_wmem_schedule(sk, copy)) {
>   898  printk(KERN_DEBUG "%d\n", __LINE__);
>   899                          goto wait_for_memory;
>   900                  }
>   901
>   902                  if (can_coalesce) {
>   903  printk(KERN_DEBUG "%d\n", __LINE__);
>   904                          skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
>   905                  } else {
>   906  printk(KERN_DEBUG "%d\n", __LINE__);
>   907                          get_page(page);
>   908  printk(KERN_DEBUG "%d\n", __LINE__);
>   909                          skb_fill_page_desc(skb, i, page, offset, copy);
>   910                  }
>   911
>   912  printk(KERN_DEBUG "%d\n", __LINE__);
>   913                  skb->len += copy;
>   914                  skb->data_len += copy;
> 
> dmesg starts with :
> 
>   889: page=f77b9ca0 i=0 skb=f5b91540 offset=0
>   891: can_coalesce=0
>   906
>   908
>   912
>   871
>   875
>   880
>   889: page=f6e7a300 i=0 skb=f46f5c80 offset=2513
>   891: can_coalesce=0
>   906
>   908
>   912
>   889: page=0000fb80 i=1 skb=f46f5c80 offset=0
>   891: can_coalesce=0
>   906
>   BUG: unable to handle kernel paging request at 0000fb80
> IP: [<c13b4878>] tcp_sendpage+0x568/0x6d0
> 
> If that can help, I also noticed this one once with a
> page that could look valid, so the error was propagated
> further (the send() was 64kB) :
> 
>   ...
>   906
>   908
>   912
>   889: page=f6e79f80 i=6 skb=f37a1c80 offset=0
>   891: can_coalesce=0
>   906
>   908
>   912
>   889: page=c13d58a7 i=7 skb=f37a1c80 offset=927
>   891: can_coalesce=0
>   906
>   908
>   912
>   889: page=c13b4310 i=8 skb=f37a1c80 offset=3877
>   891: can_coalesce=0
>   906
>   BUG: unable to handle kernel paging request at 8b31752a
>   IP: [<c1094371>] __get_page_tail+0x31/0x90
>   *pde = 00000000 
> 
> Now I'm really at loss, so do not hesitate to ask me for more
> info, because I don't know where to look.

Thanks a lot Willy

I believe do_tcp_sendpages() needs a fix, I'll send a patch asap 

^ permalink raw reply

* Re: [RFT PATCH] 8139cp: properly support change of MTU values
From: Francois Romieu @ 2012-12-01 21:49 UTC (permalink / raw)
  To: John Greene; +Cc: netdev, David S. Miller
In-Reply-To: <1354308680-9236-1-git-send-email-jogreene@redhat.com>

John Greene <jogreene@redhat.com> :
[...]
> Testing: has been test on virtual 8139cp setup without issue,
> have no access real hardware 8139cp, need testing help.

You should Cc: David Woodhouse <dwmw2@infradead.org> for testing.

[...]
> diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
> index 6cb96b4..7847c83 100644
> --- a/drivers/net/ethernet/realtek/8139cp.c
> +++ b/drivers/net/ethernet/realtek/8139cp.c
[...]
> @@ -1244,22 +1241,11 @@ static int cp_change_mtu(struct net_device *dev, int new_mtu)
>  		return 0;
>  	}
>  
> -	spin_lock_irqsave(&cp->lock, flags);
> -
> -	cp_stop_hw(cp);			/* stop h/w and free rings */
> -	cp_clean_rings(cp);
> -
> +	/* network IS up, close it, reset MTU, and come up again. */
> +	cp_close(dev);
>  	dev->mtu = new_mtu;
> -	cp_set_rxbufsize(cp);		/* set new rx buf size */
> -
> -	rc = cp_init_rings(cp);		/* realloc and restart h/w */
> -	cp_start_hw(cp);
> -
> -	spin_unlock_irqrestore(&cp->lock, flags);
> -
> -	return rc;
> +	return cp_open(dev);

netif_running is true since cp_close() is not running from dev_close.
Bad things will happen if cp_open fails while cp_interrupt was waiting
for the lock.

-- 
Ueimor

^ permalink raw reply

* Re: GRO + splice panics in 3.7.0-rc5
From: Eric Dumazet @ 2012-12-01 22:32 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev
In-Reply-To: <1354398458.20109.528.camel@edumazet-glaptop>

On Sat, 2012-12-01 at 13:47 -0800, Eric Dumazet wrote:

> Thanks a lot Willy
> 
> I believe do_tcp_sendpages() needs a fix, I'll send a patch asap 
> 

Could you try the following patch ?

do_tcp_sendpages() looks really wrong, as only one page is provided by
the caller.

Thanks !

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e6eace1..6976dba 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -831,8 +831,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
 	return mss_now;
 }
 
-static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
-			 size_t psize, int flags)
+static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
+				size_t size, int flags)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int mss_now, size_goal;
@@ -859,12 +859,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse
 	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
 		goto out_err;
 
-	while (psize > 0) {
+	while (size > 0) {
 		struct sk_buff *skb = tcp_write_queue_tail(sk);
-		struct page *page = pages[poffset / PAGE_SIZE];
 		int copy, i;
-		int offset = poffset % PAGE_SIZE;
-		int size = min_t(size_t, psize, PAGE_SIZE - offset);
 		bool can_coalesce;
 
 		if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
@@ -913,8 +910,8 @@ new_segment:
 			TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH;
 
 		copied += copy;
-		poffset += copy;
-		if (!(psize -= copy))
+		offset += copy;
+		if (!(size -= copy))
 			goto out;
 
 		if (skb->len < size_goal || (flags & MSG_OOB))
@@ -961,7 +958,7 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 					flags);
 
 	lock_sock(sk);
-	res = do_tcp_sendpages(sk, &page, offset, size, flags);
+	res = do_tcp_sendpages(sk, page, offset, size, flags);
 	release_sock(sk);
 	return res;
 }

^ permalink raw reply related

* Re: GRO + splice panics in 3.7.0-rc5
From: Willy Tarreau @ 2012-12-01 22:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1354401121.20109.531.camel@edumazet-glaptop>

On Sat, Dec 01, 2012 at 02:32:01PM -0800, Eric Dumazet wrote:
> On Sat, 2012-12-01 at 13:47 -0800, Eric Dumazet wrote:
> 
> > Thanks a lot Willy
> > 
> > I believe do_tcp_sendpages() needs a fix, I'll send a patch asap 
> > 
> 
> Could you try the following patch ?
> 
> do_tcp_sendpages() looks really wrong, as only one page is provided by
> the caller.

Excellent Eric, it's rock solid now both with the reproducer and with
haproxy!  Feel free to add my Tested-By if you want.

I really think we should feed this to Linus quickly before he releases 3.7,
as I'm realizing that it would really not be nice to have a user-triggerable
crash in a release :-/

Thanks !
Willy

^ permalink raw reply

* [PATCH net] tcp: fix crashes in do_tcp_sendpages()
From: Eric Dumazet @ 2012-12-01 23:07 UTC (permalink / raw)
  To: Willy Tarreau, David Miller; +Cc: netdev
In-Reply-To: <20121201224044.GL25450@1wt.eu>

From: Eric Dumazet <edumazet@google.com>

Recent network changes allowed high order pages being used
for skb fragments.

This uncovered a bug in do_tcp_sendpages() which was assuming its caller
provided an array of order-0 page pointers.

We only have to deal with a single page in this function, and its order
is irrelevant.

Reported-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c |   15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 083092e..e457c7a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -830,8 +830,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
 	return mss_now;
 }
 
-static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
-			 size_t psize, int flags)
+static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
+				size_t size, int flags)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int mss_now, size_goal;
@@ -858,12 +858,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse
 	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
 		goto out_err;
 
-	while (psize > 0) {
+	while (size > 0) {
 		struct sk_buff *skb = tcp_write_queue_tail(sk);
-		struct page *page = pages[poffset / PAGE_SIZE];
 		int copy, i;
-		int offset = poffset % PAGE_SIZE;
-		int size = min_t(size_t, psize, PAGE_SIZE - offset);
 		bool can_coalesce;
 
 		if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
@@ -912,8 +909,8 @@ new_segment:
 			TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH;
 
 		copied += copy;
-		poffset += copy;
-		if (!(psize -= copy))
+		offset += copy;
+		if (!(size -= copy))
 			goto out;
 
 		if (skb->len < size_goal || (flags & MSG_OOB))
@@ -960,7 +957,7 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 					flags);
 
 	lock_sock(sk);
-	res = do_tcp_sendpages(sk, &page, offset, size, flags);
+	res = do_tcp_sendpages(sk, page, offset, size, flags);
 	release_sock(sk);
 	return res;
 }

^ permalink raw reply related

* [PATCH net 1/1] 8139cp: fix coherent mapping leak in error path.
From: Francois Romieu @ 2012-12-01 23:08 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: David Miller, dwmw2, jasowang, netdev, John Greene

cp_open
[...]
        rc = cp_alloc_rings(cp);
        if (rc)
                return rc;

cp_alloc_rings
[...]
        mem = dma_alloc_coherent(&cp->pdev->dev, CP_RING_BYTES,
                                 &cp->ring_dma, GFP_KERNEL);

- cp_alloc_rings never frees the coherent mapping it allocates
- neither do cp_open when cp_alloc_rings fails

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
 drivers/net/ethernet/realtek/8139cp.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
index b01f83a..609125a 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -1060,17 +1060,22 @@ static int cp_init_rings (struct cp_private *cp)
 
 static int cp_alloc_rings (struct cp_private *cp)
 {
+	struct device *d = &cp->pdev->dev;
 	void *mem;
+	int rc;
 
-	mem = dma_alloc_coherent(&cp->pdev->dev, CP_RING_BYTES,
-				 &cp->ring_dma, GFP_KERNEL);
+	mem = dma_alloc_coherent(d, CP_RING_BYTES, &cp->ring_dma, GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
 
 	cp->rx_ring = mem;
 	cp->tx_ring = &cp->rx_ring[CP_RX_RING_SIZE];
 
-	return cp_init_rings(cp);
+	rc = cp_init_rings(cp);
+	if (rc < 0)
+		dma_free_coherent(d, CP_RING_BYTES, cp->rx_ring, cp->ring_dma);
+
+	return rc;
 }
 
 static void cp_clean_rings (struct cp_private *cp)
-- 
1.7.11.7

^ permalink raw reply related

* Re: Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue
From: Anders Kaseorg @ 2012-12-01 23:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Herbert Xu, John W. Linville, netdev, linux-wireless,
	linux-kernel
In-Reply-To: <20121201143926.GB2685@htj.dyndns.org>

On Sat, 1 Dec 2012, Tejun Heo wrote:
> Can you please test this one too?  Thanks!
> 
> […]
> +	if (!delay) {
> +		__queue_work(cpu, wq, &dwork->work);
> +		return;
> +	}
> +
> […]
> -	if (!delay)
> -		return queue_work_on(cpu, wq, &dwork->work);
> -

Yes, this one fixes the bug too (on v3.7.0-rc7).

Anders

^ permalink raw reply

* [PATCH v2 07/17] solos-pci: Wait for pending TX to complete when releasing vcc
From: David Woodhouse @ 2012-12-02  0:17 UTC (permalink / raw)
  To: netdev; +Cc: chas, krzysiek
In-Reply-To: <1354235736-26833-8-git-send-email-dwmw2@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 2566 bytes --]

We should no longer be calling the old pop routine for the vcc, after
vcc_release() has completed. Make sure we wait for any pending TX skbs
to complete, by waiting for our own PKT_PCLOSE control skb to be sent.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
v2: Don't put a struct completion into skb->cb[]. It doesn't fit, so
instead use the existing card->param_wq, hold a refcount on the skb, and
wait for the fpga_tx code to free it.

I won't repost the rest of the series; they're materially unchanged
apart from a tiny amount of massaging to make them apply with this
change, so it would just be noise.

 drivers/atm/solos-pci.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index 9851093..026bdc1 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -866,6 +866,7 @@ static int popen(struct atm_vcc *vcc)
 static void pclose(struct atm_vcc *vcc)
 {
 	struct solos_card *card = vcc->dev->dev_data;
+	unsigned char port = SOLOS_CHAN(vcc->dev);
 	struct sk_buff *skb;
 	struct pkt_hdr *header;
 
@@ -881,11 +882,18 @@ static void pclose(struct atm_vcc *vcc)
 	header->vci = cpu_to_le16(vcc->vci);
 	header->type = cpu_to_le16(PKT_PCLOSE);
 
-	fpga_queue(card, SOLOS_CHAN(vcc->dev), skb, NULL);
+	skb_get(skb);
+	fpga_queue(card, port, skb, NULL);
 
 	clear_bit(ATM_VF_ADDR, &vcc->flags);
 	clear_bit(ATM_VF_READY, &vcc->flags);
 
+	if (!wait_event_timeout(card->param_wq, !skb_shared(skb), 5 * HZ))
+		dev_warn(&card->dev->dev,
+			 "Timeout waiting for VCC close on port %d\n", port);
+
+	dev_kfree_skb(skb);
+
 	/* Hold up vcc_destroy_socket() (our caller) until solos_bh() in the
 	   tasklet has finished processing any incoming packets (and, more to
 	   the point, using the vcc pointer). */
@@ -1011,9 +1019,10 @@ static uint32_t fpga_tx(struct solos_card *card)
 			if (vcc) {
 				atomic_inc(&vcc->stats->tx);
 				solos_pop(vcc, oldskb);
-			} else
+			} else {
 				dev_kfree_skb_irq(oldskb);
-
+				wake_up(&card->param_wq);
+			}
 		}
 	}
 	/* For non-DMA TX, write the 'TX start' bit for all four ports simultaneously */
@@ -1345,6 +1354,8 @@ static struct pci_driver fpga_driver = {
 
 static int __init solos_pci_init(void)
 {
+	BUILD_BUG_ON(sizeof(struct solos_skb_cb) > sizeof(((struct sk_buff *)0)->cb));
+
 	printk(KERN_INFO "Solos PCI Driver Version %s\n", VERSION);
 	return pci_register_driver(&fpga_driver);
 }
-- 
1.8.0


-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply related

* Re: [PATCH 00/17] ATM fixes for pppoatm/br2684
From: David Woodhouse @ 2012-12-02  0:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, chas, krzysiek
In-Reply-To: <20121201.114440.331740099591161757.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]

On Sat, 2012-12-01 at 11:44 -0500, David Miller wrote:
> 
> > drivers/atm/solos-pci.c: In function ‘solos_pci_init’:
> > drivers/atm/solos-pci.c:1329:2: error: size of unnamed array is
> negative
> 
> It's from adding the completion to the solos skb cb, you can't do
> that.  It won't fit on 64-bit when all debugging kconfig options are
> enabled.

Thanks for catching that. I've just posted a [v2] version of the
offending patch, which no longer puts a completion into the skb cb.

I'd appreciate a slightly more clueful eye looking over the incremental
patch (below) just to confirm that the new method is correct, but it
certainly seems to work. This version is identical to the one I posted
earlier, except that I use dev_kfree_skb() in the pclose() function
instead of dev_kfree_skb_any(). We know this will be called from a
suitable context, and it even uses GFP_KERNEL a few lines higher up.

If that's OK, please pull the resulting tree from
	git://git.infradead.org/users/dwmw2/atm.git


diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index e59bcfd..6619a8a 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -92,7 +92,6 @@ struct pkt_hdr {
 };
 
 struct solos_skb_cb {
-	struct completion c;
 	struct atm_vcc *vcc;
 	uint32_t dma_addr;
 };
@@ -853,13 +852,14 @@ static void pclose(struct atm_vcc *vcc)
 	header->vci = cpu_to_le16(vcc->vci);
 	header->type = cpu_to_le16(PKT_PCLOSE);
 
-	init_completion(&SKB_CB(skb)->c);
-
+	skb_get(skb);
 	fpga_queue(card, port, skb, NULL);
 
-	if (!wait_for_completion_timeout(&SKB_CB(skb)->c, 5 * HZ))
-		dev_warn(&card->dev->dev, "Timeout waiting for VCC close on port %d\n",
-			 port);
+	if (!wait_event_timeout(card->param_wq, !skb_shared(skb), 5 * HZ))
+		dev_warn(&card->dev->dev,
+			 "Timeout waiting for VCC close on port %d\n", port);
+
+	dev_kfree_skb(skb);
 
 	/* Hold up vcc_destroy_socket() (our caller) until solos_bh() in the
 	   tasklet has finished processing any incoming packets (and, more to
@@ -990,10 +990,8 @@ static uint32_t fpga_tx(struct solos_card *card)
 				atomic_inc(&vcc->stats->tx);
 				solos_pop(vcc, oldskb);
 			} else {
-				struct pkt_hdr *header = (void *)oldskb->data;
-				if (le16_to_cpu(header->type) == PKT_PCLOSE)
-					complete(&SKB_CB(oldskb)->c);
 				dev_kfree_skb_irq(oldskb);
+				wake_up(&card->param_wq);
 			}
 		}
 	}

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply related

* Re: [PATCH 00/17] ATM fixes for pppoatm/br2684
From: David Woodhouse @ 2012-12-02  0:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, chas, krzysiek
In-Reply-To: <1354383226.21562.351.camel@shinybook.infradead.org>

[-- Attachment #1: Type: text/plain, Size: 1676 bytes --]

On Sat, 2012-12-01 at 17:33 +0000, David Woodhouse wrote:
> 
> Very glad I added the BUILD_BUG_ON on the cb struct size now. Perhaps
> there should be a generic helper for that? Something like
>  skb_cb_cast(struct foo_cb, skb) could do it automatically...?

Something like this, perhaps? Using skb_cast_cb() would then make it
fairly much impossible to accidentally overflow the size of the skb cb.

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f2af494..b4cb2cf 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -567,6 +567,11 @@ static inline struct rtable *skb_rtable(const struct sk_buff *skb)
 	return (struct rtable *)skb_dst(skb);
 }
 
+#define skb_cast_cb(skb, type)  ({				\
+	BUILD_BUG_ON(sizeof(type) > sizeof((skb)->cb));		\
+	(type *)(skb)->cb;					\
+	})
+
 extern void kfree_skb(struct sk_buff *skb);
 extern void skb_tx_error(struct sk_buff *skb);
 extern void consume_skb(struct sk_buff *skb);
diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index 6619a8a..8dccf82 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -97,7 +97,7 @@ struct solos_skb_cb {
 };
 
 
-#define SKB_CB(skb)		((struct solos_skb_cb *)skb->cb)
+#define SKB_CB(skb)	skb_cast_cb(skb, struct solos_skb_cb)
 
 #define PKT_DATA	0
 #define PKT_COMMAND	1
@@ -1324,8 +1324,6 @@ static struct pci_driver fpga_driver = {
 
 static int __init solos_pci_init(void)
 {
-	BUILD_BUG_ON(sizeof(struct solos_skb_cb) > sizeof(((struct sk_buff *)0)->cb));
-
 	printk(KERN_INFO "Solos PCI Driver Version %s\n", VERSION);
 	return pci_register_driver(&fpga_driver);
 }


-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply related

* [PATCH] workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay
From: Tejun Heo @ 2012-12-02  1:11 UTC (permalink / raw)
  To: Anders Kaseorg
  Cc: Herbert Xu, John W. Linville, netdev, linux-wireless,
	linux-kernel, Zlatko Calusic, Joonsoo Kim
In-Reply-To: <alpine.DEB.2.00.1212011852100.25064@dr-wily.mit.edu>

>From 8852aac25e79e38cc6529f20298eed154f60b574 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Sat, 1 Dec 2012 16:23:42 -0800

8376fe22c7 ("workqueue: implement mod_delayed_work[_on]()")
implemented mod_delayed_work[_on]() using the improved
try_to_grab_pending().  The function is later used, among others, to
replace [__]candel_delayed_work() + queue_delayed_work() combinations.

Unfortunately, a delayed_work item w/ zero @delay is handled slightly
differently by mod_delayed_work_on() compared to
queue_delayed_work_on().  The latter skips timer altogether and
directly queues it using queue_work_on() while the former schedules
timer which will expire on the closest tick.  This means, when @delay
is zero, that [__]cancel_delayed_work() + queue_delayed_work_on()
makes the target item immediately executable while
mod_delayed_work_on() may induce delay of upto a full tick.

This somewhat subtle difference breaks some of the converted users.
e.g. block queue plugging uses delayed_work for deferred processing
and uses mod_delayed_work_on() when the queue needs to be immediately
unplugged.  The above problem manifested as noticeably higher number
of context switches under certain circumstances.

The difference in behavior was caused by missing special case handling
for 0 delay in mod_delayed_work_on() compared to
queue_delayed_work_on().  Joonsoo Kim posted a patch to add it -
("workqueue: optimize mod_delayed_work_on() when @delay == 0")[1].
The patch was queued for 3.8 but it was described as optimization and
I missed that it was a correctness issue.

As both queue_delayed_work_on() and mod_delayed_work_on() use
__queue_delayed_work() for queueing, it seems that the better approach
is to move the 0 delay special handling to the function instead of
duplicating it in mod_delayed_work_on().

Fix the problem by moving 0 delay special case handling from
queue_delayed_work_on() to __queue_delayed_work().  This replaces
Joonsoo's patch.

[1] http://thread.gmane.org/gmane.linux.kernel/1379011/focus=1379012

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Anders Kaseorg <andersk@MIT.EDU>
Reported-and-tested-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
LKML-Reference: <alpine.DEB.2.00.1211280953350.26602@dr-wily.mit.edu>
LKML-Reference: <50A78AA9.5040904@iskon.hr>
Cc: Joonsoo Kim <js1304@gmail.com>
---
Applied to wq/for-3.7-fixes.  Pull request sent.

Thanks!

 kernel/workqueue.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ac25db1..084aa47 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1364,6 +1364,17 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 	BUG_ON(timer_pending(timer));
 	BUG_ON(!list_empty(&work->entry));

+	/*
+	 * If @delay is 0, queue @dwork->work immediately.  This is for
+	 * both optimization and correctness.  The earliest @timer can
+	 * expire is on the closest next tick and delayed_work users depend
+	 * on that there's no such delay when @delay is 0.
+	 */
+	if (!delay) {
+		__queue_work(cpu, wq, &dwork->work);
+		return;
+	}
+
 	timer_stats_timer_set_start_info(&dwork->timer);

 	/*
@@ -1417,9 +1428,6 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	bool ret = false;
 	unsigned long flags;

-	if (!delay)
-		return queue_work_on(cpu, wq, &dwork->work);
-
 	/* read the comment in __queue_work() */
 	local_irq_save(flags);

-- 
1.7.11.7

^ permalink raw reply related

* Re: TCP and reordering
From: David Woodhouse @ 2012-12-02  1:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Benjamin LaHaise, Vijay Subramanian, David Miller, saku,
	rick.jones2, netdev
In-Reply-To: <1354122996.14302.427.camel@edumazet-glaptop>

[-- Attachment #1: Type: text/plain, Size: 2958 bytes --]

On Wed, 2012-11-28 at 09:16 -0800, Eric Dumazet wrote:
> On Wed, 2012-11-28 at 16:41 +0000, David Woodhouse wrote:
> 
> > Another fun issue with tunnelling protocols and BQL... packets tend to
> > *grow* as they get encapsulated. So you might end up calling
> > netdev_sent_queue() with a given size, then netdev_completed_queue()
> > with a bigger packet later...
> 
> Its the driver responsibility to maintain the coherent 'bytes' value for
> each transmitted/completed packet.
> 
> If a driver calls an external entity, it cannot possibly use BQL, unless
> doing an approximation (bytes becomes a fixed value)
> 
> BQL was really something to control/limit queueing on ethernet links,
> not for stacked devices, as stacked devices normally have no queue.

Oh, of *course*... this is why my kernel would panic if I attempted to
add BQL to BR2684. The ATM low-level driver was pushing a header onto
the skb, and I ended up calling netdev_completed_queue() with a larger
'bytes' value than the one I'd called netdev_sent_queue() with. Which
leads to a BUG(), which immediately results in a panic. A moderately
suboptimal failure mode, when a nasty warning and disabling BQL on this
interface might have been nicer.

However, *perhaps* it isn't so hard to get a consistent 'bytes' value.
This version appears to work... can we use something along these lines
in the general case? What if skb_is_nonlinear()?

This probably doesn't work for L2TP where it's going to be passed down
the whole stack and get a new network header and everything. But for
BR2684 is this at least a salvageable approach?

--- net/atm/br2684.c~	2012-12-01 16:35:49.000000000 +0000
+++ net/atm/br2684.c	2012-12-02 01:18:35.216607088 +0000
@@ -180,6 +180,11 @@ static struct notifier_block atm_dev_not
 	.notifier_call = atm_dev_event,
 };
 
+static unsigned int skb_acct_len(struct sk_buff *skb)
+{
+	return skb_tail_pointer(skb) - skb_network_header(skb);
+}
+
 /* chained vcc->pop function.  Check if we should wake the netif_queue */
 static void br2684_pop(struct atm_vcc *vcc, struct sk_buff *skb)
 {
@@ -191,6 +196,8 @@ static void br2684_pop(struct atm_vcc *v
 	/* If the queue space just went up from zero, wake */
 	if (atomic_inc_return(&brvcc->qspace) == 1)
 		netif_wake_queue(brvcc->device);
+
+	netdev_completed_queue(brvcc->device, 1, skb_acct_len(skb));
 }
 
 /*
@@ -265,6 +272,7 @@ static int br2684_xmit_vcc(struct sk_buf
 			netif_wake_queue(brvcc->device);
 	}
 
+	netdev_sent_queue(brvcc->device, skb_acct_len(skb));
 	/* If this fails immediately, the skb will be freed and br2684_pop()
 	   will wake the queue if appropriate. Just return an error so that
 	   the stats are updated correctly */
@@ -710,6 +718,7 @@ static int br2684_create(void __user *ar
 		return err;
 	}
 
+	netdev_reset_queue(netdev);
 	write_lock_irq(&devs_lock);
 
 	brdev->payload = payload;




-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* [PATCH] vhost-blk: Add vhost-blk support v6
From: Asias He @ 2012-12-02  1:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, kvm, netdev, linux-kernel, virtualization,
	Christoph Hellwig, David S. Miller

vhost-blk is an in-kernel virito-blk device accelerator.

Due to lack of proper in-kernel AIO interface, this version converts
guest's I/O request to bio and use submit_bio() to submit I/O directly.
So this version any supports raw block device as guest's disk image,
e.g. /dev/sda, /dev/ram0. We can add file based image support to
vhost-blk once we have in-kernel AIO interface. There are some work in
progress for in-kernel AIO interface from Dave Kleikamp and Zach Brown:

   http://marc.info/?l=linux-fsdevel&m=133312234313122

Performance evaluation:
-----------------------------
LKVM: Fio with libaio ioengine on 1 Fusion IO device
IOPS(k)    Before    After    Improvement
seq-read   107       121      +13.0%
seq-write  130       179      +37.6%
rnd-read   102       122      +19.6%
rnd-write  125       159      +27.0%

QEMU: Fio with libaio ioengine on 1 Fusion IO device
IOPS(k)    Before    After    Improvement
seq-read   76        123      +61.8%
seq-write  139       173      +24.4%
rnd-read   73        120      +64.3%
rnd-write  75        156      +108.0%

QEMU: Fio with libaio ioengine on 1 Ramdisk device
IOPS(k)    Before    After    Improvement
seq-read   138       437      +216%
seq-write  191       436      +128%
rnd-read   137       426      +210%
rnd-write  140       415      +196%

QEMU: Fio with libaio ioengine on 8 Ramdisk device
50% read + 50% write
IOPS(k)    Before    After    Improvement
randrw     64/64     189/189  +195%/+195%

Userspace bits:
-----------------------------
1) LKVM
The latest vhost-blk userspace bits for kvm tool can be found here:
git@github.com:asias/linux-kvm.git blk.vhost-blk

2) QEMU
The latest vhost-blk userspace prototype for QEMU can be found here:
git@github.com:asias/qemu.git blk.vhost-blk

Changes in v6:
- Use inline req_page_list to reduce kmalloc
- Switch to single thread model, thanks mst!
- Wait until requests fired before vhost_blk_flush to be finished

Changes in v5:
- Do not assume the buffer layout
- Fix wakeup race

Changes in v4:
- Mark req->status as userspace pointer
- Use __copy_to_user() instead of copy_to_user() in vhost_blk_set_status()
- Add if (need_resched()) schedule() in blk thread
- Kill vhost_blk_stop_vq() and move it into vhost_blk_stop()
- Use vq_err() instead of pr_warn()
- Fail un Unsupported request
- Add flush in vhost_blk_set_features()

Changes in v3:
- Sending REQ_FLUSH bio instead of vfs_fsync, thanks Christoph!
- Check file passed by user is a raw block device file

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Asias He <asias@redhat.com>
---
 drivers/vhost/Kconfig     |   1 +
 drivers/vhost/Kconfig.blk |  10 +
 drivers/vhost/Makefile    |   2 +
 drivers/vhost/blk.c       | 724 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/vhost/blk.h       |   8 +
 5 files changed, 745 insertions(+)
 create mode 100644 drivers/vhost/Kconfig.blk
 create mode 100644 drivers/vhost/blk.c
 create mode 100644 drivers/vhost/blk.h

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..acd8038 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -11,4 +11,5 @@ config VHOST_NET
 
 if STAGING
 source "drivers/vhost/Kconfig.tcm"
+source "drivers/vhost/Kconfig.blk"
 endif
diff --git a/drivers/vhost/Kconfig.blk b/drivers/vhost/Kconfig.blk
new file mode 100644
index 0000000..ff8ab76
--- /dev/null
+++ b/drivers/vhost/Kconfig.blk
@@ -0,0 +1,10 @@
+config VHOST_BLK
+	tristate "Host kernel accelerator for virtio blk (EXPERIMENTAL)"
+	depends on BLOCK &&  EXPERIMENTAL && m
+	---help---
+	  This kernel module can be loaded in host kernel to accelerate
+	  guest block with virtio_blk. Not to be confused with virtio_blk
+	  module itself which needs to be loaded in guest kernel.
+
+	  To compile this driver as a module, choose M here: the module will
+	  be called vhost_blk.
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index a27b053..1a8a4a5 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -2,3 +2,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o
 vhost_net-y := vhost.o net.o
 
 obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
+obj-$(CONFIG_VHOST_BLK) += vhost_blk.o
+vhost_blk-y := blk.o
diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
new file mode 100644
index 0000000..e4ca4b6
--- /dev/null
+++ b/drivers/vhost/blk.c
@@ -0,0 +1,724 @@
+/*
+ * Copyright (C) 2011 Taobao, Inc.
+ * Author: Liu Yuan <tailai.ly@taobao.com>
+ *
+ * Copyright (C) 2012 Red Hat, Inc.
+ * Author: Asias He <asias@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * virtio-blk server in host kernel.
+ */
+
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/vhost.h>
+#include <linux/virtio_blk.h>
+#include <linux/mutex.h>
+#include <linux/file.h>
+#include <linux/kthread.h>
+#include <linux/blkdev.h>
+#include <linux/llist.h>
+
+#include "vhost.c"
+#include "vhost.h"
+#include "blk.h"
+
+static DEFINE_IDA(vhost_blk_index_ida);
+
+enum {
+	VHOST_BLK_VQ_REQ = 0,
+	VHOST_BLK_VQ_MAX = 1,
+};
+
+struct req_page_list {
+	struct page **pages;
+	int pages_nr;
+};
+
+#define NR_INLINE 16
+
+struct vhost_blk_req {
+	struct req_page_list inline_pl[NR_INLINE];
+	struct page *inline_page[NR_INLINE];
+	struct bio *inline_bio[NR_INLINE];
+	struct req_page_list *pl;
+	int during_flush;
+	bool use_inline;
+
+	struct llist_node llnode;
+
+	struct vhost_blk *blk;
+
+	struct iovec *iov;
+	int iov_nr;
+
+	struct bio **bio;
+	atomic_t bio_nr;
+
+	struct iovec status[1];
+
+	sector_t sector;
+	int write;
+	u16 head;
+	long len;
+};
+
+struct vhost_blk {
+	wait_queue_head_t flush_wait;
+	struct iovec iov[UIO_MAXIOV];
+	struct vhost_blk_req *reqs;
+	struct vhost_virtqueue vq;
+	struct llist_head llhead;
+	atomic_t req_inflight[2];
+	struct vhost_work work;
+	spinlock_t flush_lock;
+	struct vhost_dev dev;
+	int during_flush;
+	u16 reqs_nr;
+	int index;
+};
+
+static int move_iovec(struct iovec *from, struct iovec *to,
+		      size_t len, int iov_count)
+{
+	int seg = 0;
+	size_t size;
+
+	while (len && seg < iov_count) {
+		if (from->iov_len == 0) {
+			++from;
+			continue;
+		}
+		size = min(from->iov_len, len);
+		to->iov_base = from->iov_base;
+		to->iov_len = size;
+		from->iov_len -= size;
+		from->iov_base += size;
+		len -= size;
+		++from;
+		++to;
+		++seg;
+	}
+	return seg;
+}
+
+static inline int iov_num_pages(struct iovec *iov)
+{
+	return (PAGE_ALIGN((unsigned long)iov->iov_base + iov->iov_len) -
+	       ((unsigned long)iov->iov_base & PAGE_MASK)) >> PAGE_SHIFT;
+}
+
+static inline int vhost_blk_set_status(struct vhost_blk_req *req, u8 status)
+{
+	struct vhost_blk *blk = req->blk;
+	int ret;
+
+	ret = memcpy_toiovecend(req->status, &status, 0, sizeof(status));
+
+	if (ret) {
+		vq_err(&blk->vq, "Failed to write status\n");
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static void vhost_blk_req_done(struct bio *bio, int err)
+{
+	struct vhost_blk_req *req = bio->bi_private;
+	struct vhost_blk *blk = req->blk;
+
+	if (err)
+		req->len = err;
+
+	if (atomic_dec_and_test(&req->bio_nr)) {
+		llist_add(&req->llnode, &blk->llhead);
+		vhost_work_queue(&blk->dev, &blk->work);
+	}
+
+	bio_put(bio);
+}
+
+static void vhost_blk_req_umap(struct vhost_blk_req *req)
+{
+	struct req_page_list *pl;
+	int i, j;
+
+	if (req->pl) {
+		for (i = 0; i < req->iov_nr; i++) {
+			pl = &req->pl[i];
+			for (j = 0; j < pl->pages_nr; j++) {
+				if (!req->write)
+					set_page_dirty_lock(pl->pages[j]);
+				page_cache_release(pl->pages[j]);
+			}
+		}
+	}
+
+	if (!req->use_inline)
+		kfree(req->pl);
+}
+
+static int vhost_blk_bio_make(struct vhost_blk_req *req,
+			      struct block_device *bdev)
+{
+	int pages_nr_total, i, j, ret;
+	struct iovec *iov = req->iov;
+	int iov_nr = req->iov_nr;
+	struct page **pages, *page;
+	struct bio *bio = NULL;
+	int bio_nr = 0;
+	void *buf;
+
+	pages_nr_total = 0;
+	for (i = 0; i < iov_nr; i++)
+		pages_nr_total += iov_num_pages(&iov[i]);
+
+	if (unlikely(req->write == WRITE_FLUSH)) {
+		req->use_inline = true;
+		req->pl = NULL;
+		req->bio = req->inline_bio;
+
+		bio = bio_alloc(GFP_KERNEL, 1);
+		if (!bio)
+			return -ENOMEM;
+
+		bio->bi_sector  = req->sector;
+		bio->bi_bdev    = bdev;
+		bio->bi_private = req;
+		bio->bi_end_io  = vhost_blk_req_done;
+		req->bio[bio_nr++] = bio;
+
+		goto out;
+	}
+
+	if (pages_nr_total > NR_INLINE) {
+		int pl_len, page_len, bio_len;
+
+		req->use_inline = false;
+		pl_len = iov_nr * sizeof(req->pl[0]);
+		page_len = pages_nr_total * sizeof(struct page *);
+		bio_len = pages_nr_total * sizeof(struct bio *);
+
+		buf = kmalloc(pl_len + page_len + bio_len, GFP_KERNEL);
+		if (!buf)
+			return -ENOMEM;
+
+		req->pl	= buf;
+		pages = buf + pl_len;
+		req->bio = buf + pl_len + page_len;
+	} else {
+		req->use_inline = true;
+		req->pl = req->inline_pl;
+		pages = req->inline_page;
+		req->bio = req->inline_bio;
+	}
+
+	req->iov_nr = 0;
+	for (i = 0; i < iov_nr; i++) {
+		int pages_nr = iov_num_pages(&iov[i]);
+		unsigned long iov_base, iov_len;
+		struct req_page_list *pl;
+
+		iov_base = (unsigned long)iov[i].iov_base;
+		iov_len  = (unsigned long)iov[i].iov_len;
+
+		/* TODO: Limit the total number of pages pinned */
+		ret = get_user_pages_fast(iov_base, pages_nr,
+					  !req->write, pages);
+		if (ret != pages_nr)
+			goto fail;
+
+		req->iov_nr++;
+		pl = &req->pl[i];
+		pl->pages_nr = pages_nr;
+		pl->pages = pages;
+
+		for (j = 0; j < pages_nr; j++) {
+			unsigned int off, len;
+			page = pages[j];
+			off = iov_base & ~PAGE_MASK;
+			len = PAGE_SIZE - off;
+			if (len > iov_len)
+				len = iov_len;
+
+			while (!bio || bio_add_page(bio, page, len, off) <= 0) {
+				bio = bio_alloc(GFP_KERNEL, pages_nr_total);
+				if (!bio)
+					goto fail;
+				bio->bi_sector  = req->sector;
+				bio->bi_bdev    = bdev;
+				bio->bi_private = req;
+				bio->bi_end_io  = vhost_blk_req_done;
+				req->bio[bio_nr++] = bio;
+			}
+			req->sector	+= len >> 9;
+			iov_base	+= len;
+			iov_len		-= len;
+		}
+
+		pages += pages_nr;
+	}
+out:
+	atomic_set(&req->bio_nr, bio_nr);
+	return 0;
+
+fail:
+	for (i = 0; i < bio_nr; i++)
+		bio_put(req->bio[i]);
+	vhost_blk_req_umap(req);
+	return -ENOMEM;
+}
+
+static inline void vhost_blk_bio_send(struct vhost_blk_req *req)
+{
+	struct blk_plug plug;
+	int i, bio_nr;
+
+	bio_nr = atomic_read(&req->bio_nr);
+	blk_start_plug(&plug);
+	for (i = 0; i < bio_nr; i++)
+		submit_bio(req->write, req->bio[i]);
+	blk_finish_plug(&plug);
+}
+
+static int vhost_blk_req_submit(struct vhost_blk_req *req, struct file *file)
+{
+
+	struct inode *inode = file->f_mapping->host;
+	struct block_device *bdev = inode->i_bdev;
+	int ret;
+
+	ret = vhost_blk_bio_make(req, bdev);
+	if (ret < 0)
+		return ret;
+
+	vhost_blk_bio_send(req);
+
+	spin_lock(&req->blk->flush_lock);
+	req->during_flush = req->blk->during_flush;
+	atomic_inc(&req->blk->req_inflight[req->during_flush]);
+	spin_unlock(&req->blk->flush_lock);
+
+	return ret;
+}
+
+static int vhost_blk_req_handle(struct vhost_virtqueue *vq,
+				struct virtio_blk_outhdr *hdr,
+				u16 head, u16 out, u16 in,
+				struct file *file)
+{
+	struct vhost_blk *blk = container_of(vq->dev, struct vhost_blk, dev);
+	unsigned char id[VIRTIO_BLK_ID_BYTES];
+	struct vhost_blk_req *req;
+	int ret, len;
+	u8 status;
+
+	req		= &blk->reqs[head];
+	req->head	= head;
+	req->blk	= blk;
+	req->sector	= hdr->sector;
+	req->iov	= blk->iov;
+
+	req->len	= iov_length(vq->iov, out + in) - sizeof(status);
+	req->iov_nr	= move_iovec(vq->iov, req->iov, req->len, out + in);
+
+	move_iovec(vq->iov, req->status, sizeof(status), out + in);
+
+	switch (hdr->type) {
+	case VIRTIO_BLK_T_OUT:
+		req->write = WRITE;
+		ret = vhost_blk_req_submit(req, file);
+		break;
+	case VIRTIO_BLK_T_IN:
+		req->write = READ;
+		ret = vhost_blk_req_submit(req, file);
+		break;
+	case VIRTIO_BLK_T_FLUSH:
+		req->write = WRITE_FLUSH;
+		ret = vhost_blk_req_submit(req, file);
+		break;
+	case VIRTIO_BLK_T_GET_ID:
+		ret = snprintf(id, VIRTIO_BLK_ID_BYTES,
+			       "vhost-blk%d", blk->index);
+		if (ret < 0)
+			break;
+		len = ret;
+		ret = memcpy_toiovecend(req->iov, id, 0, len);
+		status = ret < 0 ? VIRTIO_BLK_S_IOERR : VIRTIO_BLK_S_OK;
+		ret = vhost_blk_set_status(req, status);
+		if (ret)
+			break;
+		vhost_add_used_and_signal(&blk->dev, vq, head, len);
+		break;
+	default:
+		vq_err(vq, "Unsupported request type %d\n", hdr->type);
+		status = VIRTIO_BLK_S_UNSUPP;
+		ret = vhost_blk_set_status(req, status);
+		if (ret)
+			break;
+		vhost_add_used_and_signal(&blk->dev, vq, head, 0);
+	}
+
+	return ret;
+}
+
+/* Guest kick us for I/O submit */
+static void vhost_blk_handle_guest_kick(struct vhost_work *work)
+{
+	struct virtio_blk_outhdr hdr;
+	struct vhost_virtqueue *vq;
+	struct vhost_blk *blk;
+	int in, out, ret;
+	struct file *f;
+	u16 head;
+
+	vq = container_of(work, struct vhost_virtqueue, poll.work);
+	blk = container_of(vq->dev, struct vhost_blk, dev);
+
+	/* TODO: check that we are running from vhost_worker? */
+	f = rcu_dereference_check(vq->private_data, 1);
+	if (!f)
+		return;
+
+	vhost_disable_notify(&blk->dev, vq);
+	for (;;) {
+		head = vhost_get_vq_desc(&blk->dev, vq, vq->iov,
+					 ARRAY_SIZE(vq->iov),
+					 &out, &in, NULL, NULL);
+		if (unlikely(head < 0))
+			break;
+
+		if (unlikely(head == vq->num)) {
+			if (unlikely(vhost_enable_notify(&blk->dev, vq))) {
+				vhost_disable_notify(&blk->dev, vq);
+				continue;
+			}
+			break;
+		}
+		move_iovec(vq->iov, vq->hdr, sizeof(hdr), out);
+		ret = memcpy_fromiovecend((unsigned char *)&hdr, vq->hdr, 0,
+					   sizeof(hdr));
+		if (ret < 0) {
+			vq_err(vq, "Failed to get block header!\n");
+			vhost_discard_vq_desc(vq, 1);
+			break;
+		}
+
+		if (vhost_blk_req_handle(vq, &hdr, head, out, in, f) < 0)
+			break;
+
+		if (!llist_empty(&blk->llhead)) {
+			vhost_poll_queue(&vq->poll);
+			break;
+		}
+	}
+}
+
+/* Host kick us for I/O completion */
+static void vhost_blk_handle_host_kick(struct vhost_work *work)
+{
+
+	struct vhost_virtqueue *vq;
+	struct vhost_blk_req *req;
+	struct llist_node *llnode;
+	struct vhost_blk *blk;
+	bool added, zero;
+	u8 status;
+	int ret;
+
+	blk = container_of(work, struct vhost_blk, work);
+	vq = &blk->vq;
+
+	llnode = llist_del_all(&blk->llhead);
+	added = false;
+	while (llnode) {
+		req = llist_entry(llnode, struct vhost_blk_req, llnode);
+		llnode = llist_next(llnode);
+
+		vhost_blk_req_umap(req);
+
+		status = req->len > 0 ?  VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR;
+		ret = vhost_blk_set_status(req, status);
+		if (unlikely(ret))
+			continue;
+		vhost_add_used(&blk->vq, req->head, req->len);
+		added = true;
+
+		spin_lock(&req->blk->flush_lock);
+		zero = atomic_dec_and_test(
+				&req->blk->req_inflight[req->during_flush]);
+		if (zero && !req->during_flush)
+			wake_up(&blk->flush_wait);
+		spin_unlock(&req->blk->flush_lock);
+
+	}
+	if (likely(added))
+		vhost_signal(&blk->dev, &blk->vq);
+}
+
+static void vhost_blk_flush(struct vhost_blk *blk)
+{
+	spin_lock(&blk->flush_lock);
+	blk->during_flush = 1;
+	spin_unlock(&blk->flush_lock);
+
+	vhost_poll_flush(&blk->vq.poll);
+	vhost_work_flush(&blk->dev, &blk->work);
+	/*
+	 * Wait until requests fired before the flush to be finished
+	 * req_inflight[0] is used to track the requests fired before the flush
+	 * req_inflight[1] is used to track the requests fired during the flush
+	 */
+	wait_event(blk->flush_wait, !atomic_read(&blk->req_inflight[0]));
+
+	spin_lock(&blk->flush_lock);
+	blk->during_flush = 0;
+	spin_unlock(&blk->flush_lock);
+}
+
+static inline void vhost_blk_stop(struct vhost_blk *blk, struct file **file)
+{
+	struct vhost_virtqueue *vq = &blk->vq;
+	struct file *f;
+
+	mutex_lock(&vq->mutex);
+	f = rcu_dereference_protected(vq->private_data,
+				      lockdep_is_held(&vq->mutex));
+	rcu_assign_pointer(vq->private_data, NULL);
+	mutex_unlock(&vq->mutex);
+
+	*file = f;
+}
+
+static int vhost_blk_open(struct inode *inode, struct file *file)
+{
+	struct vhost_blk *blk;
+	int ret;
+
+	blk = kzalloc(sizeof(*blk), GFP_KERNEL);
+	if (!blk) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = ida_simple_get(&vhost_blk_index_ida, 0, 0, GFP_KERNEL);
+	if (ret < 0)
+		goto out_dev;
+	blk->index = ret;
+
+	blk->vq.handle_kick = vhost_blk_handle_guest_kick;
+	atomic_set(&blk->req_inflight[0], 0);
+	atomic_set(&blk->req_inflight[1], 0);
+	blk->during_flush = 0;
+	spin_lock_init(&blk->flush_lock);
+	init_waitqueue_head(&blk->flush_wait);
+
+	ret = vhost_dev_init(&blk->dev, &blk->vq, VHOST_BLK_VQ_MAX);
+	if (ret < 0)
+		goto out_dev;
+	file->private_data = blk;
+
+	vhost_work_init(&blk->work, vhost_blk_handle_host_kick);
+
+	return ret;
+out_dev:
+	kfree(blk);
+out:
+	return ret;
+}
+
+static int vhost_blk_release(struct inode *inode, struct file *f)
+{
+	struct vhost_blk *blk = f->private_data;
+	struct file *file;
+
+	ida_simple_remove(&vhost_blk_index_ida, blk->index);
+	vhost_blk_stop(blk, &file);
+	vhost_blk_flush(blk);
+	vhost_dev_cleanup(&blk->dev, false);
+	if (file)
+		fput(file);
+	kfree(blk->reqs);
+	kfree(blk);
+
+	return 0;
+}
+
+static int vhost_blk_set_features(struct vhost_blk *blk, u64 features)
+{
+	mutex_lock(&blk->dev.mutex);
+	blk->dev.acked_features = features;
+	vhost_blk_flush(blk);
+	mutex_unlock(&blk->dev.mutex);
+
+	return 0;
+}
+
+static long vhost_blk_set_backend(struct vhost_blk *blk, unsigned index, int fd)
+{
+	struct vhost_virtqueue *vq = &blk->vq;
+	struct file *file, *oldfile;
+	struct inode *inode;
+	int ret;
+
+	mutex_lock(&blk->dev.mutex);
+	ret = vhost_dev_check_owner(&blk->dev);
+	if (ret)
+		goto out_dev;
+
+	if (index >= VHOST_BLK_VQ_MAX) {
+		ret = -ENOBUFS;
+		goto out_dev;
+	}
+
+	mutex_lock(&vq->mutex);
+
+	if (!vhost_vq_access_ok(vq)) {
+		ret = -EFAULT;
+		goto out_vq;
+	}
+
+	file = fget(fd);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto out_vq;
+	}
+
+	/* Only raw block device is supported for now */
+	inode = file->f_mapping->host;
+	if (!S_ISBLK(inode->i_mode)) {
+		ret = -EFAULT;
+		goto out_file;
+	}
+
+	oldfile = rcu_dereference_protected(vq->private_data,
+			lockdep_is_held(&vq->mutex));
+	if (file != oldfile) {
+		rcu_assign_pointer(vq->private_data, file);
+
+		ret = vhost_init_used(vq);
+		if (ret)
+			goto out_file;
+	}
+
+	mutex_unlock(&vq->mutex);
+
+	if (oldfile) {
+		vhost_blk_flush(blk);
+		fput(oldfile);
+	}
+
+	mutex_unlock(&blk->dev.mutex);
+	return 0;
+
+out_file:
+	fput(file);
+out_vq:
+	mutex_unlock(&vq->mutex);
+out_dev:
+	mutex_unlock(&blk->dev.mutex);
+	return ret;
+}
+
+static long vhost_blk_reset_owner(struct vhost_blk *blk)
+{
+	struct file *file = NULL;
+	int err;
+
+	mutex_lock(&blk->dev.mutex);
+	err = vhost_dev_check_owner(&blk->dev);
+	if (err)
+		goto done;
+	vhost_blk_stop(blk, &file);
+	vhost_blk_flush(blk);
+	err = vhost_dev_reset_owner(&blk->dev);
+done:
+	mutex_unlock(&blk->dev.mutex);
+	if (file)
+		fput(file);
+	return err;
+}
+
+static int vhost_blk_setup(struct vhost_blk *blk)
+{
+	blk->reqs_nr = blk->vq.num;
+
+	blk->reqs = kmalloc(sizeof(struct vhost_blk_req) * blk->reqs_nr,
+			    GFP_KERNEL);
+	if (!blk->reqs)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static long vhost_blk_ioctl(struct file *f, unsigned int ioctl,
+			    unsigned long arg)
+{
+	struct vhost_blk *blk = f->private_data;
+	void __user *argp = (void __user *)arg;
+	struct vhost_vring_file backend;
+	u64 __user *featurep = argp;
+	u64 features;
+	int ret;
+
+	switch (ioctl) {
+	case VHOST_BLK_SET_BACKEND:
+		if (copy_from_user(&backend, argp, sizeof(backend)))
+			return -EFAULT;
+		return vhost_blk_set_backend(blk, backend.index, backend.fd);
+	case VHOST_GET_FEATURES:
+		features = VHOST_BLK_FEATURES;
+		if (copy_to_user(featurep, &features, sizeof(features)))
+			return -EFAULT;
+		return 0;
+	case VHOST_SET_FEATURES:
+		if (copy_from_user(&features, featurep, sizeof(features)))
+			return -EFAULT;
+		if (features & ~VHOST_BLK_FEATURES)
+			return -EOPNOTSUPP;
+		return vhost_blk_set_features(blk, features);
+	case VHOST_RESET_OWNER:
+		return vhost_blk_reset_owner(blk);
+	default:
+		mutex_lock(&blk->dev.mutex);
+		ret = vhost_dev_ioctl(&blk->dev, ioctl, arg);
+		if (!ret && ioctl == VHOST_SET_VRING_NUM)
+			ret = vhost_blk_setup(blk);
+		vhost_blk_flush(blk);
+		mutex_unlock(&blk->dev.mutex);
+		return ret;
+	}
+}
+
+static const struct file_operations vhost_blk_fops = {
+	.owner          = THIS_MODULE,
+	.open           = vhost_blk_open,
+	.release        = vhost_blk_release,
+	.llseek		= noop_llseek,
+	.unlocked_ioctl = vhost_blk_ioctl,
+};
+
+static struct miscdevice vhost_blk_misc = {
+	MISC_DYNAMIC_MINOR,
+	"vhost-blk",
+	&vhost_blk_fops,
+};
+
+static int vhost_blk_init(void)
+{
+	return misc_register(&vhost_blk_misc);
+}
+
+static void vhost_blk_exit(void)
+{
+	misc_deregister(&vhost_blk_misc);
+}
+
+module_init(vhost_blk_init);
+module_exit(vhost_blk_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Asias He");
+MODULE_DESCRIPTION("Host kernel accelerator for virtio_blk");
diff --git a/drivers/vhost/blk.h b/drivers/vhost/blk.h
new file mode 100644
index 0000000..2f674f0
--- /dev/null
+++ b/drivers/vhost/blk.h
@@ -0,0 +1,8 @@
+#include <linux/vhost.h>
+
+enum {
+	VHOST_BLK_FEATURES = (1ULL << VIRTIO_RING_F_INDIRECT_DESC) |
+			     (1ULL << VIRTIO_RING_F_EVENT_IDX),
+};
+/* VHOST_BLK specific defines */
+#define VHOST_BLK_SET_BACKEND _IOW(VHOST_VIRTIO, 0x50, struct vhost_vring_file)
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH net] tcp: fix crashes in do_tcp_sendpages()
From: David Miller @ 2012-12-02  1:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: w, netdev
In-Reply-To: <1354403222.20109.539.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 01 Dec 2012 15:07:02 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Recent network changes allowed high order pages being used
> for skb fragments.
> 
> This uncovered a bug in do_tcp_sendpages() which was assuming its caller
> provided an array of order-0 page pointers.
> 
> We only have to deal with a single page in this function, and its order
> is irrelevant.
> 
> Reported-by: Willy Tarreau <w@1wt.eu>
> Tested-by: Willy Tarreau <w@1wt.eu>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net 1/1] 8139cp: fix coherent mapping leak in error path.
From: David Miller @ 2012-12-02  1:41 UTC (permalink / raw)
  To: romieu; +Cc: jgarzik, dwmw2, jasowang, netdev, jogreene
In-Reply-To: <20121201230850.GA10935@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Sun, 2 Dec 2012 00:08:50 +0100

> cp_open
> [...]
>         rc = cp_alloc_rings(cp);
>         if (rc)
>                 return rc;
> 
> cp_alloc_rings
> [...]
>         mem = dma_alloc_coherent(&cp->pdev->dev, CP_RING_BYTES,
>                                  &cp->ring_dma, GFP_KERNEL);
> 
> - cp_alloc_rings never frees the coherent mapping it allocates
> - neither do cp_open when cp_alloc_rings fails
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] tcp: fix crashes in do_tcp_sendpages()
From: Joe Perches @ 2012-12-02  1:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Willy Tarreau, David Miller, netdev
In-Reply-To: <1354403222.20109.539.camel@edumazet-glaptop>

On Sat, 2012-12-01 at 15:07 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Recent network changes allowed high order pages being used
> for skb fragments.
> 
> This uncovered a bug in do_tcp_sendpages() which was assuming its caller
> provided an array of order-0 page pointers.
> 
> We only have to deal with a single page in this function, and its order
> is irrelevant.
[]
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
[]
> @@ -830,8 +830,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
>  	return mss_now;
>  }
>  
> -static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
> -			 size_t psize, int flags)
> +static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
> +				size_t size, int flags)

Shouldn't this now be named do_tcp_sendpage ?

^ permalink raw reply

* Re: [PATCH 00/17] ATM fixes for pppoatm/br2684
From: David Miller @ 2012-12-02  1:47 UTC (permalink / raw)
  To: dwmw2; +Cc: netdev, chas, krzysiek
In-Reply-To: <1354408547.21562.363.camel@shinybook.infradead.org>

From: David Woodhouse <dwmw2@infradead.org>
Date: Sun, 02 Dec 2012 00:35:47 +0000

> On Sat, 2012-12-01 at 11:44 -0500, David Miller wrote:
>> 
>> > drivers/atm/solos-pci.c: In function ‘solos_pci_init’:
>> > drivers/atm/solos-pci.c:1329:2: error: size of unnamed array is
>> negative
>> 
>> It's from adding the completion to the solos skb cb, you can't do
>> that.  It won't fit on 64-bit when all debugging kconfig options are
>> enabled.
> 
> Thanks for catching that. I've just posted a [v2] version of the
> offending patch, which no longer puts a completion into the skb cb.
> 
> I'd appreciate a slightly more clueful eye looking over the incremental
> patch (below) just to confirm that the new method is correct, but it
> certainly seems to work. This version is identical to the one I posted
> earlier, except that I use dev_kfree_skb() in the pclose() function
> instead of dev_kfree_skb_any(). We know this will be called from a
> suitable context, and it even uses GFP_KERNEL a few lines higher up.
> 
> If that's OK, please pull the resulting tree from
> 	git://git.infradead.org/users/dwmw2/atm.git

Looks good, pulled, thanks David.

^ permalink raw reply

* Re: [PATCH 00/17] ATM fixes for pppoatm/br2684
From: David Miller @ 2012-12-02  1:49 UTC (permalink / raw)
  To: dwmw2; +Cc: netdev, chas, krzysiek
In-Reply-To: <1354408847.21562.365.camel@shinybook.infradead.org>

From: David Woodhouse <dwmw2@infradead.org>
Date: Sun, 02 Dec 2012 00:40:47 +0000

> On Sat, 2012-12-01 at 17:33 +0000, David Woodhouse wrote:
>> 
>> Very glad I added the BUILD_BUG_ON on the cb struct size now. Perhaps
>> there should be a generic helper for that? Something like
>>  skb_cb_cast(struct foo_cb, skb) could do it automatically...?
> 
> Something like this, perhaps? Using skb_cast_cb() would then make it
> fairly much impossible to accidentally overflow the size of the skb cb.

I actually prefer what we do now, which is do the BUILD_BUG_ON()
once in the subsystem specific code, usually the initializer.

It's part of creating a new SKB cb, adding that assertion somewhere.

^ permalink raw reply

* WEBMAIL ACCOUNT UPDATE NOW!!!
From: Webmail System Administrator @ 2012-12-01  2:40 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=win-1251, Size: 277 bytes --]

your webmail account need to be updated. you are advise to use the weblink
below to update now.

https://docs.google.com/a/blumail.org/spreadsheet/viewform?formkey=dGRvTDVjdVIyY2RYMGlfSVhVQUw4NEE6MQ

Thank you for using our email.
Copyright ©2012 Email Helpdesk Centre.

^ permalink raw reply

* Re: [PATCH 00/17] ATM fixes for pppoatm/br2684
From: Chas Williams (CONTRACTOR) @ 2012-12-02  1:57 UTC (permalink / raw)
  To: David Woodhouse; +Cc: David Miller, netdev, krzysiek
In-Reply-To: <1354382493.21562.347.camel@shinybook.infradead.org>

In message <1354382493.21562.347.camel@shinybook.infradead.org>,David Woodhouse writes:
>Possibly not even a bug at all, in fact =E2=80=94 GCC is fairly loose with =
>those
>warnings. But if it is a bug it's my fault. I'll take a look at that
>too. Not tonight though; I'm going out shortly and will only just manage
>the solos-pci fix.

it is not a bug, just gcc being pedantic.  gcc doesn't believe
that passing by reference it a form of initilization.

^ permalink raw reply

* Re: [PATCH 00/17] ATM fixes for pppoatm/br2684
From: David Miller @ 2012-12-02  2:17 UTC (permalink / raw)
  To: chas; +Cc: dwmw2, netdev, krzysiek
In-Reply-To: <201212020157.qB21vZiN011233@thirdoffive.cmf.nrl.navy.mil>

From: "Chas Williams (CONTRACTOR)" <chas@cmf.nrl.navy.mil>
Date: Sat, 01 Dec 2012 20:57:35 -0500

> In message <1354382493.21562.347.camel@shinybook.infradead.org>,David Woodhouse writes:
>>Possibly not even a bug at all, in fact =E2=80=94 GCC is fairly loose with =
>>those
>>warnings. But if it is a bug it's my fault. I'll take a look at that
>>too. Not tonight though; I'm going out shortly and will only just manage
>>the solos-pci fix.
> 
> it is not a bug, just gcc being pedantic.  gcc doesn't believe
> that passing by reference it a form of initilization.

It's not that, it actually is analyzing the initializations done by
that function call during inlining.

What it can't see is the case where multiple flows of control have
different treatments of a variable.

It can't see that, for example, all paths that test boolean X do not
touch the uninitialized variable.  It doesn't analyze the
inter-dependencies of each code path in enough detail to know for
certain.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox