[PATCH v2] rds: rds-stress show all zeros after few minutes

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] rds: rds-stress show all zeros after few minutes
@ 2016-03-31  6:29 shamir rabinovitch
  2016-03-31 20:02 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: shamir rabinovitch @ 2016-03-31  6:29 UTC (permalink / raw)
  To: rds-devel, netdev; +Cc: davem, shamir.rabinovitch

Issue can be seen on platforms that use 8K and above page size
while rds fragment size is 4K. On those platforms single page is
shared between 2 or more rds fragments. Each fragment has its own
offset and rds congestion map code need to take this offset to account.
Not taking this offset to account lead to reading the data fragment
as congestion map fragment and hang of the rds transmit due to far
congestion map corruption.

Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>

Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Tested-by: Anand Bibhuti <anand.bibhuti@oracle.com>
---
 net/rds/ib_recv.c |    2 +-
 net/rds/iw_recv.c |    2 +-
 net/rds/page.c    |    5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 977fb86..abc8cc8 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -796,7 +796,7 @@ static void rds_ib_cong_recv(struct rds_connection *conn,
 
 		addr = kmap_atomic(sg_page(&frag->f_sg));
 
-		src = addr + frag_off;
+		src = addr + frag->f_sg.offset + frag_off;
 		dst = (void *)map->m_page_addrs[map_page] + map_off;
 		for (k = 0; k < to_copy; k += 8) {
 			/* Record ports that became uncongested, ie
diff --git a/net/rds/iw_recv.c b/net/rds/iw_recv.c
index a66d179..62a1738 100644
--- a/net/rds/iw_recv.c
+++ b/net/rds/iw_recv.c
@@ -585,7 +585,7 @@ static void rds_iw_cong_recv(struct rds_connection *conn,
 
 		addr = kmap_atomic(frag->f_page);
 
-		src = addr + frag_off;
+		src = addr +  frag->f_offset + frag_off;
 		dst = (void *)map->m_page_addrs[map_page] + map_off;
 		for (k = 0; k < to_copy; k += 8) {
 			/* Record ports that became uncongested, ie
diff --git a/net/rds/page.c b/net/rds/page.c
index 5a14e6d..715cbaa 100644
--- a/net/rds/page.c
+++ b/net/rds/page.c
@@ -135,8 +135,9 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
 			if (rem->r_offset != 0)
 				rds_stats_inc(s_page_remainder_hit);
 
-			rem->r_offset += bytes;
-			if (rem->r_offset == PAGE_SIZE) {
+			/* some hw (e.g. sparc) require aligned memory */
+			rem->r_offset += ALIGN(bytes, 8);
+			if (rem->r_offset >= PAGE_SIZE) {
 				__free_page(rem->r_page);
 				rem->r_page = NULL;
 			}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] rds: rds-stress show all zeros after few minutes
  2016-03-31  6:29 [PATCH v2] rds: rds-stress show all zeros after few minutes shamir rabinovitch
@ 2016-03-31 20:02 ` David Miller
  2016-04-03 12:29   ` Shamir Rabinovitch
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2016-03-31 20:02 UTC (permalink / raw)
  To: shamir.rabinovitch; +Cc: rds-devel, netdev

From: shamir rabinovitch <shamir.rabinovitch@oracle.com>
Date: Thu, 31 Mar 2016 02:29:22 -0400

> Issue can be seen on platforms that use 8K and above page size
> while rds fragment size is 4K. On those platforms single page is
> shared between 2 or more rds fragments. Each fragment has its own
> offset and rds congestion map code need to take this offset to account.
> Not taking this offset to account lead to reading the data fragment
> as congestion map fragment and hang of the rds transmit due to far
> congestion map corruption.
> 
> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
> 
> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
> Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Tested-by: Anand Bibhuti <anand.bibhuti@oracle.com>

This doesn't apply cleanly to my current tree, please respin.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] rds: rds-stress show all zeros after few minutes
  2016-03-31 20:02 ` David Miller
@ 2016-04-03 12:29   ` Shamir Rabinovitch
  2016-04-03 17:11     ` santosh.shilimkar
  0 siblings, 1 reply; 4+ messages in thread
From: Shamir Rabinovitch @ 2016-04-03 12:29 UTC (permalink / raw)
  To: David Miller; +Cc: rds-devel, netdev

On Thu, Mar 31, 2016 at 04:02:46PM -0400, David Miller wrote:
> From: shamir rabinovitch <shamir.rabinovitch@oracle.com>
> Date: Thu, 31 Mar 2016 02:29:22 -0400
> 
> > Issue can be seen on platforms that use 8K and above page size
> > while rds fragment size is 4K. On those platforms single page is
> > shared between 2 or more rds fragments. Each fragment has its own
> > offset and rds congestion map code need to take this offset to account.
> > Not taking this offset to account lead to reading the data fragment
> > as congestion map fragment and hang of the rds transmit due to far
> > congestion map corruption.
> > 
> > Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
> > 
> > Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
> > Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
> > Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> > Tested-by: Anand Bibhuti <anand.bibhuti@oracle.com>
> 
> This doesn't apply cleanly to my current tree, please respin.

Sorry for the trouble.

Re-sent the patch based on net-next master.
Broke the patch according to comments from Santosh Shilimkar.

BR, Shamir

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] rds: rds-stress show all zeros after few minutes
  2016-04-03 12:29   ` Shamir Rabinovitch
@ 2016-04-03 17:11     ` santosh.shilimkar
  0 siblings, 0 replies; 4+ messages in thread
From: santosh.shilimkar @ 2016-04-03 17:11 UTC (permalink / raw)
  To: Shamir Rabinovitch, David Miller; +Cc: rds-devel, netdev

On 4/3/16 5:29 AM, Shamir Rabinovitch wrote:
> On Thu, Mar 31, 2016 at 04:02:46PM -0400, David Miller wrote:
>> From: shamir rabinovitch <shamir.rabinovitch@oracle.com>
>> Date: Thu, 31 Mar 2016 02:29:22 -0400
>>
>>> Issue can be seen on platforms that use 8K and above page size
>>> while rds fragment size is 4K. On those platforms single page is
>>> shared between 2 or more rds fragments. Each fragment has its own
>>> offset and rds congestion map code need to take this offset to account.
>>> Not taking this offset to account lead to reading the data fragment
>>> as congestion map fragment and hang of the rds transmit due to far
>>> congestion map corruption.
>>>
>>> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
>>>
>>> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
>>> Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
>>> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>> Tested-by: Anand Bibhuti <anand.bibhuti@oracle.com>
>>
>> This doesn't apply cleanly to my current tree, please respin.
>
> Sorry for the trouble.
>
> Re-sent the patch based on net-next master.
> Broke the patch according to comments from Santosh Shilimkar.
>
Thanks Shamir. Updated versions looks fine. You already
have my ack included.


Regards,
Santosh

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-03 17:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-31  6:29 [PATCH v2] rds: rds-stress show all zeros after few minutes shamir rabinovitch
2016-03-31 20:02 ` David Miller
2016-04-03 12:29   ` Shamir Rabinovitch
2016-04-03 17:11     ` santosh.shilimkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).