* strange bottleneck with SMB 2.0
@ 2015-08-19 11:11 Yale Zhang
[not found] ` <CALQF7Zw5ET+uhgskMDMar31q1uo98nkSd9dusX4gQpS47-zKig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
[not found] ` <CAH2r5mu3xF2NLU_cTx6BwcYmLHHcpNm8_x_sdkFNhCmM8CF=Kw@mail.gmail.com>
0 siblings, 2 replies; 6+ messages in thread
From: Yale Zhang @ 2015-08-19 11:11 UTC (permalink / raw)
To: linux-cifs-u79uwXL29TY76Z2rM5mHXA
SMB developers/users,
I'm experiencing a strange bottleneck when my files are mounted as SMB
2.0. When I launch multiple processes in parallel for benchmarking,
only the 1st one starts, and the rest won't start until the 1st one
finishes:
---------------------------------------test
programs--------------------------------
#!/bin/sh
./a.out&
./a.out&
./a.out&
wait
a.out is just a C program like this:
int main()
{
printf("greetings\n");
while (true);
return 0;
}
Apparently, this only affects SMB 2.0. I tried it with SMB 2.1, SMB
3.0, & SMB 3.02, and everything starts in parallel as expected.
I'm assuming SMB 3 and especially SMB 2.1 would share a common
implementation. How could 2.0 have the problem but not 3? It almost
seems the bottleneck is a feature instead of a bug? 8(
Can it still be fixed?
-Yale
^ permalink raw reply [flat|nested] 6+ messages in thread[parent not found: <CALQF7Zw5ET+uhgskMDMar31q1uo98nkSd9dusX4gQpS47-zKig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: strange bottleneck with SMB 2.0 [not found] ` <CALQF7Zw5ET+uhgskMDMar31q1uo98nkSd9dusX4gQpS47-zKig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-08-19 12:40 ` Steve French 2015-08-20 12:57 ` Jeff Layton 1 sibling, 0 replies; 6+ messages in thread From: Steve French @ 2015-08-19 12:40 UTC (permalink / raw) To: Yale Zhang; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org What kernel version? On Wed, Aug 19, 2015 at 6:11 AM, Yale Zhang <yzhang1985-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > SMB developers/users, > > I'm experiencing a strange bottleneck when my files are mounted as SMB > 2.0. When I launch multiple processes in parallel for benchmarking, > only the 1st one starts, and the rest won't start until the 1st one > finishes: > > ---------------------------------------test > programs-------------------------------- > #!/bin/sh > ./a.out& > ./a.out& > ./a.out& > wait > > a.out is just a C program like this: > > int main() > { > printf("greetings\n"); > while (true); > return 0; > } > > Apparently, this only affects SMB 2.0. I tried it with SMB 2.1, SMB > 3.0, & SMB 3.02, and everything starts in parallel as expected. > > I'm assuming SMB 3 and especially SMB 2.1 would share a common > implementation. How could 2.0 have the problem but not 3? It almost > seems the bottleneck is a feature instead of a bug? 8( > > Can it still be fixed? > > -Yale > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: strange bottleneck with SMB 2.0 [not found] ` <CALQF7Zw5ET+uhgskMDMar31q1uo98nkSd9dusX4gQpS47-zKig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-08-19 12:40 ` Steve French @ 2015-08-20 12:57 ` Jeff Layton [not found] ` <20150820085701.46611da1-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org> 1 sibling, 1 reply; 6+ messages in thread From: Jeff Layton @ 2015-08-20 12:57 UTC (permalink / raw) To: Yale Zhang; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA On Wed, 19 Aug 2015 04:11:31 -0700 Yale Zhang <yzhang1985-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > SMB developers/users, > > I'm experiencing a strange bottleneck when my files are mounted as SMB > 2.0. When I launch multiple processes in parallel for benchmarking, > only the 1st one starts, and the rest won't start until the 1st one > finishes: > > ---------------------------------------test > programs-------------------------------- > #!/bin/sh > ./a.out& > ./a.out& > ./a.out& > wait > > a.out is just a C program like this: > > int main() > { > printf("greetings\n"); > while (true); > return 0; > } > > Apparently, this only affects SMB 2.0. I tried it with SMB 2.1, SMB > 3.0, & SMB 3.02, and everything starts in parallel as expected. > > I'm assuming SMB 3 and especially SMB 2.1 would share a common > implementation. How could 2.0 have the problem but not 3? It almost > seems the bottleneck is a feature instead of a bug? 8( > > Can it still be fixed? > > -Yale Probably. It'd be interesting to see what the other tasks are blocking on. After firing up the second one can you run: # cat /proc/<pid of second a.out>/stack ...and paste the stack trace here? That should tell us what those other processes are doing. -- Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20150820085701.46611da1-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>]
* Re: strange bottleneck with SMB 2.0 [not found] ` <20150820085701.46611da1-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org> @ 2015-09-29 0:09 ` Yale Zhang [not found] ` <CALQF7Zz63RFF6C4YW8C0-spwHju=BTCDcyKtvAUh_0Lhe=Xseg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Yale Zhang @ 2015-09-29 0:09 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA Sorry about the delay. I haven't been spending time on this issue because I can just use SMB 3. But for anyone else who is stuck, here's my diagnosis: I also found that the 2nd & 3rd instances of a.out doesn't always need to wait for the 1st one to finish before starting. They consistently start 35.5s after the 1st one. Here are my observations after the 1st program launches and the 2nd & 3rd are prevented from starting: 1. the 1st a.out is in the running state R+ 2. the 2nd a.out still hasn't started. Bash has forked itself to call exec("a.out"), but ps still shows the forked process as Bash, not a.out. The process is in the D+ state, meaning it's inside the kernel. I tried getting the kernel stack trace as Jeff suggested, but "cat /proc/3718/stack" hangs! Eventually when a.out starts 35.5s after the 1st one, I see this: [<ffffffff81113a35>] __alloc_pages_nodemask+0x1a5/0x990 [<ffffffff811b9aaa>] load_elf_binary+0xda/0xeb0 [<ffffffff811375a2>] __vma_link_rb+0x62/0xb0 [<ffffffff81150cb8>] alloc_pages_vma+0x158/0x210 [<ffffffff813623e5>] cpumask_any_but+0x25/0x40 [<ffffffff8104c8a2>] flush_tlb_page+0x32/0x90 [<ffffffff8113c932>] page_add_new_anon_rmap+0x72/0xe0 [<ffffffff8112ffbf>] wp_page_copy+0x31f/0x450 [<ffffffff81159885>] cache_alloc_refill+0x85/0x340 [<ffffffff8115a06f>] kmem_cache_alloc+0x14f/0x1b0 [<ffffffff81070dc0>] prepare_creds+0x20/0xd0 [<ffffffff81167f45>] SyS_faccessat+0x65/0x270 [<ffffffff81045f23>] __do_page_fault+0x253/0x4c0 [<ffffffff817558d7>] system_call_fastpath+0x12/0x6a [<ffffffffffffffff>] 0xffffffffffffffff But that trace is probably irrelevant, because it's now in the running state. Absolutely bizarre. And appalling since it's like you're hurt and can't even yell for help (view the kernel stack) On Thu, Aug 20, 2015 at 5:57 AM, Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org> wrote: > On Wed, 19 Aug 2015 04:11:31 -0700 > Yale Zhang <yzhang1985-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> SMB developers/users, >> >> I'm experiencing a strange bottleneck when my files are mounted as SMB >> 2.0. When I launch multiple processes in parallel for benchmarking, >> only the 1st one starts, and the rest won't start until the 1st one >> finishes: >> >> ---------------------------------------test >> programs-------------------------------- >> #!/bin/sh >> ./a.out& >> ./a.out& >> ./a.out& >> wait >> >> a.out is just a C program like this: >> >> int main() >> { >> printf("greetings\n"); >> while (true); >> return 0; >> } >> >> Apparently, this only affects SMB 2.0. I tried it with SMB 2.1, SMB >> 3.0, & SMB 3.02, and everything starts in parallel as expected. >> >> I'm assuming SMB 3 and especially SMB 2.1 would share a common >> implementation. How could 2.0 have the problem but not 3? It almost >> seems the bottleneck is a feature instead of a bug? 8( >> >> Can it still be fixed? >> >> -Yale > > Probably. It'd be interesting to see what the other tasks are blocking > on. After firing up the second one can you run: > > # cat /proc/<pid of second a.out>/stack > > ...and paste the stack trace here? That should tell us what those other > processes are doing. > > -- > Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CALQF7Zz63RFF6C4YW8C0-spwHju=BTCDcyKtvAUh_0Lhe=Xseg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: strange bottleneck with SMB 2.0 [not found] ` <CALQF7Zz63RFF6C4YW8C0-spwHju=BTCDcyKtvAUh_0Lhe=Xseg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-09-29 0:18 ` Steve French 0 siblings, 0 replies; 6+ messages in thread From: Steve French @ 2015-09-29 0:18 UTC (permalink / raw) To: Yale Zhang Cc: Jeff Layton, linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org The good news is that we really, really don't want to encourage SMB2.0 (SMB2.1 and later have better performance and security) and we want to encourage SMB3.0 (SMB3.02 is fine too, but SMB3.11 is still experimental) so we want users to mount with "vers=3.0" except to Samba (where Unix Extensions to CIFS make it an interesting tradeoff which is better "vers=3.0" or cifs with unix extensions). Perhaps the odd behavior difference has to do with the lack of multicredit/large read/large write support in SMB2. Note rsize/wsize is only 64K in SMB2 as a result - and SMB2.1 and later get 1MB read/write sizes which is better. On Mon, Sep 28, 2015 at 7:09 PM, Yale Zhang <yzhang1985-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Sorry about the delay. I haven't been spending time on this issue > because I can just use SMB 3. But for anyone else who is stuck, here's > my diagnosis: > > I also found that the 2nd & 3rd instances of a.out doesn't always need > to wait for the 1st one to finish before starting. They consistently > start 35.5s after the 1st one. > > Here are my observations after the 1st program launches and the 2nd & > 3rd are prevented from starting: > > 1. the 1st a.out is in the running state R+ > 2. the 2nd a.out still hasn't started. Bash has forked itself to call > exec("a.out"), but ps still shows the forked process as Bash, not > a.out. The process is in the D+ state, meaning it's inside the kernel. > I tried getting the kernel stack trace as Jeff suggested, but "cat > /proc/3718/stack" hangs! > Eventually when a.out starts 35.5s after the 1st one, I see this: > > [<ffffffff81113a35>] __alloc_pages_nodemask+0x1a5/0x990 > [<ffffffff811b9aaa>] load_elf_binary+0xda/0xeb0 > [<ffffffff811375a2>] __vma_link_rb+0x62/0xb0 > [<ffffffff81150cb8>] alloc_pages_vma+0x158/0x210 > [<ffffffff813623e5>] cpumask_any_but+0x25/0x40 > [<ffffffff8104c8a2>] flush_tlb_page+0x32/0x90 > [<ffffffff8113c932>] page_add_new_anon_rmap+0x72/0xe0 > [<ffffffff8112ffbf>] wp_page_copy+0x31f/0x450 > [<ffffffff81159885>] cache_alloc_refill+0x85/0x340 > [<ffffffff8115a06f>] kmem_cache_alloc+0x14f/0x1b0 > [<ffffffff81070dc0>] prepare_creds+0x20/0xd0 > [<ffffffff81167f45>] SyS_faccessat+0x65/0x270 > [<ffffffff81045f23>] __do_page_fault+0x253/0x4c0 > [<ffffffff817558d7>] system_call_fastpath+0x12/0x6a > [<ffffffffffffffff>] 0xffffffffffffffff > > But that trace is probably irrelevant, because it's now in the running state. > > Absolutely bizarre. And appalling since it's like you're hurt and > can't even yell for help (view the kernel stack) > > > > On Thu, Aug 20, 2015 at 5:57 AM, Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org> wrote: >> On Wed, 19 Aug 2015 04:11:31 -0700 >> Yale Zhang <yzhang1985-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >>> SMB developers/users, >>> >>> I'm experiencing a strange bottleneck when my files are mounted as SMB >>> 2.0. When I launch multiple processes in parallel for benchmarking, >>> only the 1st one starts, and the rest won't start until the 1st one >>> finishes: >>> >>> ---------------------------------------test >>> programs-------------------------------- >>> #!/bin/sh >>> ./a.out& >>> ./a.out& >>> ./a.out& >>> wait >>> >>> a.out is just a C program like this: >>> >>> int main() >>> { >>> printf("greetings\n"); >>> while (true); >>> return 0; >>> } >>> >>> Apparently, this only affects SMB 2.0. I tried it with SMB 2.1, SMB >>> 3.0, & SMB 3.02, and everything starts in parallel as expected. >>> >>> I'm assuming SMB 3 and especially SMB 2.1 would share a common >>> implementation. How could 2.0 have the problem but not 3? It almost >>> seems the bottleneck is a feature instead of a bug? 8( >>> >>> Can it still be fixed? >>> >>> -Yale >> >> Probably. It'd be interesting to see what the other tasks are blocking >> on. After firing up the second one can you run: >> >> # cat /proc/<pid of second a.out>/stack >> >> ...and paste the stack trace here? That should tell us what those other >> processes are doing. >> >> -- >> Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org> > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CAH2r5mu3xF2NLU_cTx6BwcYmLHHcpNm8_x_sdkFNhCmM8CF=Kw@mail.gmail.com>]
[parent not found: <CAH2r5mu3xF2NLU_cTx6BwcYmLHHcpNm8_x_sdkFNhCmM8CF=Kw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: strange bottleneck with SMB 2.0 [not found] ` <CAH2r5mu3xF2NLU_cTx6BwcYmLHHcpNm8_x_sdkFNhCmM8CF=Kw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-08-19 13:41 ` Yale Zhang 0 siblings, 0 replies; 6+ messages in thread From: Yale Zhang @ 2015-08-19 13:41 UTC (permalink / raw) To: Steve French; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org It happens as late as in 4.1.6 On Wed, Aug 19, 2015 at 5:38 AM, Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > What kernel version? > > On Wed, Aug 19, 2015 at 6:11 AM, Yale Zhang <yzhang1985-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >> SMB developers/users, >> >> I'm experiencing a strange bottleneck when my files are mounted as SMB >> 2.0. When I launch multiple processes in parallel for benchmarking, >> only the 1st one starts, and the rest won't start until the 1st one >> finishes: >> >> ---------------------------------------test >> programs-------------------------------- >> #!/bin/sh >> ./a.out& >> ./a.out& >> ./a.out& >> wait >> >> a.out is just a C program like this: >> >> int main() >> { >> printf("greetings\n"); >> while (true); >> return 0; >> } >> >> Apparently, this only affects SMB 2.0. I tried it with SMB 2.1, SMB >> 3.0, & SMB 3.02, and everything starts in parallel as expected. >> >> I'm assuming SMB 3 and especially SMB 2.1 would share a common >> implementation. How could 2.0 have the problem but not 3? It almost >> seems the bottleneck is a feature instead of a bug? 8( >> >> Can it still be fixed? >> >> -Yale >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > Thanks, > > Steve ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-09-29 0:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-19 11:11 strange bottleneck with SMB 2.0 Yale Zhang
[not found] ` <CALQF7Zw5ET+uhgskMDMar31q1uo98nkSd9dusX4gQpS47-zKig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-19 12:40 ` Steve French
2015-08-20 12:57 ` Jeff Layton
[not found] ` <20150820085701.46611da1-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2015-09-29 0:09 ` Yale Zhang
[not found] ` <CALQF7Zz63RFF6C4YW8C0-spwHju=BTCDcyKtvAUh_0Lhe=Xseg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-29 0:18 ` Steve French
[not found] ` <CAH2r5mu3xF2NLU_cTx6BwcYmLHHcpNm8_x_sdkFNhCmM8CF=Kw@mail.gmail.com>
[not found] ` <CAH2r5mu3xF2NLU_cTx6BwcYmLHHcpNm8_x_sdkFNhCmM8CF=Kw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-19 13:41 ` Yale Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox