public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Select/Poll
@ 2004-06-02  5:33 jyotiraditya
  2004-06-02  5:54 ` Select/Poll David Schwartz
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: jyotiraditya @ 2004-06-02  5:33 UTC (permalink / raw)
  To: linux-kernel

Hello All, 

In one of the threads named: "Linux's implementation of poll() not 
scalable?'
Linus has stated the following:
**************
Neither poll() nor select() have this problem: they don't get more
expensive as you have more and more events - their expense is the number
of file descriptors, not the number of events per se. In fact, both poll()
and select() tend to perform _better_ when you have pending events, as
they are both amenable to optimizations when there is no need for waiting,
and scanning the arrays can use early-out semantics.
************** 

Please help me understand the above.. I'm using select in a server to read
on multiple FDs and the clients are dumping messages (of fixed size) in a
loop on these FDs and the server maintainig those FDs is not able to get all
the messages.. Some of the last messages sent by each client are lost.
If the number of clients and hence the number of FDs (in the server) is
increased the loss of data is proportional.
eg: 5 clients send messages (100 each) to 1 server and server receives
   96 messages from each client.
   10 clients send messages (100 by each) to 1 server and server again
   receives 96 from each client. 

If a small sleep in introduced between sending messages the loss of data
decreases.
Also please explain the algorithm select uses to read messages on FDs and
how does it perform better when number of FDs increases. 

Thanks and Regards,
Jyotiraditya 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Select/Poll
  2004-06-02  5:33 Select/Poll jyotiraditya
@ 2004-06-02  5:54 ` David Schwartz
  2004-06-02  6:12   ` Select/Poll Ben Greear
  2004-06-02  6:09 ` Select/Poll Ben Greear
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: David Schwartz @ 2004-06-02  5:54 UTC (permalink / raw)
  To: linux-kernel


> In one of the threads named: "Linux's implementation of poll() not
> scalable?'
> Linus has stated the following:
> **************
> Neither poll() nor select() have this problem: they don't get more
> expensive as you have more and more events - their expense is the number
> of file descriptors, not the number of events per se. In fact, both poll()
> and select() tend to perform _better_ when you have pending events, as
> they are both amenable to optimizations when there is no need for waiting,
> and scanning the arrays can use early-out semantics.
> **************
>
> Please help me understand the above.. I'm using select in a server to read
> on multiple FDs and the clients are dumping messages (of fixed size) in a
> loop on these FDs and the server maintainig those FDs is not able
> to get all
> the messages.. Some of the last messages sent by each client are lost.
> If the number of clients and hence the number of FDs (in the server) is
> increased the loss of data is proportional.
> eg: 5 clients send messages (100 each) to 1 server and server receives
>    96 messages from each client.
>    10 clients send messages (100 by each) to 1 server and server again
>    receives 96 from each client.
>
> If a small sleep in introduced between sending messages the loss of data
> decreases.
> Also please explain the algorithm select uses to read messages on FDs and
> how does it perform better when number of FDs increases.

	Your issue has nothing to do with select or poll scalability, it has to do
with the fact that UDP is unreliable and you must provide your own send
timing. A UDP server or client cannot just send 100 messages in one shot and
expect the other end to get all of them. They probably won't all even make
it to the wire, so the recipient can't solve the problem.

	DS



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Select/Poll
  2004-06-02  5:33 Select/Poll jyotiraditya
  2004-06-02  5:54 ` Select/Poll David Schwartz
@ 2004-06-02  6:09 ` Ben Greear
  2004-06-02  7:05 ` Select/Poll Vadim Lobanov
  2004-06-02 15:28 ` Select/Poll khandelw
  3 siblings, 0 replies; 10+ messages in thread
From: Ben Greear @ 2004-06-02  6:09 UTC (permalink / raw)
  To: jyotiraditya; +Cc: linux-kernel

jyotiraditya@softhome.net wrote:
> Hello All,
> In one of the threads named: "Linux's implementation of poll() not 
> scalable?'
> Linus has stated the following:
> **************
> Neither poll() nor select() have this problem: they don't get more
> expensive as you have more and more events - their expense is the number
> of file descriptors, not the number of events per se. In fact, both poll()
> and select() tend to perform _better_ when you have pending events, as
> they are both amenable to optimizations when there is no need for waiting,
> and scanning the arrays can use early-out semantics.
> **************
> Please help me understand the above.. I'm using select in a server to read
> on multiple FDs and the clients are dumping messages (of fixed size) in a
> loop on these FDs and the server maintainig those FDs is not able to get 
> all
> the messages.. Some of the last messages sent by each client are lost.
> If the number of clients and hence the number of FDs (in the server) is
> increased the loss of data is proportional.
> eg: 5 clients send messages (100 each) to 1 server and server receives
>   96 messages from each client.
>   10 clients send messages (100 by each) to 1 server and server again
>   receives 96 from each client.
> If a small sleep in introduced between sending messages the loss of data
> decreases.
> Also please explain the algorithm select uses to read messages on FDs and
> how does it perform better when number of FDs increases.

Try increasing your socket buffers so that the kernel will queue up more
packets while your user-space server is trying to wake up.

I used to have no problem receiving data with up to 1024 file descriptors
using select, but when you need more than 1024, you will need to go to poll
because fd_set has a maximum size of 1024 by default...

To increase your buffers, google for these files:
/proc/sys/net/core/wmem_max
/proc/sys/net/core/rmem_max
/proc/sys/net/core/netdev_max_backlog
...

Here is some sample code I use to set the buffer size based on the
maximum rate I think this socket will want to send:

int set_sock_wr_buffer_size(int desc, uint32 mx_rate) {
    int sz = (mx_rate / 40);
    if (sz < 32000) {
       sz = 32000;
    }
    if (sz > 4096000) {
       sz = 4096000;
    }

    while (sz >= 32000) {
       if (setsockopt(desc, SOL_SOCKET, SO_SNDBUF, (void*)&sz,
                      sizeof(sz)) < 0) {
          VLOG_WRN(VLOG << "ERROR: setting send buffer to: " << sz << " failed: "
                   << strerror(errno) << endl);
          sz = sz >> 1;
       }
       else {
          VLOG_INF(VLOG << "Set SNDBUF sz to: " << sz << " for desc: " << desc << endl);
          break;
       }
    }

    sz = max(2048000, sz);
    while (sz >= 32000) {
       if (setsockopt(desc, SOL_SOCKET, SO_RCVBUF, (void*)&sz,
                      sizeof(sz)) < 0) {
          VLOG_WRN(VLOG << "ERROR: setting receive buffer to: " << sz << " failed: "
                   << strerror(errno) << endl);
          sz = sz >> 1;
       }
       else {
          VLOG_INF(VLOG << "Set RCVBUF sz to: " << sz << " for desc: " << desc << endl);
          break;
       }
    }

    return sz;
}//set_sock_wr_buffer_size


Ben


> Thanks and Regards,
> Jyotiraditya -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Select/Poll
  2004-06-02  5:54 ` Select/Poll David Schwartz
@ 2004-06-02  6:12   ` Ben Greear
  2004-06-02  6:38     ` Select/Poll bert hubert
  0 siblings, 1 reply; 10+ messages in thread
From: Ben Greear @ 2004-06-02  6:12 UTC (permalink / raw)
  To: davids; +Cc: linux-kernel

David Schwartz wrote:

> 	Your issue has nothing to do with select or poll scalability, it has to do
> with the fact that UDP is unreliable and you must provide your own send
> timing. A UDP server or client cannot just send 100 messages in one shot and
> expect the other end to get all of them. They probably won't all even make
> it to the wire, so the recipient can't solve the problem.

You can check that they get to the wire in (almost?) all cases by watching
the return value for the sendto call.  And, if you have decent buffers on
the receive side, and a clean transport, then you can send at very high speeds
w/out dropping any significant number of packets, even when using select/poll and
non-blocking sockets...

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Select/Poll
  2004-06-02  6:12   ` Select/Poll Ben Greear
@ 2004-06-02  6:38     ` bert hubert
  0 siblings, 0 replies; 10+ messages in thread
From: bert hubert @ 2004-06-02  6:38 UTC (permalink / raw)
  To: Ben Greear; +Cc: davids, linux-kernel

> You can check that they get to the wire in (almost?) all cases by watching

QoS settings may drop your packet before actually hitting the wire, but
after being enqueued to the kernel.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Select/Poll
  2004-06-02  5:33 Select/Poll jyotiraditya
  2004-06-02  5:54 ` Select/Poll David Schwartz
  2004-06-02  6:09 ` Select/Poll Ben Greear
@ 2004-06-02  7:05 ` Vadim Lobanov
  2004-06-02 14:11   ` Select/Poll Davide Libenzi
  2004-06-02 15:28 ` Select/Poll khandelw
  3 siblings, 1 reply; 10+ messages in thread
From: Vadim Lobanov @ 2004-06-02  7:05 UTC (permalink / raw)
  To: jyotiraditya; +Cc: linux-kernel

On Tue, 1 Jun 2004 jyotiraditya@softhome.net wrote:

> Hello All, 
> 
> In one of the threads named: "Linux's implementation of poll() not 
> scalable?'
> Linus has stated the following:
> **************
> Neither poll() nor select() have this problem: they don't get more
> expensive as you have more and more events - their expense is the number
> of file descriptors, not the number of events per se. In fact, both poll()
> and select() tend to perform _better_ when you have pending events, as
> they are both amenable to optimizations when there is no need for waiting,
> and scanning the arrays can use early-out semantics.
> ************** 
> 
> Please help me understand the above.. I'm using select in a server to read
> on multiple FDs and the clients are dumping messages (of fixed size) in a
> loop on these FDs and the server maintainig those FDs is not able to get all
> the messages.. Some of the last messages sent by each client are lost.
> If the number of clients and hence the number of FDs (in the server) is
> increased the loss of data is proportional.
> eg: 5 clients send messages (100 each) to 1 server and server receives
>    96 messages from each client.
>    10 clients send messages (100 by each) to 1 server and server again
>    receives 96 from each client. 
> 
> If a small sleep in introduced between sending messages the loss of data
> decreases.
> Also please explain the algorithm select uses to read messages on FDs and
> how does it perform better when number of FDs increases. 
> 
> Thanks and Regards,
> Jyotiraditya 

I think everyone else already hit on the main points of UDP, so I'll pass 
on to the second question. :)

I believe that there is some confusion between the phrases "events" and 
"FDs". As far as I know, both poll() and select() scale O(n) (in other 
words, linearly) with the number of watched FDs, but scale O(1) (in other 
words, no effect) with the number of received events. Let's put this into 
more concrete terms:

Suppose you select/poll on an array of 100 FDs, which currently have no 
pending events. What the kernel will do for you, in essence, is go into an 
infinite loop, querying each of the 100 FDs in turn, whether it has 
received new events or not. If one of those has received an event, then 
select/poll will return that FD. But in the end, it reduces to a simple 
loop over the FDs to determine when events arrive, and it is exactly this 
loop that gives it O(n) behavior.

However, if by the time that select/poll are called, there are already 
pending events upon the FD set, then that syscall can return immediately 
with the events already present. In this case, you will not need to begin 
looping over the FDs, and hence you will not observe the O(n) behavior. 
Notice that this favorable scenario is more likely to occur when you have 
more events coming in. I think that this is what Linus meant when he said 
that select/poll like to have events waiting, for a faster return time.

As a very quick and very much simplistic summary, for select/poll, the 
more incoming events you get, and the less FDs you watch, the better off 
you are. But in your case, I do not think you have to worry about 
scalability much. If you _really_ want to, however, check epoll - should 
be standardized on the 2.6.x kernels (though my glibc still has VERY big 
issues with it).

And as a final word, I have no doubts that someone out there who is more 
knowledgeable can correct me wherever it may be needed. Such corrections 
are welcome, since I get to learn something new in that case. :)

-VadimL


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Select/Poll
  2004-06-02  7:05 ` Select/Poll Vadim Lobanov
@ 2004-06-02 14:11   ` Davide Libenzi
  0 siblings, 0 replies; 10+ messages in thread
From: Davide Libenzi @ 2004-06-02 14:11 UTC (permalink / raw)
  To: Vadim Lobanov; +Cc: jyotiraditya, Linux Kernel Mailing List

On Wed, 2 Jun 2004, Vadim Lobanov wrote:

> scalability much. If you _really_ want to, however, check epoll - should 
> be standardized on the 2.6.x kernels (though my glibc still has VERY big 
> issues with it).

(s/should/is/)
The "very BIG issues" statement is kinda hard to debug. Did you try to 
report you issues here (if kernel related) or to the glibc mailing list?



- Davide


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Select/Poll
  2004-06-02  5:33 Select/Poll jyotiraditya
                   ` (2 preceding siblings ...)
  2004-06-02  7:05 ` Select/Poll Vadim Lobanov
@ 2004-06-02 15:28 ` khandelw
  2004-06-03 15:10   ` Select/Poll Mike Jagdis
  3 siblings, 1 reply; 10+ messages in thread
From: khandelw @ 2004-06-02 15:28 UTC (permalink / raw)
  To: jyotiraditya; +Cc: linux-kernel

Hello,
   Can you give more details - Like which machine which vendor etc.,
On a sony vaio pcg frv31 laptop/ redhat 9.0/ after firing some 36,000+ request
my select multiplexed server used to fail. With select I believe you not get
any packet loss...

- Amit

PS. If you can post the code that will be great...


Quoting jyotiraditya@softhome.net:

> Hello All,
>
> In one of the threads named: "Linux's implementation of poll() not
> scalable?'
> Linus has stated the following:
> **************
> Neither poll() nor select() have this problem: they don't get more
> expensive as you have more and more events - their expense is the number
> of file descriptors, not the number of events per se. In fact, both poll()
> and select() tend to perform _better_ when you have pending events, as
> they are both amenable to optimizations when there is no need for waiting,
> and scanning the arrays can use early-out semantics.
> **************
>
> Please help me understand the above.. I'm using select in a server to read
> on multiple FDs and the clients are dumping messages (of fixed size) in a
> loop on these FDs and the server maintainig those FDs is not able to get all
> the messages.. Some of the last messages sent by each client are lost.
> If the number of clients and hence the number of FDs (in the server) is
> increased the loss of data is proportional.
> eg: 5 clients send messages (100 each) to 1 server and server receives
>    96 messages from each client.
>    10 clients send messages (100 by each) to 1 server and server again
>    receives 96 from each client.
>
> If a small sleep in introduced between sending messages the loss of data
> decreases.
> Also please explain the algorithm select uses to read messages on FDs and
> how does it perform better when number of FDs increases.
>
> Thanks and Regards,
> Jyotiraditya
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Select/Poll
  2004-06-02 15:28 ` Select/Poll khandelw
@ 2004-06-03 15:10   ` Mike Jagdis
  2004-06-03 15:53     ` Select/Poll khandelw
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Jagdis @ 2004-06-03 15:10 UTC (permalink / raw)
  To: khandelw; +Cc: jyotiraditya, linux-kernel

On Wed, Jun 02, 2004 at 11:28:29AM -0400, khandelw@cs.fsu.edu wrote:
> Hello,
>    Can you give more details - Like which machine which vendor etc.,
> On a sony vaio pcg frv31 laptop/ redhat 9.0/ after firing some 36,000+ request
> my select multiplexed server used to fail. With select I believe you not get
> any packet loss...

Then you'd be wrong. Poll/select tell you when desriptors
are readable/writable. They do *not* impose any magic queuing
mechanism that guarantees the buffers won't overflow. If the
low level protocol is non-flow controlled like UDP you *have*
to read data faster than it arrives and not write data faster
than it is being transmitted.

Mike

-- 
Mike Jagdis                        Web: http://www.eris-associates.co.uk
Eris Associates Limited            Tel: +44 7780 608 368
Reading, England                   Fax: +44 118 926 6974

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Select/Poll
  2004-06-03 15:10   ` Select/Poll Mike Jagdis
@ 2004-06-03 15:53     ` khandelw
  0 siblings, 0 replies; 10+ messages in thread
From: khandelw @ 2004-06-03 15:53 UTC (permalink / raw)
  To: Mike Jagdis; +Cc: jyotiraditya, linux-kernel

I meant it in the context of TCP. I thought it was implicit enough, because if
he was using UDP then packet loss is expected. (not necessary that it will
happen)

- Amit Khandelwal

 Quoting Mike Jagdis <mjagdis@eris-associates.co.uk>:

> On Wed, Jun 02, 2004 at 11:28:29AM -0400, khandelw@cs.fsu.edu wrote:
> > Hello,
> >    Can you give more details - Like which machine which vendor etc.,
> > On a sony vaio pcg frv31 laptop/ redhat 9.0/ after firing some 36,000+
> request
> > my select multiplexed server used to fail. With select I believe you not
> get
> > any packet loss...
>
> Then you'd be wrong. Poll/select tell you when desriptors
> are readable/writable. They do *not* impose any magic queuing
> mechanism that guarantees the buffers won't overflow. If the
> low level protocol is non-flow controlled like UDP you *have*
> to read data faster than it arrives and not write data faster
> than it is being transmitted.
>
> Mike
>
> --
> Mike Jagdis                        Web: http://www.eris-associates.co.uk
> Eris Associates Limited            Tel: +44 7780 608 368
> Reading, England                   Fax: +44 118 926 6974
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-06-03 15:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-02  5:33 Select/Poll jyotiraditya
2004-06-02  5:54 ` Select/Poll David Schwartz
2004-06-02  6:12   ` Select/Poll Ben Greear
2004-06-02  6:38     ` Select/Poll bert hubert
2004-06-02  6:09 ` Select/Poll Ben Greear
2004-06-02  7:05 ` Select/Poll Vadim Lobanov
2004-06-02 14:11   ` Select/Poll Davide Libenzi
2004-06-02 15:28 ` Select/Poll khandelw
2004-06-03 15:10   ` Select/Poll Mike Jagdis
2004-06-03 15:53     ` Select/Poll khandelw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox