netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* shutdown() and SHUT_RD on TCP sockets - broken?
@ 2003-07-01 11:23 mtk-lists
  0 siblings, 0 replies; 7+ messages in thread
From: mtk-lists @ 2003-07-01 11:23 UTC (permalink / raw)
  To: netdev

Hello,

I've done quite some searching, but have so far not found an answer
to the question of why does the behaviour described below occur on 
Linux...

According to SUSv3, if we perform a shutdown(fd, SHUT_RD) on a socket,
then further reads on that socket should be disabled.  In the AF_UNIX
domain, all is fine -- things operate as I expect.  However, for
TCP sockets, things are different (tested on 2.2.14, and 2.4.20):

1. If we perform a read() on the socket and there is no data, then 0
(EOF) is (immediately) returned.  (This is what I expected.)

2. However, the peer can still write() to the socket, and afterwards
we can read() that data from the socket, even though the reading half
of the socket should be shut down.  Instead of this behaviour, I
expected the read() to continue to return 0 as in point 1.  This is what 
we see for example in FreeBSD 4.8, Tru64 5.1B, and HP/UX 11.  

I thought that most implementations (other than Linux) did things 
this way, but I've just now gone and tested things on Solaris 8, 
and it seems to behave in the same way as Linux.

I've read the relevant source code to confirm the anomalous behaviour
described here.  But, why do things happen in this way on Linux?

3. (A side point.) Looking at Stevens UNPv1, p161, there is a statement 
that after a SHUT_RD, "any data for a TCP socket is acknowledged and 
then silently discarded".  This implies to me that the sender could keep
on writing to the socket and never block.  However, on Linux, if the peer 
keeps sending to a socket, then eventually (the channel is filled and) it
blocks.  I see that this also occurs on FreeBSD 4.8, Tru64 5.1B, 
HP/UX 11 and Solaris 8.  Have I misunderstood Stevens, or has
something changed since the implementation he described 
(or was his statement wrong)?  (In the AF_UNIX domain on Linux, the
peer gets SIGPIPE/EPIPE if it keeps writing after a local SHUT_RD.)

Thanks

Michael

-- 
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++
Bitte lächeln! Fotogalerie online mit GMX ohne eigene Homepage!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shutdown() and SHUT_RD on TCP sockets - broken?
@ 2003-07-08 16:09 mtk-lists
  2003-07-08 16:28 ` James Morris
  2003-07-08 16:55 ` Andi Kleen
  0 siblings, 2 replies; 7+ messages in thread
From: mtk-lists @ 2003-07-08 16:09 UTC (permalink / raw)
  To: netdev

Hello,

There was no response to the note below.  Is netdev the right place to 
raise this subject?

Cheers

Michael

------- Forwarded message follows -------
Date sent:      	Tue, 1 Jul 2003 13:23:49 +0200 (MEST)
From:           	mtk-lists@gmx.net
To:             	netdev@oss.sgi.com
BCC to:         	michael.kerrisk@gmx.net
Subject:        	shutdown() and SHUT_RD on TCP sockets - broken?

Hello,

I've done quite some searching, but have so far not found an answer
to the question of why does the behaviour described below occur on 
Linux...

According to SUSv3, if we perform a shutdown(fd, SHUT_RD) on a socket, then
further reads on that socket should be disabled.  In the AF_UNIX domain,
all is fine -- things operate as I expect.  However, for TCP sockets,
things are different (tested on 2.2.14, and 2.4.20):

1. If we perform a read() on the socket and there is no data, then 0
(EOF) is (immediately) returned.  (This is what I expected.)

2. However, the peer can still write() to the socket, and afterwards
we can read() that data from the socket, even though the reading half
of the socket should be shut down.  Instead of this behaviour, I
expected the read() to continue to return 0 as in point 1.  This is what we
see for example in FreeBSD 4.8, Tru64 5.1B, and HP/UX 11.  

I thought that most implementations (other than Linux) did things 
this way, but I've just now gone and tested things on Solaris 8, 
and it seems to behave in the same way as Linux.

I've read the relevant source code to confirm the anomalous behaviour
described here.  But, why do things happen in this way on Linux?

3. (A side point.) Looking at Stevens UNPv1, p161, there is a statement
that after a SHUT_RD, "any data for a TCP socket is acknowledged and then
silently discarded".  This implies to me that the sender could keep on
writing to the socket and never block.  However, on Linux, if the peer
keeps sending to a socket, then eventually (the channel is filled and) it
blocks.  I see that this also occurs on FreeBSD 4.8, Tru64 5.1B, HP/UX 11
and Solaris 8.  Have I misunderstood Stevens, or has something changed
since the implementation he described (or was his statement wrong)?  (In
the AF_UNIX domain on Linux, the peer gets SIGPIPE/EPIPE if it keeps
writing after a local SHUT_RD.)

Thanks

Michael

-- 
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++

Jetzt ein- oder umsteigen und USB-Speicheruhr als Prämie sichern!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shutdown() and SHUT_RD on TCP sockets - broken?
  2003-07-08 16:09 mtk-lists
@ 2003-07-08 16:28 ` James Morris
  2003-07-08 17:03   ` kuznet
  2003-07-08 16:55 ` Andi Kleen
  1 sibling, 1 reply; 7+ messages in thread
From: James Morris @ 2003-07-08 16:28 UTC (permalink / raw)
  To: mtk-lists; +Cc: netdev, kuznet

On Tue, 8 Jul 2003 mtk-lists@gmx.net wrote:

> Hello,
> 
> There was no response to the note below.  Is netdev the right place to 
> raise this subject?

Yes.  I believe Alexey has some reservations about the specified behavior.

- James
-- 
James Morris
<jmorris@intercode.com.au>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shutdown() and SHUT_RD on TCP sockets - broken?
  2003-07-08 16:09 mtk-lists
  2003-07-08 16:28 ` James Morris
@ 2003-07-08 16:55 ` Andi Kleen
  1 sibling, 0 replies; 7+ messages in thread
From: Andi Kleen @ 2003-07-08 16:55 UTC (permalink / raw)
  To: mtk-lists; +Cc: netdev

On Tue, 8 Jul 2003 18:09:14 +0200 (MEST)
mtk-lists@gmx.net wrote:

> 1. If we perform a read() on the socket and there is no data, then 0
> (EOF) is (immediately) returned.  (This is what I expected.)
> 
> 2. However, the peer can still write() to the socket, and afterwards
> we can read() that data from the socket, even though the reading half
> of the socket should be shut down.  Instead of this behaviour, I
> expected the read() to continue to return 0 as in point 1.  This is what we
> see for example in FreeBSD 4.8, Tru64 5.1B, and HP/UX 11.  

The problem is that it adds a new check to the input path. It's not clear how
the check can be done outside the fast path (one way would be to shrink
the window forcedly and drop the receiver into slow path, but that would
be a severe protocol violation if the shrunk window leaks out with 
some ACK). I don't think it's a good idea to add a check for such 
an obscure situation to the fast path.
> 
> 3. (A side point.) Looking at Stevens UNPv1, p161, there is a statement
> that after a SHUT_RD, "any data for a TCP socket is acknowledged and then
> silently discarded".  This implies to me that the sender could keep on
> writing to the socket and never block.  However, on Linux, if the peer
> keeps sending to a socket, then eventually (the channel is filled and) it
> blocks.  I see that this also occurs on FreeBSD 4.8, Tru64 5.1B, HP/UX 11

That's because the data is not discarded so the window fills. 

> and Solaris 8.  Have I misunderstood Stevens, or has something changed
> since the implementation he described (or was his statement wrong)?  (In

Probably Stevens was confused.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shutdown() and SHUT_RD on TCP sockets - broken?
  2003-07-08 16:28 ` James Morris
@ 2003-07-08 17:03   ` kuznet
  0 siblings, 0 replies; 7+ messages in thread
From: kuznet @ 2003-07-08 17:03 UTC (permalink / raw)
  To: James Morris; +Cc: mtk-lists, netdev

Hello!

> blocks.  I see that this also occurs on FreeBSD 4.8, Tru64 5.1B,
> HP/UX 11 and Solaris 8.  Have I misunderstood Stevens,

Most likely, it is that rare case when Stevens forgot to check the statement.

>From viewpoint of TCP the behaviour described in Stevens' book
is highly unnatural. SHUT_RD on TCP does not make any sense.


> described here.  But, why do things happen in this way on Linux?

Actually, you could check one more thing. What does happen after freebsd 4.8
returns 0 on read()? Does it open window eventually?

As you checked, all the stacks ignore SHUT_RD, when receiving data
and queue it anyway. And when read()ing Linux and, apparently Solaris,
prefer to return this data.

Alexey

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shutdown() and SHUT_RD on TCP sockets - broken?
@ 2003-07-09 10:11 mtk-lists
  2003-07-09 10:38 ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: mtk-lists @ 2003-07-09 10:11 UTC (permalink / raw)
  To: kuznet, Andi Kleen; +Cc: netdev

Hello Alexey and Andi,

[Alexey]

> > blocks.  I see that this also occurs on FreeBSD 4.8, Tru64 5.1B,
> > HP/UX 11 and Solaris 8.  Have I misunderstood Stevens,
> 
> Most likely, it is that rare case when Stevens forgot to check the
> statement.

yes, it cerainly doesn't correspond to any current implementation 
that I could find anyway.  

I should of course have added that (as you are probably well aware) SUSv3 is

vague but does say:

SHUT_RD Disables further receive operations.

which suggest that we shouldn't be able to read any more.  It seems to me 
that the only ways of satisfying that requirement are to either discard data

(a la Stevens) or send an RST to the writing peer (more on that in a moment)

so that it stops sending.

> From viewpoint of TCP the behaviour described in Stevens' book
> is highly unnatural. SHUT_RD on TCP does not make any sense.

A while back I had some communication with Andi Kleen on this point,
and he suggested that the TCP could send an RST in this case, much 
as occurs if the reader close()s the socket.  Is this not a starter?  
(Maybe not, for the reasons Andi outlined in his mail to this list -- quoted

below.)

> > described here.  But, why do things happen in this way on Linux?
> 
> Actually, you could check one more thing. What does happen after freebsd
> 4.8 returns 0 on read()? Does it open window eventually?

I'm not quite sure what you mean here.  Can you elaborate on the what 
type of experiment I should perform and what you expect I might see?

[Andi]

> > 1. If we perform a read() on the socket and there is no data, then 0
> > (EOF) is (immediately) returned.  (This is what I expected.)
> > 
> > 2. However, the peer can still write() to the socket, and afterwards we
> > can read() that data from the socket, even though the reading half of
the
> > socket should be shut down.  Instead of this behaviour, I expected the
> > read() to continue to return 0 as in point 1.  This is what we see for
> > example in FreeBSD 4.8, Tru64 5.1B, and HP/UX 11.  
> 
> The problem is that it adds a new check to the input path. It's not clear
> how the check can be done outside the fast path (one way would be to
shrink
> the window forcedly and drop the receiver into slow path, but that would
be
> a severe protocol violation if the shrunk window leaks out with some ACK).
> I don't think it's a good idea to add a check for such an obscure
situation
> to the fast path. 

Andi, I noted already your idea about delivering a RST in this case.  I
assume
the above is the practical reason that makes implementing this difficult?

> > 3. (A side point.) Looking at Stevens UNPv1, p161, there is a statement 
> > that after a SHUT_RD, "any data for a TCP socket is acknowledged and
then 
> > silently discarded".  This implies to me that the sender could keep on 
> > writing to the socket and never block.  However, on Linux, if the peer 
> > keeps sending to a socket, then eventually (the channel is filled and)
it 
> > blocks.  I see that this also occurs on FreeBSD 4.8, Tru64 5.1B, HP/UX
11
> 
> That's because the data is not discarded so the window fills. 

Yes,  I should perhaps have added that in the circumstances, blocking at
this 
point is not surprising (to me).

> > and Solaris 8.  Have I misunderstood Stevens, or has something changed
> > since the implementation he described (or was his statement wrong)?  (In
> 
> Probably Stevens was confused.

There seems to be a consensus emerging ;-).

Cheers,

Michael

-- 
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++

Jetzt ein- oder umsteigen und USB-Speicheruhr als Prämie sichern!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shutdown() and SHUT_RD on TCP sockets - broken?
  2003-07-09 10:11 shutdown() and SHUT_RD on TCP sockets - broken? mtk-lists
@ 2003-07-09 10:38 ` Andi Kleen
  0 siblings, 0 replies; 7+ messages in thread
From: Andi Kleen @ 2003-07-09 10:38 UTC (permalink / raw)
  To: mtk-lists; +Cc: kuznet, netdev

On Wed, 9 Jul 2003 12:11:19 +0200 (MEST)
mtk-lists@gmx.net wrote:

.
> 
> > From viewpoint of TCP the behaviour described in Stevens' book
> > is highly unnatural. SHUT_RD on TCP does not make any sense.
> 
> A while back I had some communication with Andi Kleen on this point,
> and he suggested that the TCP could send an RST in this case, much 

Linux sends an RST when data arrives that the  user cannot read anymore 
because the receiving socket is already closed. It would make sense to extend
this behaviour to SHUT_RD. But there is no natural place to implement it
outside the fast path, and it's so obscure that it is not worth slowing
common cases down.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-07-09 10:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-07-09 10:11 shutdown() and SHUT_RD on TCP sockets - broken? mtk-lists
2003-07-09 10:38 ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2003-07-08 16:09 mtk-lists
2003-07-08 16:28 ` James Morris
2003-07-08 17:03   ` kuznet
2003-07-08 16:55 ` Andi Kleen
2003-07-01 11:23 mtk-lists

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).