Spipe doesn't close when (established) connection times out (due to sleeping, network interruption, etc)

Discussion:

Blake Rainwater

2012-11-18 21:47:00 UTC

I am trying to use spiped for triggering updates on git repositories,
so the clients will pull the changes when they occur on the server. I
own/control the server and clients, and they are running linux (server
has an arm processor). I have posted some of the code for this at
https://gist.github.com/4107319 .

The server starts a spiped daemon that redirects an outward facing
port (encrypted) to an inward facing port (decrypted), and then it
uses socat to spawn a program when a client connects. The program
(tail -f -n 1 /srv/vcvm/ctl/repo-updates) sends the path of the
changed repo to the connected client when it is changed (since when
updated, repos append their path to that file).

The client connects to the server notification daemon by using spipe,
and then pipes the output of spipe to the monitor-sync function, which
has an infinite read loop in it. When the server daemon is killed, the
socket is closed and then the client side program closes out properly
(since spipe closes), however when it is disconnected (wifi turned
off, network cable unplugged, client put to sleep, etc) for a very
long time (e.g. overnight), and then (the network) is connected again,
the client side daemon does not receive updates anymore, but spipe
remains as if it were still connected. With short disconnections
(tested up to the order of minutes, however I suspect that it may be
up to a few hours) the client receives any updates that occurred
during the disconnection, and the notifications work properly.

The typical use of my netbook will trigger this bug in my code, since
I keep it asleep most of the time, sometimes even for days at a time,
and may not actually shut the netbook down for weeks at a time.

Alternative solutions:
I could use read timeouts in the monitor-sync function, however that
seems a bit kludgy and would possibly be prone to more
polling/overhead.
I could try messing with kernel settings, I think the relevant one
would be net.ipv4.tcp_keepalive_time, however I am unsure if this will
help.
I have modified the spiped code to drop the connection if the outgoing
connection is disconnected instead of if both are disconnected (see
commit https://github.com/brainwater/spiped/commit/fdee65943ad6c2b4a79b685895fe612709ddc184
) and have had mixed results in testing it (I have done little testing
so far, however when I used gdb to put a breakpoint in the
callback_pipestatus function, and then when I disconnected the network
cable, it hit the breakpoint a few hours afterwards, and also when I
was running my modified code it seemed to exit properly, though that
was over the course of a few hours that were hazy due to having a few
drinks (don't drink and debug). When I tried it again, it didn't seem
to work properly. Also, for the modification, I don't really have that
good of an idea of what that part of the code is supposed to do, and
am concerned about improper behavior from the server due to the change
I made.

There is a good chance I will use the timeouts with a connection
check, however I will do more testing with the spiped change that I
made.

If you have any suggestions for me, or further description of the
conditions when the spipe and/or spiped daemon shuts down or triggers
disconnections, they would be appreciated.

--Blake Rainwater

Colin Percival

2012-11-22 01:17:23 UTC

Permalink

Post by Blake Rainwater
The client connects to the server notification daemon by using spipe,
and then pipes the output of spipe to the monitor-sync function, which
has an infinite read loop in it. When the server daemon is killed, the
socket is closed and then the client side program closes out properly
(since spipe closes), however when it is disconnected (wifi turned
off, network cable unplugged, client put to sleep, etc) for a very
long time (e.g. overnight), and then (the network) is connected again,
the client side daemon does not receive updates anymore, but spipe
remains as if it were still connected. With short disconnections
(tested up to the order of minutes, however I suspect that it may be
up to a few hours) the client receives any updates that occurred
during the disconnection, and the notifications work properly.

Ok, I'm guessing what's happening here is:
1. Your server daemon writes something to its socket;
2. This is read by spiped, which encrypts it and sends it to your netbook;
3. Either a network device sends back a RST because your netbook is not
connected, or your server gets tired of retransmitting, and the operating
system resets the connection;
4. On the server, spiped sees the connection reset, and closes the other
end of the connection.
5. When your laptop wakes up, it sees a lack of traffic and interprets
this as simply being a lack of traffic.

In this sense, spipe/spiped is acting exactly the same way as if you had
a direct TCP connection -- but a TCP connection without keepalives turned
on, whereas many programs enable them.

Probably the best solution here is for me to add SO_KEEPALIVE by default
and add a command-line option to disable it. Ultimately it's a matter of
balancing dropping connections when you're really gone vs. not dropping
them in the event of temporary network glitches.

Post by Blake Rainwater
If you have any suggestions for me, or further description of the
conditions when the spipe and/or spiped daemon shuts down or triggers
disconnections, they would be appreciated.

The connection handling of spipe/spiped is TCP-transparent: If one end of
the connection is shut down, a shutdown will be passed through; and the
connection is closed if it is shut down in both directions OR if there is
an error in either direction.

--
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid

Colin Percival

2013-04-06 04:52:44 UTC

Permalink

It turns out that this was a bug in spiped -- when a connection was
reset, it failed to drop it (instead, it accidentally treated a
connection reset the same as a connection shutdown -- shutting down
the other end of the connection but not closing it).

This is fixed in spiped 1.3.0.

--
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid