I'm currenlty running a production app which has several independently running application server classes going at any given time. Each of these uses one Selector to provide support for asynchronous I/O operations. Lately I noticed that when bouncing one of these servers I'd have problems bringing it back up because of sequential "ghost listeners" and "ghost connections" colliding with the ports I was interested in.
So, I got out a local port-scanner and did some digging. To my chagrin I discovered that every time I made a call to Selector.open() a new TCP connection was made from my application to my application on an internal port. In Java 1.4.2_02 this occured on the "primary" network adapter. In Java 1.5 this occured on the loopback adapter. Unfortunately for me neither is acceptable because my app regularly binds and unbinds for listening on varying adapters including the wildcard adapter (0.0.0.0) and I can't have my own process colliding with itself trying to listen to ports.
Okay, so then I did some forum searching with the help of a couple co-workers. It turns out these connections are "normal" and related to something called the "wakupPipe", or "wakup pipe". Also, this seems somewhat related to something we call the "runaway select event" in-house (where Selector.select(x) returns 0 before the timeout is up over and over again, which we've long since worked around to support Java 1.4.2_02).
This problem occurs on windows 2000 and windows server 2003. I've attached a code-snippet below that will duplicate the problem (and flood a system with extraneous TCP connections if left running long enough).
My questions are:
1) Why in the world did this wakup pipe have to be implemented as a TCP connection (rather than in-memory)?
2) Why is this not documented anywhere in the Java API's, or am I missing the documentation of it?
3) Is there some way to control the behaviour of this "wakup pipe"? (ie: make it be in-memory only, a file, or specify port-range, IP etc...)
4) Isn't it dangerous to create a library based on undocumented and randomly allocated TCP connections that can't be controlled via configuration?
Basically on #4 I want to know why/if running this code wouldn't be a major problem on any system that opens and closes ports for listening regularly. (And yes, aside from the fact that it doesn't explicitly clean up the selectors before exiting.)
My questions are:
1) Why in the world did this wakup pipe have to be
implemented as a TCP connection (rather than
in-memory)?
The wakeup pipe is implemented as a java.nio.channels.Pipe. Internally these in turn are implemented using whatever O/S facility is appropriate. On Windows this is a TCP/IP listening port randomly assigned by the O/S & bound to the loopback address 127.0.0.1 (i.e. citing {"127.0.0.1",0} to the bind() method). This was I think the only choice given that the Pipe channels must be SelectableChannels, i.e. under the hood they must be usable in WSAAsyncSelect(), which Windows native pipes are not.
2) Why is this not documented anywhere in the Java
API's, or am I missing the documentation of it?
It's not a Java API issue, it is a platform issue having to do with the 64-fd limitation of WSAAsyncSelect() on WINSOCK 2.
3) Is there some way to control the behaviour of this
"wakup pipe"? (ie: make it be in-memory only, a
file, or specify port-range, IP etc...)
Not that I can see.
4) Isn't it dangerous to create a library based on
undocumented and randomly allocated TCP connections
that can't be controlled via configuration?
Basically on #4 I want to know why/if running this
code wouldn't be a major problem on any system that
opens and closes ports for listening regularly. (And
yes, aside from the fact that it doesn't explicitly
clean up the selectors before
exiting.)
The only danger I can see is that if you have a large number of Selectors open at the same time you are consuming the port number space on the loopback 127.0.0.1. It's pretty large, tens of thousands (although you can also run into buffer space limitations and limits imposed by various Windows versions). I don't know why anyone would use more than a couple of Selectors myself, certainly not hundreds or thousands. I don't see that your application's behaviour of unbinding/rebinding repeatedly would have any unwanted interaction with the wakeup pipes.
Basically you should ensure all Selectors are closed after use anyway, even regardless of the wakeup pipe.
Re: Random TCP connections created in Selector.open (in NIO)
Dec 15, 2004 3:05 PM
(reply 2
of 6) (In reply to
#1 )
Basically you should ensure all Selectors are closed after use anyway, even regardless of the wakeup pipe.
We do clean them up when they're not in use.
I don't know why anyone would use more than a couple of Selectors myself,
We have a single monolithic process which runs several totally independent communication modules. They are independant so that they can be dynamically class-loaded without affecting one-another or bringing the monolithic process down. We could certainly do everything with a single selector (or two, if OP_READ and OP_WRITE are still better off seperated) but we'd pretty well ruin the modular nature of our code-base.
I don't see that your application's behaviour of unbinding/rebinding repeatedly would have any unwanted interaction with the wakeup pipes.
One typical configuration had 26 communication modules, and therefore 26 selectors, active at the same time. We had an issue in production where one of the ServerSocketChannels failed to bind because it collided with the wakeup pipe range. Of course, this was on Java 1.4.2_02 which binds on the primary adapter for the system and not the loopback adapter. Yes, we can repeatedly try to bind on a port and perform other work-arounds, but why should we have to? How could we have expected this behavior? (It may be a Windows limitation that caused Sun to choose their implementation method, but non-Java TCP apps on windows don't have these problems...)
Note: The problem appears exacerbated from having the listen ports of these wakeup pipe connections stay open for long periods of time (rather than closing as soon as the pipe is established).
> 3) Is there some way to control the behaviour of this
"wakup pipe"? (ie: make it be in-memory only, a
file, or specify port-range, IP etc...)
Not that I can see.
Well, considering the behavior changed between 1.4.2_02 and 1.5 it can't be all that inaccessible a fix. Perhaps using an extra TCP connection was necessary in some cases, but obviously binding to ("127.0.0.1", 0) isn't the only choice since it has changed recently and those values could easily be made configurable (given access to the code involved).
Actually, I'm also wondering if a single (known-default/configurable) listen port wouldn't be adequate for all of these wakeup pipe TCP connections.
-Bob
P.S. - Sorry if I'm a bit abrasive, but this is a rather frustrating predicament.
Re: Random TCP connections created in Selector.open (in NIO)
Dec 15, 2004 3:22 PM
(reply 3
of 6) (In reply to
#2 )
Hmmm ...
We had an issue in production where one
of the ServerSocketChannels failed to bind because it
collided with the wakeup pipe range. Of course, this
was on Java 1.4.2_02 which binds on the primary
adapter for the system and not the loopback adapter.
This seems back to front. By default Java binds to INADDR_ANY which is all the interfaces, which is why you got the collision on the loopback port which was already there. If it bound the socket to a specific non-loopback NIC there would be no collision with any loopback port, they are different number spaces.
Are you able to create all the ServerSockets before any of the Selectors?
or, if your hosts aren't multihomed, is it practical for the application to bind its ServerSockets to the primary NIC (i.e. the non-loopback)?
Yes, we can repeatedly try to bind on a port and
d perform other work-arounds, but why should we have
to? How could we have expected this behavior? (It
may be a Windows limitation that caused Sun to choose
their implementation method, but non-Java TCP apps on
windows don't have these problems...)
Agreed, but then again non-Java TCP apps don't try to implement select() for arbitrary numbers of sockets to agree with *nix platforms, they can generally live with <= 64.
Note: The problem appears exacerbated from having the
listen ports of these wakeup pipe connections stay
open for long periods of time (rather than closing as
soon as the pipe is established).
Would this help? There would still be the connected port with the same number & this might inhibit a new listening port with that number. Haven't tried this myself.
Well, considering the behavior changed between
1.4.2_02 and 1.5 it can't be all that inaccessible a
fix. Perhaps using an extra TCP connection was
necessary in some cases, but obviously binding to
("127.0.0.1", 0) isn't the only choice since it has
changed recently and those values could easily be
made configurable (given access to the code
involved).
It changed from binding to 0, i.e. INADDR_ANY, in 1.4 to binding to 127.0.0.1 in 1.5, probably in an effort to vacate the port space for the physical NICs.
Given access to the code involved you can change anything. In the SCSL code it is sun.nio.ch.WIndowsSelectorImpl.java in src/windows/classes.
Actually, I'm also wondering if a single
(known-default/configurable) listen port wouldn't be
adequate for all of these wakeup pipe TCP
connections.
Re: Random TCP connections created in Selector.open (in NIO)
Dec 15, 2004 4:45 PM
(reply 4
of 6) (In reply to
#3 )
This seems back to front. By default Java binds to INADDR_ANY which is all the interfaces,
If you run the code-snippet I originally included in 1.4.2_02 and use netstat or tcpview or cports you'll see that the wakeup pipe TCP connections are indeed bound to a specific network adapter, not to 0.0.0.0 (the wildcard adapter) or to 127.0.0.1 (the loopback adapter). I don't know what the source looks like for this part of the Selector implementation (yet), but that's the behavior.
Are you able to create all the ServerSockets before any of the Selectors?
No, that would make it impossible to do dynamic class loading on one module while another was running.
or, if your hosts aren't multihomed, is it practical for the application to bind its ServerSockets to the primary NIC (i.e. the non-loopback)?
Each communications module in my app binds based on configuration parameters, changeable at run-time. They can bind on loopback, wild-card, a specific NIC, or the default primary NIC for the system (they also play weddings and bar-mitzvahs).
When my app needs to bind on the same address the wakeup pipes are using, that's when I get into trouble.
Agreed, but then again non-Java TCP apps don't try to implement select() for arbitrary numbers of sockets to agree with *nix platforms, they can generally live with <= 64.
Okay, I'm starting to understand a bit more why this is necessary overhead.
Would [closing pipes for listening] help? There would still be the connected port with the same number & this might inhibit a new listening port with that number. Haven't tried this myself.
Hmm... I thought so, but I'll do some more testing. I may have missed something here.
Given access to the code involved you can change anything. In the SCSL code it is sun.nio.ch.WIndowsSelectorImpl.java in src/windows/classes.
SCSL [which evidently stands for SUN COMMUNITY SOURCE LICENSE]? sun.nio.ch.WindowsSelectorImpl.java? src/windows/classes? Where do I go to download all this stuff and can I still get it for 1.4.2 or is 1.5 all that's currently available?
Re: Random TCP connections created in Selector.open (in NIO)
Dec 15, 2004 7:27 PM
(reply 5
of 6) (In reply to
#4 )
The SCSL code for 1.4.2 is still available, saw it a couple of days ago. The 1.4.2 code I have binds Pipes to 0.0.0.0 and this is also how Sockets and ServerSockets behave by default still. In 1.5 the behaviour for Pipes changed to 127.0.0.1.
I can't really recommend that you modify Sun code, might be better to bind your ServerSockets to a specific IP address if you can manage it simply. Horrible solution.
Re: Random TCP connections created in Selector.open (in NIO)
Dec 17, 2004 10:52 AM
(reply 6
of 6) (In reply to
#5 )
might be better to bind your ServerSockets to a specific IP address if you can manage it simply
When you say this you're really asking me to disable the ability of my application to bind on the loopback adapter. I guess this is what I'm going to have to do for the time being, but if I get the time I will be looking into getting Sun's implementation changed to use a single known/configureable port, if possible.