Discussion:
Strange error 10038 in select()
(too old to reply)
JJe
2004-08-12 06:41:05 UTC
Permalink
Hello group

My application works with a number of TCP connections which are handled
in a loop using the select() function. Eventually (very rarely in fact)
my application hangs in an endless loop. The reason is that the select
function starts returning errors 10038 (An operation was attempted on
something that is not a socket.).

I was sure all the sockets were correct nevertheless I added a debug
code to my application to be able to locate the socket that causes the
error. Here is the application in pseudo code:

void mainloop()
{

.
.
.

fd_set rset, wset, eset;
FD_ZERO(&rset);
FD_ZERO(&wset);
FD_ZERO(&eset);

for (all my sockets)
FD_SET(socket, (appropriate set));

error = select(sockmax, &rset, &wset, &eset, &to);
if (error == SOCKET_ERROR && GetLastError() == 10038)
debug10038();

.
.
.
}

void debug10038()
{
for (all my sockets) {

fd_set rset, wset, eset;
FD_ZERO(&rset);
FD_ZERO(&wset);
FD_ZERO(&eset);

FD_SET(socket, &rset);
FD_SET(socket, &wset);
FD_SET(socket, &eset);

error = select(sockmax, &rset, &wset, &eset, &to);
if (error == SOCKET_ERROR && GetLastError() == 10038) {
printf("Invalid socket found!\n");
assert(0);
}
}
}


To my surprise however, all the sockets were really correct. The
debug10038() function always finishes without indicating a socket being
illegal.

Can somebody tell me why select() could be returning error 10038 even
though its parameters are correct? Thank you.

JJ.
Alun Jones [MSFT]
2004-08-12 16:16:12 UTC
Permalink
Post by JJe
To my surprise however, all the sockets were really correct. The
debug10038() function always finishes without indicating a socket being
illegal.
Can somebody tell me why select() could be returning error 10038 even
though its parameters are correct? Thank you.
The only situation I can think of that would cause an error when sockets are
collected together, but no error when sockets are separated, is that you are
providing sockets of different types.

From the MSDN documentation for select
<http://msdn.microsoft.com/library/en-us/winsock/winsock/select_2.asp>:

"The sockets contained within the fd_set structures must be associated with
a single service provider. For the purpose of this restriction, sockets are
considered to be from the same service provider if the WSAPROTOCOL_INFO
structures describing their protocols have the same providerId value."

Check to make absolutely certain that the sockets you have are all TCP
sockets. Your debug10038 routine could use the getsockopt() call on each
socket with the SO_PROTOCOL_INFO option to get a WSAPROTOCOL_INFO object,
and compare providerId values.

Alun.
~~~~
JJe
2004-08-13 12:36:49 UTC
Permalink
Post by Alun Jones [MSFT]
Post by JJe
To my surprise however, all the sockets were really correct. The
debug10038() function always finishes without indicating a socket being
illegal.
Can somebody tell me why select() could be returning error 10038 even
though its parameters are correct? Thank you.
Check to make absolutely certain that the sockets you have are all TCP
sockets. Your debug10038 routine could use the getsockopt() call on each
socket with the SO_PROTOCOL_INFO option to get a WSAPROTOCOL_INFO object,
and compare providerId values.
Thanks! With the highest probablity this was the cause of my problems. I
have analyzed several cases and in all of them I have found a 3rd party
software installed that dynamically creates or removes custom providers.
I have fixed the problem by forcing a certain providerId in the call to
WSASocket().

Thanks again.

JJ.
Arkady Frenkel
2004-08-15 07:51:39 UTC
Permalink
That would be nice if great old article of MSDN "
Give Me a Handle, and I'll Show You an Object"
by Ruediger Asche ( more that 10 years ago ) could be applicable for winsock
too :)
Arkady
Post by JJe
Post by Alun Jones [MSFT]
Post by JJe
To my surprise however, all the sockets were really correct. The
debug10038() function always finishes without indicating a socket being
illegal.
Can somebody tell me why select() could be returning error 10038 even
though its parameters are correct? Thank you.
Check to make absolutely certain that the sockets you have are all TCP
sockets. Your debug10038 routine could use the getsockopt() call on each
socket with the SO_PROTOCOL_INFO option to get a WSAPROTOCOL_INFO object,
and compare providerId values.
Thanks! With the highest probablity this was the cause of my problems. I
have analyzed several cases and in all of them I have found a 3rd party
software installed that dynamically creates or removes custom providers.
I have fixed the problem by forcing a certain providerId in the call to
WSASocket().
Thanks again.
JJ.
JJe
2004-08-31 07:03:59 UTC
Permalink
Post by JJe
Post by Alun Jones [MSFT]
Post by JJe
Can somebody tell me why select() could be returning error 10038 even
though its parameters are correct? Thank you.
Check to make absolutely certain that the sockets you have are all TCP
sockets. Your debug10038 routine could use the getsockopt() call on each
socket with the SO_PROTOCOL_INFO option to get a WSAPROTOCOL_INFO object,
and compare providerId values.
Thanks! With the highest probablity this was the cause of my problems. I
have analyzed several cases and in all of them I have found a 3rd party
software installed that dynamically creates or removes custom providers.
I have fixed the problem by forcing a certain providerId in the call to
WSASocket().
Hello again

Unfortunatelly, although the occurance is much less frequent, the error
still persist. :-(

All the sockes again pass the debug function without errors. I have
improved the function so it now saves the WSAPROTOCOL_INFO structure of
all sockets. All of them indicate the MSAFD Tcpip provider. There is
always one UDP and many TCP sockets, all of which are in nonblocking
mode. I have no idea what could I be doing wrong. Any ideas?

Thanks.

JJ.
Alun Jones [MSFT]
2004-09-01 18:17:11 UTC
Permalink
Post by JJe
Unfortunatelly, although the occurance is much less frequent, the error
still persist. :-(
When you say "less frequent", do you mean it happens on fewer machines, or
happens on one machine with lower frequency than before?
Post by JJe
All the sockes again pass the debug function without errors. I have
improved the function so it now saves the WSAPROTOCOL_INFO structure of
all sockets. All of them indicate the MSAFD Tcpip provider. There is
always one UDP and many TCP sockets, all of which are in nonblocking
mode. I have no idea what could I be doing wrong. Any ideas?
All I can suggest at the moment is that you may want to separate out the UDP
and TCP sockets.

Check that the sockets' WSAPROTOCOL_INFO structure contains exactly the same
providerId value. Is the error you're getting still the WSAENOTSOCK error,
or is it something else?

Alun.
~~~~
JJe
2004-09-02 06:00:39 UTC
Permalink
Post by Alun Jones [MSFT]
Post by JJe
Unfortunatelly, although the occurance is much less frequent, the error
still persist. :-(
When you say "less frequent", do you mean it happens on fewer machines, or
happens on one machine with lower frequency than before?
Post by JJe
All the sockes again pass the debug function without errors. I have
improved the function so it now saves the WSAPROTOCOL_INFO structure of
all sockets. All of them indicate the MSAFD Tcpip provider. There is
always one UDP and many TCP sockets, all of which are in nonblocking
mode. I have no idea what could I be doing wrong. Any ideas?
All I can suggest at the moment is that you may want to separate out the UDP
and TCP sockets.
Check that the sockets' WSAPROTOCOL_INFO structure contains exactly the same
providerId value. Is the error you're getting still the WSAENOTSOCK error,
or is it something else?
It is happening on fewer machines. There were two cases so far.

Yes, the error is still 10038 (WSAENOTSOCK). The UDP socket and all of
the TCP sockets contain the following providerId (MSAFD Tcpip):

{E70F1AA0-AB8B-11CF-8CA3-00805F48A192}

Of course I could separate the UDP socket. It just seems strange,
however, because its providerId is the same. In one of the two cases my
application had worked flawlessly for almost 4 days before the error
occured.

Thanks,
JJ.
Alun Jones [MSFT]
2004-09-02 19:52:50 UTC
Permalink
Post by JJe
Of course I could separate the UDP socket. It just seems strange,
however, because its providerId is the same. In one of the two cases my
application had worked flawlessly for almost 4 days before the error
occured.
Is it possible that this might be happening because the socket is invalid at
the time of calling select(), but has become valid again by the time you
call your validation routine?

The Windows socket stack is pretty quick to reuse old handles - once you've
called closesocket() on a socket handle, that handle may be used again
immediately in the next socket() call your process makes. This is one of
the hazards of writing multithreaded Winsock programs - that you may be
using a handle that has been closed and reopened by another thread.

It's possible that you have worked this down to the sort of basis where the
law of diminishing returns sets in. That's for you to decide, obviously,
but now you're at the point where there's little I can suggest unless you
can create a scenario that is going to generate this more repeatably.

Alun.
~~~~

Nathan D. Lee
2004-08-12 17:11:35 UTC
Permalink
I notice that in your debug routine you re-add all of your sockets to
different FDSETs.

Rather than implying that your sockets are all actually valid, one other
very real possibility is that your main-loop fdset array has gotten
corrupted or perhaps you left in a handle that had gotten closed.

In debug10038 you also add each socket to EACH fdset, and in the main loop
it appears the pseudocode adds to ONE of the sets.

I realize that may be nitpicking, but it is such a strange issue that I need
to ask anyway. I still tend to think the issue may be tied up in the actual
code hidden behind the pseudocode "for all my sockets".


Cheers!

Nathan Lee
Systems Programmer/Analyst Specialist
Marshfield Clinic, Marshfield, WI
Post by JJe
Hello group
My application works with a number of TCP connections which are handled
in a loop using the select() function. Eventually (very rarely in fact)
my application hangs in an endless loop. The reason is that the select
function starts returning errors 10038 (An operation was attempted on
something that is not a socket.).
I was sure all the sockets were correct nevertheless I added a debug
code to my application to be able to locate the socket that causes the
void mainloop()
{
.
.
.
fd_set rset, wset, eset;
FD_ZERO(&rset);
FD_ZERO(&wset);
FD_ZERO(&eset);
for (all my sockets)
FD_SET(socket, (appropriate set));
error = select(sockmax, &rset, &wset, &eset, &to);
if (error == SOCKET_ERROR && GetLastError() == 10038)
debug10038();
.
.
.
}
void debug10038()
{
for (all my sockets) {
fd_set rset, wset, eset;
FD_ZERO(&rset);
FD_ZERO(&wset);
FD_ZERO(&eset);
FD_SET(socket, &rset);
FD_SET(socket, &wset);
FD_SET(socket, &eset);
error = select(sockmax, &rset, &wset, &eset, &to);
if (error == SOCKET_ERROR && GetLastError() == 10038) {
printf("Invalid socket found!\n");
assert(0);
}
}
}
To my surprise however, all the sockets were really correct. The
debug10038() function always finishes without indicating a socket being
illegal.
Can somebody tell me why select() could be returning error 10038 even
though its parameters are correct? Thank you.
JJ.
Continue reading on narkive:
Loading...