aMule Bug Tracker - aMule
View Issue Details
0000590aMuleExternal Connpublic2005-10-02 20:102008-07-14 15:10
schlumi 
lfroen 
normalmajoralways
resolvedfixed 
2.0.3 
 
0000590: SELECT'ing on an invalid FD
Hi,

I'm running amuled 2.0.3 on NetBSD 3.0 beta. It has run very stable for me until lately when I tried to download a lot more files than I usually do (normally around 20-30 simultaneously, now around 60). Upon startup everything works normally, but after around 15 minutes (I suspect the number of open connections has risen above a certain threshold by then) the problems start:
amuled starts consuming a lot of cpu-time (basically 100%) - download speed will drop to practically zero, the uploads keep running just fine. Though the daemon remains responsive, it won't download anymore. I've run a ktrace on amuled to see what triggers the problem, here is the significant part:

 14671 amuled 0.000029 CALL select(0x103,0x8430004,0x8434004,0,0xbfbfe7a0)
 14671 amuled 0.000103 RET select 3
 14671 amuled 0.000029 CALL select(0x103,0x8430004,0x8434004,0,0xbfbfe7a0)
 14671 amuled 0.000143 RET select 4
 14671 amuled 0.000014 CALL recvfrom(0xe1,0xbfbfe72f,1,2,0,0)
 14671 amuled 0.000009 GIO fd 225 read 1 bytes
       "\M-E"
 14671 amuled 0.000004 RET recvfrom 1
 14671 amuled 0.000017 CALL select(0xe2,0xbfbfe640,0xbfbfe620,0xbfbfe600,0xbfbfe5f8)
 14671 amuled 0.000008 RET select 1
 14671 amuled 0.000005 CALL recvfrom(0xe1,0xbfbfe5f7,1,2,0,0)
 14671 amuled 0.000006 GIO fd 225 read 1 bytes
       "\M-E"
 14671 amuled 0.000003 RET recvfrom 1
 14671 amuled 0.000055 CALL recvfrom(0xe1,0x8238500,0x1e8480,0,0,0)
 14671 amuled 0.000020 GIO fd 225 read 126 bytes
 14671 amuled 0.000005 RET recvfrom 126/0x7e
 14671 amuled 0.000199 CALL gettimeofday(0xbfbfe2d0,0)
 14671 amuled 0.000007 RET gettimeofday 0
 14671 amuled 0.000008 CALL gettimeofday(0xbfbfe220,0)
 14671 amuled 0.000004 RET gettimeofday 0
 14671 amuled 0.000042 CALL __sigaction_sigtramp(0xd,0xbfbfe100,0xbfbfe150,0xbd90dec0,1)
 14671 amuled 0.000006 RET __sigaction_sigtramp -1 errno 22 Invalid argument
 14671 amuled 0.000005 CALL __sigaction_sigtramp(0xd,0xbfbfe100,0xbfbfe150,0xbd90dea4,2)
 14671 amuled 0.000005 RET __sigaction_sigtramp 0
 14671 amuled 0.000005 CALL sendto(0xe1,0x8e01360,0x16,0,0,0)
 14671 amuled 0.000096 GIO fd 225 wrote 22 bytes
 14671 amuled 0.000006 RET sendto 22/0x16
 14671 amuled 0.000011 CALL __sigaction_sigtramp(0xd,0xbfbfe100,0xbfbfe150,0xbd90dec0,1)
 14671 amuled 0.000004 RET __sigaction_sigtramp -1 errno 22 Invalid argument
 14671 amuled 0.000004 CALL __sigaction_sigtramp(0xd,0xbfbfe100,0xbfbfe150,0xbd90dea4,2)
 14671 amuled 0.000004 RET __sigaction_sigtramp 0
 14671 amuled 0.000020 CALL gettimeofday(0xbfbfe2f8,0)
 14671 amuled 0.000005 RET gettimeofday 0
 14671 amuled 0.000023 CALL gettimeofday(0xbfbfe280,0)
 14671 amuled 0.000005 RET gettimeofday 0
 14671 amuled 0.000053 CALL gettimeofday(0xbfbfe640,0)
 14671 amuled 0.000006 RET gettimeofday 0
 14671 amuled 0.000030 CALL select(0x103,0x8430004,0x8434004,0,0xbfbfe7a0)
 14671 amuled 0.000119 RET select 4
 14671 amuled 0.000013 CALL recvfrom(0x101,0xbfbfe72f,1,2,0,0)
 14671 amuled 0.000007 GIO fd 257 read 0 bytes
 14671 amuled 0.000004 RET recvfrom 0
 14671 amuled 0.000030 CALL shutdown(0x101,2)
 14671 amuled 0.000032 RET shutdown 0
 14671 amuled 0.000018 CALL close(0x101)
 14671 amuled 0.000018 RET close 0
 14671 amuled 0.000111 CALL select(0x103,0x8430004,0x8434004,0,0xbfbfe7a0)
 14671 amuled 0.000101 RET select -1 errno 9 Bad file descriptor
 14671 amuled 0.000025 CALL select(0x103,0x8430004,0x8434004,0,0xbfbfe7a0)
 14671 amuled 0.000092 RET select -1 errno 9 Bad file descriptor

[the last 2 lines now keep repeating literally several million times]

It seems as if the closed fd isn't purged from the array of fds that is passed to select. I also noted, that you call select with more than 0xFF filedescriptors to watch. On NetBSD, select will only look at the first 0xFF fds by default - a workaround is given on the select-manpage but it doesn't seem like you implement it. A temporary fix is to limit the number of maximum connections to a number somewhat lower than 0xFF - In fact, even the 100% cpu-usage issue *seems *to have gone away after I've done that.
BTW: I have of course checked that i don't hit any ulimits ore other kernel-setting-imposed resource boundaries.
No tags attached.
Issue History
2005-10-02 20:10schlumiNew Issue
2005-10-30 17:07XaignarStatusnew => assigned
2005-10-30 17:07XaignarAssigned To => lfroen
2008-07-14 15:10WuischkeStatusassigned => resolved
2008-07-14 15:10WuischkeResolutionopen => fixed

There are no notes attached to this issue.