0000590: SELECT'ing on an invalid FD

View Issue Details [ Jump to Notes ]

[ Issue History ] [ Print ]

Project

Category

View Status

Date Submitted

Last Update

0000590

aMule

External Conn

public

2005-10-02 20:10

2008-07-14 15:10

Reporter

schlumi

Assigned To

lfroen

Priority

normal

Severity

major

Reproducibility

always

Status

resolved

Resolution

fixed

Platform

OS Version

Product Version

2.0.3

Target Version

Fixed in Version

Summary

0000590: SELECT'ing on an invalid FD

Description

Hi,

I'm running amuled 2.0.3 on NetBSD 3.0 beta. It has run very stable for me until lately when I tried to download a lot more files than I usually do (normally around 20-30 simultaneously, now around 60). Upon startup everything works normally, but after around 15 minutes (I suspect the number of open connections has risen above a certain threshold by then) the problems start:
amuled starts consuming a lot of cpu-time (basically 100%) - download speed will drop to practically zero, the uploads keep running just fine. Though the daemon remains responsive, it won't download anymore. I've run a ktrace on amuled to see what triggers the problem, here is the significant part:

14671 amuled 0.000029 CALL select(0x103,0x8430004,0x8434004,0,0xbfbfe7a0)
14671 amuled 0.000103 RET select 3
14671 amuled 0.000029 CALL select(0x103,0x8430004,0x8434004,0,0xbfbfe7a0)
14671 amuled 0.000143 RET select 4
14671 amuled 0.000014 CALL recvfrom(0xe1,0xbfbfe72f,1,2,0,0)
14671 amuled 0.000009 GIO fd 225 read 1 bytes
"\M-E"
14671 amuled 0.000004 RET recvfrom 1
14671 amuled 0.000017 CALL select(0xe2,0xbfbfe640,0xbfbfe620,0xbfbfe600,0xbfbfe5f8)
14671 amuled 0.000008 RET select 1
14671 amuled 0.000005 CALL recvfrom(0xe1,0xbfbfe5f7,1,2,0,0)
14671 amuled 0.000006 GIO fd 225 read 1 bytes
"\M-E"
14671 amuled 0.000003 RET recvfrom 1
14671 amuled 0.000055 CALL recvfrom(0xe1,0x8238500,0x1e8480,0,0,0)
14671 amuled 0.000020 GIO fd 225 read 126 bytes
14671 amuled 0.000005 RET recvfrom 126/0x7e
14671 amuled 0.000199 CALL gettimeofday(0xbfbfe2d0,0)
14671 amuled 0.000007 RET gettimeofday 0
14671 amuled 0.000008 CALL gettimeofday(0xbfbfe220,0)
14671 amuled 0.000004 RET gettimeofday 0
14671 amuled 0.000042 CALL __sigaction_sigtramp(0xd,0xbfbfe100,0xbfbfe150,0xbd90dec0,1)
14671 amuled 0.000006 RET __sigaction_sigtramp -1 errno 22 Invalid argument
14671 amuled 0.000005 CALL __sigaction_sigtramp(0xd,0xbfbfe100,0xbfbfe150,0xbd90dea4,2)
14671 amuled 0.000005 RET __sigaction_sigtramp 0
14671 amuled 0.000005 CALL sendto(0xe1,0x8e01360,0x16,0,0,0)
14671 amuled 0.000096 GIO fd 225 wrote 22 bytes
14671 amuled 0.000006 RET sendto 22/0x16
14671 amuled 0.000011 CALL __sigaction_sigtramp(0xd,0xbfbfe100,0xbfbfe150,0xbd90dec0,1)
14671 amuled 0.000004 RET __sigaction_sigtramp -1 errno 22 Invalid argument
14671 amuled 0.000004 CALL __sigaction_sigtramp(0xd,0xbfbfe100,0xbfbfe150,0xbd90dea4,2)
14671 amuled 0.000004 RET __sigaction_sigtramp 0
14671 amuled 0.000020 CALL gettimeofday(0xbfbfe2f8,0)
14671 amuled 0.000005 RET gettimeofday 0
14671 amuled 0.000023 CALL gettimeofday(0xbfbfe280,0)
14671 amuled 0.000005 RET gettimeofday 0
14671 amuled 0.000053 CALL gettimeofday(0xbfbfe640,0)
14671 amuled 0.000006 RET gettimeofday 0
14671 amuled 0.000030 CALL select(0x103,0x8430004,0x8434004,0,0xbfbfe7a0)
14671 amuled 0.000119 RET select 4
14671 amuled 0.000013 CALL recvfrom(0x101,0xbfbfe72f,1,2,0,0)
14671 amuled 0.000007 GIO fd 257 read 0 bytes
14671 amuled 0.000004 RET recvfrom 0
14671 amuled 0.000030 CALL shutdown(0x101,2)
14671 amuled 0.000032 RET shutdown 0
14671 amuled 0.000018 CALL close(0x101)
14671 amuled 0.000018 RET close 0
14671 amuled 0.000111 CALL select(0x103,0x8430004,0x8434004,0,0xbfbfe7a0)
14671 amuled 0.000101 RET select -1 errno 9 Bad file descriptor
14671 amuled 0.000025 CALL select(0x103,0x8430004,0x8434004,0,0xbfbfe7a0)
14671 amuled 0.000092 RET select -1 errno 9 Bad file descriptor

[the last 2 lines now keep repeating literally several million times]

It seems as if the closed fd isn't purged from the array of fds that is passed to select. I also noted, that you call select with more than 0xFF filedescriptors to watch. On NetBSD, select will only look at the first 0xFF fds by default - a workaround is given on the select-manpage but it doesn't seem like you implement it. A temporary fix is to limit the number of maximum connections to a number somewhat lower than 0xFF - In fact, even the 100% cpu-usage issue *seems *to have gone away after I've done that.
BTW: I have of course checked that i don't hit any ulimits ore other kernel-setting-imposed resource boundaries.

Issue History
Date Modified	Username	Field	Change
2005-10-02 20:10	schlumi	New Issue
2005-10-30 17:07	Xaignar	Status	new => assigned
2005-10-30 17:07	Xaignar	Assigned To	=> lfroen
2008-07-14 15:10	Wuischke	Status	assigned => resolved
2008-07-14 15:10	Wuischke	Resolution	open => fixed

Relationships

Notes
There are no notes attached to this issue.