aMule Bug Tracker - aMule
View Issue Details
0000780aMuleServerspublic2006-01-11 20:412006-09-26 05:08
lao 
Kry 
normalmajorrandom
resolvedfixed 
SVN 
SVN 
0000780: Kad does not reconnect on loss
I'm not sure what would cause Kad to be disconnected, even though ed2k continues to work; but I have now several times seen Kad marked as Not Running after several hours of uptime.

The connections to the servers and clients were working, and restarting Kad through "Bootstrap from known clients" works.

What bothers me is not so much why it loses the connection (though that may be the problem), but that it does not reconnect, even though "Reconnect on loss" is selected.

This appears (most recently) in monolithic, but I have noticed the daemon behave the same way in the past.
Running Jan-7 CVS snapshot.
No tags attached.
patch kadreconnect.patch (303) 2006-01-13 22:49
http://bugs.amule.org/file_download.php?file_id=118&type=bug
patch kadretry.patch (1,873) 2006-02-05 09:05
http://bugs.amule.org/file_download.php?file_id=129&type=bug
patch kadretry.patch (1,874) 2006-02-05 09:14
http://bugs.amule.org/file_download.php?file_id=130&type=bug
Issue History
2006-01-11 20:41laoNew Issue
2006-01-12 00:22pcmasterNote Added: 0001762
2006-01-12 01:18laoNote Added: 0001765
2006-01-13 22:49laoFile Added: kadreconnect.patch
2006-01-13 22:59laoNote Added: 0001778
2006-01-13 23:17laoNote Deleted: 0001778
2006-01-13 23:33laoNote Added: 0001779
2006-01-14 00:50KryStatusnew => assigned
2006-01-14 00:50KryAssigned To => Kry
2006-01-14 18:33laoNote Added: 0001781
2006-02-05 09:05laoFile Added: kadretry.patch
2006-02-05 09:07laoNote Added: 0001857
2006-02-05 09:14laoFile Added: kadretry.patch
2006-02-05 09:14laoNote Edited: 0001857
2006-05-24 04:04KryNote Added: 0002002
2006-05-24 04:10KryNote Added: 0002003
2006-08-22 21:17hramrachNote Added: 0002097
2006-09-26 05:08KryStatusassigned => resolved
2006-09-26 05:08KryFixed in Version => SVN
2006-09-26 05:08KryResolutionopen => fixed
2006-09-26 05:08KryNote Added: 0002125

Notes
(0001762)
pcmaster   
2006-01-12 00:22   
Yes, the daemon has the same problem. Yesterday i restarted my amule because of this: kad are unable to reconnect. After a few restarts, i got the two networks connected. See bug 779.
(0001765)
lao   
2006-01-12 01:18   
After looking at the source, the problem seems to be this code in CamuleApp::OnCoreTimer (amule.cpp)

if( Kademlia::CKademlia::isRunning() ) {
    Kademlia::CKademlia::process();
    if(Kademlia::CKademlia::getPrefs()->hasLostConnection()) {
        StopKad();
    }
}

hasLostConnection() will return true if the Kad client has not received a packet for over 20 minutes. IIRC this is in keeping with what my debug logs showed.

So while the connection is dropped, no attempt is made to reconnect the client based on the known nodes (which works if you do it manually).
(0001779)
lao   
2006-01-13 23:33   
I've tried to patch this (see uploaded file - which I can't delete ?) by simply calling StartKad(), but this never actually seems to execute the Disconnect/Reconnect; at least not according to the logs.

That doesn't seem to work, the connection stays active etc., and reports varying network interformation, but seems to stagnate and searches only return very few (0-5) results.

It's possible, though unlikely, that I never actually hit the timeout in the last 24 hours.

I'll investigate further when I find some time, unless someone beats me to it.
(0001781)
lao   
2006-01-14 18:33   
Conclusion.

In short, the above patch can be applied and fixes this issue. When the code actually gets executed, it works as expected.

The issues with Kad searches I noticed, and that made me think something was wrong, appear to be completely unrelated to this. It seems as if Kad generally returns no (very few?) results for single-word searches. This appears in various older versions I tested.

Is this something known/expected with Kad? Or is it worth a new ticket?
(0001857)
lao   
2006-02-05 09:07   
(edited on: 2006-02-05 09:14)
Uploaded new patch, to retry the reconnect should it fail.

Could be prettier though.. :)

EDIT: fixed a typo.

edited on: 02-05-06 09:14
(0002002)
Kry   
2006-05-24 04:04   
Kinnda late, but reviewing.
(0002003)
Kry   
2006-05-24 04:10   
For now, I just added a

                if (thePrefs::Reconnect()) {
                    StartKad();
                }

to the core timer code.

I dont' see much point on your other code, but you're free to point my stupidity.
(0002097)
hramrach   
2006-08-22 21:17   
I see this problem (and it can be easily reproduced/tested this way) when I use suspend to disk/hibernation.

After resume Kademlia stops, and e2dk resumes normal operation (although all peers get disconnected, of course).
(0002125)
Kry   
2006-09-26 05:08   
Solved I guess.