View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000780aMuleServerspublic2006-01-11 20:412006-09-26 05:08
Reporterlao 
Assigned ToKry 
PrioritynormalSeveritymajorReproducibilityrandom
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product VersionSVN 
Target VersionFixed in VersionSVN 
Summary0000780: Kad does not reconnect on loss
DescriptionI'm not sure what would cause Kad to be disconnected, even though ed2k continues to work; but I have now several times seen Kad marked as Not Running after several hours of uptime.

The connections to the servers and clients were working, and restarting Kad through "Bootstrap from known clients" works.

What bothers me is not so much why it loses the connection (though that may be the problem), but that it does not reconnect, even though "Reconnect on loss" is selected.

This appears (most recently) in monolithic, but I have noticed the daemon behave the same way in the past.
Additional InformationRunning Jan-7 CVS snapshot.
TagsNo tags attached.
Fixed in Revision
Operating System
Attached Filespatch file icon kadreconnect.patch [^] (303 bytes) 2006-01-13 22:49 [Show Content]
patch file icon kadretry.patch [^] (1,873 bytes) 2006-02-05 09:05 [Show Content]
patch file icon kadretry.patch [^] (1,874 bytes) 2006-02-05 09:14 [Show Content]

- Relationships

-  Notes
(0001762)
pcmaster (reporter)
2006-01-12 00:22

Yes, the daemon has the same problem. Yesterday i restarted my amule because of this: kad are unable to reconnect. After a few restarts, i got the two networks connected. See bug 779.
(0001765)
lao (reporter)
2006-01-12 01:18

After looking at the source, the problem seems to be this code in CamuleApp::OnCoreTimer (amule.cpp)

if( Kademlia::CKademlia::isRunning() ) {
    Kademlia::CKademlia::process();
    if(Kademlia::CKademlia::getPrefs()->hasLostConnection()) {
        StopKad();
    }
}

hasLostConnection() will return true if the Kad client has not received a packet for over 20 minutes. IIRC this is in keeping with what my debug logs showed.

So while the connection is dropped, no attempt is made to reconnect the client based on the known nodes (which works if you do it manually).
(0001779)
lao (reporter)
2006-01-13 23:33

I've tried to patch this (see uploaded file - which I can't delete ?) by simply calling StartKad(), but this never actually seems to execute the Disconnect/Reconnect; at least not according to the logs.

That doesn't seem to work, the connection stays active etc., and reports varying network interformation, but seems to stagnate and searches only return very few (0-5) results.

It's possible, though unlikely, that I never actually hit the timeout in the last 24 hours.

I'll investigate further when I find some time, unless someone beats me to it.
(0001781)
lao (reporter)
2006-01-14 18:33

Conclusion.

In short, the above patch can be applied and fixes this issue. When the code actually gets executed, it works as expected.

The issues with Kad searches I noticed, and that made me think something was wrong, appear to be completely unrelated to this. It seems as if Kad generally returns no (very few?) results for single-word searches. This appears in various older versions I tested.

Is this something known/expected with Kad? Or is it worth a new ticket?
(0001857)
lao (reporter)
2006-02-05 09:07
edited on: 2006-02-05 09:14

Uploaded new patch, to retry the reconnect should it fail.

Could be prettier though.. :)

EDIT: fixed a typo.

edited on: 02-05-06 09:14
(0002002)
Kry (manager)
2006-05-24 04:04

Kinnda late, but reviewing.
(0002003)
Kry (manager)
2006-05-24 04:10

For now, I just added a

                if (thePrefs::Reconnect()) {
                    StartKad();
                }

to the core timer code.

I dont' see much point on your other code, but you're free to point my stupidity.
(0002097)
hramrach (reporter)
2006-08-22 21:17

I see this problem (and it can be easily reproduced/tested this way) when I use suspend to disk/hibernation.

After resume Kademlia stops, and e2dk resumes normal operation (although all peers get disconnected, of course).
(0002125)
Kry (manager)
2006-09-26 05:08

Solved I guess.

- Issue History
Date Modified Username Field Change
2006-01-11 20:41 lao New Issue
2006-01-12 00:22 pcmaster Note Added: 0001762
2006-01-12 01:18 lao Note Added: 0001765
2006-01-13 22:49 lao File Added: kadreconnect.patch
2006-01-13 22:59 lao Note Added: 0001778
2006-01-13 23:17 lao Note Deleted: 0001778
2006-01-13 23:33 lao Note Added: 0001779
2006-01-14 00:50 Kry Status new => assigned
2006-01-14 00:50 Kry Assigned To => Kry
2006-01-14 18:33 lao Note Added: 0001781
2006-02-05 09:05 lao File Added: kadretry.patch
2006-02-05 09:07 lao Note Added: 0001857
2006-02-05 09:14 lao File Added: kadretry.patch
2006-02-05 09:14 lao Note Edited: 0001857
2006-05-24 04:04 Kry Note Added: 0002002
2006-05-24 04:10 Kry Note Added: 0002003
2006-08-22 21:17 hramrach Note Added: 0002097
2006-09-26 05:08 Kry Status assigned => resolved
2006-09-26 05:08 Kry Fixed in Version => SVN
2006-09-26 05:08 Kry Resolution open => fixed
2006-09-26 05:08 Kry Note Added: 0002125


Copyright © 2000 - 2024 MantisBT Team
Powered by Mantis Bugtracker