[Encore] forked task stops after a few weeks
Daniel Jung
jung at uib.no
Tue Feb 23 06:56:39 MST 2010
Hi all
I need help in finding out how I can track down the sudden abortion of a
repeated forked task.
I have a verb that checks for purloined objects (players) and moves them
home, then forks/suspends and runs again. Every once in a while, the
process stops, and I don't know why. The code has a debugging feature
which prints errors (code line numbers, callers, stacks, comments ... )
to a debugger; it counts and prints ticks left (which are replenished by
suspending anyway) to outrule dying by too few ticks.
I (think I) noticed that when the fork was set to 5 minutes, it was more
likely (=sooner) to abort than now that I set it to 10 minutes. Last
time, the task went smoothly for 17 days (one instance every ten
minutes), moving guests and players home every now and then, then it
stopped.
A rough sketch of the structure:
--------------------------
$housekeeper:run_players()
--------------------------
9: for p in (players())
10: $command_utils:suspend_if_needed(0);
11: try
/* if p is "not connected" */
/* check p's home for validity etc. */
/* PRINT DEBUG on error */
/* move p to its home or suitable place */
/* PRINT DEBUG on actual moving, or on error */
52: except error (ANY)
/* PRINT DEBUG on error */
57: endtry
58: endfor
/* PRINT DEBUG telling fork */
61: fork (delay)
62: this:(verb)(@args);
63: endfork
65: return result;
The last debug entry is this one, exactly like the preceding ones.
**************** DEBUG *****************
Thu Feb 18 00:09:36 2010 CST
Daniel: debug index 2477
----------------------------------------
running $housekeeper:run_players, line 60
running $housekeeper:run_players, line 62
----------------------------------------
about to fork, ticks left: 10450
****************************************
There are no indications, no error reports, no weird objects to handle
(as I can see) which could have caused a drop out. The indexing began at
0 when I stared the first task.
I see that the property gets fairly crowded with 2477 indices, but when
restarting the procedure, it adds the new ones without complains, and
we're at index 2874 now (meaning: writing to the long property doesn't
seem to have stopped the task.)
The ticks left vary between 4012 and 14334.
The server log has nothing for the time this occurred (and the ten
following minutes):
Feb 18 00:04:33: DISCONNECTED: #-509150 on port 7878 from
spider36.yandex.ru, port 61218
Feb 18 00:09:06: ACCEPT: #-509151 on port 7878 from
66-199-234-66.reverse.ezzi.net, port 34716
Feb 18 00:09:07: DISCONNECTED: #-509151 on port 7878 from
66-199-234-66.reverse.ezzi.net, port 34716
Feb 18 00:10:22: ACCEPT: #-509152 on port 7878 from
msnbot-65-55-207-94.search.msn.com, port 56255
Feb 18 00:10:22: DISCONNECTED: #-509152 on port 7878 from
msnbot-65-55-207-94.search.msn.com, port 56255
Feb 18 00:12:30: ACCEPT: #-509153 on port 7878 from
66-199-234-66.reverse.ezzi.net, port 41118
Feb 18 00:12:50: CLIENT DISCONNECTED: #-509153 on port 7878 from
66-199-234-66.reverse.ezzi.net, port 41118
Feb 18 00:14:22: ACCEPT: #-509154 on port 7878 from
msnbot-65-55-207-72.search.msn.com, port 25311
Feb 18 00:14:22: DISCONNECTED: #-509154 on port 7878 from
msnbot-65-55-207-72.search.msn.com, port 25311
Feb 18 00:15:09: ACCEPT: #-509155 on port 7878 from
b3091297.crawl.yahoo.net, port 50798
Feb 18 00:15:09: DISCONNECTED: #-509155 on port 7878 from
b3091297.crawl.yahoo.net, port 50798
Feb 18 00:19:36: CLIENT DISCONNECTED: mzr (#1462) on port 8200 from
124.160.46.113, port 31601
Feb 18 00:29:06: ACCEPT: #-509156 on port 7878 from spider36.yandex.ru,
port 64753
Feb 18 00:29:06: DISCONNECTED: #-509156 on port 7878 from
spider36.yandex.ru, port 64753
There is a checkpoint at 00:36, but there is one every hour anyway, they
don't seem to harm the tasks, and the next instances should have been on
00:19 and 00:29 anyway.
Thank you for any input.
Greetings,
- Daniel
More information about the Encore
mailing list