Automatic restart of unresponsive Engine

Automatic restart of unresponsive Engine

Overview

The Engine periodically resets a watchdog timer to indicate that it is running correctly. When not reset, the watchdog timer expires within ten minutes by default. In consequence, if the Engine is not able to reset the timer before ten minutes elapse, the timer triggers the restart procedure of the Engine.

Internal faults or very complex queries involving millions of events may render the Engine unresponsive. In these cases, the watchdog timer forces the Engine to restart anew and thus recover from potentially blocking situations.

Changing the timeout value

To change the default value of the watchdog timer:

  1. Log in to the CLI of the Engine.
  2. Optional: Verify the current value of the watchdog timer by typing in:
    nxinfo config -w | grep watchdog
    • The result is the configured value of the watchdog timer in seconds. For the default value of ten minutes, the result of the command should display 600 seconds:
    <watchdog_timeout>600</watchdog_timeout>
  3. Set the new value of the watchdog timer. For instance, to double the default of ten minutes and make it twenty minutes (1200 s), type in:
    nxinfo config -s tweak.watchdog_timeout=1200
  4. Restart the Engine for the new watchdog value to take effect:
    sudo systemctl restart [email protected]