Why cron jobs fail silently (and how to catch them)

Cron jobs are the workhorses of backend infrastructure. They run database cleanups, send scheduled emails, generate reports, take backups, and keep data fresh. Most of the time they work perfectly. When they don't — you usually find out too late.

The fundamental problem is asymmetry: when a cron job succeeds, you know because you see the result. When it fails, often nothing happens — no log, no email, no error page. The failure mode is silence.

Why cron jobs go silent

Non-zero exit with no notification

Unix cron sends output by email to the local system user — but only if MAILTO is configured and the server has a working mail transfer agent. Most production servers don't. If your job exits with code 1, cron reports it nowhere. The job simply "ran" from cron's perspective.

The job never started

Syntax errors in crontab, wrong filesystem paths, permission issues, or the cron daemon itself crashing prevent jobs from running at all. Because cron has no "job didn't run" notification mechanism — it can't know what was supposed to run and didn't — these failures are completely invisible.

The job ran but produced no useful output

Many scripts swallow errors internally. A Python script with a broad except: pass, a bash script without set -e, or a database command that silently rolls back. The job exits 0, cron is satisfied, but nothing useful happened.

The job ran slowly and was killed

Some schedulers kill jobs that exceed a timeout. Others let the previous instance keep running while the next one fires. Neither situation typically produces a useful alert.

Closing the detection gap

Cron was designed as a scheduler, not a monitor. It fires tasks on a schedule and moves on. There is no built-in mechanism for "I expected job X to complete successfully and it didn't."

To close this gap, you need an external system that expects a completion signal from the job and alerts when the signal doesn't arrive. This is the dead man's switch pattern: the job checks in when it finishes, and the absence of a check-in triggers the alert.

The pattern: ping only on success

The most reliable fix is simple: add one line at the end of your job that only runs if everything above it succeeded. In bash, set -euo pipefail guarantees exit on any error:

bash

#!/bin/bash
set -euo pipefail

# Your job code here
/scripts/backup.sh

# Only reached if backup.sh exited 0
curl -fsS https://api.tymo.site/p/YOUR_TOKEN

set -euo pipefail means: exit immediately on error (-e), treat unset variables as errors (-u), and propagate pipe failures (-o pipefail). Combined with the ping pattern, the job either pings on success or stays silent — and silence triggers an alert.

In Python, the structure is equivalent:

python

import urllib.request, sys

PING_URL = "https://api.tymo.site/p/YOUR_TOKEN"

def main():
    do_backup()  # raises on failure

try:
    main()
    urllib.request.urlopen(PING_URL)  # only on success
except Exception as e:
    print(f"Job failed: {e}", file=sys.stderr)
    sys.exit(1)

What MAILTO can't catch

Setting MAILTO in your crontab captures stdout and stderr from your jobs. But the most dangerous failures are the ones that exit 0 and produce no output while failing to do useful work: a backup that connected to the database but wrote an empty file, a report that ran but processed stale data, a cleanup that stopped after the first error but caught the exception and continued.

Monitoring the expected arrival of a completion signal catches this class of failure. Monitoring output doesn't. That's why heartbeat monitoring and log monitoring are complementary, not interchangeable — they catch different things.