Postfix, Amavis, and ClamAV Extremely Slow
I was recently experiencing longer and longer delays for incoming mail delivered by Postfix to Dovecot. In the end, it turned out to be a simple permission problem: ClamAV was not able to read Amavis' files, and obviously waited for a timeout for every single mail.
In fact, multiple (admittedly stupid) errors contributed more or less to the overall problem.
ClamAV Not In Amavis Group
The biggest issue was caused by clamd
not being able to read the mail files created by Amavis. The log output looked like this:
Dec 2 17:36:30 mail amavis[10284]: (10284-01) (!)run_av (ClamAV-clamd) FAILED - unexpected , output="/var/amavis/tmp/amavis-20191202T173630-10284-9LpO_9G2/parts: lstat() failed: Permission denied. ERROR\n"
Dec 2 17:36:30 mail amavis[10284]: (10284-01) (!)ClamAV-clamd av-scanner FAILED: CODE(0x55c9fd7d88c0) unexpected , output="/var/amavis/tmp/amavis-20191202T173630-10284-9LpO_9G2/parts: lstat() failed: Permission denied. ERROR\n" at (eval 85) line 950.
On our server, amavisd-new
is running as user amavis
and group amavis
, while clamd
runs as user clamav
and group clamav
. Putting user clamav
into the group amavis
fixed the issue completely. Instead of 2 minutes per mail, the virus scan was no accomplished in fractions of a second.
All Primary Virus Scanners Failed
Another error message I had seen a lot of times but ignored out of laziness was this:
Dec 2 17:36:30 mail amavis[10284]: (10284-01) (!)WARN: all primary virus scanners failed, considering backups
It turned out, that I had missed one local modification in the Amavis configuration file /etc/amavid.conf
after a post upgrade. It defines an array @av_scanners
which contains candidates for all kind of commercial(?) virus scanners. The free ClamAV was commented out.
The error message means that amavisd-new
tried out all candidates but all of the failed. Sure, because none of them is installed. The error message went away after uncommenting the ClamAV entry:
@av_scanners = (
...
['ClamAV-clamd',
\&ask_daemon, ["CONTSCAN {}\n", "/var/run/clamav/clamd.sock"],
qr/\bOK$/m, qr/\bFOUND$/m,
qr/^.*?: (?!Infected Archive)(.*) FOUND$/m ],
...
);
You have to make sure that the path to the ClamAV socket file (/var/run/clamav/clamd.sock
) is correct. In doubt, consult the configuration variable LocalSocket
in /etc/clamd.conf
:
...
LocalSocket /var/run/clamav/clamd.sock
...
In fact, I don't know how serious the "all primary virus scanners failed" issue was but the warning from the log files is now gone.
Too Many Undeliverable Mails
The above issues were probably older but since we are a relatively small organization, delivery times of 2 minutes per mail was nothing that immediately caused a big failure. That changed however, after a cronjob started failing some time ago.
Unfortunately, the user that was running the cronjob was not set up to receive mails, and so postfix queued them for the default of 5 days, regularly retrying delivery. Now all the regular mails were buried among cron error mails and took literally days to be delivered.
Therefore: Make sure that all users that can run cronjobs can also receive mail but setting up an alias or an entry in the Postfix virtual table.
In fact, there are many reasons why mails can sit in the Postfix mail queue until they ultimately bounce. It is a good idea to regularly run mailq
and check for abnormally filled output queues to prevent such trouble.
Leave a comment