Files
netdisco/lib/App/Netdisco/Backend/Worker/Manager.pm
Oliver Gorwits 9a72d7e74a Avoid lock/defer of jobs deined via ACL
This commit adds a table 'device_skip' that is used to restrict job queue
searches to avoid jobs that are not permitted on this backend via *_no ACLs,
or jobs on devices that have previously encountered multiple SNMP timeouts.

When the backend loads or a device is added, a row is added to the table if
that device should not be polled on this backend (together with the job
actions which are to be skipped/denied). When a device SNMP connect fails a
counter in the same row (or a new row) is incremented.

There is also a new report 'SNMP Connect Failures' to show the devices with
non-zero SNMP connect failure counters. A configurable limit in the setting
'max_deferrals' is used to set the threshold of no longer polling the device.

To reset the deferrals/failures count, restart the Netdisco backend (which
regenerates 'device_skip' cache entries).

Squashed commit of the following:

commit b5e32c219d
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 20:55:14 2017 +0100

    show all failed connections in report

commit ffce3cee84
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 20:12:39 2017 +0100

    only resolve fqdn once

commit cc4f680f01
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 20:10:20 2017 +0100

    Revert "only resolve fqdn once"

    This reverts commit 3d136a54de.

commit d8d082b30e
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 20:09:05 2017 +0100

    a report to show SNMP failures

commit 3d136a54de
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 19:37:58 2017 +0100

    only resolve fqdn once

commit 4550b8a84c
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 17:27:43 2017 +0100

    skipover now implicit from deferrals/actionset; fix sql where logic with better correlation

commit b51edbccd2
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 16:11:29 2017 +0100

    only abort lock if action matches badactions

commit 415559b24f
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 13:56:42 2017 +0100

    set skipover true when adding to actionset

commit 1086f2c467
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 13:50:56 2017 +0100

    fix empty actionset

commit 31962580b8
Merge: 9b2e993e 6808133b
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 13:25:08 2017 +0100

    Merge branch 'og-device_skip' of github.com:netdisco/netdisco into og-device_skip

commit 6808133bdb
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 13:19:54 2017 +0100

    in-job checks for acls are required for netdisco-do foreground actions

commit 3944dd7813
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 13:18:30 2017 +0100

    avoid extra device lookup

commit 9b2e993e0f
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 12:31:36 2017 +0100

    also delete device_skip rows when deleting device

commit b55854e91d
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 11:34:27 2017 +0100

    actions in device_skip table are now an array/set

commit 5e126eef07
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 09:36:33 2017 +0100

    typo

commit 44266f2767
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 09:14:25 2017 +0100

    *able checks within jobs should not be necessary with skiplist

commit e7c22e7d11
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 08:58:57 2017 +0100

    increment deferrals field when job is deferred

commit 88ae9c00ba
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 08:40:27 2017 +0100

    turn connect fail into defer

commit eac1857043
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Tue May 23 08:26:59 2017 +0100

    rename failures column to be deferrals

commit 96ed444bbb
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Mon May 22 22:52:51 2017 +0100

    set up list of jobs the backend instance should skip

commit 3a0019296d
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Mon May 22 22:01:50 2017 +0100

    separate out is_*able last_* checks

commit cf8589aba2
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Sun May 21 22:35:38 2017 +0100

    change from ignore to skip name

commit ed193356f8
Author: Oliver Gorwits <oliver@cpan.org>
Date:   Sun May 21 14:52:33 2017 +0100

    device_ignore table to track devices to skip in polling
2017-05-27 08:50:08 +01:00

91 lines
2.6 KiB
Perl

package App::Netdisco::Backend::Worker::Manager;
use Dancer qw/:moose :syntax :script/;
use List::Util 'sum';
use App::Netdisco::Util::Backend;
use Role::Tiny;
use namespace::clean;
use App::Netdisco::JobQueue
qw/jq_locked jq_getsome jq_getsomep jq_lock jq_prime_skiplist/;
sub worker_begin {
my $self = shift;
my $wid = $self->wid;
return debug "mgr ($wid): no need for manager... skip begin"
if setting('workers')->{'no_manager'};
debug "entering Manager ($wid) worker_begin()";
# rebuild device skip hints
jq_prime_skiplist;
# requeue jobs locally
debug "mgr ($wid): searching for jobs booked to this processing node";
my @jobs = jq_locked;
if (scalar @jobs) {
info sprintf "mgr (%s): found %s jobs booked to this processing node",
$wid, scalar @jobs;
$self->{queue}->enqueuep(100, @jobs);
}
}
sub worker_body {
my $self = shift;
my $wid = $self->wid;
if (setting('workers')->{'no_manager'}) {
prctl sprintf 'netdisco-backend: worker #%s manager: inactive', $wid;
return debug "mgr ($wid): no need for manager... quitting"
}
while (1) {
prctl sprintf 'netdisco-backend: worker #%s manager: gathering', $wid;
my $num_slots = 0;
$num_slots = parse_max_workers( setting('workers')->{tasks} )
- $self->{queue}->pending();
debug "mgr ($wid): getting potential jobs for $num_slots workers (HP)";
# get some high priority jobs
# TODO also check for stale jobs in Netdisco DB
foreach my $job ( jq_getsomep($num_slots) ) {
# mark job as running
next unless jq_lock($job);
info sprintf "mgr (%s): job %s booked out for this processing node",
$wid, $job->job;
# copy job to local queue
$self->{queue}->enqueuep(100, $job);
}
$num_slots = parse_max_workers( setting('workers')->{tasks} )
- $self->{queue}->pending();
debug "mgr ($wid): getting potential jobs for $num_slots workers (NP)";
# get some normal priority jobs
# TODO also check for stale jobs in Netdisco DB
foreach my $job ( jq_getsome($num_slots) ) {
# mark job as running
next unless jq_lock($job);
info sprintf "mgr (%s): job %s booked out for this processing node",
$wid, $job->job;
# copy job to local queue
$self->{queue}->enqueue($job);
}
debug "mgr ($wid): sleeping now...";
prctl sprintf 'netdisco-backend: worker #%s manager: idle', $wid;
sleep( setting('workers')->{sleep_time} || 1 );
}
}
1;