Мониторинг unsupported items и unknown triggers в zabbix

Если zabbix по какой-то причине не может получить item с мониторящегося хоста, то item становится в положение unsupported, все связанные триггеры становятся unknown. Отмониторить такую ситуацию стандартными средствами zabbix нельзя, приходится что-то изобретать.
Шаблоны для zabbix:

<?xml version="1.0" encoding="UTF-8"?>
<zabbix_export version="1.0" date="26.05.11" time="14.50">
  <hosts>
    <host name="Template_unsupported_items">
      <proxy_hostid>0</proxy_hostid>
      <useip>1</useip>
      <dns></dns>
      <ip>127.0.0.1</ip>
      <port>10050</port>
      <status>3</status>
      <useipmi>0</useipmi>
      <ipmi_ip>127.0.0.1</ipmi_ip>
      <ipmi_port>623</ipmi_port>
      <ipmi_authtype>0</ipmi_authtype>
      <ipmi_privilege>2</ipmi_privilege>
      <ipmi_username></ipmi_username>
      <ipmi_password></ipmi_password>
      <groups>
        <group>Templates</group>
      </groups>
      <triggers>
        <trigger>
          <description>{ITEM.LASTVALUE} Triggers are in state UNKNOWN</description>
          <type>0</type>
          <expression>{Template_unsupported_items:unknownTriggersCount.last(0)}&gt;0</expression>
          <url></url>
          <status>0</status>
          <priority>3</priority>
          <comments>You will not receive any notification for triggers that are in UNKNOWN state. Check the trigger expression is valid and item on which this trigger is based, is receiving valid data.</comments>
        </trigger>
        <trigger>
          <description>{ITEM.LASTVALUE} Unsupported items detected</description>
          <type>0</type>
          <expression>{Template_unsupported_items:unsupportedItemsCount.last(0)}&gt;0</expression>
          <url></url>
          <status>0</status>
          <priority>3</priority>
          <comments>Checks the number of unsupported items for this host</comments>
        </trigger>
      </triggers>
      <items>
        <item type="2" key="unknownTriggersCount" value_type="3">
          <description>Triggers in error count</description>
          <ipmi_sensor></ipmi_sensor>
          <delay>60</delay>
          <history>90</history>
          <trends>365</trends>
          <status>0</status>
          <data_type>0</data_type>
          <units></units>
          <multiplier>0</multiplier>
          <delta>0</delta>
          <formula>0</formula>
          <lastlogsize>0</lastlogsize>
          <logtimefmt></logtimefmt>
          <delay_flex></delay_flex>
          <authtype>0</authtype>
          <username></username>
          <password></password>
          <publickey></publickey>
          <privatekey></privatekey>
          <params></params>
          <trapper_hosts>127.0.0.1</trapper_hosts>
          <snmp_community></snmp_community>
          <snmp_oid></snmp_oid>
          <snmp_port>161</snmp_port>
          <snmpv3_securityname></snmpv3_securityname>
          <snmpv3_securitylevel>0</snmpv3_securitylevel>
          <snmpv3_authpassphrase></snmpv3_authpassphrase>
          <snmpv3_privpassphrase></snmpv3_privpassphrase>
          <valuemapid>0</valuemapid>
          <applications/>
        </item>
        <item type="2" key="unknownTriggersDetails" value_type="4">
          <description>Triggers in error details</description>
          <ipmi_sensor></ipmi_sensor>
          <delay>60</delay>
          <history>90</history>
          <trends>365</trends>
          <status>0</status>
          <data_type>0</data_type>
          <units></units>
          <multiplier>0</multiplier>
          <delta>0</delta>
          <formula>0</formula>
          <lastlogsize>0</lastlogsize>
          <logtimefmt></logtimefmt>
          <delay_flex></delay_flex>
          <authtype>0</authtype>
          <username></username>
          <password></password>
          <publickey></publickey>
          <privatekey></privatekey>
          <params></params>
          <trapper_hosts>127.0.0.1</trapper_hosts>
          <snmp_community></snmp_community>
          <snmp_oid></snmp_oid>
          <snmp_port>161</snmp_port>
          <snmpv3_securityname></snmpv3_securityname>
          <snmpv3_securitylevel>0</snmpv3_securitylevel>
          <snmpv3_authpassphrase></snmpv3_authpassphrase>
          <snmpv3_privpassphrase></snmpv3_privpassphrase>
          <valuemapid>0</valuemapid>
          <applications/>
        </item>
        <item type="2" key="unsupportedItemsCount" value_type="3">
          <description>Unsupported items count</description>
          <ipmi_sensor></ipmi_sensor>
          <delay>60</delay>
          <history>90</history>
          <trends>365</trends>
          <status>0</status>
          <data_type>0</data_type>
          <units></units>
          <multiplier>0</multiplier>
          <delta>0</delta>
          <formula>0</formula>
          <lastlogsize>0</lastlogsize>
          <logtimefmt></logtimefmt>
          <delay_flex></delay_flex>
          <authtype>0</authtype>
          <username></username>
          <password></password>
          <publickey></publickey>
          <privatekey></privatekey>
          <params></params>
          <trapper_hosts>127.0.0.1</trapper_hosts>
          <snmp_community></snmp_community>
          <snmp_oid></snmp_oid>
          <snmp_port>161</snmp_port>
          <snmpv3_securityname></snmpv3_securityname>
          <snmpv3_securitylevel>0</snmpv3_securitylevel>
          <snmpv3_authpassphrase></snmpv3_authpassphrase>
          <snmpv3_privpassphrase></snmpv3_privpassphrase>
          <valuemapid>0</valuemapid>
          <applications/>
        </item>
        <item type="2" key="unsupportedItemsDetails" value_type="4">
          <description>Unsupported items details</description>
          <ipmi_sensor></ipmi_sensor>
          <delay>60</delay>
          <history>90</history>
          <trends>365</trends>
          <status>0</status>
          <data_type>0</data_type>
          <units></units>
          <multiplier>0</multiplier>
          <delta>0</delta>
          <formula>0</formula>
          <lastlogsize>0</lastlogsize>
          <logtimefmt></logtimefmt>
          <delay_flex></delay_flex>
          <authtype>0</authtype>
          <username></username>
          <password></password>
          <publickey></publickey>
          <privatekey></privatekey>
          <params></params>
          <trapper_hosts>127.0.0.1</trapper_hosts>
          <snmp_community></snmp_community>
          <snmp_oid></snmp_oid>
          <snmp_port>161</snmp_port>
          <snmpv3_securityname></snmpv3_securityname>
          <snmpv3_securitylevel>0</snmpv3_securitylevel>
          <snmpv3_authpassphrase></snmpv3_authpassphrase>
          <snmpv3_privpassphrase></snmpv3_privpassphrase>
          <valuemapid>0</valuemapid>
          <applications/>
        </item>
      </items>
      <templates/>
      <graphs/>
      <macros/>
    </host>
    <host name="Template_unsupported_items_collector">
      <proxy_hostid>0</proxy_hostid>
      <useip>1</useip>
      <dns></dns>
      <ip>127.0.0.1</ip>
      <port>10050</port>
      <status>3</status>
      <useipmi>0</useipmi>
      <ipmi_ip>127.0.0.1</ipmi_ip>
      <ipmi_port>623</ipmi_port>
      <ipmi_authtype>0</ipmi_authtype>
      <ipmi_privilege>2</ipmi_privilege>
      <ipmi_username></ipmi_username>
      <ipmi_password></ipmi_password>
      <groups>
        <group>Templates</group>
      </groups>
      <triggers>
        <trigger>
          <description>Error in processing unsupported items data</description>
          <type>0</type>
          <expression>{Template_unsupported_items_collector:unsupItemsCheck.pl[].regexp(Failed 0)}=0</expression>
          <url></url>
          <status>0</status>
          <priority>3</priority>
          <comments></comments>
        </trigger>
      </triggers>
      <items>
        <item type="10" key="unsupItemsCheck.pl[]" value_type="4">
          <description>Collect unsupported items/triggers</description>
          <ipmi_sensor></ipmi_sensor>
          <delay>60</delay>
          <history>30</history>
          <trends>365</trends>
          <status>0</status>
          <data_type>0</data_type>
          <units></units>
          <multiplier>0</multiplier>
          <delta>0</delta>
          <formula>0</formula>
          <lastlogsize>0</lastlogsize>
          <logtimefmt></logtimefmt>
          <delay_flex></delay_flex>
          <authtype>0</authtype>
          <username></username>
          <password></password>
          <publickey></publickey>
          <privatekey></privatekey>
          <params></params>
          <trapper_hosts></trapper_hosts>
          <snmp_community></snmp_community>
          <snmp_oid></snmp_oid>
          <snmp_port>161</snmp_port>
          <snmpv3_securityname></snmpv3_securityname>
          <snmpv3_securitylevel>0</snmpv3_securitylevel>
          <snmpv3_authpassphrase></snmpv3_authpassphrase>
          <snmpv3_privpassphrase></snmpv3_privpassphrase>
          <valuemapid>0</valuemapid>
          <applications/>
        </item>
      </items>
      <templates/>
      <graphs/>
      <macros/>
    </host>
  </hosts>
  <dependencies/>
</zabbix_export>

Можно попробовать скачать шаблон.
Шаблон Template_unsupported_items вешается на все хосты, на которых хочется мониторить неподдерживаемые триггеры и items.
Шаблон Template_unsupported_items_collector вешается на любой хост, например на zabbix сервер, этот шаблон запускает специальный скрипт на сервере, который делает соответствюущий запрос в базе и обновляет данные у всех хостов в шаблоне Template_unsupported_items.

#!/usr/bin/perl -w
## Unsupported Items and Triggers check
## Andrew Dike
## andrew@frontlineops.net
## 26/5/11
## Version 0.52

use DBI;
use strict;

# Configure these parameters
my $zabbixDB = "zabbix";
my $dbUser = "zabbix";
my $dbPassword = "password";
my $dataFile = "/etc/zabbix/externalscripts/zsend_data";
my $zabbixAgentConfig = "/etc/zabbix/zabbix_agentd.conf";
my $zabbixSender = "/usr/local/bin/zabbix_sender";

############################
my %unsupportedItemsCount;
my %unsupportedItemsDetails;
my %unknownTriggersCount;
my %unknownTriggersDetails;


open(STDERR, ">&STDOUT");
# Connect to database
# change mysql to 'Pg' for postgres (Haven't tested postgres or other databases)
my $dbh = DBI->connect('dbi:mysql:zabbix',$dbUser, $dbPassword) 
  or die "Connection Error: $DBI::errstr\n";

# Get list of server that are available with the unsupported items check 'Active'
my $sql = "select h.host, i.key_ from hosts h join items i on h.hostid = i.hostid  where h.status = 0 and i.key_ like 'unsupported%' and i.status = 0 group by h.host";
my $sth = $dbh->prepare($sql);
 $sth->execute
  or die "SQL Error: $DBI::errstr\n";


while (my @row = $sth->fetchrow_array)
{
  my $host = $row[0];
  $unsupportedItemsCount{$host} = 0;
  $unsupportedItemsDetails{$host} = "";
  $unknownTriggersCount{$host} = 0;
  $unknownTriggersDetails{$host} = "";
}   

# Get list of unsupported Items
$sql = "SELECT DISTINCT h.host,h.ip,i.key_ FROM items i, hosts h WHERE i.hostid=h.hostid AND h.status=0 AND i.status=3";
$sth = $dbh->prepare($sql);
 $sth->execute
  or die "SQL Error: $DBI::errstr\n";

#Store the count and description of unsupported Items for each host in hash
while (my @row = $sth->fetchrow_array) 
{
  #print "$row[0]\n";
  my $host = $row[0];
  my $itemDesc = $row[1];
  my $itemKey = $row[2];

  foreach my $hostWithUnsupTemplate (keys %unsupportedItemsCount)
    {  if ($hostWithUnsupTemplate eq $host)
       {
         $unsupportedItemsCount{$host}++;
         $unsupportedItemsDetails{$host} .= "$itemKey, ";
       }
    }
  
}
# Get list of failed triggers and items that are making them fail
$sql = "select h.host, t.description, i.key_ from items i left join hosts h on i.hostid = h.hostid left join functions f on i.itemid = f.itemid left join triggers t on f.triggerid = t.triggerid where t.status=0 and i.status=3 and h.status=0";

$sth = $dbh->prepare($sql);
$sth->execute
  or die "SQL Error: $DBI::errstr\n";

while (my @row = $sth->fetchrow_array)
{
  my $host = $row[0];
  my $triggerDesc = $row[1];
  my $itemKey = $row[2];

  foreach my $hostWithUnsupTemplate (keys %unsupportedItemsCount)
  {
     if ($hostWithUnsupTemplate eq $host)
     {   
        $unknownTriggersCount{$host}++;
        $unknownTriggersDetails{$host} .= "Trigger \[$triggerDesc\] in error because item \[$itemKey\] is in error, ";
     }
  }
}

# Write data to upload to zabbix server to file
open DATA, ">", $dataFile
    or die "Cannot open $dataFile for writing: $!";

foreach my $host (keys %unknownTriggersCount)
{
  print DATA "\"$host\" unknownTriggersCount $unknownTriggersCount{$host}\n";
  #print "\"$host\" unknownTriggersCount $unknownTriggersCount{$host}\n";
}

foreach my $host (keys %unknownTriggersDetails)
{
  print DATA "\"$host\" unknownTriggersDetails \'$unknownTriggersDetails{$host}\'\n";
  #print "$host unknownTriggersDetails \'$unknownTriggersDetails{$host}\'\n";
}

foreach my $host (keys %unsupportedItemsCount)
{
  print DATA "\"$host\" unsupportedItemsCount $unsupportedItemsCount{$host}\n";
}

foreach my$host (keys %unsupportedItemsDetails)
{
  print DATA "\"$host\" unsupportedItemsDetails \'$unsupportedItemsDetails{$host}\'\n";
}

close (DATA);
# Send data to zabbix server
my $zabbixSenderOut = `$zabbixSender -c $zabbixAgentConfig -i $dataFile`;
print $zabbixSenderOut;

скачать
Предполагается, что скрипт лежит в /etc/zabbix/externalscripts/.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s