webstat-collector/README.rst

168 lines
5.2 KiB
ReStructuredText
Raw Normal View History

2020-09-06 04:42:26 +02:00
====================
Website checker demo
====================
2020-09-06 04:42:26 +02:00
.. image:: https://github.com/vladan/aiven-site-checker/workflows/build/badge.svg
:target: https://github.com/vladan/aiven-site-checker/actions?query=workflow%3Abuild+branch%3Amaster
.. image:: https://github.com/vladan/aiven-site-checker/workflows/documentation/badge.svg
2020-09-06 11:27:56 +02:00
:target: https://vladan.github.io/aiven-site-checker/
2020-09-06 03:54:48 +02:00
2020-09-05 01:50:54 +02:00
CHWEB is a website checking tool.
It sends HTTP requests to sites with the intent to check their status /
availibility, and if the ``regex`` param is specified, it runs it against the
response body. The retreived status check is sent to a Kafka topic. When
read by the consumer, the status check is written in a PostgreSQL database.
2020-09-05 01:50:54 +02:00
ATM in its very early stages meant to demo `aiven <https://aiven.io>`_'s
platform, using their `kafka <https://aiven.io/kafka>`_ and `postgresql
<https://aiven.io/postgresql>`_ services.
2020-09-06 09:29:57 +02:00
Quickstart
==========
Easiest way to start the application is to clone it, install it and run the
console scripts via the command line::
2020-09-05 01:50:54 +02:00
2020-09-06 02:02:02 +02:00
pip install .
2020-09-05 01:50:54 +02:00
chweb_collect -c config.yaml &
chweb_consume -c config.yaml &
2020-09-06 09:29:57 +02:00
Sites
-----
You can specify the sites you want checked in the yaml config file. They are
stored in the ``sites`` key and are represented as a list of objects with
``url`` and ``check_interval`` as mandatory fields, and regex as an optional
field that can freely omitted which checks the body of the response against the
regex expression::
2020-09-06 10:59:08 +02:00
- url: "https://example.com"
regex: "domain" # a regex matching the body of the response
2020-09-06 09:29:57 +02:00
check_interval: 5
- url: "https://example.com"
2020-09-06 10:59:08 +02:00
regex: "aaaaaaaa" # a regex not matching the body of the response
2020-09-06 09:29:57 +02:00
check_interval: 8
- url: "https://example.com/404"
check_interval: 13
Logging configuration
---------------------
.. highlight:: yaml
The ``logging`` section must be present. A simple example of a console logger
as seen in ``config.yaml``::
logging:
version: 1
formatters:
standard:
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
error:
format: "%(levelname)s <PID %(process)d:%(processName)s> %(name)s.%(funcName)s(): %(message)s"
handlers:
console:
class: logging.StreamHandler
level: DEBUG
formatter: standard
stream: ext://sys.stdout
root:
level: DEBUG
handlers: [console]
propogate: yes
Postgres
--------
Postgres settings with aivens certs, passwords, etc. You'll get them once you
setup the postgres service on https://console.aiven.io/. My postgres section
in the config looks like this::
postgres:
dbhost: "pg-2e0f365c-vladanovic-4654.aivencloud.com"
dbport: 23700
dbname: "defaultdb"
dbuser: "avnadmin"
dbpass: ""
dbcert: "./certs/pg.pem"
* ``dbhost`` and ``dbport``, ``dbuser`` and ``dbpass`` are straightforward.
* ``defaultdb`` is the database present by default, when the service is
created. You can create another database if it rocks your boat.
* ``dbcert`` is the cert which opens in a modal popup in the console. You need
to copy it manually to a file on a path you later state in the config.
Kafka
-----
Kafka is also a service easily provisioned through aivens console. After it's
set up you get a config section similar to this one::
kafka:
servers:
- "kafka-f7ae38e-vladanovic-4654.aivencloud.com:23702"
topic: "sitestats"
cafile: "./certs/ca.pem"
cert: "./certs/service.cert"
key: "./certs/service.key"
passwd: ""
* ``servers`` is a list because that's how the library is initialized, which
makes sense if you have multiple brokers.
* ``topic`` is the kafka topic messages are sent to. You need to define it in
aivens console as well.
* ``cafile``, ``cert`` and ``key`` are the ssl certificates you get when aivens
kafka service is ready.
* ``password`` is not needed afaik, but you can give it a go.
Full config file
================
while reading about the config values might be good, here's a better
all-in-one config file, which ofc you'd need to update with your own values::
kafka:
servers:
- "kafka-f7ae38e-vladanovic-4654.aivencloud.com:23702"
topic: "sitestats"
cafile: "./certs/ca.pem"
cert: "./certs/service.cert"
key: "./certs/service.key"
passwd: ""
postgres:
dbhost: "pg-2e0f365c-vladanovic-4654.aivencloud.com"
dbport: 23700
dbname: "defaultdb"
dbuser: "avnadmin"
dbpass: ""
dbcert: "./certs/pg.pem"
sites:
- url: "https://dsadakjhkjsahkjh.com"
regex: "domain"
check_interval: 5
- url: "https://example.com"
regex: "aaaaaaaaaaaaa"
check_interval: 8
- url: "https://example.com/404"
check_interval: 13
logging:
version: 1
formatters:
standard:
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
error:
format: "%(levelname)s <PID %(process)d:%(processName)s> %(name)s.%(funcName)s(): %(message)s"
handlers:
console:
class: logging.StreamHandler
level: DEBUG
formatter: standard
stream: ext://sys.stdout
root:
level: DEBUG
handlers: [console]
propogate: yes