No description
Find a file
2020-09-07 22:03:45 +02:00
.github/workflows Ignore jekyll in documentation build 2020-09-07 22:03:45 +02:00
docs/source Build and publish documentation 2020-09-06 04:42:26 +02:00
reports First raw version of the stats collector 2020-09-04 00:24:21 +02:00
src/chweb Update config and SSL, i.e. make it work on aiven 2020-09-06 07:44:31 +02:00
tests Mock SSLContext in the unit tests 2020-09-07 22:03:45 +02:00
.gitignore First raw version of the stats collector 2020-09-04 00:24:21 +02:00
config.yaml Update config and SSL, i.e. make it work on aiven 2020-09-06 07:44:31 +02:00
LICENCE First raw version of the stats collector 2020-09-04 00:24:21 +02:00
README.rst Explain a bit more what the application does 2020-09-07 22:03:45 +02:00
requirements.txt First raw version of the stats collector 2020-09-04 00:24:21 +02:00
setup.cfg Refactor, rename and fix pydantic errors 2020-09-04 18:22:49 +02:00
setup.py Build and publish documentation 2020-09-06 04:42:26 +02:00
tox.ini Build and publish documentation 2020-09-06 04:42:26 +02:00

====================
Website checker demo
====================

.. image:: https://github.com/vladan/aiven-site-checker/workflows/build/badge.svg
   :target: https://github.com/vladan/aiven-site-checker/actions?query=workflow%3Abuild+branch%3Amaster

.. image:: https://github.com/vladan/aiven-site-checker/workflows/documentation/badge.svg
   :target: https://vladan.github.io/aiven-site-checker/

CHWEB is a website checking tool.

It sends HTTP requests to sites with the intent to check their status /
availibility, and if the ``regex`` param is specified, it runs it against the
response body. The retreived status check is sent to a Kafka topic. When
read by the consumer, the status check is written in a PostgreSQL database.

ATM in its very early stages meant to demo `aiven <https://aiven.io>`_'s
platform, using their `kafka <https://aiven.io/kafka>`_ and `postgresql
<https://aiven.io/postgresql>`_ services.


Quickstart
==========

Easiest way to start the application is to clone it, install it and run the
console scripts via the command line::

    pip install .
    chweb_collect -c config.yaml &
    chweb_consume -c config.yaml &

Sites
-----

You can specify the sites you want checked in the yaml config file. They are
stored in the ``sites`` key and are represented as a list of objects with
``url`` and ``check_interval`` as mandatory fields, and regex as an optional
field that can freely omitted which checks the body of the response against the
regex expression::

  - url: "https://example.com"
    regex: "domain" # a regex matching the body of the response
    check_interval: 5
  - url: "https://example.com"
    regex: "aaaaaaaa" # a regex not matching the body of the response
    check_interval: 8
  - url: "https://example.com/404"
    check_interval: 13

Logging configuration
---------------------

.. highlight:: yaml

The ``logging`` section must be present. A simple example of a console logger
as seen in ``config.yaml``::

    logging:
      version: 1
      formatters:
        standard:
          format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
        error:
          format: "%(levelname)s <PID %(process)d:%(processName)s> %(name)s.%(funcName)s(): %(message)s"
      handlers:
        console:
          class: logging.StreamHandler
          level: DEBUG
          formatter: standard
          stream: ext://sys.stdout
      root:
        level: DEBUG
        handlers: [console]
        propogate: yes

Postgres
--------

Postgres settings with aivens certs, passwords, etc. You'll get them once you
setup the postgres service on https://console.aiven.io/. My postgres section
in the config looks like this::

    postgres:
      dbhost: "pg-2e0f365c-vladanovic-4654.aivencloud.com"
      dbport: 23700
      dbname: "defaultdb"
      dbuser: "avnadmin"
      dbpass: ""
      dbcert: "./certs/pg.pem"

* ``dbhost`` and ``dbport``, ``dbuser`` and ``dbpass`` are straightforward.
* ``defaultdb`` is the database present by default, when the service is
  created. You can create another database if it rocks your boat.
* ``dbcert`` is the cert which opens in a modal popup in the console. You need
  to copy it manually to a file on a path you later state in the config.

Kafka
-----

Kafka is also a service easily provisioned through aivens console. After it's
set up you get a config section similar to this one::

    kafka:
      servers:
      - "kafka-f7ae38e-vladanovic-4654.aivencloud.com:23702"
      topic: "sitestats"
      cafile: "./certs/ca.pem"
      cert: "./certs/service.cert"
      key: "./certs/service.key"
      passwd: ""

* ``servers`` is a list because that's how the library is initialized, which
  makes sense if you have multiple brokers.
* ``topic`` is the kafka topic messages are sent to. You need to define it in
  aivens console as well.
* ``cafile``, ``cert`` and ``key`` are the ssl certificates you get when aivens
  kafka service is ready.
* ``password`` is not needed afaik, but you can give it a go.

Full config file
================

while reading about the config values might be good, here's a better
all-in-one config file, which ofc you'd need to update with your own values::

    kafka:
      servers:
      - "kafka-f7ae38e-vladanovic-4654.aivencloud.com:23702"
      topic: "sitestats"
      cafile: "./certs/ca.pem"
      cert: "./certs/service.cert"
      key: "./certs/service.key"
      passwd: ""
    postgres:
      dbhost: "pg-2e0f365c-vladanovic-4654.aivencloud.com"
      dbport: 23700
      dbname: "defaultdb"
      dbuser: "avnadmin"
      dbpass: ""
      dbcert: "./certs/pg.pem"
    sites:
    - url: "https://dsadakjhkjsahkjh.com"
      regex: "domain"
      check_interval: 5
    - url: "https://example.com"
      regex: "aaaaaaaaaaaaa"
      check_interval: 8
    - url: "https://example.com/404"
      check_interval: 13
    logging:
      version: 1
      formatters:
        standard:
          format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
        error:
          format: "%(levelname)s <PID %(process)d:%(processName)s> %(name)s.%(funcName)s(): %(message)s"
      handlers:
        console:
          class: logging.StreamHandler
          level: DEBUG
          formatter: standard
          stream: ext://sys.stdout
      root:
        level: DEBUG
        handlers: [console]
        propogate: yes