Running MongoDB with Ops Manager

April 17, 2020, 10:09 am

≫ Next: Announcing ClusterControl 1.7.6: HA Stack Deployments in the Cloud with HAProxy

≪ Previous: A Guide to Configuring a Load Balancer in a MongoDB Sharded Cluster

Database administration goes beyond ensuring smooth operations to having historic performance that will offer some baselines for capacity planning, get real-time performance for load spikes, automating a large cluster of nodes and having a backup plan for the database.

There are so many automation tools that can perform some of these tasks, like Ansible, Salt and Puppet but MongoDB Ops Manager offers more beyond their capability. Besides, one needs to know what the database state is, at a given time and what updates need to be made so that the system is up to date.

What is MongoDB Ops Manager?

This is a management application for MongoDB created by the MongoDB database engineers to make it easier to and speed up the processes of deployment, monitoring, backups, and scaling. It is only available with the MongoDB Enterprise Advanced license.

Database usage increases with time as more users use it and the vulnerability of the data involved increases too. A database can be subjected to risks such as network humming and hacking thus affecting a business operation. The database management group needs to notice the changing numbers so as to keep the database in the latest patches and serving capability. MongoDB Ops Manager provides this extension capabilities for an improved database performance in the following ways:

Data Loss Protection
Easy Tasks Automation
Providing Information on query rates
GUI overall Performance Visibility
Elastic deployments management
Integration with cloud applications

In general, Ops Manager helps in Automation, Monitoring, and Backups.

Ops Manager Automation Features

Managing a large cluster deployment by yourself can become tedious, especially when executing the same instructions over time and (depending on the demand) you will either scale up or down. Some of these tasks may require you to hire database specialists to do so. The Ops Manager GUI offers some of these actions with just a few clicks. You can use it to add or remove nodes to your cluster according to demand and the MongoDB rebalances automatically in regard to the new topology with minimal or no downtime.

Some of the operations you performed manually (such as deploying a new cluster, upgrading nodes, adding replica set members and shards) are orchestrated and automated by the Ops Manager. The next time you undertake the procedure, you will just need a click of a button and all the tasks will be executed. There is also an Ops Manager RESTful API to enable you to integrate programmatic management.

With this type of automation, you can reduce your operational costs and overhead.

MongoDG Monitoring with Ops Manager

Monitoring is an important feature for any database system in regards to resource allocation and notifications on database health. Without any idea how your database is performing, chances of hitting a technical hitch are high and consequently catastrophic. MongoDB Ops Manager even has a complete performance visibility in a graphical representation, provides real-time reporting and an alerting capability on key performance indicators such as hardware resources.

In case of capacity planning, the Ops Manager offers a historic performance view from which operational baseline can be derived.

The monitoring is achieved by enabling it in the same MongoDB host. Monitoring collects the data from all nodes in the deployment and an Agent transmits these statistics to the Ops Manager which creates a report on the deployment status in real time.

From the reports, you can easily see slow and fast queries and figure out how you can optimize them for average performance.

The Ops Manager provides custom dashboards and charts for tracking many databases on key health metrics that include CPU utilization and memory.

Enabling alerts in the Ops Manager is important as you would want to know which key metrics from the database are out of range. Their configuration varies in terms of parameters affecting individual hosts, agents, replica sets and backups. The Ops Manager offers 4 major reporting strategies to keep you overhead of any potential technical hitches: Incident Management system, SMS, Email or Slack.

You can also use the Ops Manager RESTful API and feed the data to platforms such APM to view the health metrics.

MongoDB Backups with Ops Manager

Data loss is one of the most painful setbacks that may impact the operation of any business. However, with Ops Manager, data is protected. Database downtime may happen at any time for example, due to power blackouts or network disconnections. Lucky is the organization that uses the MongoDB Ops Manager since it continuously maintains backups either in a scheduled snapshots mode or a point-in-time recovery. If the MongoDB deployment fails at some point, the most recent backup will be only moments behind last database status before failure hence reduced data loss.

The tool offers a window for executing queries to backups directly to find the correct point for a restore. Besides, you can use this to understand how data structures have changed with time.

The Ops Manager backup only works with a cluster or replica set, otherwise, for a standalone mongod process you will need to convert it into a single-member replica set.

How Backup and Restoration Work with Ops Manager

After enabling backup in MongoDB deployment, the backup performs an initial sync of the deployment’s data the same way as it could be creating a new invisible member of a replica set.An agent sends the initial sync and oplog data over the HTTPS back to Ops Manager. During the backup process, the database withholds all throughput operations but they are recorded in the oplog hence it is also sent to get the last update.

The backup will then tail each replica set’s oplog to maintain an on disk standalone database (head database) which will be maintained by the Ops Manager for each backed up replica set. This head database stays consistent with the original primary to the last oplog supplied via the agent.

For a sharded cluster, a restore can be made from checkpoints between snapshots while for a replica set a restore can be made from selected points in time.

For a snapshot restoration, the Ops Manager will read directly from the snapshot storage.

When using point-in-time or check point, the Ops manager restores a full snapshot from the snapshot storage and then applies the stored oplogs to a specified point. The Ops manager delivers the snapshot and oplog update using a HTTPS mechanism.

How much oplog you keep per backup will determine how much time a checkpoint and point-in-time restore can cover.

Integration with Cloud Applications

Not all MongoDB deployments are run from the same cluster host. There are so many cloud hosts (such as Red Hat OpenShift, Kubernates and Pivotal Cloud Foundry) are making the integration complicated with other tools. Ops Manager, however, can be integrated with these variety of cloud application deployment platforms hence making it consistent and elegant to run and deploy workloads wherever they need to be, ensuring same database configuration in different environments and controlling them from a single platform.

Conclusion

Managing a large MongoDB cluster deployment is not an easy task. Ops Manager is an automation tool that offers a visualized database state and an alerting system; key features in providing information about the health of the database. It does, however, require an Enterprise License which for some organizations can be out of the budget.

ClusterControl provides an alternative, offering many of the same features and functions of Ops Manager but at more than half the cost. You can learn more about what ClusterControl does for MongoDB here.

Tags:

MongoDB

nosql

mongodb ops manager

database management

mongodb tools

↧

Announcing ClusterControl 1.7.6: HA Stack Deployments in the Cloud with HAProxy

April 20, 2020, 11:22 am

≫ Next: PostgreSQL Load Balancing in the Cloud Made Easy

≪ Previous: Running MongoDB with Ops Manager

We’re excited to announce the 1.7.6 release of ClusterControl - the only database management system you’ll ever need to take control of your open source database infrastructure.

This new edition expands our commitment to cloud integration by allowing a user to deploy a SQL database stack to the cloud provider of your choice with the HAProxy load balancer pre-configured. This makes it even simpler and faster to get a highly available deployment of the most popular open source databases into the cloud with just a couple of clicks.

In addition to this new function we also have improved our new MySQL Freeze Frame system by adding the ability to snapshot the process list before a cluster failure.

Release Highlights

Simple Cloud Deployment of HA Database Stack with Integrated HAProxy

Improvements to the cloud deployment GUI to allow deployment and configuration of HAProxy along with the database stack to the cloud provider of your choosing.

MySQL Freeze Frame (BETA)

Now snapshots the MySQL process list before a cluster failure.

Additional Misc Improvements

CMON Upgrade operations are logged in a log file.
Many improvements and fixes to PostgreSQL Backup, Restore, and Verify Backup.
A number of legacy ExtJS pages have been migrated to AngularJS.

View Release Details and Resources

Release Details

Cloud Deployment of HA Database Stack with Integrated HAProxy

In ClusterControl 1.6 we introduced the ability to directly deploy a database cluster to the cloud provider of your choosing. This made the deployment of highly available database stacks simpler than it had ever been before. Now with the new release we are adding the ability to deploy an HAProxy Load Balancer right alongside the database in a complete, pre-configured full stack.

Load balancers are an essential part of traffic management and performance, and you can now deploy a pre-integrated database/load balancer stack using our easy-to-use wizard.

PostgreSQL Improvements

Over the course of the last few months, we have been releasing several patches which culminated in the release of ClusterControl 1.7.6. You can review the changelog to see all of them. Here are some of the highlights...

Addition of Read/Write Splitting for HAProxy for PostgreSQL
Improvements to the Backup Verification process
Improvements to the Restore & Recovery functions
Several fixes and improvements regarding Point-in-Time Recovery
Bug fixes regarding the Log & Configuration files
Bug fixes regarding process monitoring & dashboards

Tags:

↧

PostgreSQL Load Balancing in the Cloud Made Easy

April 21, 2020, 10:22 am

≫ Next: Managing Your Open Source Databases from Your iPad

≪ Previous: Announcing ClusterControl 1.7.6: HA Stack Deployments in the Cloud with HAProxy

We’d mentioned many times the advantages of using a Load Balancer in your database topology. It could be for redirecting traffic to healthy database nodes, distribute the traffic across multiple servers to improve performance, or just to have a single endpoint configured in your application for an easier configuration and failover process.

Now with the new ClusterControl 1.7.6 version, you can not only deploy your PostgreSQL cluster directly in the cloud, but also you can deploy Load Balancers in the same job. For this, ClusterControl supports AWS, Google Cloud, and Azure as cloud providers. Let’s take a look at this new feature.

Creating a New Database Cluster

For this example, we’ll assume that you have an account with one of the supported cloud providers mentioned, and configured your credentials in a ClusterControl 1.7.6 installation.

If you don’t have it configured, you must go to ClusterControl -> Integrations -> Cloud Providers -> Add Cloud Credentials.

Here, you must choose the cloud provider and add the corresponding information.

This information depends on the cloud provider itself. For more information, you can check our official documentation.

You don’t need to access your cloud provider management console to create anything, you can deploy your Virtual Machines, Databases, and Load Balancers directly from ClusterControl. Go to the deploy section and select “Deploy in the Cloud”.

Specify vendor and version for your new database cluster. In this case, we’ll use PostgreSQL 12.

Add the number of nodes, cluster name, and database information like credentials and server port.

Choose the cloud credentials, in this case, we’ll use an AWS account. If you don’t have your account added into ClusterControl yet, you can follow our documentation for this task.

Now you must specify the virtual machine configuration, like operating system, size, and region.

In the next step, you can add Load Balancers to your Database Cluster. For PostgreSQL, ClusterControl supports HAProxy as Load Balancer. You need to select the number of Load Balancer nodes, instance size, and the Load Balancer information.

This Load Balancer information is:

Listen Port (Read/Write): Port for read/write traffic.
Listen Port (Read-Only): Port for read-only traffic.
Policy: It can be:
- leastconn: The server with the lowest number of connections receives the connection
- roundrobin: Each server is used in turns, according to their weights
- source: The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request

Now you can review the summary and deploy it.

ClusterControl will create the virtual machines, install the software, and configure it, all in the same job and in an unattended way.

You can monitor the creation process in the ClusterControl activity section. When it finishes, you will see your new cluster in the ClusterControl main screen.

If you want to check the Load Balancers nodes, you can go to ClusterControl -> Nodes -> HAProxy node, and check the current status.

You can also monitor your HAProxy servers from ClusterControl by checking the Dashboard section.

Now you are done, you can check your cloud provider management console, where you will find the Virtual Machines created according to your selected ClusterControl job options.

Conclusion

As you could see, having a Load Balancer in front of your PostgreSQL cluster in the cloud is really easy using the new ClusterControl“Deploy in the Cloud” feature, where you can deploy your Databases and Load Balancer nodes in the same job.

Tags:

↧

Managing Your Open Source Databases from Your iPad

April 22, 2020, 11:24 am

≫ Next: SysAdmin Working from Home? Tips to Automate MySQL, MariaDB, Postgres & MongoDB

≪ Previous: PostgreSQL Load Balancing in the Cloud Made Easy

With the current COVID-19 situation ongoing, plenty of people have started to work from home. Among those are people whose job is to manage database systems. The lockdowns which have been announced all over the world mean that kids are also staying at home too. Homeschooling is now a thing, in many cases it comes with some sort of online learning activities. This creates pressure on the resources available at home. Who should be using laptops? Moms and Dads working from home or their kids, for their online classes. People often experience an “every laptop and tablet counts” situation. How can you do your job while having only an iPad available? Can you manage your database system with its help? Let’s take a look at this problem.

Connectivity

The main issue to solve would most likely be connectivity.

If you can use one of the supported VPN methods, good for you. If not, you can search the App Store for additional VPN clients like . Hopefully you’ll be able to find something suitable for you like, for example, OpenVPN Connect.

One way or the other, as soon as you can connect to your VPN, you can start working. There are a couple of ways to approach it. One might be a traditional way involving SSH access. Technically speaking, a 13'’ iPad with a Smart Keyboard can be quite a nice replacement for a laptop. Still, for those smaller, 10’’ screens, you have to accept some compromises.

For connecting over SSH we used Terminus. Here’s how it looks.

With on-screen keyboard work space is quite limited. On the other hand, you can achieve everything you could have achieved using your laptop. It’s just more time consuming and more annoying.

In full screen mode it’s slightly better but the font is really small. Sure, you can increase its size:

But then you end up scrolling through the kilometers of text. Doable but far from comfortable. You can clearly see that managing databases in such a way is quite hard, especially if we are talking about emergency situations where you have to act quickly.

Luckily, there’s another approach where you can rely on the database management platform to help you in your tasks. ClusterControl is an example of such a solution.

We are not going to lie, as every UI, ClusterControl will work better on larger screens, but it still works quite well:

It can help you to deal with the regular tasks like monitoring the replication status.

You can scroll through the metrics and see if there are any problems with your database environment.

With just a couple of clicks you can perform management tasks that otherwise would require executing numerous CLI commands.

You can manage your backups, edit the backup schedule, create new backups, restore, verify. All with just a couple of clicks.

As you can see, an iPad might be quite a powerful tool in dealing with database management tasks. Even with the low screen estate, through using proper tools like ClusterControl, you can achieve almost the same outcome.

Tags:

clustercontrol

sysadmin

Database

↧

SysAdmin Working from Home? Tips to Automate MySQL, MariaDB, Postgres & MongoDB

April 22, 2020, 12:09 pm

≫ Next: Tips for Managing MongoDB Remotely

≪ Previous: Managing Your Open Source Databases from Your iPad

Wednesday, April 22, 2020 - 16:00 to 16:30

Are you an SysAdmin who is now responsible for your companies database operations? Then this is the webinar for you. Learn from a Senior DBA the basics you need to know to keep things up-and-running and how automation can help.

↧

Tips for Managing MongoDB Remotely

April 23, 2020, 10:58 am

≫ Next: My Favorite PostgreSQL Extensions - Part One

≪ Previous: SysAdmin Working from Home? Tips to Automate MySQL, MariaDB, Postgres & MongoDB

Working remotely due to the Covid-19 pandemic means an increase in the importance of isolated infrastructures; more specifically ones that can only be accessed through an internal network, but in a way that authorized people from the outside world can access the system anytime or anywhere.

In this article, we will share some basic steps that you must implement with MongoDB to ensure secure access while administering the database.

Securing MongoDB

Before accessing the MongoDB database remotely, you must perform a “hardening” of the environment. Set the following on the infrastructure side:

Enable MongoDB Authentication

This feature is mandatory to enable, regardless if we want to access the MongoDB database from the internal network or from an external network. Before enabling the authorization, you must first create an admin user in MongoDB. You can run below command to create admin user in your one of mongoDB server:

$ mongo

> use admin

> db.createUser(

      {

          user: "admin",

          pwd: "youdontknowmyp4ssw0rd",

          roles: [ "root" ]

      }

  );

Above command will create a new user called admin with root privileges. You can enabled the MongoDB Auth feature by opening the /etc/mongod.conf file and then adding the following line:

  security:

   authorization: 'enabled'

Do not forget to restart your mongoDB service to apply the changes. Above command will restrict access to the database, only the one who has access credentials who are eligible to log in.

Setup Roles and Privileges

To prevent the misuse of access to MongoDB, we can implement role-based access by creating several roles and its privileges.

Make sure you have a list of users who need to access the database and understand each individual’s needs and responsibilities. Create roles and assign the privileges to these created roles. After that, you can assign your user to a role based on the responsibilities.

This approach helps us to minimize the abuse of authority and identify the role and user immediately when something unwanted happened.

Configure an SSL / TLS Connection

MongoDB supports SSL / TLS connections for securing data in transit. To implement this, you have to generate your own SSL Key, you can generate it using openssl. To enable SSL / TLS support, you can edit the /etc/mongod.conf file and add the following parameter:

  net:

      tls:

         mode: requireTLS

         certificateKeyFile: /etc/mongo/ssl/mongodb.pem

After adding these parameters, you need to restart the MongoDB service. If you have MongoDB replicaset architecture, you need to apply them on each node. SSL is also needed when the client will access MongoDB, whether it is from the application side or from the client directly.

For production use, you should use valid certificates generated and signed by a single certificate authority. You or your organization can generate and maintain certificates as an independent certificate authority, or use certificates generated by third-party TLS/SSL vendors. Prevent using a self signed certificate, unless it is a trusted network.

Restrict the Database Port

You have to make sure that only the MongoDB port is opened on the firewall server or firewall appliance, make sure no other ports are open.

Securing the MongoDB Connection

Remote connection via public internet presents the risk of data being transmitted from local users to the database server and vice versa. Attackers can interrupt the interconnection, which in this case is known as MITM (Min-in-The-Middle) attack. Securing connection is very necessary when we manage / administer the database remotely, some things we can apply to protect our access to the database are as follows:

Private Network Access

VPN (Virtual Private Network) is one of the fundamental things when we want to access our infrastructure from outside securely. VPN is a private network that uses public networks to access the remote sites. VPN setup requires hardware that must be prepared on the private network side, beside that the client also needs VPN software that supports access to the private network.

Besides using VPN, another way to access MongoDB server is by port forwarding database port via SSH, or better known as SSH Tunneling.

Use SSL / TLS from the Client to the Database Server

In addition to implementing secure access using VPN or SSH Tunneling, we can use SSL / TLS which was previously configured on the MongoDB side. You just need the SSL key that you have and try connecting to the database using the SSL Key.

Enable Database Monitoring

It is essential to enable the monitoring service to understand the current state of the databases. The monitoring server can be installed under the public domain that has SSL / TLS enabled, so automatically access to the browser can use HTTPs.

Conclusion

It is really fun to work from home, you can interact with your kids and at the same time monitor your database. You must follow the above guidelines to make sure you do not get attacked or have data stolen when accessing your database remotely.

Tags:

↧

My Favorite PostgreSQL Extensions - Part One

April 24, 2020, 12:17 pm

≫ Next: Tips for Reducing Production Database Infrastructure Costs

≪ Previous: Tips for Managing MongoDB Remotely

This is in continuation of my previous blog entry wherein I had touched upon a topic of PostgreSQL Extensions. PostgreSQL Extensions are a plug and play set of enhancements that add an extra feature-set to a PostgreSQL cluster. Some of these features are as simple as reading or writing to an external database while others could be a sophisticated solution to implement database replication, monitoring, etc.

PostgreSQL has evolved over the years from a simple open source ORDBMS to a powerful database system with over 30 years of active development offering reliability, performance, and all ACID compliant features. With PostgreSQL 12 released a few months ago, this database software is only getting bigger, better, and faster.

Occasionally, extensions needed to be added to a PostgreSQL cluster to achieve enhanced functionality that were unavailable in the native code, because they were either not developed due to time constraints or due to insufficient evidence of edge case database problems. I am going to discuss a few of my favourite extensions in no particular order, with some demos that are used by developers and DBAs.

Some of these extensions may require to be included in the shared_preload_libraries server parameter as a comma separated list to be preloaded at the server start. Although most of the extensions are included in the contrib module of source code, some have to be downloaded from an external website dedicated only to PostgreSQL extensions called the PostgreSQL Extension Network.

In this two part blog series we will discuss extensions used to access data (postgres_fwd) and shrink or archive databases (pg_partman). Additional extensions will be discussed in the second part.

postgres_fdw

The postgres_fdw is a foreign data wrapper extension that can be used to access data stored in external PostgreSQL servers. This extension is similar to an older extension called dblink but it differs from its predecessor by offering standards-compliant syntax and better performance.

The important components of postgres_fdw are a server, a user mapping, and a foreign table. There is a minor overhead added to the actual cost of executing queries against remote servers which is the communication overhead. The postgres_fdw extension is also capable of communicating with a remote server having a version all the way up to PostgreSQL 8.3, thus being backward compatible with earlier versions.

Demo

The demo will exhibit a connection from PostgreSQL 12 to a PostgreSQL 11 database. The pg_hba.conf settings have already been configured for the servers to talk to each other. The extensions control files have to be loaded into the PostgreSQL shared home directory before creating the extension from Inside a PostgreSQL cluster.

Remote Server:

$ /usr/local/pgsql-11.3/bin/psql -p 5432 -d db_replica postgres

psql (11.3)

Type "help" for help.



db_replica=# create table t1 (sno integer, emp_id text);

CREATE TABLE



db_replica=# \dt t1

        List of relations

 Schema | Name | Type  |  Owner

--------+------+-------+----------

 public | t1   | table | postgres



db_replica=# insert into t1 values (1, 'emp_one');

INSERT 0 1

db_replica=# select * from t1;

 sno | emp_id

-----+---------

   1 | emp_one

(1 row)

Source Server:

$ /database/pgsql-12.0/bin/psql -p 5732 postgres

psql (12.0)

Type "help" for help.

postgres=# CREATE EXTENSION postgres_fdw;

CREATE EXTENSION



postgres=# CREATE SERVER remote_server

postgres-# FOREIGN DATA WRAPPER postgres_fdw

postgres-# OPTIONS (host '192.168.1.107', port '5432', dbname 'db_replica');

CREATE SERVER



postgres=# CREATE USER MAPPING FOR postgres

postgres-# SERVER remote_server

postgres-# OPTIONS (user 'postgres', password 'admin123');

CREATE USER MAPPING



postgres=# CREATE FOREIGN TABLE remote_t1

postgres-# (sno integer, emp_id text)

postgres-# server remote_server

postgres-# options (schema_name 'public', table_name 't1');

CREATE FOREIGN TABLE



postgres=# select * from remote_t1;

 sno | emp_id

-----+---------

   1 | emp_one

(1 row)



postgres=# insert into remote_t1 values (2,'emp_two');

INSERT 0 1



postgres=# select * from remote_t1;

 sno | emp_id

-----+---------

   1 | emp_one

   2 | emp_two

(2 rows)

The WRITE operation from the source server reflects the remote server table immediately. A similar extension called oracle_fdw also exists which enables READ and WRITE access between PostgreSQL and Oracle tables. In addition to that, there is another extension called file_fdw which enables data access from flat files on disk. Please refer to the official documentation of postgres_fdw published here, for more information and details.

pg_partman

As databases and tables grow, there is always a need to shrink databases, archive data that is not needed or at least partition tables into various smaller fragments. This is so the query optimizer only visits the parts of the table that satisfy query conditions, instead of scanning the whole heap of tables.

PostgreSQL has been offering partitioning features for a long time including Range, List, Hash, and Sub-partitioning techniques. However, it requires a lot of administration and management efforts such as defining child tables that inherit properties of a parent table to become its partitions, creating trigger functions to redirect data into a partition and further create triggers to call those functions, etc. This is where pg_partman comes into play, wherein all of these hassles are taken care of automatically.

Demo

I will show a quick demo of setting things up and inserting sample data. You will see how the data inserted into the main table gets automatically redirected to the partitions by just setting up pg_partman. It is important for the partition key column to be not null.

db_replica=# show shared_preload_libraries;

 shared_preload_libraries

--------------------------

 pg_partman_bgw

(1 row)



db_replica=# CREATE SCHEMA partman;

CREATE SCHEMA

db_replica=# CREATE EXTENSION pg_partman SCHEMA partman;

CREATE EXTENSION

db_replica=# CREATE ROLE partman WITH LOGIN;

CREATE ROLE

db_replica=# GRANT ALL ON SCHEMA partman TO partman;

GRANT

db_replica=# GRANT ALL ON ALL TABLES IN SCHEMA partman TO partman;

GRANT

db_replica=# GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO partman;

GRANT

db_replica=# GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO partman;

GRANT

db_replica=# GRANT ALL ON SCHEMA PUBLIC TO partman;

GRANT

db_replica=# create table t1  (sno integer, emp_id varchar, date_of_join date not null);

db_replica=# \d

        List of relations

 Schema | Name | Type  |  Owner

--------+------+-------+----------

 public | t1   | table | postgres

(1 row)



db_replica=# \d t1

                         Table "public.t1"

    Column    |       Type        | Collation | Nullable | Default

--------------+-------------------+-----------+----------+---------

 sno          | integer           |           |          |

 emp_id       | character varying |           |          |

 date_of_join | date              |           | not null |

db_replica=# SELECT partman.create_parent('public.t1', 'date_of_join', 'partman', 'yearly');

 create_parent

---------------

 t

(1 row)



db_replica=# \d+ t1

                                             Table "public.t1"

    Column    |       Type        | Collation | Nullable | Default | Storage  | Stats target | Description

--------------+-------------------+-----------+----------+---------+----------+--------------+-------------

 sno          | integer           |           |          |         | plain    |              |

 emp_id       | character varying |           |          |         | extended |              |

 date_of_join | date              |           | not null |         | plain    |              |

Triggers:

    t1_part_trig BEFORE INSERT ON t1 FOR EACH ROW EXECUTE PROCEDURE t1_part_trig_func()

Child tables: t1_p2015,

              t1_p2016,

              t1_p2017,

              t1_p2018,

              t1_p2019,

              t1_p2020,

              t1_p2021,

              t1_p2022,

              t1_p2023



db_replica=# select * from t1;

 sno | emp_id | date_of_join

-----+--------+--------------

(0 rows)



db_replica=# select * from t1_p2019;

 sno | emp_id | date_of_join

-----+--------+--------------

(0 rows)



db_replica=# select * from t1_p2020;

 sno | emp_id | date_of_join

-----+--------+--------------

(0 rows)



db_replica=# insert into t1 values (1,'emp_one','01-06-2019');

INSERT 0 0

db_replica=# insert into t1 values (2,'emp_two','01-06-2020');

INSERT 0 0

db_replica=# select * from t1;

 sno | emp_id  | date_of_join

-----+---------+--------------

   1 | emp_one | 2019-01-06

   2 | emp_two | 2020-01-06

(2 rows)



db_replica=# select * from t1_p2019;

 sno | emp_id  | date_of_join

-----+---------+--------------

   1 | emp_one | 2019-01-06

(1 row)



db_replica=# select * from t1_p2020;

 sno | emp_id  | date_of_join

-----+---------+--------------

   2 | emp_two | 2020-01-06

(1 row)

This is a simple partitioning technique but each of the above simple partitions can be further divided into sub-partitions. Please check the official documentation of pg_partman published here, for more features and functions it offers.

Conclusion

Part two of this blog will discuss other PostgreSQL extensions like pgAudit, pg_repack and HypoPG.

Tags:

↧

Tips for Reducing Production Database Infrastructure Costs

April 27, 2020, 10:39 am

≫ Next: Using SSH Tunneling as a VPN Alternative

≪ Previous: My Favorite PostgreSQL Extensions - Part One

The database tier is one of the most important layers in a system architecture. It must be set up correctly from the beginning due to it being stateful, it is harder to scale as compared to other tiers. If the growth is exponential, the initial decision might get caught in the middle with outrageous total cost of ownership (TCO) which could inhibit database scaling and eventually affect business growth.

In this blog post, we are going to look into some tips on how to reduce the overall TCO of our production database infrastructure costs.

Use Open-Source Software & Tools

Using open source software is the very first step to lower down the database infrastructure cost. Almost every commercial software available in the market has an equivalent of it in the open-source world. The most flexible and cost-effective way to optimize your database strategy is to use the right tool for the right job.

It is possible to build the whole database tier with open source softwares and tools, for example:

Component	Product/Tool/Software
Infrastructure	OpenStack, CloudStack
Hypervisor	Virtualbox, KVM, Xen, QEMU
Firewall	PFSense, OPNsense, Untangle, Simplewall
Containerization	Docker, rkt, lxc, OpenVZ
Operating system	Ubuntu Server, CentOS, Debian, CoreOS
Relational DBMS	MySQL, MariaDB, PostgreSQL, Hive, SQLite
Document-based DBMS	MongoDB, Couchbase, CouchDB
Column-based DBMS	Cassandra, ClickHouse, HBase
Key-value DBMS	Redis, memcached
Time-series DBMS	InfluxDB, OpenTSDB, Prometheus, TimeScaleDB
Database backup tool	Percona Xtrabackup, MariaDB Backup, mydumper, pgbackrest
Database monitoring tool	PMM, Monyog, Zabbix, Nagios, Cacti, Zenoss, Munin
Database management tool	PHPMyAdmin, HeidiSQL, PgAdmin, DBeaver
Database load balancer	ProxySQL, HAProxy, MySQL Router, Pgbouncer, pg-pool, MaxScale
Topology manager	Orchestrator, MaxScale, MHA, mysqlrpladmin
Configuration management tool	Ansible, Puppet, Chef, Salt
Keyring server	Vault, CyberArk Conjur, Keywhiz
Service discovery	etcd, consul, Zookeeper
ETL tools	Talend, Kettle, Jaspersoft

As listed above, there is a plethora of open source software and tools in various categories available that you can choose from. Although the software is available ‘for free’, many offer a dual licensing model - community or commercial, where the latter comes with extended features and technical support.

There are also free companion and helper tools that are created and maintained as open-source projects which can improve the usability, efficiency, availability and productivity of a product. For example, for MySQL you can have PHPmyAdmin, Percona Xtrabackup, Orchestrator, ProxySQL and gh-ost, amongst many others. For PostgreSQL we have for example Slony-I, pgbouncer, pgAdmin and pgBackRest. All of these tools are free to use and are driven by community.

Using open source software will also make us free from vendor lock-in, makes us independent from a vendor for products and services. We are free to use other vendors without substantial switching costs.

Run on Virtual Machines or Containers

Hardware virtualization allows us to make use of all of the resources available in a server. Despite the performance overhead due to physical resource sharing by the guest hosts, it gives us a cheaper alternative to have multiple instances running simultaneously without the cost of multiple physical servers. It is easier to manage, reusable for different purposes like testing and understanding how well our application and database communicate and scale across multiple hosts.

Running your production database on bare-metal servers is the best option if performance matters. Most of the time, the performance overhead on hardware virtualization can be minimized if we plan proper isolation of the guest hosts with fair load distribution and if we allocate sufficient resources to avoid starvation when sharing resources.

Containers are better placed, at least theoretically, to achieve lower TCO (total cost of ownership) than traditional hardware virtualization. Containers are an operating system-level virtualization, so multiple containers can share the OS. Hardware virtualization uses a hypervisor to create virtual machines and each of those VMs has its own operating system. If you are running on virtualization with the same operating system over guest OSes, that could be a good justification to use container virtualization instead. You can pack more on to a server that is running containers on one version of an OS compared to a server running a hypervisor with multiple copies of an OS.

For databases, almost all popular DBMS container images are available for free in DockerHub:

PostgreSQL - https://hub.docker.com/_/postgres
MySQL - https://hub.docker.com/_/mysql
MariaDB - https://hub.docker.com/_/mariadb
Percona Server - https://hub.docker.com/_/percona
MongoDB - https://hub.docker.com/_/mongo
Redis - https://hub.docker.com/_/redis
Cassandra - https://hub.docker.com/_/cassandra
Couchbase - https://hub.docker.com/_/couchbase
ClickHouse - https://hub.docker.com/r/yandex/clickhouse-server
TimescaleDB - https://hub.docker.com/r/timescale/timescaledb
InfluxDB - https://hub.docker.com/_/influxdb

There are also tons of articles and guidelines on how to run your open source database on Docker containers, for example this one which I like (because I wrote it! :-) ), MySQL Docker Containers: Understanding the Basics.

Embrace Automation

Automation can greatly reduce cost by shrinking the DBA/DevOps team size with all sorts of automation tools. Managing the database infrastructure lifecycle involves many risky and repetitive tasks which require expertise and experience. Hiring talented candidates, or building up a team to support the infrastructure can take a significant amount of time, and it comes with a handsome cost for salary, benefits and employee welfare.

Human beings have feelings. They have bad days, personal problems, pressure for results, many types of distractions, and so on. It’s common to forget a step, or misfire a destructive command especially on a daily repetitive task. A well-defined configuration creates a stable process. The machine will never miss a single step.

Repetitive tasks like database deployment, configuration management, backup, restore and software upgrade can be automated with infrastructure provisioning tools like Terraform, Heat (OpenStack) or CloudFormation (AWS) together with configuration management tools like Ansible, Chef, Salt or Puppet. However, there are always missing parts and pieces that need to be covered by a collection of custom scripts or commands like failover, resyncing, recovery, scaling and many more. Rundeck, an open source runbook automation tool can be used to manage all the custom scripts, which can bring us closer to achieving full automation.

A fully automated database infrastructure requires all important components to work in-sync together like monitoring, alerting, notification, management, scaling, security and deployment. ClusterControl is a pretty advanced automation tool to deploy, manage, monitor and scale your MySQL, MariaDB, PostgreSQL and MongoDB servers. It supports handling of complex topologies with all kinds of database clustering and replication technologies offered by the supported DBMS. ClusterControl has all the necessary tools to replace specialized DBAs to maintain your database infrastructure. We believe that existing sysadmins or devops teams alongside ClusterControl would be enough to handle most of the operational burden of your database infrastructure.

Utlize Automatic Scaling

Automatic scaling is something that can help you reduce the cost if you are running on multiple database nodes in a database cluster or replication chain. If you are running on cloud infrastructure with on-demand or pay-per-use subscription, you probably want to turn off underutilized instances to avoid accumulating unnecessary usage charges. If you are running on AWS, you may use Amazon CloudWatch to detect and shut down unused EC2 instances, as shown in this guide. For GCP, there is a way to auto-schedule nodes using Google Cloud Scheduler.

There are a number of ways to make database automatic scaling possible. We could use Docker containers with the help of orchestration tools like Kubernetes, Apache Mesos or Docker Swarm. For Kubernetes, there are a number of database operators available that we can use to deploy or scale a cluster. Some of them are:

Oracle MySQL Operator for Kubernetes - https://github.com/oracle/mysql-operator
Percona XtraDB Cluster and MongoDB Operator - https://www.percona.com/software/percona-kubernetes-operators
Presslab Percona Server Operator - https://github.com/presslabs/mysql-operator
CrunchyData PostgreSQL Operator - https://github.com/CrunchyData/postgres-operator
Zalando PostgreSQL Operator - https://github.com/zalando/postgres-operator
DataStax Cassandra Operator - https://github.com/datastax/cass-operator

Automatic database scaling is somehow trivial with the ClusterControl CLI. It's a command line client that you can use to control, manage, monitor your database cluster and it can perform basically anything that the ClusterControl UI is capable of. For example, adding a new MySQL slave node is just a command away:

$ s9s cluster --add-node --cluster-id=42 --nodes='192.168.0.93?slave' --log

Removing a database node is also trivial:

$ s9s cluster --remove-node --cluster-id=42 --nodes='192.168.0.93' --log

The above commands can be automated with a simple bash script, where you can combine with infrastructure automation tools like Terraform or CloudFormation to decommission unused instances. If you are running on supported clouds (AWS, GCP and Azure), ClusterControl CLI can also be used create a new EC2 instance in the default AWS region with a command line:

$ s9s container --create aws-apsoutheast1-mysql-db1 --log

Or you could also remove the instance created in AWS directly:

$ s9s container --delete aws-apsoutheast1-mysql-db1 --log

The above CLI makes use of the ClusterControl Cloud module where one has to configure the cloud credentials first under ClusterControl -> Integrations -> Cloud Providers -> Add Cloud Credentials. Note that the "container" command in ClusterControl means a virtual machine or a host that sits on top of a virtualization platform, not a container on top of OS-virtualization like Docker or LXC.

Tags:

↧

Using SSH Tunneling as a VPN Alternative

April 28, 2020, 11:06 am

≫ Next: An Overview of MongoDB User Management

≪ Previous: Tips for Reducing Production Database Infrastructure Costs

Using a VPN connection is the most secure way to access a network if you are working remotely, but as this configuration could require hardware, time, and knowledge, you should probably want to know alternatives to do it. Using SSH is also a secure way to access a remote network without extra hardware, less time consuming, and less effort than configuring a VPN server. In this blog, we’ll see how to configure SSH Tunneling to access your databases in a secure way.

What is SSH?

SSH (Secure SHell), is a program/protocol that allows you to access a remote host/network, run commands, or share information. You can configure different encrypted authentication methods and it uses the 22/TCP port by default, but it’s recommended changing it for security reasons.

How to Use SSH?

The most secure way to use it is by creating an SSH Key Pair. With this, you don’t only need to have the password but also the private key to be able to access the remote host.

Also, you should have a host with only the SSH server role, and keep it as isolated as possible, so in case of an external attack, it won’t affect your local servers. Something like this:

Let’s see first, how to configure the SSH server.

Server configuration

Most of the Linux Installation has SSH Server installed by default, but there are some cases where it could be missing (minimal ISO), so to install it, you just need to install the following packages:

RedHat-based OS

$ yum install openssh-clients openssh-server

Debian-based OS

$ apt update; apt install openssh-client openssh-server

Now you have the SSH Server installed, you can configure it to only accept connections using a key.

vi /etc/ssh/sshd_config

PasswordAuthentication no

Make sure you change it after having the public key in place, otherwise you won’t be able to log in.

You can also change the port and deny root access to make it more secure:

Port 20022

PermitRootLogin no

You must check if the selected port is open in the firewall configuration to be able to access it.

This is a basic configuration. There are different parameters to change here to improve the SSH security, so you can follow the documentation for this task.

Client configuration

Now, let’s generate the key pair for the local user “remote” to access the SSH Server. There are different types of keys, in this case, we’ll generate an RSA key.

$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/remote/.ssh/id_rsa):

Created directory '/home/remote/.ssh'.

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/remote/.ssh/id_rsa.

Your public key has been saved in /home/remote/.ssh/id_rsa.pub.

The key fingerprint is:

SHA256:hT/36miDBbRa3Povz2FktC/zNb8ehAsjNZOiX7eSO4w remote@local

The key's randomart image is:

+---[RSA 3072]----+

|                 |

|        ..  .    |

|       o.+.=.    |

|        *o+.o..  |

|       +S+o+=o . |

|      . o +==o+  |

|         =oo=ooo.|

|        .E=*o* .+|

|         ..BB ooo|

+----[SHA256]-----+

This will generate the following files in a directory called “.ssh” inside the user’s home directory:

$ whoami

remote

$ pwd

/home/remote/.ssh

$ ls -la

total 20

drwx------ 2 remote remote 4096 Apr 16 15:40 .

drwx------ 3 remote remote 4096 Apr 16 15:27 ..

-rw------- 1 remote remote 2655 Apr 16 15:26 id_rsa

-rw-r--r-- 1 remote remote  569 Apr 16 15:26 id_rsa.pub

The “id_rsa” file is the private key (keep it as secure as possible), and the “id_rsa.pub” is the public one that must be copied to the remote host to access it. For this, run the following command as the corresponding user:

$ whoami

remote

$ ssh-copy-id -p 20022 remote@35.166.37.12

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/remote/.ssh/id_rsa.pub"

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

remote@35.166.37.12's password:



Number of key(s) added:        1



Now try logging into the machine, with:   "ssh -p '20022''remote@35.166.37.12"

and check to make sure that only the key(s) you wanted were added.

In this example, I’m using the port 20022 for SSH, and my remote host is 35.166.37.12. I have also the same user (remote) created in both local and remote hosts. You can use another user in the remote host, so in that case, you should change the user to the correct one in the ssh-copy-id command:

$ ssh-copy-id -p 20022 user@35.166.37.12

This command will copy the public key to the authorized_keys file in the remote .ssh directory. So, in the SSH Server you should have this now:

$ pwd

/home/remote/.ssh

$ ls -la

total 20

drwx------ 2 remote remote 4096 Apr 16 15:40 .

drwx------ 3 remote remote 4096 Apr 16 15:27 ..

-rw------- 1 remote remote  422 Apr 16 15:40 authorized_keys

-rw------- 1 remote remote 2655 Apr 16 15:26 id_rsa

-rw-r--r-- 1 remote remote  569 Apr 16 15:26 id_rsa.pub

Now, you should be able to access the remote host:

$ ssh -p 20022 remote@35.166.37.12

But this is not enough to access your database node, as you are in the SSH Server yet.

SSH Database Access

To access your database node you have two options. The classic way is, if you are in the SSH Server, you can access it from there as you are in the same network, but for this, you should open two or three connections.

First, the SSH connection established to the SSH Server:

$ ssh -p 20022 remote@35.166.37.12

Then, the SSH connection to the Database Node:

$ ssh remote@192.168.100.120

And finally, the database connection, that in case of MySQL, is:

$ mysql -h localhost -P3306 -udbuser -p

And for PostgreSQL:

$ psql -h localhost -p 5432 -Udbuser postgres

If you have the database client installed in the SSH Server, you can avoid the second SSH connection and just run the database connection directly from the SSH Server:

$ mysql -h 192.168.100.120 -P3306 -udbuser -p

or:

$ psql -h 192.168.100.120 -p 5432 -Udbuser postgres

But, this could be annoying as you used to use the database connection directly from your computer connected in the office, so let’s see how to use the SSH Tunneling for this.

SSH Tunneling

Following the same example, we have:

SSH Server Public IP Address: 35.166.37.12
SSH Server Port: 20022
Database Node Private IP Address: 192.168.100.120
Database Port: 3306/5432
SSH user (local and remote): remote
Database user: dbuser

Command Line

So, if you run the following command in your local machine:

$ ssh -L 8888:192.168.100.120:3306 remote@35.166.37.12 -p 20022 -N

This will open the port 8888 in your local machine, which will access the remote database node, port 3306, via the SSH Server, port 20022, using the “remote” user.

So, to make it more clear, after running this command, you can access the remote database node, running this in your local machine:

$ mysql -h localhost -P8888 -udbuser -p

Graphic Tools

If you are using a graphic tool to manage databases, most probably it has the option to use SSH Tunneling to access the database node.

Let’s see an example using MySQL Workbench:

And the same for PgAdmin:

As you can see, the information asked here is pretty similar to the used for the command line SSH Tunneling connection.

Conclusion

Security is important for all companies, so if you are working from home, you must keep data as secure as you are while working in the office. As we mentioned, for this, probably the best solution is having a VPN connection to access the databases, but if for some reason it is not possible, you need to have an alternative to avoid handling data over the internet in an insecure way. As you could see, configuring SSH Tunneling for accessing your databases is not rocket science, and is probably the best alternative in this case.

Tags:

↧

An Overview of MongoDB User Management

April 29, 2020, 9:25 am

≫ Next: Tips for Managing PostgreSQL Remotely

≪ Previous: Using SSH Tunneling as a VPN Alternative

Database User Management is a particularly important part of data security, as we must understand who is accessing the database and set the access rights of each user. If a database does not have a proper user management, user access is going to get very messy and difficult to maintain as time goes on.

MongoDB is a NoSQL database and document store. Applying the RBAC (Role Based-Access Control) concept is key to implementing proper user management to manage user credentials.

What is Role Based Access Control (RBAC)?

RBAC is an approach which restricts the system only to authorized users. In an organization, roles are created for various job functions, in the database we then create the access rights to carry out some operations assigned to a particular role.

Staff members (or other system users) are assigned certain roles and through them are assigned permissions to perform computer system functions. Users are not given permissions directly, but only get them through their role (or roles). Managing individual user rights becomes a matter of simply placing the appropriate role into the user's account; this simplifies general operations (such as adding users or changing user departments).

Three main rules are set for RBAC are:

Role Assignment: A subject can execute permissions only if the subject has been chosen or has been assigned a role.
The Role of Authorization: the active role of a subject must be authorized for the subject. With rule 1 above, this rule ensures that users can take roles only for those who are authorized.
Permission Authorization: A subject can execute permits only if permission is authorized for the active role of the subject. With rules 1 and 2, this rule ensures that users can exercise permission only for those who are authorized.

This blog will briefly review Role Based Access Control in the MongoDB database.

MongoDB User Roles

MongoDB has several types of roles in the database, those are...

Built-in Roles

Provides access to data and actions to MongoDB through role-based authorization and has built-in roles that provide several levels of access in the database.

Role gives several privileges to do something on the resource that has been created. MongoDB built-in roles have several categories:

User Database: Roles Database users have a role to manipulate data in non-system collection. Examples of User Database roles are: read, readWrite.
Database Administration: Roles Database Administration deals with administrative management of databases such as user administration, schema, and objects in it.
Examples of Database Administration roles are: dbAdmin, userAdmin, dbOwner.
Cluster Administration: The role of cluster administration is to administer the entire MongoDB system, including its replicasets and shards. Examples of cluster administration roles are: clusterAdmin, clusterManager.
Backup and Restoration: This Roles is specific for functions related to database backup in MongoDB. Examples of roles are: backup, restore.
All-Database Roles: Roles are in the database admin and have access to all databases except local and config. Examples are: readAnyDatabase, readWriteAnyDatabase, userAdminAnyDatabase.
Superuser: Roles has the ability to grant access to every user, to every privilege, in all databases. Example of this role: root

User Defined Roles

In addition to built-in roles, we can create our own roles according to our needs, what privileges we will give to those roles. To create roles, you can use the db.createRole () function command. Besides being able to create roles, there are several other functions to manage existing roles such as: db.dropRole () which is useful for deleting existing roles in the database, db.getRole () functions to get all information from specific roles.

Privilege Actions in MongoDB

Privileges actions in MongoDB are actions that can be performed by a user on a resource. MongoDB has several action categories, namely:

Database Management Actions, actions related to commands relating to database administration such as changePassword, createCollection, createIndex actions.
Query and Write Actions, actions related to executing data manipulation in a collection. For example in the insert action, the command that can be executed in that action is the insert command which can insert into documents.
Deployment Management Actions, actions relating to changes in database configuration. Some actions that fall into the Deployment Management category are cpuProfiler, storageDetails, killOp.
Replication Actions, actions relating to the execution of database replication resources such as replSetConfigure, replSetHeartbeat.
Server Administration Actions, actions related to commands from server administration resources on mongoDB such as logrotate actions that are used to rotate log databases at the operating system level.
Sharding Actions, actions related to commands from database sharding databases such as addShard to add new shard nodes.
Session Actions, actions related to resource sessions in a database such as listSessions, killAnySession.
Diagnostic Actions, actions related to the diagnosis of resources such as dbStats to find out the latest conditions in the database.
Free Monitoring Actions, actions related to monitoring in the database.

Managing MongoDB User & Roles

You can create a user and then assign the user to built-in roles, for example as follows:

db.createUser( {

user: "admin",

pwd: "thisIspasswordforAdmin",

roles: [ { role: "root", db: "admin" } ]

} );

In the script above, meaning the admin user will be made with a password that has been defined with builtin root roles, where the role is included in the Superuser category.

Besides that, you can assign more than one roles to a user, here is an example:

db.createUser(

{user:'businessintelligence', 

pwd:'BIpassw0rd', 

roles:[{'role':'read', 'db':'oltp'}, { 'role':'readWrite', 'db':'olapdb'}]

});

Business intelligence users have 2 roles, first the read roles in the oltp database, and the readWrite roles in the olapdb database.

Creating user defined roles can use the db.createRole () command. You must determine the purpose of creating the role so that you can determine what actions will be in that role. The following is an example of making a role for monitoring the Mongodb database :

use admin

db.createRole(

   {

     role: "RoleMonitoring",

     privileges: [

       { resource: { cluster: true }, actions: [ "serverStatus" ] }

     ],

     roles: []

   }

)

Then we can assign the user defined role to the user that we will create, can use the following command:

db.createUser( {

user: "monuser",

pwd: "thisIspasswordforMonitoring",

roles: [ { role: "RoleMonitoring", db: "admin" } ]

} );

Meanwhile, to assign the role to an existing user, you can use the following command:

db.grantRolesToUser(

    "existingmonuser",

    [

      { role: "RoleMonitoring", db: "admin" }

    ]

)

To revoke an existing user of a role, you can use the following command :

db.revokeRolesFromUser(

    "oldmonguser",

    [

      { role: "RoleMonitoring", db: "admin" }

    ]

)

By using user defined roles, we can create roles as we wish according to the actions we will take on those roles, such as roles to restrict users can only delete rows on certain databases.

Conclusion

The application of access rights can improve security. Mapping roles and users in the database makes it easy for you to manage user access.

Make sure all of this information regarding roles and rights are documented properly with restrictive access to the document. This helps you share the information to the other DBA or support personnel and is handy for audits and troubleshooting.

Tags:

↧

Tips for Managing PostgreSQL Remotely

April 30, 2020, 12:26 pm

≫ Next: My Favorite PostgreSQL Extensions - Part Two

≪ Previous: An Overview of MongoDB User Management

A wide range of resources are available for you when managing your PostgreSQL database clusters remotely. With the right tools managing it remotely is not a difficult task.

Using fully-managed services for PostgreSQL offers an observability that can deliver most of what you need to manage your database. They provide you with an alerting system, metrics, automation of time-consuming system administration tasks, managing your backups, etc.

When running on-prem it’s a different challenge. That's what we'll cover in this blog. We'll share tips on managing your PostgreSQL database cluster remotely.

Database Observability

The term observability might not be a familiar thing to some folks. Observability is not a thing of the past, it's the trend when managing your databases (or even PaaS or SaaS applications). Observability deals with monitoring, but to some extent it covers the ability to determine the state of your database health and performance and has a proactive and reactive capability which decides based on a certain status of your database nodes.

A good example of this is in ClusterControl. When ClusterControl detects warnings based on the checks on a given configuration it will send alerts to the provided channels. These can be setup and customized by the system or Database Administrator.

If your primary database has been degraded and unable to process transactions (either read or writes) ClusterControl will react accordingly and start to trigger a failover so that a new node can process the unwarranted cause of overflowing traffic. While this occurs, ClusterControl can notify the engineers what happened by triggering alarms and sending alerts. Logs are also centralized and which investigation and diagnostic tasks can be done in one place, allowing you to provide a quick result.

Although this might not mean that ClusterControl is a complete package for Observability, it is one of the powerful tools. There are tools that are more architectured also to manage especially in containerized environments such as Rancher mixed with Datadog.

How Does This Help You In Managing Remotely?

One basic principle of management is to have peace of mind. If a problem occurs, the tools you are using for observability must be able to notify you via email, sends you SMS, or through a pager applications (like PagerDuty) to alert you to the status of your databases cluster,

or you can receive alerts such like below...

It is very important that it notifies you when changes occur. You can then improve and analyze the state of your infrastructure and avoid any impacts that can affect the business.

Database Automation

It is very important that most of the time-consuming tasks are automated. Automation allows you to downsize the manpower. What does it mean to automate your PostgreSQL database clusters?

Failover

Failover is an automatic approach that occurs when an unprecedented incident occurs (such as a failure on hardware, a system crash, power loss in your main primary node, or a network loss within the entire data center). Your failover capacity must be regularly tested and follow industry standard practices. The service discovery of an internal failure must go to the point that it has been determined as true and it's actually happening.

In ClusterControl, when an incident occurs it triggers the failover mechanism and promotes the most updated standby node and then triggers alarms as seen below...

Then, it works in the background for a failover as you have seen below, the progress is on the move.

leaving the result as it finishes below...

Backup Scheduling

Backups are a very important part of Disaster and Recovery Planning (DRP). Backups serve as your backbone when your cluster data goes adrift after a split brain or network partition encounters. There are certain occasions where pg_rewind can be beneficial also but automation of your backups are always very important to avoid any such huge loss of data and lesser RPO and RTO.

In ClusterControl you can take or create a backup without any special tools or add utility work to script an automated backup. All are there and it will be up to your organization when the backup will take place and what are the policies of your backup including its retention. In fact, the most important thing here is, backup shall not interfere with your production environment and shall not lockup your nodes when backup takes place.

Backup verification plays also a very important role here. Rest assured, your backup must be a valid type of backup and is a reliable copy when crisis takes place. Adding the mechanism to store your backup not only in your premises or data center, but also store it elsewhere securely like in the cloud or to AWS S3 or Google Cloud Storage for example.

With ClusterControl, this has been taken easily and single handedly all in the platform by just following the GUI as shown below,

This allows you to pick up the backup method you choose, store it in the cloud to add more backup retention and assurance by spreading your backup copy not only in one source but also in the cloud. Then, you have an option to verify the backup once it's finishing creating the backup to verify if it's a valid one or not. Part of it is also you can choose to encrypt your backup which is a very important practice when storing your data at rest and complying security regulatory guidelines.

Database Security

Security is usually the majority's primary concern when it comes to managing your PostgreSQL database cluster remotely. Who will be able to access the database remotely or should it be only local? How to add security restrictions and how to manage the users and review the user's permission by a Security Analyst. It is very important to have a more set in place and provide a clear picture of your architecture so it can be dissected where are the loopholes and what are the necessary things to improve or tighten the security.

ClusterControl provides you an overview of and management of your PostgreSQL users and provides you a visualization and an editor for your pg_hba.conf, which manages how the users can be authenticated.

For User Management it provides an overview of the list of users and it's privileges in the database cluster. It also allows you here to modify or change the user's privileges if it's not in accordance to your security and company guidelines. Managing remotely requires that all of your users must have specific permissions and roles and when it can only be used or accessed and limits the role to avoid damage in your database.

It's also very important in your PostgreSQL to review and verify that there's no lapses with the authentication of the user. When it can be allowed and its scope to be able to connect to the servers. It is best that this is visualized like we have below,

This allows you to easily verify and avoid the authentication overlooked for such possible loopholes that an attacker might be able to log in due to weak rules in authentication.

Using SSL and encryption adds more security and robustness when your database is accessed remotely. But if you are accessing your database remotely outside your organization premise, it is best to encapsulate your data such as logging in through a VPN. You can check out our blog on Multi-DC PostgreSQL: Setting Up a Standby Node at a Different Geo-Location Over a VPN.

Centralized Database Logs

Centralization of aggregated logs provides you a very convenient way to investigate and implement a security analysis tool to understand your database clusters and how they behave. This is very beneficial when managing remote databases. Some common approaches are using Logstash using the ELK stack or the powerful open-source management for logs, Graylog.

Why is it Important to Centralize Your Database Logs?

In case you need to investigate a cluster-wide problem and see what has been going through your database clusters, proxies, or load balancers. It is very convenient to just look upon one place. Some very rich and powerful tools like I mentioned above let you search dynamically and in real time. They also provide metrics and graphs which is a very convenient way for analysis.

With ClusterControl, there's a convenience provided when accessing the logs. Although the logs are not collected and stored centrally, it offers you an overview and ability to read the logs. See below...

You may even review the jobs of what did ClusterControl detected and had acted either based on the Alarms or going through the Jobs just like below,

Conclusion

Managing your PostgreSQL database clusters remotely can be daunting, especially when it comes to security, monitoring, and failover. If you have the right tools, industry standards, and best practices for implementation, security, and observability then you can have peace of mind when you manage your database; regardless of your location.

Tags:

↧

My Favorite PostgreSQL Extensions - Part Two

May 1, 2020, 10:38 am

≫ Next: Amazon RDS for PostgreSQL Alternatives - ClusterControl for PostgreSQL

≪ Previous: Tips for Managing PostgreSQL Remotely

This is the second part of my blog “My Favorite PostgreSQL Extensions” wherein I had introduced you to two PostgreSQL extensions, postgres_fdw and pg_partman. In this part I will explore three more.

pgAudit

The next PostgreSQL extension of interest is for the purpose of satisfying auditing requirements by various government, financial and other certifying bodies such as ISO, BSI, and FISCAM, etc. The standard logging facility which PostgreSQL offers natively with log_statement = all is useful for monitoring, but it does not provide the details required to comply or face the audit. The pgAudit extension focuses on the details of what happened under the hood, while a database was satisfying an application request.

An audit trail or audit log is created and updated by a standard logging facility provided by PostgreSQL, which provides detailed session and/or object audit logging. The audit trail created by pgAudit can get enormous in size depending on audit settings, so care must be observed to decide on what and how much auditing is required beforehand. A brief demo in the following section shows how pgAudit is configured and put to use.

The log trail is created within the PostgreSQL database cluster log found in the PGDATA/log location but the audit log messages are prefixed with a “AUDIT: “ label to distinguish between regular database background messages and audit log records.

Demo

The official documentation of pgAudit explains that there exists a separate version of pgAudit for each major version of PostgreSQL in order to support new functionality introduced in every PostgreSQL release. The version of PostgreSQL in this demo is 11, so the version of pgAudit will be from the 1.3.X branch. The pgaudit.log is the fundamental parameter to be set that controls what classes of statements will be logged. It can be set with a SET for a session level or within the postgresql.conf file to be applied globally.

postgres=# set pgaudit.log = 'read, write, role, ddl, misc';

SET



cat $PGDATA/pgaudit.log

pgaudit.log = 'read, write, role, ddl, misc'



db_replica=# show pgaudit.log;

         pgaudit.log

------------------------------

 read, write, role, ddl, misc

(1 row)



2020-01-29 22:51:49.289 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,3,1,MISC,SHOW,,,show pgaudit.log;,<not logged>



db_replica=# create table t1 (f1 integer, f2 varchar);

CREATE TABLE



2020-01-29 22:52:08.327 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,4,1,DDL,CREATE TABLE,,,"create table t1 (f1 integer, f2 varchar);",<not logged>



db_replica=#  insert into t1 values (1,'one');

INSERT 0 1

db_replica=#  insert into t1 values (2,'two');

INSERT 0 1

db_replica=#  insert into t1 values (3,'three');

INSERT 0 1

2020-01-29 22:52:19.261 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,5,1,WRITE,INSERT,,,"insert into t1 values (1,'one');",<not logged>

20-01-29 22:52:38.145 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,6,1,WRITE,INSERT,,,"insert into t1 values (2,'two');",<not logged>

2020-01-29 22:52:44.988 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,7,1,WRITE,INSERT,,,"insert into t1 values (3,'three');",<not logged>



db_replica=# select * from t1 where f1 >= 2;

 f1 |  f2

----+-------

  2 | two

  3 | three

(2 rows)



2020-01-29 22:53:09.161 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,9,1,READ,SELECT,,,select * from t1 where f1 >= 2;,<not logged>



db_replica=# grant select on t1 to usr_replica;

GRANT



2020-01-29 22:54:25.283 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,13,1,ROLE,GRANT,,,grant select on t1 to usr_replica;,<not logged>



db_replica=# alter table t1 add f3 date;

ALTER TABLE



2020-01-29 22:55:17.440 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,23,1,DDL,ALTER TABLE,,,alter table t1 add f3 date;,<not logged>



db_replica=# checkpoint;

CHECKPOINT



2020-01-29 22:55:50.349 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,33,1,MISC,CHECKPOINT,,,checkpoint;,<not logged>



db_replica=# vacuum t1;

VACUUM



2020-01-29 22:56:03.007 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,34,1,MISC,VACUUM,,,vacuum t1;,<not logged>



db_replica=# show log_statement;

 log_statement

---------------

 none



2020-01-29 22:56:14.740 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,36,1,MISC,SHOW,,,show log_statement;,<not logged>

The log entries, as shown in the demo above, are only written to the server background logfile when the parameter log_statement is set, however in this case it is not configured but the audit messages are written by virtue of pgaudit.log parameter as evidenced in the demo. There are more powerful options available to fulfill all your database auditing requirements within PostgreSQL, which can be configured by following the official documentation of pgaudit here or on the github repository.pg_repack

This is a favourite extension among many PostgreSQL engineers that are involved directly with managing and keeping the general health of a PostgreSQL cluster. The reason for that will be discussed a little later but this extension offers the functionality to remove database bloat within a PostgreSQL database, which is one of the nagging concerns among very large PostgreSQL database clusters requiring database re-org.

As a PostgreSQL database undergoes constant and heavy WRITES (updates & deletes), the old data is marked as deleted while the new version of the row gets inserted, but the old data is not actually wiped from a data block. This requires a periodic maintenance operation called vacuuming, which is an automated procedure that executes in the background that clears all the “marked as deleted” rows. This process is sometimes referred to as garbage collection in colloquial terms.

The vacuuming process generally gives way to the database operations during busier times. The least restrictive manner of vacuuming in favour of database operations results in a large number of “marked as deleted” rows causing databases to grow out of proportion referred to as “database bloat”. There is a forceful vacuuming process called VACUUM FULL, but that results in acquiring an exclusive lock on the database object being processed, stalling database operations on that object.

pg_repack

It is for this reason that pg_repack is a hit among PostgreSQL DBAs and engineers, because it does the job of a normal vacuuming process but offers an efficiency of VACUUM FULL by not acquiring an exclusive lock on a database object, in short, it works online. The official documentation here explains more about the other methods of reorganizing a database but a quick demo as below will put things in appropriate light for better understanding. There is a requirement that the target table must have at least one column defined as a PRIMARY KEY, which is a general norm in most of the production database setups.

Demo

The basic demo shows the installation and usage of pg_repack in a test environment. This demo uses the version 1.4.5 of pg_repack which is the latest version of this extension at the time of publishing this blog. A demo table t1 initially has 80000 rows which undergoes a massive operation of delete, which deletes every 5th row of the table. An execution of pg_repack shows the size of the table before and after.

mydb=# CREATE EXTENSION pg_repack;

CREATE EXTENSION



mydb=# create table t1 (no integer primary key, f_name VARCHAR(20), l_name VARCHAR(20), d_o_b date);

CREATE TABLE

mydb=# insert into t1 (select generate_series(1,1000000,1),'a'||

mydb(# generate_series(1,1000000,1),'a'||generate_series(1000000,1,-1),

mydb(# cast( now() - '1 year'::interval * random()  as date ));

INSERT 0 1000000



mydb=# SELECT pg_size_pretty( pg_total_relation_size('t1'));

 pg_size_pretty

----------------

 71 MB

(1 row)



mydb=# CREATE or replace FUNCTION delete5() RETURNS void AS $$

mydb$# declare

mydb$# counter integer := 0;

mydb$# BEGIN

mydb$#

mydb$#  while counter <= 1000000

mydb$# loop

mydb$# delete from t1 where no=counter;

mydb$# counter := counter + 5;

mydb$# END LOOP;

mydb$# END;

mydb$# $$ LANGUAGE plpgsql;

CREATE FUNCTION

The delete5 function deletes 200000 rows from t1 table using a counter which increments 5 counts

mydb=# select delete5();

 delete5

------



(1 row)

mydb=# SELECT pg_size_pretty( pg_total_relation_size('t1'));

 pg_size_pretty

----------------

 71 MB

(1 row)



$ pg_repack -t t1 -N -n -d mydb -p 5433

INFO: Dry run enabled, not executing repack

INFO: repacking table "public.t1"



$ pg_repack -t t1 -n -d mydb -p 5433

INFO: repacking table "public.t1"



mydb=# SELECT pg_size_pretty( pg_total_relation_size('t1'));

 pg_size_pretty

----------------

 57 MB

(1 row)

As shown above, the original size of the table does not change after executing the delete5 function, which shows that the rows still exist in the table. The execution of pg_repack clears those ‘marked as deleted’ rows from the t1 table bringing down the size of t1 table to 57 MBs. One other good thing about pg_repack is an option for dry run with -N flag, using which you can check what will be executed during an actual run.

HypoPG

The next interesting extension is identical to a popular concept called invisible indexes among proprietary database servers. The HypoPG extension enables a DBA to see the effect of introducing a hypothetical index (which does not exist) and whether it will improve the performance of one or more queries, and hence the name HypoPG.

The creation of a hypothetical index does not require any CPU or disk resources, however, it consumes a connection’s private memory. As the hypothetical index is not stored in any database catalog tables, so there is no impact of table bloat. It is for this reason, that a hypothetical index cannot be used in an EXPLAIN ANALYZE statement while a plain EXPLAIN is a good way to assess if a potential index will be used by a given problematic query. Here is a quick demo to explain how HypoPG works.

Demo

I am going to create a table containing 100000 rows using generate_series and execute a couple of simple queries to show the difference in cost estimates with and without hypothetical indexes.

olap=# CREATE EXTENSION hypopg;

CREATE EXTENSION



olap=# CREATE TABLE stock (id integer, line text);

CREATE TABLE



olap=# INSERT INTO stock SELECT i, 'line ' || i FROM generate_series(1, 100000) i;

INSERT 0 100000



olap=# ANALYZE STOCK;

ANALYZE



olap=#  EXPLAIN SELECT line FROM stock WHERE id = 1;

                       QUERY PLAN

---------------------------------------------------------

 Seq Scan on stock  (cost=0.00..1791.00 rows=1 width=10)

   Filter: (id = 1)

(2 rows)

olap=# SELECT * FROM hypopg_create_index('CREATE INDEX ON stock (id)') ;

 indexrelid |       indexname

------------+-----------------------

      25398 | <25398>btree_stock_id

(1 row)



olap=# EXPLAIN SELECT line FROM stock WHERE id = 1;

                                     QUERY PLAN

------------------------------------------------------------------------------------

 Index Scan using <25398>btree_stock_id on stock  (cost=0.04..8.06 rows=1 width=10)

   Index Cond: (id = 1)

(2 rows)



olap=# EXPLAIN ANALYZE SELECT line FROM stock WHERE id = 1;

                                             QUERY PLAN

----------------------------------------------------------------------------------------------------

 Seq Scan on stock  (cost=0.00..1791.00 rows=1 width=10) (actual time=0.028..41.877 rows=1 loops=1)

   Filter: (id = 1)

   Rows Removed by Filter: 99999

 Planning time: 0.057 ms

 Execution time: 41.902 ms

(5 rows)



olap=# SELECT indexname, pg_size_pretty(hypopg_relation_size(indexrelid))

olap-#   FROM hypopg_list_indexes() ;

       indexname       | pg_size_pretty

-----------------------+----------------

 <25398>btree_stock_id | 2544 kB

(1 row)



olap=# SELECT pg_size_pretty(pg_relation_size('stock'));

 pg_size_pretty

----------------

 4328 kB

(1 row)

The above exhibit shows how the estimated total cost can be reduced from 1791 to 8.06 by adding an index to the “id” field of the table to optimize a simple query. It also proves that the index is not really used when the query is executed with an EXPLAIN ANALYZE which executes the query in real time. There is also a way to find out approximately how much disk space the index occupies using the hypopg_list_indexes function of the extension.

The HypoPG has a few other functions to manage hypothetical indexes and in addition to that, it also offers a way to find out if partitioning a table will improve performance of queries fetching a large dataset. There is a hypothetical partitioning option of HypoPG extension and more of it can be followed by referring to the official documentation.

Conclusion

As stated in part one, PostgreSQL has evolved over the years in only getting bigger, better and faster with rapid development both in the native source code as well as plug and play extensions. An open source version of the new PostgreSQL can be most suitable for plenty of IT shops that are running one of the major proprietary database servers, in order to reduce their IT CAPEX and OPEX.

There are plenty of PostgreSQL extensions that offer features ranging from monitoring to high-availability and from scaling to dumping binary datafiles into human readable format. It is hoped that the above demonstrations have shed enormous light on the maximum potential and power of a PostgreSQL database.

Tags:

↧

Amazon RDS for PostgreSQL Alternatives - ClusterControl for PostgreSQL

May 4, 2020, 11:39 am

≫ Next: PGTune Alternatives - ClusterControl PostgreSQL Configuration

≪ Previous: My Favorite PostgreSQL Extensions - Part Two

Amazon RDS for PostgreSQL is a managed service for PostgreSQL available as part of Amazon Web Services. It comes with a handful of management functions that are intended to reduce the workload of managing the databases. Let’s take a look at this functionality and see how it compares with options available in ClusterControl.

PostgreSQL Deployment

PostgreSQL RDS

PostgreSQL RDS supports numerous versions of PostgreSQL, starting from 9.5.2 up to 12.2:

For Aurora it is 9.6.8 to 11.6:

You can pick if the cluster should be highly available or not at the deployment time.

ClusterControl

ClusterControl supports PostgreSQL in versions 9.6, 10, 11 and 12:

You can deploy a master and multiple slaves using streaming replication.

ClusterControl supports asynchronous and semi-synchronous replication. You can deploy the rest of the high availability stack (i.e. load balancers) at any point in time.

PostgreSQL Backup Management

PostgreSQL RDS

Amazon RDS supports snapshots as the way of taking backups. You can rely on the automated backups or take backups manually at any time.

Restoration is done as a separate cluster. Point-in-time recovery is possible with up to one second granularity. Backups can also be encrypted.

ClusterControl

ClusterControl supports several backup methods for PostgreSQL.

It is possible to store the backup locally or upload it to the cloud. Point-in-time recovery is supported for most of the backup methods.

When restoring, it is possible to do it on an existing cluster, create a new cluster or restore it on a standalone host. It is possible to schedule a backup verification job. Backups can be encrypted.

PostgreSQL Database Monitoring

PostgreSQL RDS

RDS comes with features that provide visibility into your database operations.

Using Performance Insights, you can check the state of the nodes in CloudWatch:

ClusterControl

ClusterControl provides insight into the database operations using the Overview section:

It is also possible to enable agent-based monitoring for more detailed dashboards:

PostgreSQL Scalability

PostgreSQL RDS

In couple of clicks you can scale your RDS cluster by adding replicas to RDS or readers to Aurora:

ClusterControl

ClusterControl provides an easy way to scale up your PostgreSQL cluster by adding a new replica:

PostgreSQL High Availability (HA)

PostgreSQL RDS

Aurora clusters can benefit from a load balancer deployed in front of them. Regular RDS clusters do not have this feature available.

In the Aurora cluster it is possible to promote readers to become master. For RDS clusters you can failover to a read replica but then the replica will become a new node, without any other replicas. You would have to deploy new replicas after the failover completes.

It is possible to deploy highly available clusters for both RDS and Aurora. Failed master nodes are handled automatically, by promotion of one of the available replicas.

ClusterControl

ClusterControl can be used to deploy a full high availability stack that consists of master - slave database cluster, load balancers (HAProxy) and keepalived to provide VIP across load balancers.

It is possible to promote a slave. If the master is unavailable, one of the slaves will be promoted as a new master and remaining slaves will be slaved off the new master.

PostgreSQL Configuration Management

PostgreSQL RDS

In PostgreSQL RDS configuration management can be performed using parameter groups. You can create custom groups with your custom configuration and then assign them to new or existing instances.

This lets you share the same configuration across multiple instances or across whole clusters. There is a separate parameter group for Aurora and RDS. Some of the configuration settings cannot be configured, especially the ones related to backups and replication.

ClusterControl

ClusterControl provides a way of managing the configuration of the PostgreSQL nodes. You can change given parameter on some or all of the nodes:

It is also possible to make the configuration change by directly modifying the configuration files:

In ClusterControl you have full control over the configuration.

Conclusion

These are the main features that can be compared between ClusterControl and Amazon RDS for PostgreSQL.

There are also other features that ClusterControl provides that are not available in RDS: Query Monitoring, User Management, & Operational Reports to name a few.

If you are interested in trying them out, you can download ClusterControl for free and see for yourself how it can help you with managing PostgreSQL clusters.

Tags:

↧

PGTune Alternatives - ClusterControl PostgreSQL Configuration

May 5, 2020, 11:23 am

≫ Next: pgDash Alternatives - PostgreSQL Database Monitoring with ClusterControl

≪ Previous: Amazon RDS for PostgreSQL Alternatives - ClusterControl for PostgreSQL

If you are new to PostgreSQL the most common challenge you face is about how to tune up your database environment.

When PostgreSQL is installed it automatically produces a basic postgresql.conf file. This configuration file is normally kept inside the data directory depending on the operating system you are using. For example, in Ubuntu PostgreSQL places the configurations (pg_hba.conf, postgresql.conf, pg_ident.conf) inside /etc/postgresql directory. Before you can tune your PostgreSQL database, you first have to locate the postgresql.conf files.

But what are the right settings to use? and what are the values set to initially? Using external tools such as PGTune (and alternative tools like ClusterControl) will help you solve this specific problem.

What is PGTune?

PGTune is a configuration wizard which was originally created by Greg Smith from 2ndQuadrant. It's based on a Python script which is, unfortunately, no longer supported. (It does not support newer versions of PostgreSQL.) It then transitioned into pgtune.leopard.in.ua (which is based on the original PGTune) and is now a configuration wizard you can use for your PG database configuration settings.

PGTune is used to calculate configuration parameters for PostgreSQL based on the maximum performance for a given hardware configuration. It isn't a silver bullet though, as many settings depend not only on the hardware configuration, but also on the size of the database, the number of clients and the complexity of queries.

How to Use PGTune

The old version of PGTune was based on python script which you can invoked via shell command (using Ubuntu):

root@debnode4:~/pgtune-master# $PWD/pgtune -L -T Mixed -i /etc/postgresql/9.1/main/postgresql.conf | sed -e '/#.*/d' | sed '/^$/N;/^\n/D' 

stats_temp_directory = '/var/run/postgresql/9.1-main.pg_stat_tmp'

datestyle = 'iso, mdy'

default_text_search_config = 'pg_catalog.english'

default_statistics_target = 100

maintenance_work_mem = 120MB

checkpoint_completion_target = 0.9

effective_cache_size = 1408MB

work_mem = 9MB

wal_buffers = 16MB

checkpoint_segments = 32

shared_buffers = 480MB

But the new one is much more easier and way convenient since you can just access via browser. Just go to https://pgtune.leopard.in.ua/. A good example is like below:

All you need to do is specify the following fields below:

DB version - the version of your PostgreSQL. It supports versions of PostgreSQL from 9.2, 9.3, 9.4, 9.5, 9.6, 10, 11, and 12.
OS Type- the type of OS (Linux, OS X, Windows)
DB Type - the database type which is mainly what kind of transactional processing your database will handle (Web Application, OLTP, Data Warehousing, Desktop Application, Mixed Type of Applications)
Total Memory (RAM) - The total memory that your PG instance will handle. Need to specify it in GiB.
Number of CPUs- Number of CPUs, which PostgreSQL can use CPUs = threads per core * cores per socket * sockets
Number of Connections - Maximum number of PostgreSQL client connections
Data Storage- Type of data storage device which you can choose from SSD, HDD, or SAN based storage.

Then hit the Generate button. Alternatively, you can also run ALTER SYSTEM statement which generates postgresql.auto.conf, but it won't take until you hit a PostgreSQL restart.

How Does It Sets The Values

The algorithm for this tool can be basically found here in configuration.js. It does share the same algorithm from the old PGTune starting here pgtune#L477. For example, versions of PostgreSQL < 9.5 supports checkpoint_segments, but PG >= 9.5 uses the min_wal_size and max_wal_size.

Setting the checkpoint_segments or min_wal_size/max_wal_size depends on what type of PostgreSQL version and the DB type of database application transaction. See how in the snippet below:

if (dbVersion < 9.5) {

  return [

    {

      key: 'checkpoint_segments',

      value: ({

        [DB_TYPE_WEB]: 32,

        [DB_TYPE_OLTP]: 64,

        [DB_TYPE_DW]: 128,

        [DB_TYPE_DESKTOP]: 3,

        [DB_TYPE_MIXED]: 32

      }[dbType])

    }

  ]

} else {

  return [

    {

      key: 'min_wal_size',

      value: ({

        [DB_TYPE_WEB]: (1024 * SIZE_UNIT_MAP['MB'] / SIZE_UNIT_MAP['KB']),

        [DB_TYPE_OLTP]: (2048 * SIZE_UNIT_MAP['MB'] / SIZE_UNIT_MAP['KB']),

        [DB_TYPE_DW]: (4096 * SIZE_UNIT_MAP['MB'] / SIZE_UNIT_MAP['KB']),

        [DB_TYPE_DESKTOP]: (100 * SIZE_UNIT_MAP['MB'] / SIZE_UNIT_MAP['KB']),

        [DB_TYPE_MIXED]: (1024 * SIZE_UNIT_MAP['MB'] / SIZE_UNIT_MAP['KB'])

      }[dbType])

    },

    {

      key: 'max_wal_size',

      value: ({

        [DB_TYPE_WEB]: (4096 * SIZE_UNIT_MAP['MB'] / SIZE_UNIT_MAP['KB']),

        [DB_TYPE_OLTP]: (8192 * SIZE_UNIT_MAP['MB'] / SIZE_UNIT_MAP['KB']),

        [DB_TYPE_DW]: (16384 * SIZE_UNIT_MAP['MB'] / SIZE_UNIT_MAP['KB']),

        [DB_TYPE_DESKTOP]: (2048 * SIZE_UNIT_MAP['MB'] / SIZE_UNIT_MAP['KB']),

        [DB_TYPE_MIXED]: (4096 * SIZE_UNIT_MAP['MB'] / SIZE_UNIT_MAP['KB'])

      }[dbType])

    }

  ]

}

Just to explain short, it detects if dbVersion < 9.5, then it determines the suggested values for variables checkpoint_segments or min_wal_size/max_wal_size based on the type of dbType value set during the web UI form.

Basically, you can learn more about the algorithm on how it decides to suggest the values by looking at this script configuration.js.

PostgreSQL Configuration Tuning with ClusterControl

If you are using ClusterControl to create, build, or import a cluster, it automatically does an initial tuning based on the given hardware specs. For example, creating a cluster with the following job specs below,

{

  "command": "create_cluster",

  "group_id": 1,

  "group_name": "admins",

  "job_data": {

    "api_id": 1,

    "cluster_name": "pg_11",

    "cluster_type": "postgresql_single",

    "company_id": "1",

    "datadir": "/var/lib/postgresql/11/",

    "db_password": "dbapgadmin",

    "db_user": "dbapgadmin",

    "disable_firewall": true,

    "disable_selinux": true,

    "generate_token": true,

    "install_software": true,

    "nodes": [

      {

        "hostname": "192.168.30.40",

        "hostname_data": "192.168.30.40",

        "hostname_internal": "",

        "port": "5432"

      },

      {

        "hostname": "192.168.30.50",

        "hostname_data": "192.168.30.50",

        "hostname_internal": "",

        "port": "5432",

        "synchronous": false

      }

    ],

    "port": "5432",

    "ssh_keyfile": "/home/vagrant/.ssh/id_rsa",

    "ssh_port": "22",

    "ssh_user": "vagrant",

    "sudo_password": "",

    "user_id": 1,

    "vendor": "default",

    "version": "11"

  },

  "user_id": 1,

  "user_name": "paul@severalnines.com"

}

Provides me the following tuning as shown below:

[root@ccnode ~]# s9s job --log  --job-id 84919 | sed -n '/stat_statements/,/Writing/p'

192.168.30.40:5432: Enabling stat_statements plugin.

192.168.30.40:5432: Setting wal options.

192.168.30.40:5432: Performance tuning.

192.168.30.40: Detected memory: 1999MB.

192.168.30.40:5432: Selected workload type: mixed

Using the following fine-tuning options:

  checkpoint_completion_target: 0.9

  effective_cache_size: 1535985kB

  maintenance_work_mem: 127998kB

  max_connections: 100

  shared_buffers: 511995kB

  wal_keep_segments: 32

  work_mem: 10239kB

Writing file '192.168.30.40:/etc/postgresql/11/main/postgresql.conf'.

192.168.30.50:5432: Enabling stat_statements plugin.

192.168.30.50:5432: Setting wal options.

192.168.30.50:5432: Performance tuning.

192.168.30.50: Detected memory: 1999MB.

192.168.30.50:5432: Selected workload type: mixed

Using the following fine-tuning options:

  checkpoint_completion_target: 0.9

  effective_cache_size: 1535985kB

  maintenance_work_mem: 127998kB

  max_connections: 100

  shared_buffers: 511995kB

  wal_keep_segments: 32

  work_mem: 10239kB

Writing file '192.168.30.50:/etc/postgresql/11/main/postgresql.conf'.

Additionally, it also tunes up your system or kernel parameters such as,

192.168.30.50:5432: Tuning OS parameters.

192.168.30.50:5432: Setting vm.swappiness = 1.

Conclusion

The ClusterControl tuning parameters are also based on the algorithm shared in pgtune#L477. It's not fancy, but you can change it to whatever values you would like. With these setting values, it allows you to have a raw start which is ready enough to handle a production load based on the initial given values.

Tags:

↧

pgDash Alternatives - PostgreSQL Database Monitoring with ClusterControl

May 6, 2020, 12:27 pm

≫ Next: MySQL Workbench Alternatives - ClusterControl Configuration Management

≪ Previous: PGTune Alternatives - ClusterControl PostgreSQL Configuration

Database monitoring and alerting is a particularly important part of database operations, as we must understand the current state of the database. If you don’t have good database monitoring in place, you will not be able to find problems in the database quickly. This could then result in downtime.

One tool available for monitoring is pgDash, a SaaS application for monitoring and alerting for the PostgreSQL database.

pgDash Installation Procedure

Registration for pgDash can be done via the website or can also be downloaded (self-hosted) as provided by RapidLoop.

The installation process of pgDash is simple, we just need to download the package needed from pgDash to be configured on the host / database server side.

You can run the process as follow:

[postgres@n5 ~]$ curl -O -L https://github.com/rapidloop/pgmetrics/releases/download/v1.9.0/pgmetrics_1.9.0_linux_amd64.tar.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100   647  100   647    0     0    965      0 --:--:-- --:--:-- --:--:--   964

100 3576k  100 3576k    0     0   189k      0  0:00:18  0:00:18 --:--:--  345k

[postgres@n5 ~]$ tar xvf pgmetrics_1.9.0_linux_amd64.tar.gz

pgmetrics_1.9.0_linux_amd64/LICENSE

pgmetrics_1.9.0_linux_amd64/README.md

pgmetrics_1.9.0_linux_amd64/pgmetrics

[postgres@n5 ~]$ curl -O -L https://github.com/rapidloop/pgdash/releases/download/v1.5.1/pgdash_1.5.1_linux_amd64.tar.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100   644  100   644    0     0   1370      0 --:--:-- --:--:-- --:--:--  1367

100 2314k  100 2314k    0     0   361k      0  0:00:06  0:00:06 --:--:--  560k

[postgres@n5 ~]$ tar xvf pgdash_1.5.1_linux_amd64.tar.gz

pgdash_1.5.1_linux_amd64/LICENSE

pgdash_1.5.1_linux_amd64/README.md

pgdash_1.5.1_linux_amd64/pgdash

[postgres@n5 ~]$ ./pgmetrics_1.9.0_linux_amd64/pgmetrics --no-password -f json ccdb | ./pgdash_1.5.1_linux_amd64/pgdash -a NrxaHk3JH2ztLI06qQlA4o report myserver1

Apart from pgDash you will need another package, pgmetrics, to be installed for monitoring. pgmetrics is an open source utility whose job is to collect information and statistics in the database needed by pgDash, while the job of pgdash is to send that information to the dashboard.

If you want to add more databases to the monitoring platform, you would need to repeat the above process for each database.

Although the installation of pgDash is simple, there are repetitive processes that can become a concern if there are additional databases that need to be monitored. You will most likely need to make an automation script for that.

pgDash Metrics

There are 3 main features under pgDash Dashboard, those are:

Dashboard: consists of sub-menus such as: Overview, Database, Queries, Backend, Locks, Tablespace, Replication, WAL Files, BG Writers, Vacuum, Roles, Configuration.
Tools: consists of sub-menus, such as Index Management, Tablespace Management, Diagnostics, and Top-K.
Alerts: consist of sub-menus such as Alerts & Change Alerts.

PostgreSQL Monitoring by ClusterControl

Monitoring conducted by ClusterControl uses the ssh method and direct connection from the controller node to the target database node in gathering information needed to be displayed on the dashboard.

ClusterControl also has an Agent Based Monitoring feature that can easily be activated. You can see it below...

ClusterControl will then carry out the prometheus installation process, node exporters, and PostgreSQL exporters, in the targeted database that aims to gather and to collect information required by the dashboard to display metrics.

If Agent Based Monitoring is active, any new target database will be automatically added and monitored by Agent Based Monitoring.

ClusterControl Dashboards

Here you can see information in the PostgreSQL Cluster Overview and System Information screens. In this function you can see detailed information such as db version, transaction ID, Last Checkpoint and Date and Time when the server is alive. This information is depicted below:

In System Information page, we can get the information such as Load Average, Memory Usage, Swap Usage, see the picture below:

Database: you can get the information such as db name, db size, number of tables, index and also tablespace.
Queries: you can monitor Calls, Disk Write, Disk Read, Buffer Hit from queries. Also, you can search any query that runs within a specific time period.
Backend: you can monitor current state of database backend, within this information, critical details are provided, such as backends waiting for the locks, other waiting backends, transaction open too long, backend idling in transaction. You can also see all the backends that run in the database.
Locks: You can check the number of total locks, locks not granted, and blocked queries.
Tablespace: provides information related to tablespace, ie. tablespace size, usage of Disk and Inodes.
Replications: you can monitor your Replication status in PostgreSQL database, start from Replication Slot, Incoming Replication, Outgoing Replication, Replication Publications, and Replication Subscriptions.
Wal Files: provides information related to WAL (Write Ahead Log) and also statistics eg: WAL File Counts, WAL Generation Rate, WAL Files Generated Each Hours.
BG Writers: provides information related to checkpoint database, buffer written, and parameters related to the Background Writer.
Vacuum Progress: contains information related to the Vacuum which runs in the database, also vacuum parameters.
Roles: contains information related to the roles which exist in the database including privileges.
Configuration: contains parameters in database PostgreSQL.

Inside Tools, there are sub-menus such as Index Management that provided information ie. Unused Index, Bloat Index, dan Index with Low Cache Hit Ratio. Tablespace Management provides information related to Tablespace dan other objects available under.

Diagnostics, to understand the potential issues that may occur through Top 10 Most Bloated Tables, Top 10 Most Bloated Indexes, List of Inactive Replication Slots, Top 10 Longest Running Transactions, etc.

ClusterControl has several metrics under separated menu, those are Overview, Nodes, Dashboard, Query Monitor, and Performance, see picture below :

When Agent Based Monitoring is enabled, hence all of the information related to statistics and other information related to the database will be stored in a time series database (prometheus). You can see those information in ClusterControl as depicted below :

In the Query Monitor, you can find Top Queries, Running Queries, Query Outliers, or Query Statistics menus. They provide information related to running query, top query, and statistics of the database. You can also configure slow queries and non-indexing queries.

On Performance, there are sub-menus such as DB Growth that can show information of database and table size statistics. Schema Analyzer provides information related to Redundant Index and Table without primary key.

PostgreSQL Alerting

There are two parts of alerting...

Alert Rules: alert rules play a major role, you can define limits as parameters that can trigger alarm to the DBA.
Third Party Integration: is an integration channel to the incident management platform for communication and collaboration such as: PagerDuty, OpsGenie, Slack, or via Email.

PgDash has many options of database parameters you can set related to the alert rule, divided in several layers starting from Server, Database, Table, Index, Tablespace, and Query. You can see those information in pgDash as depicted below...

As for the third party integration channel, pgDash has support for several channels such as Slack, Pagerduty, VictorOps, Xmatters, e-mail, or making their own webhooks so they can be consumed by other services.

The following is the appearance of the Third party Integration of pgDash :

In contrast to pgDash, ClusterControl has a broader and more general event alert option, starting with alerts related to the host, network, cluster, and database itself. The following are examples of event options that can be selected :

ClusterControl can select several database clusters in one event alert. Third party integration from ClusterControl supports several incident management and communication / collaboration tools such as PagerDuty, VictorOps, Telegram, OpsGenie, Slack, ServiceNow, or can create your own webhook.

In the alert rules section, both pgDash and ClusterControl have advantages and disadvantages. The advantage of pgDash is that you can set very detailed database alerts related to what will be sent, while the drawback is that you have to do these settings in each database (although there is a feature to import from other database configurations.

ClusterControl lacks detailed event alerts, only general database events, but ClusterControl can send alerts not only related to the database, but can send event alerts from nodes, clusters, networks, etc. Besides that you can set these alerts for several database clusters.

In the Third party Integration section, pgDash and ClusterControl both have support for various third party incident management and communication channels. Infact both of them can make their own webhook so that it can be consumed by other services (eg. Grafana).

Tags:

↧

MySQL Workbench Alternatives - ClusterControl Configuration Management

May 7, 2020, 11:23 am

≫ Next: NoSQL Data Streaming with MongoDB & Kafka

≪ Previous: pgDash Alternatives - PostgreSQL Database Monitoring with ClusterControl

MySQL configuration management consists of two major components - MySQL configuration files and runtime configuration. Applying configuration changes on the runtime environment can be done through MySQL server clients with no privilege for session variables but SUPER privileges for global variables. Applying the same configuration changes into MySQL configuration file is also necessary to make the changes persistent across MySQL restarts, otherwise the default values will get loaded during the startup.

In this blog post, we are going to look at ClusterControl Configuration Management as an alternative to MySQL Workbench configuration management.

MySQL Workbench Configuration Management

MySQL Workbench is a graphical client for working with MySQL servers and databases for server versions 5.x and higher. It is freely available and commonly being used by SysAdmins, DBAs and developers to perform SQL development, data modelling, MySQL server administration and data migration.

You can use MySQL Workbench to perform MySQL/MariaDB configuration management on a remote MySQL server. However, there are some initial steps required to enable this feature. From MySQL Workbench, select an existing connection profile and choose Configure Remote Management. You will be presented with a step-by-step configuration wizard to help you to set up remote management for the connection profile:

At the start, a connection attempt is made to determine the server version and operating system of the target machine. This allows connection settings to be validated and allows the wizard to pick a meaningful configuration preset. If this attempt fails you can still continue to the next step, where you can customize the settings further to suit the remote server environment.

Once the remote connection configuration is complete, double clicks on the connection profile to start connecting to the MySQL instance. Then, go to the Instance -> Options File to open the configuration manager section. You should see something similar to the following screenshot:

All existing configuration variables from the configuration file are pre-loaded into this configuration manager so you can see what options have been enabled with its respective values. Configurations are categorized to a number of sections - General, logging, InnoDB, networking and so on - which really helps us focus on specific features that we want to tweak or enable.

Once you are satisfied with the changes, and before clicking "Apply", make sure you choose the correct MySQL group section from the dropdown menu (right next to the Discard button). Once applied, you should see the configuration is applied to the MySQL server where a new line will appear (if it didn't exist) in the MySQL configuration file.

Note that clicking on the "Apply" button will not push the corresponding change into MySQL runtime. One has to perform restart operation on the MySQL server to load the new configuration changes by going to Instance -> Startup/Shutdown. This will take a hit on your database uptime.

To see all the loaded system status and variables, go to Management -> Status and System Variables:

ClusterControl Configuration Management

ClusterControl Configuration Manager can be accessed under Manage -> Configurations. ClusterControl pulls a number of important configuration files and displays them in a tree structure. A centralized view of these files is key to efficiently understanding and troubleshooting distributed database setups. The following screenshot shows ClusterControl's configuration file manager which listed out all related configuration files for this cluster in one single view with syntax highlighting:

As you can see from the screenshot above, ClusterControl understands MySQL"!include" parameter and will follow through all configuration files associated with it. For instance, there are two MySQL configuration files being pulled from host 192.168.0.21, /etc/my.cnf and /etc/my.cnf.d/secrets-backup.cnf. You can open multiple configuration files in another editor tab which make it easier to compare the content side-by-side. ClusterControl also pulls the last file modification information from the OS timestamp, as shown at the bottom right of the text editor.

ClusterControl eliminates the repetitiveness when changing a configuration option of a database cluster. Changing a configuration option on multiple nodes can be performed via a single interface and will be applied to the database node accordingly. When you click on "Change/Set Parameter", you can select the database instances that you would want to change and specify the configuration group, parameter and value:

You can add a new parameter into the configuration file or modify an existing parameter. The parameter will be applied to the chosen database nodes' runtime and into the configuration file if the option passes the variable validation process. Some variables might require a follow-up step like server restart or configuration reload, which will then be advised by ClusterControl.

All services configured by ClusterControl use a base configuration template available under /usr/share/cmon/templates on the ClusterControl node. You can directly modify the file to suit your deployment policy however, this directory will be replaced after a package upgrade. To make sure your custom configuration template files persist across upgrades, store your template files under /etc/cmon/templates directory. When ClusterControl loads up the template file for deployment, files under /etc/cmon/templates will always have higher priority over the files under /usr/share/cmon/templates. If two files having identical names exist on both directories, the one located under /etc/cmon/templates will be used.

Go to Performance -> DB Variables to check the runtime configuration for all servers in the cluster:

Notice a line highlighted in red in the screenshot above? That means the configuration is not identical in all nodes. This provides more visibility on the configuration difference among hosts in a particular database cluster.

Workbench v ClusterControl: Advantages and Disadvantages

Every product has its own set of advantages and disadvantages. For ClusterControl, since it understands cluster and topology, it's the best configuration manager to manage multiple database nodes at once. It supports multiple MySQL vendors like MariaDB, Percona as well as all Galera Cluster variants. It also understands database load balancer configuration format for HAProxy, MariaDB MaxScale, ProxySQL and Keepalived. Since ClusterControl requires passwordless SSH configuration at the beginning of importing/deploying the cluster, configuration management requires no remote setup like Workbench and it works out-of-the-box after the hosts are managed by ClusterControl. MySQL configuration changes performed by ClusterControl will be loaded into runtime automatically (for all supported variables) as well as written into MySQL configuration files for persistence. In terms of disadvantages, ClusterControl configuration management does not come with configuration descriptions which could help us anticipate what would happen if we changed the configuration option. It does not support all platforms that MySQL can run, particularly only certain Linux distributions like CentOS, RHEL, Debian and Ubuntu.

MySQL Workbench supports remote management of many operating systems like Windows, FreeBSD, MacOS, Open Solaris and Linux. MySQL Workbench is available for free and can also be used with other MySQL vendors like Percona and MariaDB (despite not listed here, it does work with some older MariaDB versions). It also supports managing installation from the TAR bundle. It allows some customizations on configuration file path, service/stop commands and MySQL group sections naming. One of the neat features is that MySQL Workbench uses dropdown menu for fixed values, which can be a huge help in reducing the risk of misconfiguration from a user, as shown in the following screenshot:

On the downside, MySQL Workbench does not support multiple host configuration management where you have to perform the config change on every host separately. It also does not push the configuration changes into runtime, without explicit MySQL restart which can compromise the database service uptime.

The following table simplifies the significant differences taken from the all the mentioned points:

Configuration Aspect	MySQL Workbench	ClusterControl
Supported OS for MySQL server	Linux Windows FreeBSD Open Solaris Mac OS	Linux (Debian, Ubuntu, RHEL, CentOS)
MySQL vendor	Oracle Percona	Oracle Percona MariaDB Codership
Support other software		HAProxy ProxySQL MariaDB MaxScale Keepalived
Configuration/Variable description	Yes	No
Config file syntax highlighting	No	Yes
Drop down configuration values	Yes	No
Multi-host configuration	No	Yes
Auto push configuration into runtime	No	Yes
Configuration templating	No	Yes
Cost	Free	Subscription required for configuration management

We hope this blog post will help you out in determining which tool is suitable to manage your MySQL servers' configurations. You can also try our new Configuration Files Management tool (currently in alpha)

Tags:

MySQL

configuration

config

configuration management

oracle

↧

NoSQL Data Streaming with MongoDB & Kafka

May 8, 2020, 12:13 pm

≫ Next: Manage Engine Applications Manager Alternatives - ClusterControl Database Monitoring

≪ Previous: MySQL Workbench Alternatives - ClusterControl Configuration Management

Developers describe Kafka as a "Distributed, fault-tolerant, high throughput, pub-sub, messaging system." Kafka is well-known as a partitioned, distributed, and replicated commit log service. It also provides the functionality of a messaging system, but with a unique design. On the other hand, MongoDB is known as "The database for giant ideas." MongoDB is capable of storing data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB is designed for high availability and scalability, with built-in replication and auto-sharding.

MongoDB is classified under "Databases", while Kafka belongs to the "Message Queue" category of the tech stack. Developers consider Kafka "High-throughput", "Distributed" and "Scalable" as the key factors; whereas "Document-oriented storage", "No SQL" and "Ease of use" is considered as the primary reasons why MongoDB is favored.

Data Streaming in Kafka

In today’s data ecosystem, there is no single system that can provide all of the required perspectives to deliver real insight of the data. Deriving better visualization of data insights from data requires mixing a huge volume of information from multiple data sources. As such, we are eager to get answers immediately; if the time taken to analyze data insights exceeds 10s of milliseconds, then the value is lost or irrelevant. Applications such as fraud detection, high-frequency trading, and recommendation engines cannot afford to wait. This operation also is known as analyzing the inflow of data before it gets updated as the database of record with zero tolerance for data loss, and the challenge gets even more daunting.

Kafka helps you ingest and quickly move reliably large amounts of data from multiple data sources and then redirect it to the systems that need it by filtering, aggregating, and analyzing en-route. Kafka has a higher throughput, reliability, and replication characteristics, a scalable method to communicate streams of event data from one or more Kafka producers to one or more Kafka consumers. Examples of events include:

Air pollution data captured based on periodical basis
A consumer adding an item to the shopping cart in an online store
A Tweet posted with a specific hashtag

Streams of Kafka events are captured and organized into predefined topics. The Kafka producer chooses a topic to send a given event to, and consumers select which topics they pull events from. For example, a stock market financial application could pull stock trades from one topic and company financial information from another in order to look for trading opportunities.

MongoDB and Kafka collaboration make up the heart of many modern data architectures today. Kafka is designed for boundless streams of data that sequentially write events into commit logs, allowing real-time data movement between MongoDB and Kafka done through the use of Kafka Connect.

Figure1: MongoDB and Kafka working together

The official MongoDB Connector for Kafka was developed and is supported by MongoDB Inc. engineers. It is also verified by Confluent (who pioneered the enterprise-ready event streaming platform), conforming to the guidelines which were set forth by Confluent’s Verified Integrations Program. The connector enables MongoDB to be configured as both a sink and a source for Kafka. Easily build robust, reactive data pipelines that stream events between applications and services in real-time.

Figure 2: Connector enables MongoDB configured as both a sink and a source for Kafka.

MongoDB Sink Connector

The MongoDB Sink allows us to write events from Kafka to our MongoDB instance. The Sink connector converts the value from the Kafka Connect SinkRecords into a MongoDB Document and will do an insert or upsert depending on the configuration you chose. It expected the database created upfront, the targeted MongoDB collections created if they don’t exist.

MongoDB Kafka Source Connector

The MongoDB Kafka Source Connector moves data from a MongoDB replica set into a Kafka cluster. The connector configures and consumes change stream event documents and publishes them to a topic. Change streams, a feature introduced in MongoDB 3.6, generate event documents that contain changes to data stored in MongoDB in real-time and provide guarantees of durability, security, and idempotency. You can configure change streams to observe changes at the collection, database, or deployment level. It uses the following settings to create change streams and customize the output to save to the Kafka cluster. It will publish the changed data events to a Kafka topic that consists of the database and collection name from which the change originated.

MongoDB & Kafka Use Cases

eCommerce Websites

Use case of an eCommerce website whereby the inventory data is stored into MongoDB. When the stock inventory of the product goes below a certain threshold, the company would like to place an automatic order to increase the stock. The ordering process is done by other systems outside of MongoDB, and using Kafka as the platform for such event-driven systems are a great example of the power of MongoDB and Kafka when used together.

Website Activity Tracking

Site activity such as pages visited or adverts rendered are captured into Kafka topics – one topic per data type. Those topics can then be consumed by multiple functions such as monitoring, real-time analysis, or archiving for offline analysis. Insights from the data stored in an operational database such as MongoDB, where they can be analyzed alongside data from other sources.

Internet of Things (IoT)

IoT applications must cope with massive numbers of events that are generated by a multitude of devices. Kafka plays a vital role in providing the fan-in and real-time collection of all of that sensor data. A common use case is telematics, where diagnostics from a vehicle's sensors must be received and processed back at base. Once captured in Kafka topics, the data can be processed in multiple ways, including stream processing or Lambda architectures. It is also likely to be stored in an operational database such as MongoDB, where it can be combined with other stored data to perform real-time analytics and support operational applications such as triggering personalized offers.

Conclusion

MongoDB is well-known as non-relational databases, which published under a free-and-open-source license, MongoDB is primarily a document-oriented database, intended for use with semi-structured data like text documents. It is the most popular modern database built for handling huge and massive volumes of heterogeneous data.

Kafka is a widely popular distributed streaming platform that thousands of companies like New Relic, Uber, and Square use to build scalable, high-throughput, and reliable real-time streaming systems.

Together MongoDB and Kafka play vital roles in our data ecosystem and many modern data architectures.

Tags:

streaming replication

↧

Manage Engine Applications Manager Alternatives - ClusterControl Database Monitoring

May 11, 2020, 11:46 am

≫ Next: MySQL Workbench Alternatives - ClusterControl Database User Management

≪ Previous: NoSQL Data Streaming with MongoDB & Kafka

If you’re looking for a monitoring system, you probably read about many different options with different features and different costs based on these features.

Manage Engine Applications Manager is an application performance management solution that proactively monitors business applications and helps businesses ensure their revenue-critical applications meet end-user expectations.

ClusterControl is an agentless management and automation software for database clusters. It helps deploy, monitor, manage, and scale your database server/cluster directly from the ClusterControl UI or using the ClusterControl CLI.

In this blog, we’ll take a look at some of the features of these products so you’ll be able to have an overview to help choose the correct one based on your requirements.

Database Monitoring Features Comparison

Manage Engine Applications Manager

There are three different versions of the product:

Free: Supports monitoring up to 5 apps or servers
Professional: Supports integrated performance monitoring for a heterogeneous set of applications
Enterprise: Supports large deployments with its distributed monitoring capability

It can be installed on both Windows and Linux operating systems, and it can monitor not only Databases but also Applications, Mail Servers, Virtualization, and more.

ClusterControl

Like the previous one, there are three different versions of the product:

Free Community: Great for deployment & monitoring. No limit on the number of servers but there is a limit on the available features
Advanced: For high availability and scalability requirements
Enterprise: With enterprise-grade and security features

It can be installed only on Linux operating systems, and it’s only for Database and Load Balancer servers.

The Installation Process

Manage Engine Applications Manager Installation Process

The installation process can be hard for a standard user, as the documentation doesn’t have a step-by-step guide and it’s not clear about the packages required.

Let’s see an example of this installation on CentOS 8.

It’s not mentioned in the documentation (at least I didn’t find it), but it requires the following packages: tar, unzip, and hostname. You need to install it for you own, otherwise, as the installer won’t install it, you’ll receive an error message like:

/opt/ManageEngine_ApplicationsManager_64bit.bin: line 686: tar: command not found

Then, you need to run the installer with the ”-i console” flag using a privileged user (non-root):

$ sudo /opt/ManageEngine_ApplicationsManager_64bit.bin -i console

During the installation process, you can choose the Professional or Enterprise edition for your trial period. After your 30-day free trial ends, your installation will automatically convert to the free edition unless you have a commercial license:

===============================================================================

Edition Selection

-----------------

  ->1- Professional Edition

    2- Enterprise Edition(Distributed Setup)

    3- Free Edition

ENTER THE NUMBER FOR YOUR CHOICE, OR PRESS <ENTER> TO ACCEPT THE DEFAULT::

It supports different languages that you can choose here:

===============================================================================

Language Selection

------------------

  ->1- English

    2- Simplified Chinese

    3- Japanese

    4- Vietnamese

    5- French

    6- German

    7- European Spanish

    8- korean

    9- Hungarian

   10- Traditional Chinese

ENTER THE NUMBER FOR YOUR CHOICE, OR PRESS <ENTER> TO ACCEPT THE DEFAULT::

You can also add a license (if you have one), specify the web server and SSL port, local database (for this it supports PostgreSQL or Microsoft SQL Server), installation path, and if you want to register for technical support. You’ll see a summary before starting the installation process:

===============================================================================

Pre-Installation Summary

------------------------

Please Review the Following Before Continuing:

Product Name:

    ManageEngine Applications Manager14

Install Folder:

    /opt/ManageEngine/AppManager14

Link Folder:

    /root

Type Of Installation:

    PROFESSIONAL EDITION

DB Back-end :

    pgsql

Web Server Port :

    "9090"

Disk Space Information (for Installation Target):

    Required:  549,437,924 Bytes

    Available: 13,418,307,584 Bytes

PRESS <ENTER> TO CONTINUE:

When you receive the “Installation Complete” message, you’ll be ready to start it running the “startApplicationsManager.sh” script located in the installation path:

$ cd /opt/ManageEngine/AppManager14

$ sudo ./startApplicationsManager.sh

##########################################################################

 Note:It is recommended to start the product in nohup mode.

 Usage : nohup sh startApplicationsManager.sh &

##########################################################################

AppManager Info: Temporary image files are removed

This evaluation copy is valid for 29 days

[Tue May 05 01:28:31 UTC 2020] Starting Applications Manager "Primary" Server Modules, please wait ...

[Tue May 05 01:28:34 UTC 2020] Process : Site24x7IntegrationProcess [ Started ]

[Tue May 05 01:28:34 UTC 2020] Process : AMScriptProcess [ Started ]

[Tue May 05 01:28:35 UTC 2020] Process : AMExtProdIntegrationProcess [ Started ]

[Tue May 05 01:28:35 UTC 2020] Process : AuthMgr [ Started ]

[Tue May 05 01:28:35 UTC 2020] Process : AMDataCleanupProcess [ Started ]

[Tue May 05 01:28:35 UTC 2020] Process : DBUserStorageServer [ Started ]

[Tue May 05 01:28:35 UTC 2020] Process : NmsPolicyMgr [ Started ]

[Tue May 05 01:28:36 UTC 2020] Process : StartRelatedServices [ Started ]

[Tue May 05 01:28:36 UTC 2020] Process : AMUrlMonitorProcess [ Started ]

[Tue May 05 01:28:36 UTC 2020] Process : NMSMServer [ Started ]

[Tue May 05 01:28:36 UTC 2020] Process : NmsAuthManager [ Started ]

[Tue May 05 01:28:36 UTC 2020] Process : WSMProcess [ Started ]

[Tue May 05 01:28:38 UTC 2020] Process : APMTracker [ Started ]

[Tue May 05 01:28:38 UTC 2020] Process : RunJSPModule [ Started ]

[Tue May 05 01:28:38 UTC 2020] Process : StandaloneApplnProcess [ Started ]

[Tue May 05 01:28:38 UTC 2020] Process : AMRBMProcess [ Started ]

[Tue May 05 01:28:38 UTC 2020] Process : ApplnStandaloneBE [ Started ]

[Tue May 05 01:28:38 UTC 2020] Process : AMDistributionProcess [ Started ]

[Tue May 05 01:28:38 UTC 2020] Process : OAuthRefreshAccessToken [ Started ]

[Tue May 05 01:28:38 UTC 2020] Process : DiscoveryProcess [ Started ]

[Tue May 05 01:28:38 UTC 2020] Process : AMCAMProcess [ Started ]

[Tue May 05 01:28:39 UTC 2020] Process : NMSSAServer [ Started ]

[Tue May 05 01:28:39 UTC 2020] Process : AMServerStartUp [ Started ]

[Tue May 05 01:28:42 UTC 2020] Process : Collector [ Started ]

[Tue May 05 01:28:42 UTC 2020] Process : DBServer [ Started ]

[Tue May 05 01:28:43 UTC 2020] Process : MapServerBE [ Started ]

[Tue May 05 01:28:43 UTC 2020] Process : NmsConfigurationServer [ Started ]

[Tue May 05 01:28:44 UTC 2020] Process : AMFaultProcess [ Started ]

[Tue May 05 01:28:44 UTC 2020] Process : AMEventProcess [ Started ]

[Tue May 05 01:28:56 UTC 2020] Process : AMServerFramework [ Started ]

[Tue May 05 01:29:07 UTC 2020] Process : AMDataArchiverProcess [ Started ]

[Tue May 05 01:29:08 UTC 2020] Process : MonitorsAdder [ Started ]

[Tue May 05 01:29:11 UTC 2020] Process : EventFE [ Started ]

[Tue May 05 01:29:11 UTC 2020] Process : AlertFE [ Started ]

[Tue May 05 01:29:11 UTC 2020] Process : NmsMainFE [ Started ]

Verifying connection with web server... verified

Applications Manager started successfully.

Please connect your client to the web server on port: 9090

Now you can access the UI using the default user and password (admin/admin):

ClusterControl Installation Process

There are different installation methods as it’s mentioned in the documentation. In the case of a manual installation, the required packages are specified in the same documentation, and there is a step-by-step guide for all the process.

Let’s see an example of this installation on CentOS 8 using the automatic installation script.

$ wget http://www.severalnines.com/downloads/cmon/install-cc

$ chmod +x install-cc

$ sudo ./install-cc   # omit sudo if you run as root

The installation script will attempt to automate the following tasks:

Install and configure a local MySQL server (used by ClusterControl to store monitoring data)
Install and configure the ClusterControl controller package via package manager
Install ClusterControl dependencies via package manager
Configure Apache and SSL
Configure ClusterControl API URL and token
Configure ClusterControl Controller with minimal configuration options
Enable the CMON service on boot and start it up

Running the mentioned script, you’ll receive a question about sending diagnostic data:

$ sudo ./install-cc

!!

Only RHEL/Centos 6.x|7.x|8.x, Debian 7.x|8.x|9.x|10.x, Ubuntu 14.04.x|16.04.x|18.04.x LTS versions are supported

Minimum system requirements: 2GB+ RAM, 2+ CPU cores

Server Memory: 1024M total, 922M free

MySQL innodb_buffer_pool_size set to 512M



Severalnines would like your help improving our installation process.

Information such as OS, memory and install success helps us improve how we onboard our users.

None of the collected information identifies you personally.

!!

=> Would you like to help us by sending diagnostics data for the installation? (Y/n):

Then, it’ll start installing the required packages. The next question is about the hostname that will be used:

=> The Controller hostname will be set to 192.168.100.131. Do you want to change it? (y/N):

When the local database is installed, the installer will secure it creating a root password that you must enter:

=> Starting database. This may take a couple of minutes. Do NOT press any key.

Redirecting to /bin/systemctl start mariadb.service

=> Securing the MySQL Server ...

=> !! In order to complete the installation you need to set a MySQL root password !!

=> Supported special password characters: ~!@#$%^&*()_+{}<>?

=> Press any key to proceed ...

And a CMON user password, which will be used by ClusterControl:

=> Set a password for ClusterControl's MySQL user (cmon) [cmon]

=> Supported special characters: ~!@#$%^&*()_+{}<>?

=> Enter a CMON user password:

That’s it. In this way, you’ll have all in place without installing or configuring anything manually.

=> ClusterControl installation completed!

Open your web browser to http://192.168.100.131/clustercontrol and

enter an email address and new password for the default Admin User.

Determining network interfaces. This may take a couple of minutes. Do NOT press any key.

Public/external IP => http://10.10.10.10/clustercontrol

Installation successful. If you want to uninstall ClusterControl then run install-cc --uninstall.

The first time you access the UI, you will need to register for the 30-day free trial period.

After your 30-day free trial ends, your installation will automatically convert to the community edition unless you have a commercial license.

Database Monitoring Usage Comparison

Manage Engine Applications Manager

To start using it, you need to add a new monitor using the corresponding section, where you have different options to be used. As we mentioned in the features section, it allows you to monitor different things like Applications, Databases, Virtualization, and even more.

Let’s say you want to monitor a MySQL instance. For this, you’ll need to add a MySQL Java Connector in the Application Manager directory, in “/opt/ManageEngine/AppManager14/working/mysql/MMMySQLDriver”, and then, restart the Applications Manager software. Then, you need to create the user to access the database.

To add this monitor, you must specify the display name, hostname/IP address, database port, credentials, and database to be monitored. The database service must be running, and the database and user must be created previously.

ClusterControl

To add the first database node to be monitored, you must go to the deploy/import section. ClusterControl requires SSH access to the remote node for both deploy and import actions.

For the import process, you’ll need to use a database admin user, and specify vendor, version, database port, and Hostname/IP address of the node/nodes.

For deployment, you just need to specify the user that will be created during the installation process. ClusterControl will also install the database software and required packages in this process, so you don’t need to perform any manual configuration or installation.

You can also choose between different database vendors and versions, and a basic configuration like database port and datadir.

Finally, you must define the topology to be deployed. The available topologies depend on the selected technology.

Monitoring Your Database

Database Monitoring with Manage Engine Applications Manager

Let’s see an example of monitoring a MySQL database. In this case, you can see first, an overview of the database node, with some basic metrics.

You can go to the Database tab, to see specific information about the database that you’re monitoring:

If you take a look at the Replication section, in this case, it says “Replication is not enabled”:

But actually, there is a master-slave replication up and running... There is nothing related to this issue in the documentation, so, as it’s not working, let’s continue to the following section: “Performance”, where you’ll have a list of the top queries.

Then, the “Session” section, where you’ll have the current sessions:

And finally, information about the database configuration:

Database Monitoring with ClusterControl

Like the previous case, let’s see an example of monitoring a MySQL database. In this case, you can see first, an overview of the database node, with some basic metrics.

You have different dashboards here, that you can customize based on your requirements. Then, in the “Node” section, you can see host/database metrics, top process, and configuration for each node.

If you go to the “Dashboards” section, you’ll have more detailed information about your database, load balancer, or host, with more useful metrics.

You can also check the “Topology View” section, where you can see the status of all the environment, or even perform actions over the nodes.

In the “Query Monitor” section, you can see the Top Queries, Running Queries, and Query Outliers.

Then, in the “Performance” section, you have information about your database performance, configuration variables, schema analyzer, transaction log, and even more.

In the same section, you can check the database growth, that will show the Data Size and Index Size for each database.

You can check the “Log” section, to monitor not only the ClusterControl log but also the Operating System and Database logs, so you don’t need to access the server to check this.

Database Alarms & Notifications

Manage Engine Applications Manager Notifications

A good monitoring system requires alarms to alert you in case of failure. This system has its own alarm system where you must configure actions to be run when the alarm is generated.

You can integrate it with another Manage Engine product called AlarmsOne, to centralize it. This is a separate product, so it has its own price/plan.

ClusterControl Notifications

It also has an alarm system using Advisors. ClusterControl has some predefined advisors that could be modified if needed, but in general, it’s not necessary so you don’t need to do any manual task. You can also use the Developer Studio tool to manage or create a new script.

It has integration with 3rd party tools like Slack or PagerDuty, so you can receive notifications there too.

Conclusion

According to the features mentioned above, we can say Applications Manager is a good option to monitor both applications and databases in a basic way. It supports different languages, and it has also support not only for Linux but also for Windows as the Operating System. The installation process, however, can be very challenging for inexperienced users as it requires too many manual actions and configurations, the documentation is not well written and the monitoring options and metrics are basic.

On the other hand, we can say ClusterControl is an all-in-one management system with a lot of features, but only for databases and load balancer servers, and only available for Linux Operating System. In this case, the installation is really easy using the automatic installation script (it doesn’t require extra manual configuration or installation), the documentation has step-by-step guides, and it’s a complete monitoring system with dashboards and several metrics that could be useful for you.

You can perform not only monitoring tasks but also deployment, scale, management, and even more. The monitoring features of ClusterControl are also free and part of the Community Edition.

Tags:

↧

MySQL Workbench Alternatives - ClusterControl Database User Management

May 12, 2020, 12:37 pm

≫ Next: Comparing Amazon RDS Point-in-Time Recovery to ClusterControl

≪ Previous: Manage Engine Applications Manager Alternatives - ClusterControl Database Monitoring

MySQL user and privilege management is very critical for authentication, authorization and accounting purposes. Since MySQL 8.0, there are now two types of database user privileges:

Static privileges - The common global, schema and administrative privileges like SELECT, ALTER, SUPER and USAGE, built into the server.
Dynamic privileges - New in MySQL 8.0. A component that can be registered and unregistered at runtime which provides better control over global privileges. For example, instead of assigning SUPER privilege only for configuration management purposes, that particular user is better be granted with SYSTEM_VARIABLES_ADMIN privilege only.

Creating a database schema with its respective user is the very initial step to start using MySQL as your database server. Most applications that use MySQL as the datastore require this task to be done before the application could work as intended. To use with an application, commonly a MySQL user is configured to have full privileges (ALL PRIVILEGES) on the schema level, meaning the database user used by the application has the freedom to perform any actions on the assigned database.

In this blog post, we are going to compare and contrast MySQL database user management features between MySQL Workbench and ClusterControl.

MySQL Workbench - Database User Management

For MySQL Workbench, you can find all the user management stuff under Administration -> Management -> User and Privileges. You should see a list of existing users on the left-side while on the right-side is the authentication and authorization configuration section for the selected user:

MySQL supports over 30 static privileges and it is not easy to understand and remember them all. MySQL Workbench has a number of preset administrative roles, which is very helpful when assigning sufficient privileges to a database user. For example, if you would like to create a MySQL user specifically to perform backup activities using mysqldump, you may pick the BackupAdmin role and the related global privileges will be assigned to the user accordingly:

To create a new database user, click on the "Add Account" button and supply necessary information under the "Login" tab. You may add some more resource restrictions under the "Account Limit" tab. If the user is only for a database schema and not intended for any administrative role (strictly for application usage), you may skip the "Administrative Roles" tab and just configure the "Schema Privileges".

Under the "Schema Privileges" section, one can pick a database schema (or define the matching pattern) by clicking "Add Entry". Then, press the "Select ALL" button to allow all rights (except GRANT OPTION) which is similar to "ALL PRIVILEGES" option statement:

A database user will not be created in the MySQL server until you have applied the changes, by clicking the "Apply" button.

ClusterControl - Database and Proxy User Management

ClusterControl database and user management is a bit more straightforward than what MySQL Workbench offers. While MySQL Workbench is more developer friendly, ClusterControl is focused more on what SysAdmins and DBAs are interested in, more like common administration stuff for those who are already familiar with MySQL roles and privileges.

To create a database user, go to Manage -> Schemas and Users -> Users -> Create New User. You will be presented with the following user creation wizard:

Creating a user in ClusterControl requires you to fill up all necessary fields in one page, unlike MySQL Workbench which involved a number of clicks to achieve similar results. ClusterControl also supports creating a user with "REQUIRE SSL" syntax, to enforce the particular user to access only via SSL encryption channel.

ClusterControl provides an aggregated view on all database users in a cluster, eliminating you to login to every individual server to look for a particular user:

A simple rollover on the privileges box reveals all privileges that have been assigned to this user. ClusterControl also provides a list of inactive users, user accounts that have not been used since the last server restart:

The above list gives us a clear summary of which users are worth to exist, allowing us to manage the user more efficiently. DBAs can then ask the developer whether the inactive database user is still necessary to be active, otherwise the user account can be locked or dropped.

If you are having a ProxySQL load balancer in between, you might know that ProxySQL has its own MySQL user management to allow it to be passed through it. There are a number of different settings and variables if compared to the common MySQL user configurations e.g, default hostgroup, default schema, transaction persistence, fast forward and many more. ClusterControl provides a graphical user interface in managing ProxySQL database users, improving the experience and efficiency of managing your proxy and database users at once:

When creating a new database user via ProxySQL management page, ClusterControl will automatically create the corresponding user on both ProxySQL and MySQL. However, when dropping a MySQL user from ProxySQL, the corresponding database user will remain on the MySQL server.

Advantages & Disadvantages

ClusterControl supports multiple database vendors so you will get a similar user experience dealing with other database servers. ClusterControl also supports creating a database user on multiple hosts at once, where it will make sure the created user exists on all database servers in the cluster. ClusterControl has a cleaner way when listing out user accounts, where you can see all necessary information right in the listing page. However, user management requires active subscription and is not available in the community edition. It does not support all platforms that MySQL can run, particularly only certain Linux distributions like CentOS, RHEL, Debian and Ubuntu.

The strongest advantage of MySQL Workbench is that it is free, and can be used together with schema management and administration. It's built to be more friendly to developers and DBAs and has the advantage of being built and backed by the Oracle team, who owns and maintains MySQL server. It also provides much clearer guidance with description on most of the input fields, especially in the critical parts like authentication and privilege management. The preset administrative role is a neat way of granting a set of privileges to a user, based on the work the user must carry out on the server. On the down side, MySQL Workbench is not a cluster friendly tool since every management connection is tailored to one endpoint MySQL server. Thus, it doesn't provide a centralized view of all users in the cluster. It also doesn't support creating users with SSL enforcement.

Both of these tools do not support the new MySQL 8.0 dynamic privileges syntax e.g, BACKUP_ADMIN, BINLOG_ADMIN, SYSTEM_VARIABLES_ADMIN, etc.

The following table highlights notable features for both tools for easy comparison:

User Management Aspect	MySQL Workbench	ClusterControl
Supported OS for MySQL server	Linux Windows FreeBSD Open Solaris Mac OS	Linux (Debian, Ubuntu, RHEL, CentOS)
MySQL vendor	Oracle Percona	Oracle Percona MariaDB Codership
Support user management for other software		ProxySQL
Multi-host user management	No	Yes
Aggregated view of users in a database cluster	No	Yes
Show inactive users	No	Yes
Create user with SSL	No	Yes
Privilege and role description	Yes	No
Preset administrative role	Yes	No
MySQL 8.0 dynamic privileges	No	No
Cost	Free	Subscription required for management features

We hope that these blog posts will help you determine what tools suit best to manage your MySQL databases and users.

Tags:

↧

Comparing Amazon RDS Point-in-Time Recovery to ClusterControl

May 13, 2020, 11:16 am

≫ Next: pt-query-digest Alternatives - MySQL Query Management & Monitoring with ClusterControl

≪ Previous: MySQL Workbench Alternatives - ClusterControl Database User Management

The Amazon Relational Database Service (AWS RDS) is a fully-managed database service which can support multiple database engines. Among those supported are PostgreSQL, MySQL, and MariaDB. ClusterControl, on the other hand, is a database management and automation software which also supports backup handling for PostgreSQL, MySQL, and MariaDB open source databases.

While RDS has been widely embraced by many companies, some might not be familiar with how their Point-in-time Recovery (PITR) works and how it can be used.

Several of the database engines used by Amazon RDS have special considerations when restoring from a specific point in time, and in this blog we'll cover how it works for PostgreSQL, MySQL, and MariaDB. We'll also compare how it differs with the PITR function in ClusterControl.

What is Point-in-Time Recovery (PITR)

If you are not yet familiar with Disaster Recovery Planning (DRP) or Business Continuity Planning (BCP), you should know that PITR is one of the important standard practices for database management. As mentioned in our previous blog, Point In Time Recovery (PITR) involves restoring the database at any given moment in the past. To be able to do this, we will need to restore a full backup and then PITR takes place by applying all the changes that happened at a specific point in time you want to recover.

Point-in-time Recovery (PITR) with AWS RDS

AWS RDS handles PITR differently than the traditional way common to an on-prem database. The end result shares the same concept, but with AWS RDS the full backup is a snapshot, it then applies the PITR (which is stored in S3), and then launches a new (different) database instance.

The common way requires you to either use a logical (using pg_dump, mysqldump, mydumper) or a physical (Percona Xtrabackup, Mariabackup, pg_basebackup, pg_backrest) for your full backup before you apply the PITR.

AWS RDS will require you to launch a new DB instance, whereas the traditional approach allows you to flexibly store the PITR on the same database node where backup was taken or target a different (existing) DB instance that needs recovery or to a fresh DB instance.

Upon creation of your AWS RDS instance automated backups will be turned on. Amazon RDS automatically performs a full daily snapshot of your data. Snapshot schedules can be set during creation at your preferred backup window. While automated backups are turned on, AWS also captures transaction logs to Amazon S3 every 5 minutes recording all your DB updates. Once you initiate a point-in-time recovery, transaction logs are applied to the most appropriate daily backup in order to restore your DB instance to the specific requested time.

How To Apply a PITR with AWS RDS

Applying PITR can be done in three different ways. You can use AWS Management Console, the AWS CLI, or the Amazon RDS API once the DB instance is available. You must also take into consideration that the transaction logs are captured every five minutes which is then stored in AWS S3.

Once you restore a DB instance, the default DB security group (SG) is applied to the new DB instance. If you need the custom db SG, you can explicitly define this using the AWS Management Console, the AWS CLI modify-db-instance command, or the Amazon RDS API ModifyDBInstance operation after the DB instance is available.

PITR requires that you need to identify the most latest restorable time for a DB instance. To do this, you can use the AWS CLI describe-db-instances command and look at the value returned in the LatestRestorableTime field for the DB instance. For example,

[root@ccnode ~]# aws rds describe-db-instances --db-instance-identifier database-s9s-mysql|grep LatestRestorableTime

            "LatestRestorableTime": "2020-05-08T07:25:00+00:00",

Applying PITR with AWS Console

To apply PITR in AWS Console, login to AWS Console→ go to Amazon RDS → Databases → Select (or click) your desired DB instance, then click Actions. See below,

Once you attempt to restore via PITR, the console UI will notify you what's the most latest restorable time you can set. You can use the latest restorable time or specify your desired target date and time. See below:

It's quite easy to follow but it requires you to pay attention and fill in the desired specifications you need for the new instance to be launched.

Applying PITR with AWS CLI

Using the AWS CLI can be quite handy especially if you need to incorporate this with your automation tools for your CI/CD pipeline. To do this, you can start simply with,

[root@ccnode ~]# aws rds restore-db-instance-to-point-in-time \

>     --source-db-instance-identifier  database-s9s-mysql \

>     --target-db-instance-identifier  database-s9s-mysql-pitr \

>     --restore-time 2020-05-08T07:30:00+00:00

{

    "DBInstance": {

        "DBInstanceIdentifier": "database-s9s-mysql-pitr",

        "DBInstanceClass": "db.t2.micro",

        "Engine": "mysql",

        "DBInstanceStatus": "creating",

        "MasterUsername": "admin",

        "DBName": "s9s",

        "AllocatedStorage": 18,

        "PreferredBackupWindow": "00:00-00:30",

        "BackupRetentionPeriod": 7,

        "DBSecurityGroups": [],

        "VpcSecurityGroups": [

            {

                "VpcSecurityGroupId": "sg-xxxxx",

                "Status": "active"

            }

        ],

        "DBParameterGroups": [

            {

                "DBParameterGroupName": "default.mysql5.7",

                "ParameterApplyStatus": "in-sync"

            }

        ],

        "DBSubnetGroup": {

            "DBSubnetGroupName": "default",

            "DBSubnetGroupDescription": "default",

            "VpcId": "vpc-f91bdf90",

            "SubnetGroupStatus": "Complete",

            "Subnets": [

                {

                    "SubnetIdentifier": "subnet-exxxxx",

                    "SubnetAvailabilityZone": {

                        "Name": "us-east-2a"

                    },

                    "SubnetStatus": "Active"

                },

                {

                    "SubnetIdentifier": "subnet-xxxxx",

                    "SubnetAvailabilityZone": {

                        "Name": "us-east-2c"

                    },

                    "SubnetStatus": "Active"

                },

                {

                    "SubnetIdentifier": "subnet-xxxxxx",

                    "SubnetAvailabilityZone": {

                        "Name": "us-east-2b"

                    },

                    "SubnetStatus": "Active"

                }

            ]

        },

        "PreferredMaintenanceWindow": "fri:06:01-fri:06:31",

        "PendingModifiedValues": {},

        "MultiAZ": false,

        "EngineVersion": "5.7.22",

        "AutoMinorVersionUpgrade": true,

        "ReadReplicaDBInstanceIdentifiers": [],

        "LicenseModel": "general-public-license",

        "OptionGroupMemberships": [

            {

                "OptionGroupName": "default:mysql-5-7",

                "Status": "pending-apply"

            }

        ],

        "PubliclyAccessible": true,

        "StorageType": "gp2",

        "DbInstancePort": 0,

        "StorageEncrypted": false,

        "DbiResourceId": "db-XXXXXXXXXXXXXXXXX",

        "CACertificateIdentifier": "rds-ca-2019",

        "DomainMemberships": [],

        "CopyTagsToSnapshot": false,

        "MonitoringInterval": 0,

        "DBInstanceArn": "arn:aws:rds:us-east-2:042171833148:db:database-s9s-mysql-pitr",

        "IAMDatabaseAuthenticationEnabled": false,

        "PerformanceInsightsEnabled": false,

        "DeletionProtection": false,

        "AssociatedRoles": []

    }

}

Both of these approaches take time to create or prepare the database instance until it will be available and viewable in the list of database instances in your AWS RDS console.

AWS RDS PITR Limitations

When using AWS RDS you are tied to them as a vendor. Moving your operations out their system can be troublesome. Here's are some things you have to consider:

The level of vendor-lock in when using AWS RDS
Your only option to recover via PITR requires you to launch a new instance running on RDS
No way you can recover using PITR process to an external node not in RDS
Requires you to learn and be familiar with their tools and security framework.

How To Apply A PITR with ClusterControl

ClusterControl performs PITR in a simple, yet straightforward, fashion (but requires you have to enable or set the prerequisites so PITR can be used). As discussed earlier, PITR for ClusterControl works differently than AWS RDS. Here a list of where PITR can be applied using ClusterControl (as of version 1.7.6):

Applies after the full backup based on the available backup method solutions we support for PostgreSQL, MySQL, and MariaDB databases.
- For PostgreSQL, only pg_basebackup backup method is supported and compatible to work with PITR
- For MySQL or MariaDB, only xtrabackup/mariabackup backup method is supported and compatible to work with PITR
Applicable for MySQL or MariaDB databases, PITR applies only if the source node of the full backup is the target node to be recovered.
MySQL or MariaDB databases requires that you have binary logging enabled
Applicable for PostgreSQL databases, PITR applies only to the active master/primary and requires that you have to enable WAL archiving.
PITR can only be applied when restoring an existing full backup

Backup Management for ClusterControl is applicable for environments where databases are not fully managed and requires SSH access which is totally different from AWS RDS. Although they share the same result which is to recover data, the backup solutions that are present in ClusterControl cannot be applicable in AWS RDS. ClusterControl also does not support RDS as well for management and monitoring.

Using ClusterControl for PITR in PostgreSQL

As mentioned earlier of the prerequisites to leverage the PITR, you must have to enable WAL archiving. This can be achieve by clicking the gear icon as shown below:

Since PITR can be applied right after a full backup, you can only run find this feature under the Backup list where you can attempt to restore an existing backup. To do that, the sequence of screenshots will show you how to do it:

Then restore it on the same host as the source of the backup as taken,

Then just specify the date and time,

Once you are set and specify the date and time, ClusterControl will then restore the backup then apply the PITR once the backup is done. You can also verify this by inspecting the job activity logs just like below,

Using ClusterControl for PITR in MySQL/MariaDB

PITR for MySQL or MariaDB does not differ from the approach we have above for PostgreSQL. However, there's no WAL archiving equivalence nor a button or option you can set that is required to enable the PITR functionality. Since MySQL and MariaDB require that a PITR can be applied using binary logs, in ClusterControl, this can be handled under Manage tab. See below:

Then specify the log_bin variable with the corresponding boolean value. For example,

Once the log_bin is set on the node, ensure that you have the full backup taken on the same node where you will also apply the process of PITR. This is stated earlier in the prerequisites. Alternatively, you can also just edit the configuration files (/etc/my.cnf or /etc/mysql/my.cnf) and add the log_bin=ON under the [mysqld] section, for example.

When binary logs are enabled and a full backup is available, you can then do the PITR process same as how PostgreSQL UI but with different fields that you can fill in. You can specify the date and time or specify based on the binlog's file and position (or x & y position). See below:

ClusterControl PITR Limitations

In case you’re wondering what you can and cannot do for PITR in ClusterControl, here's the list below:

There's no current s9s CLI tool which supports the PITR process, so it's not possible to automate or integrate to your CI/CD pipeline.
No PITR support for external nodes
No PITR support when the source of the backup is different from the target node
There's no such periodic notification of what's the most latest period of time you can apply for PITR

Conclusion

Both tools have different approaches and different solutions for the target environment. The key takeaways is that AWS RDS has its own PITR which is faster, but is applicable only if your database is hosted under RDS and you are tied to a vendor lock in.

ClusterControl allows you to freely apply the PITR process to whatever data center or on-premise as long as the prerequisites are taken into consideration. It's goal is to recover the data. Regardless of its limitations, it's based on how you will use the solution in accordance to the architectural environment you are using.

Tags:

↧